Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Calculus of Thought: Neuromorphic Logistic Regression in Cognitive Machines
Calculus of Thought: Neuromorphic Logistic Regression in Cognitive Machines
Calculus of Thought: Neuromorphic Logistic Regression in Cognitive Machines
Ebook516 pages67 hours

Calculus of Thought: Neuromorphic Logistic Regression in Cognitive Machines

Rating: 2 out of 5 stars

2/5

()

Read preview

About this ebook

Calculus of Thought: Neuromorphic Logistic Regression in Cognitive Machines is a must-read for all scientists about a very simple computation method designed to simulate big-data neural processing. This book is inspired by the Calculus Ratiocinator idea of Gottfried Leibniz, which is that machine computation should be developed to simulate human cognitive processes, thus avoiding problematic subjective bias in analytic solutions to practical and scientific problems.

The reduced error logistic regression (RELR) method is proposed as such a "Calculus of Thought." This book reviews how RELR's completely automated processing may parallel important aspects of explicit and implicit learning in neural processes. It emphasizes the fact that RELR is really just a simple adjustment to already widely used logistic regression, along with RELR's new applications that go well beyond standard logistic regression in prediction and explanation. Readers will learn how RELR solves some of the most basic problems in today’s big and small data related to high dimensionality, multi-colinearity, and cognitive bias in capricious outcomes commonly involving human behavior.

  • Provides a high-level introduction and detailed reviews of the neural, statistical and machine learning knowledge base as a foundation for a new era of smarter machines
  • Argues that smarter machine learning to handle both explanation and prediction without cognitive bias must have a foundation in cognitive neuroscience and must embody similar explicit and implicit learning principles that occur in the brain
LanguageEnglish
Release dateOct 15, 2013
ISBN9780124104525
Calculus of Thought: Neuromorphic Logistic Regression in Cognitive Machines
Author

Daniel M Rice

Daniel M. Rice is Principal and Senior Scientist of Rice Analytics. He founded the business in early 1996 as a sole proprietorship, but it was incorporated into its current structure in 2006. Prior to 1996, he was an assistant professor at the University of California-Irvine and the University of Southern California. Dan has almost 25 years of research project and advanced statistical modeling experience for major organizations that include the National Institute on Aging, Eli Lilly, Anheuser-Busch, Sears Portrait Studios, Hewlett-Packard, UBS, and Bank of America. He has a Ph.D. from the University of New Hampshire in Cognitive Neuroscience and Postdoctoral training in Applied Statistics from the University of California-Irvine. Dan is a previous recipient of an Individual National Research Service Award from the National Institutes of Health and is author of more than 20 publications, many of which are in conference proceedings and peer-reviewed journals in cognitive neuroscience and statistics.

Related to Calculus of Thought

Related ebooks

Mathematics For You

View More

Related articles

Reviews for Calculus of Thought

Rating: 2 out of 5 stars
2/5

1 rating0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Calculus of Thought - Daniel M Rice

    Calculus of Thought

    Neuromorphic Logistic Regression in Cognitive Machines

    Daniel M. Rice

    Table of Contents

    Cover image

    Title page

    Copyright

    Preface

    A Personal Perspective

    Chapter 1. Calculus Ratiocinator

    Abstract

    1 A Fundamental Problem with the Widely Used Methods

    2 Ensemble Models and Cognitive Processing in Playing Jeopardy

    3 The Brain's Explicit and Implicit Learning

    4 Two Distinct Modeling Cultures and Machine Intelligence

    5 Logistic Regression and the Calculus Ratiocinator Problem

    Chapter 2. Most Likely Inference

    Abstract

    1 The Jaynes Maximum Entropy Principle

    2 Maximum Entropy and Standard Maximum Likelihood Logistic Regression

    3 Discrete Choice, Logit Error, and Correlated Observations

    4 RELR and the Logit Error

    5 RELR and the Jaynes Principle

    Chapter 3. Probability Learning and Memory

    Abstract

    1 Bayesian Online Learning and Memory

    2 Most Probable Features

    3 Implicit RELR

    4 Explicit RELR

    Chapter 4. Causal Reasoning

    Abstract

    1 Propensity Score Matching

    2 RELR's Outcome Score Matching

    3 An Example of RELR's Causal Reasoning

    4 Comparison to Other Bayesian and Causal Methods

    Chapter 5. Neural Calculus

    Abstract

    1 RELR as a Neural Computational Model

    2 RELR and Neural Dynamics

    3 Small Samples in Neural Learning

    4 What about Artificial Neural Networks?

    Chapter 6. Oscillating Neural Synchrony

    Abstract

    1 The EEG and Neural Synchrony

    2 Neural Synchrony, Parsimony, and Grandmother Cells

    3 Gestalt Pragnanz and Oscillating Neural Synchrony

    4 RELR and Spike-Timing-Dependent Plasticity

    5 Attention and Neural Synchrony

    6 Metrical Rhythm in Oscillating Neural Synchrony

    7 Higher Frequency Gamma Oscillations

    Chapter 7. Alzheimer's and Mind–Brain Problems

    Abstract

    1 Neuroplasticity Selection in Development and Aging

    2 Brain and Cognitive Changes in Very Early Alzheimer's Disease

    3 A RELR Model of Recent Episodic and Semantic Memory

    4 What Causes the Medial Temporal Lobe Disturbance in Early Alzheimer's?

    5 The Mind–Brain Problem

    Chapter 8. Let Us Calculate

    Abstract

    1 Human Decision Bias and the Calculus Ratiocinator

    2 When the Experts are Wrong

    3 When Predictive Models Crash

    4 The Promise of Cognitive Machines

    Appendix

    A1 RELR Maximum Entropy Formulation

    A2 Derivation of RELR Logit from Errors-in-Variables Considerations

    A3 Methodology for Pew 2004 Election Weekend Model Study

    A4 Derivation of Posterior Probabilities in RELR's Sequential Online Learning

    A5 Chain Rule Derivation of Explicit RELR Feature Importance

    A6 Further Details on the Explicit RELR Low Birth Weight Model in Chapter 3

    A7 Zero Intercepts in Perfectly Balanced Stratified Samples

    A8 Detailed Steps in RELR's Causal Machine Learning Method

    Notes and References

    Index

    Copyright

    Academic Press is an imprint of Elsevier

    225, Wyman Street, Waltham, MA 02451, USA

    The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK

    Radarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands

    © 2014 Elsevier Inc. All rights reserved.

    No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher

    Permissions may be sought directly from Elsevier's Science & Technology Rights Department in Oxford, UK: phone (+44) (0) 1865 843830; fax (+44) (0) 1865 853333; email: permissions@elsevier.com. Alternatively you can submit your request online by visiting the Elsevier web site at http://elsevier.com/locate/permissions, and selecting Obtaining permission to use Elsevier material

    Notice

    No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made

    Library of Congress Cataloging-in-Publication Data

    A catalog record for this book is available from the Library of Congress

    British Library Cataloguing in Publication Data

    A catalogue record for this book is available from the British Library

    ISBN: 978-0-12-410407-5

    For information on all Academic Press publications visit our web site at store.elsevier.com

    Printed and bound in USA

    14 15 16 17  10 9 8 7 6 5 4 3 2 1

    Preface

    A Personal Perspective

    I first heard the phrase Calculus of Thought as a second year graduate student in 1981 when I took a seminar called Brain and Behavior taught by our professor James Davis at the University of New Hampshire. He gave us a thorough grounding in the cognitive neuroscience of that day, and he spent significant time on various models of neural computation. The goal of cognitive research in neuroscience, he constantly reminded us, was to discover the neural calculus, which he took to be a complete and unifying understanding of how the brain performs computation in diverse functions such as coordinated motor action, perception, learning and memory, decision making, and causal reasoning. He suggested that if we had such a calculus, then there would be truly amazing practical artificial intelligence applications beyond our wildest dreams. This was before the widespread popularity of artificial neural network methods in the mid-1980s, as none of the quantitative models that we learned about were artificial neural networks, but instead were grounded in empirical data in neural systems like the primary visual cortex and the basal ganglia. He admitted that the models that we were taught fell way short of such a calculus, but he did fuel an idea within me that has stayed for more than 30 years.

    I went on to do my doctoral dissertation with my thesis advisor Earl Hagstrom on timing relationships in auditory attention processing as reflected by the scalp recorded Electroencephalography (EEG) alpha rhythm. During the early and mid-1980s, there was not a lot of interest in the EEG as a window into normal human cognition, as the prevailing sentiment was that the brain's electric field potentials were too gross of a measure to contain valuable information. We now have abundant evidence that brain field potential measures, such as the EEG alpha rhythm, are sensitive to oscillating and synchronous neural signals that do reflect the rhythm and time course of cognitive processing. My dissertation study was one of a handful of initial studies to document a parallelism between the oscillating neural synchrony as measured by the EEG and cognition.¹ Through that dissertation study, and the advice that I also got from other faculty committee members—John Limber and Rebecca Warner, I learned the importance of well-controlled experimental findings in providing more reliable explanations. Yet, this dissertation did not address how basic cognitive computations might be performed by oscillating neural synchrony mechanisms, and my thoughts have wandered back to how this Calculus of Thought might work ever since.

    I did postdoctoral fellowship research with Monte Buchsbaum at the University of California-Irvine Brain Imaging Center. Much of our work focused on abnormal temporal lobe EEG slowing in probable Alzheimer's patients and related temporal lobe slowing in nondemented older adults. We published one paper in 1990 that replicated other studies including one from Monte's lab showing abnormal temporal lobe EEG slowing in Alzheimer's patients compared to nondemented elderly, but we critically refined the methodology to get a higher fidelity measurement.² We published a second paper in 1991 that used this refined methodology to make the first claim that Alzheimer's disease must have an average preclinical period of at least 10 years.³ Our basic finding was that a milder form of the temporal lobe EEG slowing observed in Alzheimer's patients was seen in nondemented older adults with minor memory impairment. This observed memory impairment had the exact same profile as what neuropsychologist Brenda Milner had observed in the famous medial temporal lobe patient H.M. This is the most famous clinical case study in neuroscience.⁴ This was that there was normal immediate recall ability, but dramatically greater forgetting a short time later. Other than this memory deficit, we could find nothing else wrong in terms of cognitive and intelligence tests with these nondemented older adults. Given the nature of the memory deficit and the location of the EEG abnormality, we suggested that the focus of this abnormality must be in the medial temporal lobe of the brain. Given the prevalence of such EEG slowing in nondemented older adults and the prevalence of Alzheimer's disease, we calculated that this must be a preclinical sign of Alzheimer's disease that is present 10 years earlier than the clinical diagnosis. We had no idea how abnormal neural synchrony might be related to dysfunction in memory computations, but ever since my thoughts have wandered back to how this Calculus of Thought could go awry early in Alzheimer's disease.

    Our claim that there is a long preclinical period in Alzheimer's disease with a major focal abnormality originating in the medial temporal lobe memory system has now become the prevailing view in Alzheimer's research.⁵,⁶,⁷ This evidence for the long preclinical period is such that the National Institute on Aging issued new diagnostic recommendations in 2011 to incorporate preclinical Alzheimer's disease into the clinical Alzheimer's disease diagnosis.⁸ Yet, when I moved to a new assistant professor position at the University of Southern California (USC) in the earlier 1990s and tried to get National Institutes of Health (NIH) funding to validate this hypothesis with a longitudinal, prospective study, my proposals were repeatedly not funded. It was not controversial that Alzheimer's starts with medial temporal lobe memory system problems, as almost everyone believed that was true. However, I was a complete newcomer, and the idea that Alzheimer's had such a long preclinical period was just too unexpected to be accepted as reasonable by the quorum needed to get NIH funding. It was then that I began to have recollections of Thomas Kuhn's theory about bias in science taught by my undergraduate history of science professor Charles Bonwell many years earlier. Kuhn warned that the established scientific community is often overinvested in the basic theory in their field that he called paradigm.⁹ Kuhn argued that this bias precludes most scientists from considering new results and hypotheses that are inconsistent with the paradigm. In the early 1990s, the entire Alzheimer's clinical diagnostic process and pharmaceutical research were only concerned with actual demented patients. So any results or explanation that there is in fact a long 10-year preclinical period was not yet welcome. As it turns out, the only hope to prevent and treat Alzheimer's disease now seems to be in this very early period when the disease is still mild,¹⁰ as all drug studies in actual demented patients have not really worked. At that time, I was not yet familiar with Leibniz's concept of Calculus Ratiocinator, or a thought calculus machine designed to generate explanations that avoid such bias. Though, I did start to think that any unbiased machine that could generate most reasonable hypotheses based upon all available data would be useful.

    Fortunately, I had wider research interests than Alzheimer's disease, as I had an interest in quantitative modeling of the brain's electric activity as a means to understand the brain's computations. One day while still at UC-Irvine, I attended a seminar given by a graduate student on maximum entropy and information theory in a group organized by mathematical cognitive psychologists Duncan Luce and Bill Batchelder. I then began to study maximum entropy on my own and became interested in the possibility that this could be the basic computational process within neurons and the brain. When I ended up teaching at the USC a few years later, I was fortunate enough to collaborate with engineering professor Manbir Singh and his graduate student Deepak Khosla on modeling the EEG with the maximum entropy method.¹¹ In our maximum entropy modeling, Deepak taught me a very interesting new way to smooth error out of EEG models using what we now call L2 norm regularization. Yet, I also began to think that there might be a better approach based upon probability theory to model and ultimately reduce error in regression models that model the brain's neural computation. This thinking eventually led to the reduced error logistic regression (RELR) method that is the proposed Calculus of Thought, which is the subject of this book.

    In April of 1992, I had a bird's eye view of the Los Angeles (LA) riots through my third floor laboratory windows at USC that faced the south central section of the city. I watched shops and houses burn, and I was shocked by the magnitude of the violence. But, I also began to wonder whether human social behavior also might be determined probabilistically in ways similar to how causal mechanisms determine cognitive neural processes like attention and memory so that it might be possible to predict and explain such behavior. After the riots, I listened to the heated debates about causal forces involved in the 1992 LA riots, and again I began to wonder how objective these hypotheses about causal explanations of human behavior ever could be due to extremely strong biases. This was also true of most explanations of human behavior that I saw offered in social science whether they were conservative or liberal. So, it became clear to me that bias was the most significant problem in social science predictions and explanations of human behavior. And, I began to believe that an unbiased machine learning methodology would be a huge benefit to the understanding of human behavior outcomes. However, I did not yet make the connection that a data-driven quantitative methodology that models neural computations could be the basis of this unbiased Calculus Ratiocinator machine.

    Eventually, it became clear that my proposed research on a longitudinal, prospective study to test the putative long preclinical period in Alzheimer's disease would not be funded. So, because NIH funding was required to get tenure at USC, I realized that I had better find another career. I resigned from USC in 1995, which was years prior to when my tenure decision would have been made. Even though I had a contract that would be renewed automatically until at least 1999, I saw no reason to be dead weight for several years and then face a negative tenure decision. Instead, I wanted to get started on a more entrepreneurial career where I would have more control over my fate. So, I pursued an applied analytics career. I have enjoyed this career often better than my academic career because the rewards and feedback are more immediate. Also, there is much more emphasis on problem solving as the goal rather than writing a paper or getting a grant, although sales ability is obviously always valued. Yet, I eventually realized that the same bias problems that limit understanding of human behavior in academic neural, behavioral, and social science also limit practical business analytics. Time and time again I noticed decisions based upon biased predictive or explanatory models dictated by incorrect, questionable assumptions, so again I noticed a need for a data-driven Calculus Ratiocinator to generate reasonable and testable hypotheses and conclusions that avoid human bias.

    Much of my applied analytic work has involved logistic regression, and I eventually learned that maximum entropy and maximum likelihood methods yield identical results in modeling the kind of binary signals that both neurons and business decisions produce. Thus, at some point I also realized that I could continue my research into the RELR Calculus of Thought. But instead of focusing on the brain's computations, I could focus on practical real world business applications using the same computational method that I believed that neurons use—the RELR method. In so doing, I eventually realized that RELR could be useful as the Calculus Ratiocinator that Leibniz had suggested is needed to remove biased answers to important, real world questions.

    Many of today's data scientists have a background in statistics, pure mathematics, computer science, physics, engineering, and operations research. Yet, these are academic areas that are not focused on studying human behavior, but instead focus on quantitative and technical issues related to more mechanical data processes. Many other analytic scientists, along with many analytic executives, have a background in behaviorally oriented fields like economics, marketing, business, psychology, and sociology. These academic areas do focus on studying human behavior, but are not heavily quantitatively oriented. There is a real need for a theoretical bridge between the more quantitative and more behavioral knowledge areas, and that is the intention of this book. So, I believe that this book could appeal both to quantitative and behaviorally oriented analytic professionals and executives and yet fill important knowledge gaps in each case.

    Unless an analytic professional today has a specific background in cognitive or computational neuroscience, it is unlikely that they will have a very good understanding of neuroscience and how the brain may compute cognition and behavior. Yet, data-driven modeling that seeks to solve the Turing problem and mimic the performance of the human brain without its propensity for bias and error is probably the implicit goal of all machine-learning applications today. In fact, this book argues that cognitive machines will need to be neuromorphic, or based upon neuroscience, in order to simulate aspects of human cognition. So, this book covers the most fundamental and important concepts in modern cognitive neuroscience including neural dynamics, implicit and explicit learning, neural synchrony, Hebbian spike-timing dependent plasticity, and neural Darwinism. Although this book presents a unified view of these fundamental neuroscience concepts in relationship to RELR computation mechanisms, it also should serve as a much needed introductory overview of these fundamental cognitive neuroscience principles for analytic professionals who are interested in neuromorphic cognitive machines.

    Theories in science can simplify a subject and make that subject more understandable and applicable in practice, even when theories ultimately are modified with new data. So this book may help analytic professionals understand fundamental neural computational theory and cognitive neuroscientists understand practical issues surrounding real world machine learning applications. It is true that the theory of the brain's computation in this book should be viewed as a question rather than an answer. But, if cognitive neuroscientists and analytic professionals all start asking questions about whether the brain and machine learning each can work according to the information theory principles that are laid out in this book, then this book will have served its purpose.

    Daniel M. Rice

    St Louis, MO (USA)

    Autumn, 2013

    Chapter 1

    Calculus Ratiocinator

    Abstract

    There is more need than ever to implement Leibniz's Calculus Ratiocinator suggestion concerning a machine that simulates human cognition but without the inherent subjective biases of humans. This need is seen in how predictive models based upon observation data often vary widely across different blinded modelers or across the same standard automated variable selection methods. This unreliability is amplified in today's Big Data with very high-dimension confounding candidate variables. The single biggest reason for this unreliability is uncontrolled error that is especially prevalent with highly multicollinear input variables. So, modelers need to make arbitrary or biased subjective choices to overcome these problems because widely used automated variable selection methods like standard stepwise methods are simply not built to handle such error. The stacked ensemble method that averages many different elementary models was reviewed as one way to avoid such bias and error to generate a reliable prediction, but there are disadvantages including lack of automation and lack of transparent, parsimonious, and understandable solutions. A form of logistic regression that also models error events as a component of the maximum likelihood estimation called Reduced Error Logistic Regression (RELR) was also introduced as a method that avoids this multicollinearity error. An important neuromorphic property of RELR is that it shows stable explicit and implicit learning in small training samples and high-dimension inputs as observed in neurons. Other important neuromorphic properties of RELR consistent with a Calculus Ratiocinator machine were also introduced including the ability to produce unbiased automatic stable maximum probability solutions and stable causal reasoning based upon matched sample quasi-experiments. Given RELR's connection to information theory, these stability properties are the basis of the new stable information theory that is reviewed in this book with wide ranging causal and predictive analytics applications.

    Keywords

    Analytic science; Big data; Calculus Ratiocinator; Causal analytics; Causality; Cognitive neuroscience; Cognitive science; Data mining; Ensemble learning; Explanation; Explicit learning and memory; High dimension data; Implicit learning and memory; Information theory; Logistic regression; Machine learning; Matching experiment; Maximum entropy; Maximum likelihood; Multicollinearity; Neuromorphic; Neuroscience; Observational data; Outcome score matching; Prediction; Predictive analytics; Propensity score; Quasi-experiment; Randomized controlled experiment; Reduced error logistic regression (RELR); Stable information theory

    It is obvious that if we could find characters or signs suited for expressing all our thoughts as clearly and as exactly as arithmetic expresses numbers or geometry expresses lines, we could do in all matters insofar as they are subject to reasoning all that we can do in arithmetic and geometry. For all investigations which depend on reasoning would be carried out by transposing these characters and by a species of calculus.

    Gottfried Leibniz, Preface to the General Science, 1677.¹

    Contents

    1. A Fundamental Problem with the Widely Used Methods

    2. Ensemble Models and Cognitive Processing in Playing Jeopardy

    3. The Brain's Explicit and Implicit Learning

    4. Two Distinct Modeling Cultures and Machine Intelligence

    5. Logistic Regression and the Calculus Ratiocinator Problem

    At the of end of his life and starting in 1703 Gottfried Leibniz engaged in a 12-year feud with Isaac Newton over who first invented the calculus and who committed plagiarism. All serious scholarship now indicates that both Newton and Leibniz developed calculus independently.² Yet, stories about Leibniz's invention of calculus usually focus on this priority dispute with Newton and give much less attention to how Leibniz's vision of calculus differed substantially from Newton's. Whereas Newton was trained in mathematical physics and continued to be associated with academia during the most creative time in his career, Leibniz's early academic failings in math led him to become a lawyer by training and an entrepreneur by profession.³ So Leibniz's deep mathematical insights that led to calculus occurred away from a university professional association. Unlike Newton whose entire mathematical interests seemed tied to physics, Leibniz clearly had a much broader goal for calculus. These applications were in areas well beyond physics that seem to have nothing to do with mathematics. His dream application was for a Calculus Ratiocinator, which is synonymous with Calculus of Thought.⁴ This can be interpreted to be a very precise mathematical model of cognition that could be automated in a machine to answer any important philosophical, scientific, or practical question that traditionally would be answered with human subjective conjecture.⁵ Leibniz proposed that if we had such a cognitive calculus, we could just say Let us calculate⁶ and always find most reasonable answers uncontaminated by human bias.

    In a sense, this concept of Calculus Ratiocinator foreshadows today's predictive analytic technology.⁷ Predictive analytics are widely used today to generate better than chance longer term projections for more stable physical and biological outcomes like climate change, schizophrenia, Parkinson's disease, Alzheimer's disease, diabetes, cancer, optimal crop yields, and even good short-term projections for less stable social outcomes like marriage satisfaction, divorce, successful parenting, crime, successful businesses, satisfied customers, great employees, successful ad campaigns, stock price changes, loan decisions, among many others. Until the widespread practice of predictive analytics with the introduction of the computers in the past century, most of these outcomes were thought to be too capricious to have anything to do with mathematics. Instead, they were traditionally answered with speculative and biased hypotheses or intuitions often rooted in culture or philosophy (Fig. 1.1).

    Figure 1.1 Gottfried Wilhelm Leibniz.

    Until just very recently, standard computer technology could only evaluate a small number of predictive features and observations. But, we are now in an era of big data and high performance massively parallel computing. So our predictive models should now become much more powerful. This is because it would seem reasonable that those traditional methods that worked to select important predictive features from small data will scale to high-dimension data and suddenly select predictive models that are much more accurate and insightful. This would give us a new and much more powerful big data machine intelligence technology that is everything that Leibniz imagined in a Calculus Ratiocinator. Big data massively parallel technology should thus theoretically allow completely new data-driven cognitive machines to predict and explain capricious outcomes in science, medicine, business, and government.

    Unfortunately, it is not this simple. This is because observation samples are still fairly small in most of today's predictive analytic applications. One reason is that most real-world data are not representative samples of the population to which one wishes to generalize. For example, the people who visit Facebook or search on Google might not be a good representative sample of many populations, so smaller representative samples will need to be taken if the analytics are to generalize very well. Another problem is that many real-world data are not independent observations and instead are often repeated observations from the same individuals. For this reason, data also need to be down sampled significantly to be independent observations. Still, another problem is that even when there are many millions of independent representative observations, there are usually a much smaller number of individuals who do things like respond to a particular type of cancer drug or commit fraud or respond to an advertising promotion in the recent past. The informative sample for a predictive model is the group of targeted individuals and a group of similar size that did not show such a response, but these are not usually big data samples in terms of large numbers of observations. So, the biggest limitation of big data in the sense of a large number of observations is that most real-world data are not big and instead have limited numbers of observations. This is especially true because most predictive models are not built from Facebook or Google data.

    Still, most real-world data are big in another sense. This is in the sense of being very high dimensional given that interactions between variables and nonlinear effects are also predictive features. Previously we have not had the technology to evaluate high dimensions of potentially predictive variables rapidly enough to be useful. The slower processing that was the reason for this curse of dimensionality is now behind us. So many might believe that this suddenly allows the evaluation of almost unfathomably high dimensions of data for the selection of important features in much more accurate and smarter big data predictive models simply by applying traditional widely used methods.

    Unfortunately, the traditional widely used methods often do not give unbiased or non-arbitrary predictions and explanations, and this problem will become ever more apparent with today's high-dimension data.

    1 A Fundamental Problem with the Widely Used Methods

    There is one glaring problem with today's widely used predictive analytic methods that stands in the way of our new data-driven science. This problem is inconsistent with Leibniz's idea of an automated machine that can reproduce the very computations of human cognition, but without the subjective biases of humans. This problem is suggested by the fact that there are probably at least hundreds of predictive analytic methods that are in use today. Each method makes differing assumptions that would not be agreed upon by all, and all have at least one and sometimes many arbitrary parameters. This arbitrary diversity is defended by those who believe a no free lunch theorem that argues that there is no one best method across all situations.¹⁰,¹¹ Yet, when predictive modelers test various arbitrary algorithms based upon these methods to get a best model for a specific situation, they obviously will only test but a tiny subset of the possibilities. So unless there is an obvious very simple best model, different modelers will almost always produce substantially different arbitrary models with the same data.

    As examples of this problem of arbitrary methods, there are different types of decision tree methods like CHAID and CART which have different statistical tests to determine branching. Even with the very same method, different user-provided parameters for splitting the branches of the tree will often give quite different decision trees that will generate very different predictions and explanations. Likewise, there are many widely used regression variable selection methods like stepwise and LASSO logistic regression that are all different in the arbitrary assumptions and parameters employed in how one selects important explanatory variables. Even with the very same regression method, different user choices in these parameters will almost always generate widely differing explanations and often substantially differing predictions. There are other methods like Principal Component Analysis (PCA), Variable Clustering and Factor Analysis that attempt to avoid the variable selection problem by greatly reducing the dimensionality of the variables. These methods work well when the data match underlying assumptions, but most behavioral data will not be easily modeled with the assumptions in these methods like orthogonal components in the case of PCA or that one knows how to rotate the components to be nonorthogonal using the other methods given that there are an infinite number of possible rotations. Likewise, there are many other methods like Bayesian Networks, Partial Least Squares, and Structural Equation Modeling that modelers often use to make explanatory inferences. These methods each make differing arbitrary assumptions that often generate wide diversity in explanations and predictions. Likewise, there are a large number of fairly black box methods like Support Vector Machines, Artificial Neural Networks, Random Forests, Stochastic Gradient Boosting, and various Genetic Algorithms that are not completely transparent in their explanations of how the predictions are formed, although some measure of variable importance often can be obtained. These methods can generate quite different predictions and important variables simply because of differing assumptions across the methods or differing user-defined modeling parameters within the methods.

    Because there are so many methods and because all require unsubstantiated modeling assumptions along with arbitrary user-defined parameters, if you gave exactly the same data to a 100 different predictive modelers, you would likely get a 100 completely different models unless it was a simple solution. These differing models often would make very different predictions and almost always generate different explanations to the extent that the method produces transparent models that could be interpreted. In cases where regression methods are used and raw interaction or nonlinear effects are parsimoniously selected without accompanying main effects, the model's predictions are even likely to depend on how variables are scaled so that currency in Dollars versus Euros would give different predictions.¹² Because of such variability that even can defy basic principles of logic, it is unreasonable to interpret any of these arbitrary models as reflecting a causal and/or most probable explanation or prediction.

    Because the widely used methods yield arbitrary and even illogical models in many cases, hardly can we say Let us calculate to answer important questions such as the most likely contribution of environmental versus genetic versus other biological factors in causing Parkinson's disease, Alzheimer's disease, prostate cancer, breast cancer and so on. Hardly, can we say Let us calculate, when we wish to provide a most likely explanation for why there is climate change or why certain genetic and environmental markers correlate to diseases or why our business is suddenly losing customers or how we may decrease costs and yet improve quality in health care. Hardly, can we say Let us calculate, when we wish to know the extent to which sexual orientation and other average gender differences are determined by biological factors or by social factors, when we wish to know whether stricter guns control policies would have a positive or negative impact on crime and murder rates, or when we wish to know whether austerity as an economic intervention tool is helpful or hurtful. Because our widely used predictive analytic methods are so influenced by completely subjective human choices, predictive model explanations and predictions about human diseases, climate change, and business and social outcomes will have substantial variability simply due to our cognitive biases and/or our arbitrary modeling methods. The most important questions of our day relate to various economic, social, medical, and environmental outcomes related to human behavior by cause or effect, but our widely used predictive analytic methods cannot answer these questions reliably.

    Even when the very same method is used to select variables, the important variables that the model selects as the basis of explanation are likely to vary across independent observation samples. This sampling variability will be especially prevalent if the observations available to train the model are limited or if there are many possible features that are candidates for explanatory variables, and if there is also more than a modest correlation between at least some of the candidate explanatory variables. This problem of correlation between variables or multicollinearity is ultimately the real culprit. This multicollinearity problem is almost always seen with human behavior outcomes. Unlike many physical phenomena, behavioral outcomes usually cannot be understood in terms of easy to separate uncorrelated causal components. Models based upon randomized controlled experimental selection methods can avoid this multicollinearity problem through designs that yield variables that are orthogonal.¹³ Yet, most of today's predictive analytic applications necessarily must deal with observation data, as randomized experiments are simply not possible usually with human behavior in real-world

    Enjoying the preview?
    Page 1 of 1