Statistics and Artificial Intelligence

As a new faculty member, when I arrived on the University of Notre Dame campus in 1975, I noticed that no one was using the IBM 360 computer in The Computer Center during the football games. Since this one computer was the only one that most faculty could use, I was eager to spend the time during the football games creating my punch cards and running them through the computer. I told my wife how eager I was to use the computer during the football games, but she said adamantly “No, we are going to the football games.” Oh, well. At least I was able to run my programs before and after, but not during, the football games.

In 1975 that one computer in The Computer Center had less computing power than the Apple watch on your wrist. But back then it was all we had, so we had to make the best of it. Randomized trials were created and developed to deal with situations, especially in medicine, where the sample sizes were relatively small. Regression analysis could serve as a substitute or a complement to randomized trials with larger samples. Artificial intelligence requires much, much larger samples and many simulations to be effective.

The scientific method recognizes that in any given sample, there are two types of relationships. The relationships we want to determine are the general population relationships that appear consistently in sample after sample. But in any given sample there are relationships that are unique to that particular sample and cannot be expected to appear in other samples and are certainly not the population relationships we are after.

To avoid overfitting to any particular sample, the scientific method requires that we fully specify the procedure and functional forms of any statistical method we plan to use. By prespecifying the functional form we attempt to avoid the problem of overfitting to the sample and picking up the relationships that are unique to that particular sample and do not represent populations relationships. Moreover, if you adjust the functional form to try to get a better fit, you are using the sample data to change your model and that undermines our ability to track the statistical distributions. We cannot track the statistical distributional effects of adjustments made through your head. Adjusting the functional form means not getting valid t-statistics or F-statistics that you need for determining the statistical significance of your results. Consequently, if you adjust the functional form of your model in any way to get a better fit to that particular sample, you lose track of the statistical distribution and are likely to overfit to that particular sample and not discover the true population relationships you want. Many people have lost their shirt in the stock market by overfitting to the sample data and getting great, but invalid, numbers labeled t-statistics, F-statistics and R-squared values but not getting at the true population relationships they are after. (Note: So called bootstrap methods use simulations to try to at least determine a good estimate of the variance of the distribution after functional form manipulation.)

The methods used for artificial intelligence intentionally violate the scientific method. They intentionally overfit to the data. They get away with this only by using simulations and a huge volume of data. They don’t just acquire one sample, but to as great an extent as possible, they attempt to acquire an extremely large number of samples to discover the population relationships that show up in sample after sample.

I don’t know the details of the Transformer model for A.I., but I know what strategy I would pursue. Find the word that most frequently starts a sentence about the subject of interest such as Alzheimer’s disease. Find the word that most frequently follows that word. Note the correlation. Then find the word that most frequently follows those two words. Here is where is gets interesting. You need to use the two-way correlations and the three-way correlations. As a fourth word is introduced in the same matter, you will need all two-way, three-way, and the four-way correlations. Basically you use these correlations to produce your first sentence in your summary essay. This procedure can be followed with the word that most frequently starts the second most frequent sentence that starts with that word. My paper on “Composite Dummy Variables” provides the basic ideas for understanding basic interactions effects such as used in ChatGPT with all available interaction effects as the basis for generating sentences that summarize the large literature on a topic such as Alzheimer’s disease. Follow this link to my paper in ResearchGate:

Why Doctors and Nurses Should Not Ignore Anecdotal Evidence

Randomized trials may not be definitive even when they show clear and indisputable evidence of both a statistically significant and practical difference between the average in the treatment group and the average in the control group. Statistically significant randomized trial results are not always useful to a doctor or nurse when there is substantial practical variation within the treatment group. That is why a statistically significant randomized trial is not definitive proof of causation. Think of a situation where the randomized trial ignores gender. What if a treatment works really well for men but actually does a little bit of harm to women with no benefit. The “average person” may do fairly well with the treatment and even show statistically significant improvement. But the “average person” does not exist. There are only men and women in the trial. The “average person” represents no one. The solution is to use control variables (such as gender) in a randomized trial in order to zero in on the subsets within the treatment group in order to sort out who will really benefit from the treatment and who will not benefit. This means using regression analysis (or analysis of covariance) on data from randomized trials. Hopefully, the new generation of doctors and nurses understand this and are not led astray by “highly significant” results from some randomized trials. This opens the door to the use of anecdotal evidence as useful in alerting a doctor or nurse that the results of a particular randomized trial may not be definitive for some types of patients.