The previous post “Random Numbers Predict Future Temperatures” used random numbers for prediction of climate. Random numbers may also be predicted. This is a major difference between models and natural phenomena. Random numbers generated by computer can always be predicted exactly given knowledge of the code, and so have a deterministic generating mechanism, or model.

The above series of numbers appears to be a temperature trend such as 20th century warming, composed of random fluctuations with a slight upward trend. One might find on fitting a linear regression the coefficients are significant, and then use the model to predict the trend in upward temperatures to continue.

It is actually possible to predict all subsequent numbers in the series exactly with the following deterministic code for generating pseudo-random numbers in R.

dkrand< -function(n) {
q[1] <-1426594706

for (i in x) {
q[i+1]<- (k * q[i]) %% m
x [i]<- q[i+1]/ m

This examples shows you should be very concerned if you find a paper in which the authors use a model building technique such as regression and then use it for prediction, treating the resulting models and coefficients as though the model were true.

The confusion is very well described in the essay by Gerard Dallal “The Most Important Lesson You’ll Ever Learn About Multiple Linear Regression Analysis“.
It is well-stated by Chris Chatfield 1995.

It is “well known” to be “logically unsound and practically misleading” to make inference as if a model is known to be true when it has, in fact, been selected from the same data to be used for estimation purposes.


Chris Chatfield in “Model Uncertainty, Data Mining and Statistical Inference”, Journal of the Royal Statistical Society, Series A, 158 (1995), 419-486 (p 421).

About these ads