One of the main assumptions of linear regression is, ahem, linearity. Here is an example drawn from dendroclimatology, the reconstruction of past climates using tree rings, of the trouble one can get into by blindly assuming linearity. This subject was dealt with some time ago at ClimateAudit Upside-Down Quadratic Proxy Response.
From the Summary of chapter 9 of my book, niche-modeling-chap-9
These results demonstrate that procedures with linear assumptions are unreliable when applied to the non-linear responses of niche models. Reliability of reconstruction of past climates depends, at minimum, on the correct specification of a model of response that holds over the whole range of the proxy, not just the calibration period. Use of a linear model of non-linear response can cause apparent growth decline with higher temperatures, signal degradation with latitudinal variation, temporal shifts in peaks, period doubling, and depressed long time-scale amplitude.
Niche Modeling: Predictions from Statistical Distributions. Chapman & Hall/CRC, Boca Raton, FL., 2007.
I notice Craig Loehle converged on similar results in post about a publication on the Divergence Problem. In the abstract Craig finds a similar quantitative depression in the range of signal recovered.
If trees show a nonlinear growth response, the result is to potentially truncate any historical temperatures higher than those in the calibration period, as well as to reduce the mean and range of reconstructed values compared to actual.
By far the most interesting result I find is the introduction of ‘doubling’ from assuming a linear response of a non-linear variable. This is illustrated by Craig’s figure here:
Because over the course of one climate cycle, the tree passes through two optimal growth periods, the tree is, in electrical terms, a frequency doubler. This would create enormous difficulties in trying to detect major features such as Medieval Warm Periods and Little Ice Ages from such a responder.
But the problems do not end there. According to the latitudinal (or attitudinal) location of the tree, relative to its optimal growth zone, the location of the doubled peaks is shifted temporally. This shifting of the peaks is illustrated in the figure below, taken from my chapter.
If one then imposes two non-linear responses, such as temperature and rainfall, the response becomes even more choppy, as shown in another graphic from the chapter.
The recovery of a climate signal in the face of nonlinearity of response is fraught with difficulties. When the fundamental growth response of a trees, and all living things actually, is known to be a non-linear niche-like response, there is more onus on modelers to prove their methods are adequate.
While not unaware of the problem, most often in climate (and ecological) science, risky statistical prediction methods are used with inadequate validation, or like the drought modeling efforts by CSIRO here, results from GCMs are used with no attempt to demonstrate they are ‘fit-for-purpose’ at all. Rob Wilson argues at ClimatAudit that while linear modelling of tree-growth relationships is not ideal, the field is ripe for some fancy non-linear modelling. Given the range of exotic features introduced by non-linearities, as I showed above, I would argue that fancy non-linear modeling would probably lead more surely to self deception, and a better path is robust validation.