The coefficient of determination, or r2, is the ratio of explained variation to total variation of two variables X and Y. The coefficient ranges from âˆ’1 to 1, where a value of 1 is an exact, positive linear relationship, with all data points lying on the same line. A value of 0 shows no linear relationship between the variables. If r2 is high, most people assume X and Y are related. Possibly, if the errors are “normal” or independent and identically distributed. But r2 may also be high and X not related to Y when natural series have “spurious significance”.
It is hard to do better than this set of articles on high correlation statistics from Steve McIntyre at ClimateAudit to explain “spurious significance”.
“Spurious significance” is a term in statistics used to describe a situation when a statistic returns a value which is “statistically significant”, when it is impossible that there is any significance. Itâ€™s a topic, which sounds easy, but quickly gets difficult.
This is the second of a planned series of notes on spurious significance, to give a sense of the statistical background. Granger and Newbold  posted up here is an extremely famous article, which starts off the modern discussion of the problem of spurious regression. Granger is a recent Nobel laureate in economics.
Granger and Newbold  provided examples of spurious significance in a random walk context. This has been extended by various authors to a number of other persistent processes.
One of the reasons for discussing Granger-Newbold and Phillips is to show their approach to “spurious” regression statistics (here t-statistics and F-statistics) and why other statistics need to be considered to ensure that there is no mis-specification in the model.
What is the standard deviation (variance) of an autocorrelated series? Sounds like an easy question, but it isnâ€™t. This issue turns out to affect the spurious regression problem.