Scale Invariance of the Aggregate

For computational, statistical or display reasons, daily data are often aggregated to a coarser time scale. This is done by splitting the sequence into subsets along a coarser index grid, and calculating a summary statistics such as the mean value at each segment.

Missing values cause problems when calculating the mean. In the R default, the presence of a single NA returns an NA for most arithmetic operations. There is an option to calculate the mean after omitting the NAs. In the first case, the calculated means are valid but data is lost when converted to NAs. In the second, no data is lost but the means deviate wildly when the data come from strongly cyclical series such as temperature.


Figure 1 shows the reduction in monthly aggregate data when na.rm=T.


Figure 2 shows the reduction when na.rm=T, with almost total loss on annual aggregation. While data is not lost with the option na.rm=F, the outliers at the start and end of the Rutherglen minimum data series illustrates its unexpected biasing effect.

The figures illustrate that a heterogeneous sequence is not ‘invariant’ with respect to aggregation using a mean. The only way to ensure invariance, which confers a degree of reliability under aggregation, is if the missing data are randomly distributed within each section of the course index.

Most studies define rules about the number of allowable missing values, but either these are not clearly stated, or use rules that o not guarantee invariance, such as a set number of missing values (eg. CAWCR).

Because of the invariance of heterogeneous data under aggregation, it is best to analyze data at their original resolution.

Q: Where Do Climate Models Fail? A: Almost Everywhere

“How much do I fail thee. Let me count the ways”

Ben Santer’s latest model/observation comparison paper demonstrates that climate realists were right and climate models exaggerate warming:

The multimodel average tropospheric temperature trends are outside the 5–95 percentile range of RSS results at most latitudes.

Where do the models fail?

1. Significantly warmer than reality (95% CI) in the lower troposphere at all latitudes, except for the arctic.

2. Significantly warmer than reality (95% CI) in the mid-troposphere at all latitudes, except for the possible polar regions.

3. Significant warmer that reality (95% CI) in the lower stratosphere at all latitudes, except possibly polar regions.

Answer: Everywhere except for polar regions where uncertainty is greater.

East Pacific Region Temperatures: Climate Models Fail Again

Bob Tisdale, author of the awesome book “Who Turned on the Heat?” presented an interesting problem that turns out to be a good application of robust statistical tests called empirical fluctuation processes.

Bob notes that sea surface temperature (SST) in a large region of the globe in the Eastern Pacific does not appear to have warmed at all in the last 30 years, in contrast to model simulations (CMIP SST) for that region that show strong warming. The region in question is shown below.

The question is, what is the statistical significance of the difference between model simulations and the observations? The graph comparing the models with observations from Bob’s book shows two CMIP model projections strongly increasing at 0.15C per decade for the region (brown and green) and the observations increasing at 0.006C per decade (magenta).

However, there is a lot of variability in the observations, so the natural question is whether the difference is statistically significant? A simple-minded approach would be to compare the temperature change between 1980 and 2012 relative to the standard deviation, but this would be a very low power test, and only used by someone who wanted to obfuscate the obvious failure of climate models in this region.

Empirical fluctuation processes are a natural way to examine such questions in a powerful and generalized way, as we can ask of a strongly autocorrelated series — Has there been a change in level? — without requiring the increase to be a linear trend.

To illustrate the difference, if we assume a linear regression model, as is the usual practice: Y = mt +c the statistical test for a trend is whether the trend coefficient m is greater than zero.

H0: m=0 Ha: m>0

If we test for a change in level, the EFP statistical test is whether m is constant for all of time t:

H0: mi = m0 for i over all time t.

For answering questions similar to tests of trends in linear regression, the EFP path determines if and when a simple constant model Y=m+c deviates from the data. In R this is represented as the model Y~1. If we were to use a full model Y~t then this would test whether the trend of Y is constant, not whether the level of Y is constant. This is clearer if you have run linear models in R.

Moving on to the analysis, below are the three data series given to me by Bob, and available with the R code here.

The figure below shows the series in question on the x axis, the EFP path is the black line, and 95% significance levels for the EFP path are in red.

It can be seen clearly that while the EFP path for the SST observations series shows a little unusual behavior, with a significant change in level in 1998 and again in 2005, the level is currently is not significantly above the level in 1985.

The EFP path for the CMIP3 model (CMIP5 is similar), however, exceeds the 95% significant level in 1990 and continues to increase, clearly indicating a structural increase in level in the model that has continued to intensify.

Furthermore, we can ask whether there is a change in level between the CMIP models and the SST observations. The figure below shows the EFP path for the differences CMIP3-SST and CMIP5-SST. After some deviation from zero at about 1990, around 2000 the difference becomes very significant at the 5% level, and continues to increase. Thus the EFP test shows a very significant and widening disagreement between the temperature simulation of the CMIP over the observational SST series in the Eastern Pacific region after the year 2000.

While the average of multiple model simulations show a significant change in level over the period, in the parlance of climate science, there is not yet a detectable change in level in the observations.

One could say I am comparing apples and oranges, as the models are average behavior while the SST observations are a single realization. But, the fact remains only the simulations of models show warming, because there is no support for warming of the region from the observations. This is consistent with the previous post on Santer’s paper showing failure of models to match the observations over most latitudinal bands.

Not cointegrated, so global warming is not anthropogenic – Beenstock

Cointegration has been mentioned previously and is one of the highest ranking search terms on landshape.

We have also discussed the cointegration manuscript from 2009 by Beenstock and Reingewertz, and I see he has picked up another author and submitted it to an open access journal here.

Here is the abstract.

Polynomial cointegration tests of anthropogenic impact on global warming M. Beenstock, Y. Reingewertz, and N. Paldor

Abstract. We use statistical methods for nonstationary time series to test the anthropogenic interpretation of global warming (AGW), according to which an increase in atmospheric greenhouse gas concentrations raised global temperature in the 20th century. Specifically, the methodology of polynomial cointegration is used to test AGW since during the observation period (1880–2007) global temperature and solar irradiance are stationary in 1st differences whereas greenhouse gases and aerosol forcings are stationary in 2nd differences. We show that although these anthropogenic forcings share a common stochastic trend, this trend is empirically independent of the stochastic trend in temperature and solar irradiance. Therefore, greenhouse gas forcing, aerosols, solar irradiance and global temperature are not polynomially cointegrated. This implies that recent global warming is not statistically significantly related to anthropogenic forcing. On the other hand, we find that greenhouse gas forcing might have had a temporary effect on global temperature.

The bottom line:

Once the I(2) status of anthopogenic forcings is taken into consideration, there is no significant effect of anthropogenic forcing on global temperature.

They do, however, find a possible effect of the CO2 first difference:

The ADF and PP test statistics suggest that there is a causal effect of the change in CO2 forcing on global temperature.

They suggest “… there is no physical theory for this modified theory of AGW”, although I would think the obvious one would be that the surface temperature adjusts over time to higher CO2 forcing, such as through intensified heat loss by convection, so returning to an equilibrium. However, when revised solar data is used the relationship disappears, so the point is probably moot.

When we use these revised data, Eqs. (11) and (12) remain polynomially uncointegrated. However, Eq. (15) ceases to be cointegrated.


For physical reasons it might be expected that over the millennia these variables should share the same order of integration; they should all be I(1) or all I(2), otherwise there would be persistent energy imbalance. However, during 150 yr there is no physical reason why these variables should share the same order of integration. However, the fact that they do not share the same order of integration over this period means that scientists who make strong interpretations about the anthropogenic causes of recent global warming should be cautious. Our polynomial cointegration tests challenge their interpretation of the data.

Should the ABS take over the BoM?

I read an interesting article article about Peter Martin, head of the Australian Bureau of Statistics.

He has a refreshing, mature attitude to his job.

‘I want people to challenge our data – that’s a good thing, it helps us pick things up,’ he says.

Big contrast to the attitude of Climate Scientists. Examples that they believe they cannot be challenged are legion, from meetings to peer review. For example, emails expressing disagreement with the science are treated as threatening, as shown by the text of eleven emails released under ‘roo shooter’ FOI by the Climate Institute at Australian National University.

Australia’s Chief statistician is also egalitarian. In response to a complaint by the interviewer about employment figures, he responds:

He says he doesn’t believe there is a problem, but gives every indication he’ll put my concerns to his staff, giving them as much weight as if they came from the Treasurer.

This is a far cry from the stated policy of the CSIRO/BoM (Bureau of Meteorology) to only respond to peer-reviewed publications. Even when one does publish statistical audits identifying problems with datasets, as I have done, you are likely to get a curt review stating that “this paper should be thrown out because its only purpose is criticism”. It takes a certain type of editor to proceed with publication under those circumstances.

When the Federal Government changes this time, as appears inevitable, one initiative they might consider is a greater role for the ABS in overseeing the BoM responsibilities. Although the BoM is tasked with the collection of weather and water data by Acts of Parliament, it would benefit from an audit and ongoing supervision by the ABS, IMHO.

Two New Numerate Blogs

A couple of new entries in the links section:

Sabermetric Research does it own sports research and reviews statistical studies of sports research. I added this after reading one of my Chrissy gifts – Moneyball: The Art of Winning an Unfair Game – by Michael Lewis, now a movie starring Brad Pitt, a David vs Goliath story of stats over precedent.

Status Iatrogenicus by Scott K. Aberegg, M.D., and ER physician in Salt Lake City who also has a Medical Evidence Blog I follow. This blog is about how lack of common sense leads to common nonsense in medical practice, and aims a critical eye at various aspects of medical practice that just plain don’t make sense.