The next step in the statistical forensics process is to breakdown the data in ways that reveal where the anomolous divergences are coming from. Here I am indulging in classical scientific reduction methodology by examining overall phenomena in terms of the sum of its parts.
The previous post in the series identified significant divergence in the distribution of the last digits of two global temperatrue data sets, from GISS (Pr<0.05) and CRU (Pr<0.01). Two other data sets based on satellite data were cleared of non-randomness, from RSS and UAH.
LuboÅ¡ Motl confirmed my results on GISS and CRU. Steve McIntyre initally disagreed, but then found an order of magnitude mistake in his calculations which he reported in a comment here. So there can be no doubt that the anomaly in distribution of digits in these datasets is real. This can be caused by many factors, only one of which is ‘manipulation’ of the data. How do we find the cause?
The graphs below show changes in chi-sq values (red) over the time scale of the GISS and CRU temperature series (blue) from 1880 to the present. I show them now to indicate where I am going. Sorry they are very basic but I am developing the code in php from scratch, so it can be used on the WikiChecks website. I used a 100 data point window, and plotted the significance of the dirvergence from a uniform over time (red).
The regions where the distribution of digits diverges is shown clearly, and will be the basis for more detailed examination.
GISS temperature and digit divergence.
CRU temperature and digit divergence.