Detecting ‘massaging’ of data by human hands is an area of statistical analysis I have been working on for some time, and devoted one chapter of my book, Niche Modeling, to its application to environmental data sets.

The WikiChecks web site now incorporates a script for doing a Benford’s analysis of digit frequency, sometimes used in numerical analysis of tax and other financial data.

I have posted some initial tests on the site: random numbers and the like. I also ran each of the major monthly global temperature indices through the site: GISS, RSS, UAH and CRU. The results, listed from lowest deviation to highest are listed below.

RSS – Pr<1

UAH – Pr<1 based on global data series Pr<0.001 for whole file (see note)

GISS – Pr<0.05

CRU – Pr<0.01

Numbers such as missing values in the UAH data (-99.990) may have caused its high deviation. I don't know about the others.

**Table of results for GISS monthly global temperature data.**

Frequency of each final digit: observed vs. expected

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | Totals | |

Observed | 297 | 300 | 267 | 268 | 243 | 262 | 255 | 227 | 253 | 235 | 2607 |

Expected | 260 | 260 | 260 | 260 | 260 | 260 | 260 | 260 | 260 | 260 | 2607 |

Variance | 4.92 | 5.77 | 0.13 | 0.18 | 1.13 | 0.00 | 0.10 | 4.23 | 0.20 | 2.44 | 19.10 |

Significant | * | * | * |

Statistic | DF | Obtained | Prob | Critical |
---|---|---|---|---|

Chi Square | 9 | 19.10 | <0.05 | 16.92 |

RESULT: Significant management detected.

Significant variation in digit 0: (Pr<0.05) indicates rounding up or down.

Significant variation in digit 1: (Pr<0.05) indicates management.

Significant variation in digit 7: (Pr<0.05) indicates management.

One of the main sources of global warming information, the GISS data set from NASA showed significant management, particularly a deficiency of zeros and ones. Interestingly the moving window mode of the algorithm identified two years, 1940 and 1968 (see here).

Considerable controversy has surrounded the 1940 period, related to possible adjustments for bucket sampling of water temperatures. I am not aware of controversy surrounded 1968 temperature measurements, although 1968 is was a year marked by violent protests, the assassination of Martin Luther King Jr. and Senator Robert Kennedy.

At this stage I am in exploratory mode. The chi-square test is prone to produce false positives for small samples. Also, there are a number of innocent reasons that digit frequency may diverge from expected. However, the tests are very sensitive. Even if arithmetic operations are performed on data after the manipulations, the ‘fingerprint’ of human intervention can remain.

**Update:**

Thanks to LuboÅ¡ Motl who checked this data, UAH was confirmed to be manipulation free.

Dear David, a great idea. I’ve reproduced your qualitative results but obtained an even stronger signal i.e. lower probability that this non-uniformity appeared by chance, namely 0.4 percent or so for GISS. Click my name to see the details, including a Mathematica notebook.

Dear David, a great idea. I’ve reproduced your qualitative results but obtained an even stronger signal i.e. lower probability that this non-uniformity appeared by chance, namely 0.4 percent or so for GISS. Click my name to see the details, including a Mathematica notebook.

Dear Luboš,

Thanks for checking these results for GISS and UAH. The reason for the high UAH result, is that the whole file contains data such as number of days in month that will give a signal for non-uniform digit frequency. When I run on just the global data series, (after extraction into excel) I get the same result as you that clears UAH of manipulation.

This illustrates another way to get false positives. Rounding errors are another.

GISS and CRU divergences are significant, and the files don’t have these data inhomogeneities. The localization of the GISS divergences at 1940, the year where the controversy over the ‘warmest year of the century’ erupted, is intriguing though.

The test I use is the standard Chi-square with DF=9 and Yates correction for small samples. I am looking for a more reliable test for small samples.

Dear LuboÅ¡,

Thanks for checking these results for GISS and UAH. The reason for the high UAH result, is that the whole file contains data such as number of days in month that will give a signal for non-uniform digit frequency. When I run on just the global data series, (after extraction into excel) I get the same result as you that clears UAH of manipulation.

This illustrates another way to get false positives. Rounding errors are another.

GISS and CRU divergences are significant, and the files don’t have these data inhomogeneities. The localization of the GISS divergences at 1940, the year where the controversy over the ‘warmest year of the century’ erupted, is intriguing though.

The test I use is the standard Chi-square with DF=9 and Yates correction for small samples. I am looking for a more reliable test for small samples.

David,

As an exercise, I put the numbers from your table of results above through the check. The result ; Pr<0.001 – “Extremely significant management detected”!

What have you been up to?

David,

As an exercise, I put the numbers from your table of results above through the check. The result ; Pr<0.001 – “Extremely significant management detected”!

What have you been up to?

Nick, Oh right, very funny. You have all the repeated 0 digits from the expected value of 260.

Nick, Oh right, very funny. You have all the repeated 0 digits from the expected value of 260.

You should do this on all the lotto draws from last year. You may find THEY have been massaged too.

You should do this on all the lotto draws from last year. You may find THEY have been massaged too.

I like the disclaimer at the bottom of the web page:

“Disclaimer: Statistical forensic methods are prone to false positives. Findings must be verified independently. No responsibility is taken for misuse of the tools on this website. ”

tee hee…

I like the disclaimer at the bottom of the web page:

“Disclaimer: Statistical forensic methods are prone to false positives. Findings must be verified independently. No responsibility is taken for misuse of the tools on this website. ”

tee hee…

Would it be possible to run this on raw station data too? I’m guessing the intervention (if there is any) could be there rather than at GISS.

Would it be possible to run this on raw station data too? I’m guessing the intervention (if there is any) could be there rather than at GISS.

Bishop, I would work backwards from the final data towards the station data, analysing intermediate data sets and trying to show how they change from stage to stage.

Bishop, I would work backwards from the final data towards the station data, analysing intermediate data sets and trying to show how they change from stage to stage.

“You should do this on all the lotto draws from last year. You may find THEY have been massaged too.”

I’m sure they have been :- I never win!!!

All very interesting.

“You should do this on all the lotto draws from last year. You may find THEY have been massaged too.”

I’m sure they have been :- I never win!!!

All very interesting.

Ang on a sec. Benford’s law is about the distribution of initial digits which follow a power law distribution. The test above is chi-squared against an assumed uniform distribution. Not the same animal at all.

Ang on a sec. Benford’s law is about the distribution of initial digits which follow a power law distribution. The test above is chi-squared against an assumed uniform distribution. Not the same animal at all.

Can you make your script available? I don’t understand how the Pr values were derived.

Can you make your script available? I don’t understand how the Pr values were derived.

#Rich, Benfords Law tends to uniform in the subsequent digits. Anyway it doesn’t apply to measurement data, that can have a constant initial digit.

#John, http://landshape.org/check/check.txt

#Rich, Benfords Law tends to uniform in the subsequent digits. Anyway it doesn’t apply to measurement data, that can have a constant initial digit.

#John, http://landshape.org/check/check.txt

David Stockwell,

You state yourself that “there are a number of innocent reasons that digit frequency may diverge from expected”. Why, then, do you consider it proper to refer to “fraud detection” in your title and “cheating” in your subtitle? It seems to me that you are improperly insinuating such motives on the part of GISS, without respecting the responsibility of actually presenting any proof to support your implications. Evidence of divergence is not proof of fraud or cheating.

David Stockwell,

You state yourself that “there are a number of innocent reasons that digit frequency may diverge from expected”. Why, then, do you consider it proper to refer to “fraud detection” in your title and “cheating” in your subtitle? It seems to me that you are improperly insinuating such motives on the part of GISS, without respecting the responsibility of actually presenting any proof to support your implications. Evidence of divergence is not proof of fraud or cheating.

Steven: “Evidence of divergence is not proof of fraud or cheating.” That is right and I have said that. No insinuation in the title, detecting cheating is what the web site and these methods are mainly for.

Here is an analogy. You go to your doctor and he suggests a blood test for a condition, prostate cancer say. He tells you the test is not definitive, but if it comes back negative you are clear. The test comes back positive, and he suggests more extensive testing.

That is what is happening here. Do you accuse the doctor of insinuating you have prostate cancer? If it turns out you don’t have prostate cancer do you tell he’s wearing a tin hat for reporting that the first test came back positive?

Thanks for your comment anyway.

Steven: “Evidence of divergence is not proof of fraud or cheating.” That is right and I have said that. No insinuation in the title, detecting cheating is what the web site and these methods are mainly for.

Here is an analogy. You go to your doctor and he suggests a blood test for a condition, prostate cancer say. He tells you the test is not definitive, but if it comes back negative you are clear. The test comes back positive, and he suggests more extensive testing.

That is what is happening here. Do you accuse the doctor of insinuating you have prostate cancer? If it turns out you don’t have prostate cancer do you tell he’s wearing a tin hat for reporting that the first test came back positive?

Thanks for your comment anyway.

#16 davids

That was rather my point. If what you’re doing is testing against a uniform distribution (and the table of results shows that you are) why mention Benford’s Law at all? True, the distribution of the Nth digit tends to uniform as N increases but so what?

No, it doesn’t matter but it confused me at the outset.

#16 davids

That was rather my point. If what you’re doing is testing against a uniform distribution (and the table of results shows that you are) why mention Benford’s Law at all? True, the distribution of the Nth digit tends to uniform as N increases but so what?

No, it doesn’t matter but it confused me at the outset.

Rich: “why mention Benford’s Law at all?” Well on the plus side it puts it into context so people can research it. First digit has been used a lot, see Nigrini, second digit I have used in my book on measurement data. This is the first time I have seen last digits used. It’s not a law really.

Rich: “why mention Benfordâ€™s Law at all?” Well on the plus side it puts it into context so people can research it. First digit has been used a lot, see Nigrini, second digit I have used in my book on measurement data. This is the first time I have seen last digits used. It’s not a law really.

In my laboratory times in the 70s, instruments often had meter readouts that the operator had to approximate while they moved gently to and fro. Unknown to the operators, we developed signatures for each of them based on the last digit. It was not infallible, but when you asked “Why did you do XYZ’s readings yesterday?” the body language and sometimes the admission often suggested you were right. So I explained to the staff what we were doing and the practice soon died out. It was sometimes complicated by log scales on the meters.

In my laboratory times in the 70s, instruments often had meter readouts that the operator had to approximate while they moved gently to and fro. Unknown to the operators, we developed signatures for each of them based on the last digit. It was not infallible, but when you asked “Why did you do XYZ’s readings yesterday?” the body language and sometimes the admission often suggested you were right. So I explained to the staff what we were doing and the practice soon died out. It was sometimes complicated by log scales on the meters.