The Intellectual Property card was played today so I cannot verify the statistical significance (or otherwise) of the Drought Exceptional Circumstances Report. But I found out enough about the statistical tests (performed after publication of the report due to my promptings) to determine that autocorrelation in the temperature series was probably not taken into account.
If this is the case, then it is highly likely the confidence intervals were grossly underestimated and so it is also likely that only one or two regions (SWWA) show statistically significant increase in predicted droughts, not 3 or 4 as claimed by the authors. I am more confident in my original assessment that the results show no significant increase in drought due to greenhouse warming in almost all regions of Australia
As an aside, it is normal practice, and in fact a requirement of publication in major scientific journals, that scientists document a good faith attempt to resolve points of contention prior to submitting a comment or request for correction. I think this is a good policy, as it avoids filling the literature with pointless disputes that could have been resolved between the disputants. Often the erring party issues a correction or withdraws the paper altogether (as Xian-Jin Li recently did with a faulty proof of the Riemann Hypothesis).
Drought Exceptional Circumstances funding is massive, so getting it right is very important. The client should be confident the interpretation of the results are free of researcher bias. Payments to farmers involve billions of tax-payer dollars and government has a duty of diligence to ensure policy is based on statistically sound information. Then there is the reputation of CSIRO (Australia’s NASA equivalent) and the public interest at stake.
To date I am very happy with Kevin’s promptness, and understand he may be constrained by CSIRO IP policy. If anyone has any specific information on this please let me know. However, my suspicions were raised when the report only quoted increases in mean values, and did not disclose whether these results were significant or not. His initial inquiry into my concerns did not reveal a problem. I am quite happy to be proved wrong on this, but I still think there is a problem. CSIRO data policy is not helping resolve the issue.
Below I requested the data used in previous tests and more details of the statistical tests used.
Thank you for your explanation and summary of the results of your
significance tests. Sweeping other issues to the side, I would simply like to check the
significance of your results of increasing droughts in Australia. To do this I think
it would be sufficient to have:
1. The individual 13 values for areal % used to obtain each of the
mean and extreme values in tables 4, 7 and 9.
2. The data you used in the significance tests you quote below.
Delimited text files are best.
3. A description of the method you used to determine your significance.
I am assuming that the return period is a deterministic function of
areal % and so additional tests of significance will be redundant. If not, the respective data
for return period would also be of interest.
The results you quote below were interesting and I would like to
resolve any conflicting results that arise.
I note that your quoted significances reconcile with your claims that
“more declarations would be likely, and over larger areas, in the SW,
SWWA and Vic&Tas regions, with little
detectable change in the other regions.”
Many thanks in advance.
I’m not able to hand over the data from the 13 models, due to restrictions on Intellectual Property, but I can describe the methods used to determine statistical significance.
Dewi Kirono says:
Â· I have used a number of statistical tests (parametric and non-parametric) and found that most of them show agreement. I used the 5% significance level. One marginal case was the change in percentage area for exceptionally low rainfall in NSW, in which the T-test was insignificant at the 5% level while Kolmogorov-Smirnov test was significant 5% level. I feel the non-parametric test is more objective since it doesnâ€™t assume a Normal distribution.
Â· For the percentage area (temp, rain), the 13-model-mean sample is the 108 yr time series for 1900-2007 and the 31 yr time series for 2010-2040. For percentage area (soil moisture), the sample is the 50 yr time series for 1957-2006 and the same 50 yr time series modified for a period centred on 2030
Â· For the frequency (temp and rain), the sample is the number of models (13) as each period (i.e. 1900-2007 and 2010-2040) only produces one return period value.
Â· For soil moisture frequency, I cannot perform the test as we only have one value for the obs (1957-2006).
Â· At the moment Iâ€™ve only applied the tests to the “mean” data not the “90th” and 10th” percentiles. This is because we cannot do that for soil moisture and because we deal with lots of zero values for the 10th percentile.
I then asked for further clarification of how the statistical tests were performed,
and asked again for the data.
Further explanation of the statistical tests reveals that they consisted simply of comparison of the means
for two time periods, where % area in individual years were the single data points. This simple
test assumes the data points are independent, but due to autocorrelation is an unjustified assumption.
Failing to account for autocorrelation grossly overestimates the power of the test to detect significant
differences (see my post Scale Invariance for Dummies or Chapter 10 of my book). Also see the results of Breusch and Vahid 2008 from the Draft Garnaut Report (reviewed here), where t-test scores for rate of temperature increase dropped from more than 4 to less than 2 when autocorrelation was taken into account.
I don’t know the exact autocorrelation, as I can’t get the data, but the temperature and rainfall variables that produce it have very high autocorrelation (or ‘bursty’) behaviour, and so these data must inherit that character.
Answers to your questions are embedded in the email below.
From: David Stockwell [mailto:email@example.com]
Sent: Tuesday, 15 July 2008 10:37 AM
To: Hennessy, Kevin (CMAR, Aspendale)
Subject: Re: Exceptional circumstances report supplementary information
Thank you for relaying the description of the significance tests.
Just to be clear on what you did:
Was the standard deviation or the standard error of
the 13 model averages over the future periods used to determine
the significance of the decreasing areal % of rainfall and soil
No. For rainfall and temperature, the tests were performed to assess
differences between means from two groups: group A (1900-2007) VS group
B (2010-2040) for temperature and rainfall. Thus, groups A and B had 108
and 31 data points, respectively. For soil moisture, group A (1957-2006)
and group B (50 yrs centred on 2030) both had 50 data points. We also
did a quick test of whether the results were sensitive to treating the
13 models separately or as multi-model means, and the answer is no.
Was the 108 yr time series for 1900-2007 one or 13 points?
One point (N = 108). See above.
Was the 31 yr time series for 2010-2040 13 points and used to get the
One point (N = 31). Yes.
Was a one or two-tailed t-test applied?
Both. They suggest the same conclusions, either above or below the 5%
IMO the 13 values of model predictions are summaries of output that
are necessary to make a determination of the quality of evidence in
your report. Whether they are covered by IP or not, it would be in
the best interests of science for you to allow them to be used in an
As you might be aware, using the minimal data provided in your report,
I determined that only one region showed significance for % area
exceptional low rainfall (SWWA) and only two areas showed
significance soil moisture (SWWA and Vic&Tas).
This is fewer than the 3 areas and 4 areas where you claim
significance (SW and NSW being the additional areas). I would like
to reconcile this difference and the quickest way would be for me to
check it with the data.
Thanks again for your cooperation up to this point, and I hope you
will reconsider your decision regarding the data.
As indicated in my previous email, we are not at liberty to distribute
the data due to IP limitations. We have checked our data and believe the
results and conclusions are correct.