Rainfall Drought Index

Just as a quick look at the data provided from CSIRO for the Drought Exceptional Circumstances Report, I made density plots (frequency histograms) for the rainfall data over two periods, 1900-2010 and 2010-2040 for the South-west of Western Australia, the area with the highest drought predictions. The plot below is the result, with the past in blue and the future in red. You can see the extremely skewed distribution of the data, with a large number of zeros (no drought areas). The future data (red) has a bump at the right hand end of the plot indicating a higher frequency of 100% drought declared areas than the past data.

The question in: What is the appropriate test of the difference between these two populations?

Here is the R code. You will have to delete the summary rows at the bottom and save a file as csv, until I develop a script for processing all the data files in native format.

d< -read.table("RSWWA.csv",header=T,sep=",")
pr<-as.matrix(d[1:110,2:14])
dim(pr)<-NULL
fr<-as.matrix(d[111:141,2:14])
dim(fr)red")
lines(density(pr),col="blue")

10 Reasons to Share Your Data

This started out as a request to CSIRO for data used in coming to conclusions in the Drought Exceptional Circumstances Report, which spawned a series of posts and some furore when the data were not forthcoming immediately. Kevin Hennessy of CSIRO informs me that the data are now available on the BoM website. Credit is due to Kevin for making this happen, as he had to get the permission of all parties to the report, the Dept of Agriculture, Fisheries and Forestry, Bureau of Meteorology and the CSIRO, before the data could be released. I will post the results when I have them. In the meantime, I was reflecting on reasons why people should share data:

1. Greater recognition of your work through increased citations and follow-on research.
2. Multiple copies protect it from accidental erasure.
3. Because many eyes have looked at it, it will be of higher quality.
4. You receive good press as a result.
5. Economic benefits can accrue.
6. It fulfils obligations, such as a corporate vision of sharing research outcomes.
7. Satisfaction is gained from benefiting the broader community.
8. Supports other areas of your business.
9. Helps to build a community of researchers around a common resource.
10. Independent lines of research based on it might result in the right answer.

David Evans on Greenhouse Gas

David Evans , aka ‘rocket scientist‘, shared his progression from a believer in anthropogenic global warming (AGW) to skeptic in response to new evidence (longer article here). Bayesian updating is a way of modeling rational changes of mind. I want see if DE is a rational, thinking person. Dr. Jim Peacock, Chief Scientist of Australia, immediate past President of the Australian Academy of Science and a Fellow of the Royal Society of London thinks not. Using the evidence presented in his article, and a very simple Bayesian updating system, we can model the changing probabilities of AGW in DE’s mind. Bayesian updating can get very complicated, but I am going to show a very simple way of simulating it.

The only rigorous relationship I am going to show is Bayes’ Theorem named after the British cleric Thomas Bayes in “An Essay Toward Solving a Problem in the Doctrine of Chances” (Bayes 1764). Bayes’ Theorem relates the “direct” probability of a hypothesis given evidence P(H|E), with the “inverse” probability of the evidence given the hypothesis, P(E|H), the evidence P(E) and the prior P(H).

P(H|E) = P(E|H)P(H)/P(E)

As DE tells it, prior to his conversion:

The evidence was not conclusive, but why wait until we were certain when it appeared we needed to act quickly? Soon government and the scientific community were working together and lots of science research jobs were created. We scientists had political support, the ear of government, big budgets, and we felt fairly important and useful (well, I did anyway). It was great. We were working to save the planet.

P(H) is called the prior probability of the hypothesis. DE’s motivation for working on global warming, as stated in the first paragraph above, was basically self interest. In other words he was in some sense economically rational, and he was uncommitted on AGW. The Bayesan term for this is an uninformed prior, expressed as a 50/50 chance of AGW vs. not AGW. We write this as a matrix (0.5,0.5) where the first place is the probability of AGW and the second is the probability of not AGW. DE goes on:

But since 1999 new evidence has seriously weakened the case that carbon emissions are the main cause of global warming, and by 2007 the evidence was pretty conclusive that carbon played only a minor role and was not the main cause of the recent global warming. As Lord Keynes famously said, “When the facts change, I change my mind. What do you do, sir?”

P(E|H) is the probability of evidence given the hypothesis. His first piece of evidence is:


1. The greenhouse signature is missing. We have been looking and measuring for years, and cannot find it.

The probability of seeing a greenhouse signature given AGW must be greater 0.5. But given all the noise in the climate system the probability of seeing a signature is not 1 either. Lets say that P(E1|H), the probability of seeing a greenhouse signature given AGW is 0.75, and write the evidence that we have not seen AGW as the matrix (0.25,0.75).

Now we could try to estimate the P(E1) the probability of a greenhouse signature and use Bayes Theorem to calculate P(H|E1) the probability of AGW given the evidence that no signature has been found, but we would have to estimate P(E1), and I am not sure how to do that. Besides there is a simpler way. Because probabilities sum to one, we can simulate the updating of probability of AGW due to the evidence of lack of greenhouse signature by multiplying the matrices and then normalizing, i.e.

(0.5,0.5).(0.25,0.75) = (0.25,0.75)

Thus the evidence reduces the original probability of AGW of 0.5 to 0.25. The same can be done for the other pieces of evidence.

2. [T]heory suggests that carbon emissions should raise temperatures (though by how much is hotly disputed) but there are no observations by anyone that implicate carbon emissions as a significant cause of the recent global warming.

P(E2|H) The probability that carbon dioxide could be the cause of any given warming depends on how many other causes of warming there are. Lets say that historically, carbon dioxide has caused 50% of warmings, or causes 50% of the warmth of warming. Then P(E2|H) is (0.5,0.5). This says that the theory of global warming due to greenhouse gases (AGW) provides no effective evidence to support AGW as a cause. If CO2 was the cause of warmings 90% of the time say, then (0.9,0.1) would produce considerable support from this evidence when multiplied with the prior.

3. The satellites that measure the world’s temperature all say that the warming trend ended in 2001, and that the temperature has dropped about 0.6C in the past year (to the temperature of 1980).

Here we can estimate P(E3|H) with some Monte Carlo simulations to find the frequency of decadal temperature increases and decreases when the globe is warming. Say that temperatures over a ten year period are increasing 75% of the time when the earth is warming. Then P(E3|H) => (0.25,0.75).

4. The new ice cores show that in the past six global warmings over the past half a million years, the temperature rises occurred on average 800 years before the accompanying rise in atmospheric carbon.

While CO2 lagging temperature does not disprove AGW completely, it does tend to provide some support for the alternative, so my guess is that the probability of lagging CO2 when AGW is true is less than 0.5 and so (0.25,0.75).

Now we use a variation of a Bayesian chain rule to calculate the final probability of the hypothesis that global warming is caused by greenhouse gases given the four pieces of evidence. Below we multiply the prior matrix by the four evidence matrices and normalize the result.

P(H).P(E1|H).P(E1|H).P(E1|H).P(E1|H)
=> (0.5,0.5).(0.25,0.75).(0.5,0.5).(0.25,0.75).(0.25,0.75)
=> (0.0039, 0.1055)/0.1094
=> (0.036,0.964)

That is, the probability of AGW given the evidences E1-4 is 0.036 or 3.6%. This is low enough to lead to rejection of a hypothesis at the standard significance level of 95%. In this very rough model, the changing of David Evan’s mind is a rational outcome of updated probabilities in response to new evidence, leading to a formalized decision to dismiss AGW when it reached a high enough threshold of improbability. I think Dr. Jim Peacock, Chief Scientist of Australia, owes David Evans — ‘rocket scientist’ — an apology.

AGW: Where is the evidence?

A superb opinion piece published recently in The Australian graphs one scientists conversion from AGW believer to skeptic after failing to find evidence. David Evans is the self-confessed rocket scientist who wrote the custom carbon accounting model (FullCAM) that measures Australia’s carbon credit in the land use change and forestry sector. In disagreeing with mainstream science as represented by the IPCC he is among those that Professor Garnaut indecorously refers to as ‘dissenters’ here or even ‘deniers’ here.

Below are quotes but the full report is well worth reading as an example of the evidential mindset. In this view, climate models are very low on the pecking order of evidential support, and direct evidence for AGW is lacking.

But since 1999 new evidence has seriously weakened the case that carbon emissions are the main cause of global warming, and by 2007 the evidence was pretty conclusive that carbon played only a minor role and was not the main cause of the recent global warming. As Lord Keynes famously said, “When the facts change, I change my mind. What do you do, sir?”

There has not been a public debate about the causes of global warming and most of the public and our decision makers are not aware of the most basic salient facts:

1. The greenhouse signature is missing. We have been looking and measuring for years, and cannot find it.

2. There is no evidence to support the idea that carbon emissions cause significant global warming. None. There is plenty of evidence that global warming has occurred, and theory suggests that carbon emissions should raise temperatures (though by how much is hotly disputed) but there are no observations by anyone that implicate carbon emissions as a significant cause of the recent global warming.

3. The satellites that measure the world’s temperature all say that the warming trend ended in 2001, and that the temperature has dropped about 0.6C in the past year (to the temperature of 1980). Land-based temperature readings are corrupted by the “urban heat island” effect: urban areas encroaching on thermometer stations warm the micro-climate around the thermometer, due to vegetation changes, concrete, cars, houses. Satellite data is the only temperature data we can trust, but it only goes back to 1979. NASA reports only land-based data, and reports a modest warming trend and recent cooling. The other three global temperature records use a mix of satellite and land measurements, or satellite only, and they all show no warming since 2001 and a recent cooling.

4. The new ice cores show that in the past six global warmings over the past half a million years, the temperature rises occurred on average 800 years before the accompanying rise in atmospheric carbon. Which says something important about which was cause and which was effect.

None of these points are controversial. The alarmist scientists agree with them, though they would dispute their relevance.

The world has spent $50 billion on global warming since 1990, and we have not found any actual evidence that carbon emissions cause global warming. Evidence consists of observations made by someone at some time that supports the idea that carbon emissions cause global warming. Computer models and theoretical calculations are not evidence, they are just theory.

Credit to Geoff Sherrington for spotting this.

CSIRO Wars

Attempts to get some summary data from the Drought Exceptional Circumstances report out of Australia’s scientific organization CSIRO, in order to check the statistical significance of the results, have been described as a saga. The way this has been picked up on various blogs and comments shows the depth of concern people have about data access for checking scientific work.

Steve McIntyre at ClimateAudit describes the saga as “a recent lurid report on Australian drought, only to be stonewalled on grounds of ‘Intellectual Property Rights’, a pretext familiar to CA readers.” In another post he finds fault with another aspect of the report, writing that CSIRO produced “an interesting example of a promotional press release, that would daunt the most adventurous stock promoter, followed by mealy-mouthed and untrue excuses by the government department.”

Further afield, Agmates Rural News linked in with story headlined Scientists & Farmers Question CSIRO Scare Mongering Reports. The very readable SeaBlogger refers to it as “the Australia drought hysteria.”

It doesn’t matter that paleoclimatology shows two modes for (Australian) regional climate: dry and drier. Any modern drought must be climate change caused by your SUV.

Meanwhile, I have been following up on the claim made by Mr Hennessy that “I’m not able to hand over the data from the 13 models, due to restrictions on Intellectual Property”. I wanted to get a copy of the CSIRO IPR policy to see if this was true, or simply a case of an over-zealous employee. I emailed a Dr Tendulkar listed as a contact for IP on the CSIRO web site and asked for a link to a policy and if it might restrict data access in this way. I also emailed a Ms Caldwell in the Freedom of Information (FOI) Unit for information about starting an FOI request for the data. To date I have neither received a reply nor acknowledgment of my emails.

Virtually no information is provided on starting an FOI request. The website states only that:

FOI applications should be accompanied by the statutory A$30 application fee. There are some additional charges associated with processing requests including search, retrieval and photocopying fees.

I asked what costs could potentially be involved. I am concerned because when the then opposition environment minister
Peter Garrett put an (unsuccessful) FOI application to the Great Barrier Reef Marine Park Authority (GBRMPA) for documents on the effect of global warming on the Great Barrier Reef he was hit with an administration charge of more than $12,000. Part of the $12,718.80 costs included charges for 107.6 hours of search and retrieval time, 539 hours of decision-making time and photocopying of more than 3250 pages at 10 cents per page. I can’t complain about the hourly rate of less than $20 an hour, but they seem to work exceedingly slowly.

Peter Garrett is now environment minister in the newly elected Rudd Labor Government. He might be sympathetic to an appeal, given his experiences with getting information out of government research organizations.

UPDATE: I rang Kevin Hennessy yesterday morning (July 21) and he said the data will be on its website in a couple of days. I will do a post when it appears, and give credit where it is due.

CSIRO Data Policy: Go Pound Sand

The Intellectual Property card was played today so I cannot verify the statistical significance (or otherwise) of the Drought Exceptional Circumstances Report. But I found out enough about the statistical tests (performed after publication of the report due to my promptings) to determine that autocorrelation in the temperature series was probably not taken into account.

If this is the case, then it is highly likely the confidence intervals were grossly underestimated and so it is also likely that only one or two regions (SWWA) show statistically significant increase in predicted droughts, not 3 or 4 as claimed by the authors. I am more confident in my original assessment that the results show no significant increase in drought due to greenhouse warming in almost all regions of Australia

As an aside, it is normal practice, and in fact a requirement of publication in major scientific journals, that scientists document a good faith attempt to resolve points of contention prior to submitting a comment or request for correction. I think this is a good policy, as it avoids filling the literature with pointless disputes that could have been resolved between the disputants. Often the erring party issues a correction or withdraws the paper altogether (as Xian-Jin Li recently did with a faulty proof of the Riemann Hypothesis).

Drought Exceptional Circumstances funding is massive, so getting it right is very important. The client should be confident the interpretation of the results are free of researcher bias. Payments to farmers involve billions of tax-payer dollars and government has a duty of diligence to ensure policy is based on statistically sound information. Then there is the reputation of CSIRO (Australia’s NASA equivalent) and the public interest at stake.

To date I am very happy with Kevin’s promptness, and understand he may be constrained by CSIRO IP policy. If anyone has any specific information on this please let me know. However, my suspicions were raised when the report only quoted increases in mean values, and did not disclose whether these results were significant or not. His initial inquiry into my concerns did not reveal a problem. I am quite happy to be proved wrong on this, but I still think there is a problem. CSIRO data policy is not helping resolve the issue.

Below I requested the data used in previous tests and more details of the statistical tests used.

Dear Kevin,

Thank you for your explanation and summary of the results of your
significance tests. Sweeping other issues to the side, I would simply like to check the
significance of your results of increasing droughts in Australia. To do this I think
it would be sufficient to have:

1. The individual 13 values for areal % used to obtain each of the
mean and extreme values in tables 4, 7 and 9.
2. The data you used in the significance tests you quote below.
Delimited text files are best.
3. A description of the method you used to determine your significance.

I am assuming that the return period is a deterministic function of
areal % and so additional tests of significance will be redundant. If not, the respective data
for return period would also be of interest.

The results you quote below were interesting and I would like to
resolve any conflicting results that arise.
I note that your quoted significances reconcile with your claims that
“more declarations would be likely, and over larger areas, in the SW,
SWWA and Vic&Tas regions, with little
detectable change in the other regions.”

Many thanks in advance.

Dear David,

I’m not able to hand over the data from the 13 models, due to restrictions on Intellectual Property, but I can describe the methods used to determine statistical significance.

Dewi Kirono says:

· I have used a number of statistical tests (parametric and non-parametric) and found that most of them show agreement. I used the 5% significance level. One marginal case was the change in percentage area for exceptionally low rainfall in NSW, in which the T-test was insignificant at the 5% level while Kolmogorov-Smirnov test was significant 5% level. I feel the non-parametric test is more objective since it doesn’t assume a Normal distribution.

· For the percentage area (temp, rain), the 13-model-mean sample is the 108 yr time series for 1900-2007 and the 31 yr time series for 2010-2040. For percentage area (soil moisture), the sample is the 50 yr time series for 1957-2006 and the same 50 yr time series modified for a period centred on 2030

· For the frequency (temp and rain), the sample is the number of models (13) as each period (i.e. 1900-2007 and 2010-2040) only produces one return period value.

· For soil moisture frequency, I cannot perform the test as we only have one value for the obs (1957-2006).

· At the moment I’ve only applied the tests to the “mean” data not the “90th” and 10th” percentiles. This is because we cannot do that for soil moisture and because we deal with lots of zero values for the 10th percentile.

Regards

Kevin Hennessy

I then asked for further clarification of how the statistical tests were performed,
and asked again for the data.

Further explanation of the statistical tests reveals that they consisted simply of comparison of the means
for two time periods, where % area in individual years were the single data points. This simple
test assumes the data points are independent, but due to autocorrelation is an unjustified assumption.
Failing to account for autocorrelation grossly overestimates the power of the test to detect significant
differences (see my post Scale Invariance for Dummies or Chapter 10 of my book). Also see the results of Breusch and Vahid 2008 from the Draft Garnaut Report (reviewed here), where t-test scores for rate of temperature increase dropped from more than 4 to less than 2 when autocorrelation was taken into account.

I don’t know the exact autocorrelation, as I can’t get the data, but the temperature and rainfall variables that produce it have very high autocorrelation (or ‘bursty’) behaviour, and so these data must inherit that character.

Dear David,

Answers to your questions are embedded in the email below.

Regards

Kevin Hennessy



—–Original Message—–

From: David Stockwell [mailto:davids99us@gmail.com]

Sent: Tuesday, 15 July 2008 10:37 AM

To: Hennessy, Kevin (CMAR, Aspendale)

Subject: Re: Exceptional circumstances report supplementary information

Dear Kevin,

Thank you for relaying the description of the significance tests.
Just to be clear on what you did:

Was the standard deviation or the standard error of
the 13 model averages over the future periods used to determine
the significance of the decreasing areal % of rainfall and soil
moisture?

No. For rainfall and temperature, the tests were performed to assess
differences between means from two groups: group A (1900-2007) VS group
B (2010-2040) for temperature and rainfall. Thus, groups A and B had 108
and 31 data points, respectively. For soil moisture, group A (1957-2006)
and group B (50 yrs centred on 2030) both had 50 data points. We also
did a quick test of whether the results were sensitive to treating the
13 models separately or as multi-model means, and the answer is no.

Was the 108 yr time series for 1900-2007 one or 13 points?

One point (N = 108). See above.

Was the 31 yr time series for 2010-2040 13 points and used to get the
SD?

One point (N = 31). Yes.

Was a one or two-tailed t-test applied?

Both. They suggest the same conclusions, either above or below the 5%
significance level.

IMO the 13 values of model predictions are summaries of output that
are necessary to make a determination of the quality of evidence in
your report. Whether they are covered by IP or not, it would be in
the best interests of science for you to allow them to be used in an
independent check.

As you might be aware, using the minimal data provided in your report,
I determined that only one region showed significance for % area
exceptional low rainfall (SWWA) and only two areas showed
significance soil moisture (SWWA and Vic&Tas).

http://landshape.org/enm/dought-exceptional-circumstances-review/

This is fewer than the 3 areas and 4 areas where you claim
significance (SW and NSW being the additional areas). I would like
to reconcile this difference and the quickest way would be for me to
check it with the data.

Thanks again for your cooperation up to this point, and I hope you
will reconsider your decision regarding the data.

As indicated in my previous email, we are not at liberty to distribute
the data due to IP limitations. We have checked our data and believe the
results and conclusions are correct.

Drought Exceptional Circumstances Reply

Today I received a reply from CSIRO regarding the Drought Exceptional Circumstances Report. I was very pleased that Mr Hennessy expresses an interest in providing the data needed to check the results, and will refrain from further comment until I get the data and analyze it.

—–Original Message—–

From: David Stockwell

Sent: Monday, 7 July 2008 1:56 PM

To: Enquiries

Subject: Exceptional circumstances report supplementary information

Could you please direct this enquiry to the appropriate person.

I am interested in obtaining the supplementary information for the Exceptional Circumstances Report. In particular, I would like to obtain the results for the individual 13 models used in the summary tables 4, 7, and 9. If possible, I would like information on the tests that were conducted to determine the statistical significance of projected increases in % area of temperature, rainfall and soil moisture, supporting such statements as follows:

http://www.abc.net.au/rural/news/content/200807/s2296263.htm

“A new report is predicting a dramatic loss of soil moisture, increased evaporation and reduced ground water levels across much of Australia’s farming regions, as temperatures begin to rise exponentially.”

I also note from the BOM site that the supplementary information is not due to be released until 31 July. Would you please let me know why the detailed information necessary for checking the report has not been released in conjunction with the main report.

Regards

The response was as follows:

Dear David,

Thanks for your enquiry.

Firstly, the report (with typos fixed) and Supplementary Information are available at http://www.bom.gov.au/climate/droughtec/ . I have asked DAFF to provide a link to the Supplementary Information.

Secondly, some of the media reports have misinterpreted the findings of the report. We have little control over this.

Thirdly, the Terms of Reference (Appendix 1) state that “it will be presented in a form that will enable it to be used in future drought policy discussions, including stakeholder consultation”. Our first draft of the report was considered too technical by the client (DAFF), since the target audience is for lay-people, so we had to spend considerable time simplifying the language, diagrams and tables. Therefore, statistical tests and results from individual climate models were not presented.

However, in response to your request, we have undertaken further work. I can confirm that for the mean scenarios:

  • the % area for exceptionally hot years is significant at the 5% level for all regions
  • the return period for exceptionally hot years is significant at the 5% level for all regions
  • the % area for exceptionally low rainfall years is significant at the 5% level in the SW, SW WA and Vic&Tas regions
  • the return period for exceptionally low rainfall years are significant at the 5% level in the SW, SW WA and Vic&Tas regions
  • the % area for exceptionally low soil moisture years is significant at the 5% level in the SW, SW WA, NSW and Vic&Tas regions
  • the return period for exceptionally low soil moisture years cannot be calculated since the “past” sample has only 1 value (for the period 1957-2006)

I have also attached sample plots of individual model results for % area with exceptionally low rainfall years in SW WA. We’d rather not plot any more regions or variables until we know whether this type of plot is of any use to you.

Regards

Kevin Hennessy

Principal Research Scientist

Climate Change Research Group

Centre for Australian Weather and Climate Research

A partnership between CSIRO (Marine and Atmospheric Research) and the Bureau of Meteorology

PB1 Aspendale Victoria

Attachment: tes1.JPG