Antarctic Snowfall Data Visualisation

An issue in question here is whether the recent snowfall at Law Dome is unusually high relative to the 750 year long record (and therefore, so the argument goes, probably due to AGW).

Below is the snowfall at Law Dome from the ice core. Above is the actual snowfall, and below is the accumulation of the series minus the mean (using the R function cumsum) indicating where snowfall is above or below average.

fig1LD

This simple approach is not used in the paper. While the accumulation of snow at present (relative to the mean) is high, it was just as high earlier in the record around 1400. Figure 3 in the paper shows a 10 year Gaussian filter of these data, so I illustrate that below, with a similar result.

fig1LD5

The approach in the paper records the size of filtered accumulation ‘events’. These would be the (approx. 60) areas under discrete sections of curves in the upper panel of the figures, postive if above or and negative if below the mean (red line). The event of most interest is the last one, where snow has accumulated since 1970, after filtering.

I start to get concerned with an approach like this because its unfamiliar to me, and I don’t recall anywhere in the statistics books where it has been characterized. I check the distribution. The figure below shows the distribution of these events, and compares them to a normal distribution.

fig2D

The distribution of event sizes appears to be Leptokurtic (peaked with fat tails) although the supplementary information states that the distribution at a Gaussian half-filter size of 5 did not differ from the normal distribution. I have to check whether this deviation is accurate or due to the binning of the data.

Finally I looked at the improbability of the final event by calculating the set of events and their standard deviations for a range of filter sizes:

fig2LDL

The dashed red line is the 2-sigma level, and dotted red line is the 3-sigma level. The size of the final event becomes 3-sigma significant at around half-filter size 4 and declines thereafter. In other words, the final event is significant at scales between 8 and 40 years.

Here is a link to the R code and data used to generate these plots.

Tas van Ommen has been a really good sport in providing data and information to allow me to check the figures. The relevant passage from the paper is as follows:

Long-term climate variability
In assessing the degree to which recent trends are natural or anthropogenic, it is useful to consider the size of the anomaly in the context of 750 years of data (Fig. 3). We define decadal-scale anomalies as the difference between the ten-year Gaussian smoothed series and the long-term mean, and use the sign changes in this anomaly to define successive anomalous intervals. The size of the event is the integrated accumulation anomaly over each interval. We find the event that commenced in 1970 is the largest in the record. It is significantly larger than the distribution of the 56 other events (P=7×10−4,1-tail t -test; Supplementary Information). Only a single such event is expected in around 38,000 years for a climate unchanged from that of the past 750 years. were computed using one-sided t -tests.

In contrast, I get a significance of 99.7% or P=3×10-3 an expectation of every 333 years, or twice in the record, a similar level to the accumulation plot I presented first up.

There is an inconsistency in the passage quoted above. By my calculation a P=7×10−4 event is 1/0.0007, equalling an occurance every 1428 years, so I am not sure where the figure of 38,000 years comes from.

Another puzzle is that statement “It is significantly larger than the distribution of the 56 other events (P=7×10−4,1-tail t -test; Supplementary Information). ” indicates the test is based on the size of the final accumulation event. However the relevant statement in the supplementary information suggest this figure comes from a test of difference of distributions.

Law Dome 750-Year Accumulation Statistics
We computed decadal scale anomalies for the LD precipitation series as described in the main text, and found that the largest anomaly was the most recent decadal event, which began in 1970. This event appears to belong to a population with a larger mean than the remaining 56 decadal events – we reject the null hypothesis (that it comes from the same distribution) using a 1-tailed t-test, (P = 7×10-4).

The test of the size of the final event (comparable to the test I performed above) is described in the following paragraph. This is a order of magnitude different to my replication.

The 56 anomalies in the period 1250-1969 have a mean precipitation of –0.0188 m and standard deviation, 0.719 m (ice-equivalent, i.e. for glacial ice density 917 kg.m-3). The post 1970 anomaly is 2.42 m (i.e.), which is extremely improbable from the distribution of 1250-1969 anomalies (P = 3.4×10-4, normal z-score).

It seems that the significance of the event might be based on the distribution of the (possibly filtered) annual snowfalls during that event. If so, put your statistician on danger money! I will try to replicate the stated values more closely tomorrow.

Even though this is an initial visual examination of the data, there are some questions that need to be looked at more closely:

Where do the test values come from and the 38,000 year figure?

Are the data really leptokurtic, meaning that even my initial estimates of P=0.003 are inflated anyway?

The snowfall at Law Dome does not look particularly impressive here, given that the present accumulation of snow is not greater that at other times in the 750 year record.

Reference:

van Ommen, T. D. and V. Morgan (2010). Snowfall increase in coastal East Antarctica linked with southwest Western Australian drought, Nature Geoscience, Advance Online Publication, doi:10.1038/NGEO761

Advertisements

0 thoughts on “Antarctic Snowfall Data Visualisation

  1. I thought I knew what was tested from the term “integratedaccumulation anomaly”. But it can't be that. Perhaps Tas will clearit up, as well as the 38,000 year event.

    • I thought I knew what was tested from the term “integrated
      accumulation anomaly”. But it can’t be that. Perhaps Tas will clear
      it up, as well as the 38,000 year event.

    • Math error, statistical error, logic error — you can call it any of those. It appears to be the relatively common problem of assuming normality and extrapolating to assume something is a very rare event when it isn’t.

  2. Given the subjective impression that Perth weather shows up in Adelaide a day or two later, then another day to Melbourne, would not the hypothesis be strengthened by comparing these locations with Law Dome as well? If the correlations between Perth and Adelaide are worse than Perth and Law Dome, I'd be a bit worried.

  3. Given the subjective impression that Perth weather shows up in Adelaide a day or two later, then another day to Melbourne, would not the hypothesis be strengthened by comparing these locations with Law Dome as well? If the correlations between Perth and Adelaide are worse than Perth and Law Dome, I’d be a bit worried.

  4. Math error, statistical error, logic error — you can call it any of those. It appears to be the relatively common problem of assuming normality and extrapolating to assume something is a very rare event when it isn't.

  5. Hi David,A few comments on your post (Antarctic Snowfall Data Visualisation).First, I’m more than happy to see others play around with the data, and have a moment free right now, but I can’t necessarily respond to any and every post, and I don’t check your blog frequently – I just happened across this today.You plot something you call the accumulation of the series (which confusingly in glaciology we call the snowfall accumulation) – in any case it is the cumulative sum of the original accumulation series minus its mean, then centred to zero mean. This is the integrated snowfall anomaly, and indeed we have looked at it, but for the purposes of this paper it isn’t the quantity of direct interest. It is, actually of glaciological interest, because it corresponds to snowfall mass-balance anomaly and assuming ice flow response is slow (which it is) compared with the fluctuations we expect to see surface height variations. Indeed there is evidence from old survey work on Law Dome, and modern GPS and repeats that the surface height has increased during the present positive growth anomaly.Anyway, the cumulative sum you plot is the _integral_ of precipitation rate and so the high and relatively steady levels in the late 1300s-1400s correspond to a steady snowfall (i.e. derivative of this curve ~0) about the same as the long term AVERAGE – i.e. snowfall rate is not high. If you look more closely, this positive anomaly results from excess snowfall events over the period AD1316-1355 when, in a couple of separate spurts, about +1.8m was added over 39 years.This is to be compared with the modern period positive anomaly at the end, in which in a single spurt of 36 years, +2.4m was added. Note, that this is of the same order as the height increases seen in survey work.I understand that the approach I’ve used is unfamiliar to you and indeed this is what research is all about – doing novel things. The key of course is not to reinvent wheels. In doing the work, I have had input from professional statisticians as well as climate professionals. So, on to the matter of distributions – you will know of course that from any small sample, the chance of discerning the shape of a distribution and deciding if it is normal is not a straightforward issue. This is why, as stated in the paper, I did some appropriate tests – a Q-Q plot, which is better than just plotting the binned data, a Lilliefors test and Kolmogorov-Smirnov test, which all indicated the validity of a normal distribution. EVEN SO, I then also repeated my significance tests using a non-parametric Mann-Whitney test, to avoid the assumption of normality.Finally, to the issue of probabilities. The P=7×10^-4 value is the confidence with which a t-test isolates the post-1970s anomaly as NOT coming from the same distribution as the other 56 anomalies. It is a parametric test – you will also read that using the non-parametric Wilcoxon rank-sum, the value falls to P=0.04: still significant, albeit not such a stand-out.The 38000 year number comes from the converse argument – say the event is in the same distribution and ask how unlikely it would be. As shown in detail in the supplementary data, the distribution of events has mean -0.0188m and sigma=0.719m, which yields P=3.4×10-5, or once in 2930 occurrences. Here occurrences aren’t yearly, but correspond to our decadal anomalies which (as stated in the supplementary information) occur every 12.86 years in our record. 12.86×2930 should give the 38000 year figure. All of which is not meant to do more than illustrate the unusual nature of this in an otherwise stationary climate (which of course is not something we have over 38000 year periods!). By the way, as I dug back into my notes and calculations, your email prompted me to find a typo in my Supplementary Information – It has P=3×10^-4, where it should have the more significant P=3×10^-5. This will be fixed.

  6. Hi Tas, Thanks for the clarifications. Its clear you have put thought into the statistics, which makes it a more interesting paper to dissect, and given the data is simple, its easy for people to get into. Probably I will put together a summary of the main issues that remain at the end, as right a the moment I mainly want to understand what has been done. The issue that these events are of varying lengths, adds a lot of parameters, and is a concern. In fact, the most recent 'event' appears longer than most, and so of course would have a higher cumulative value. When the data are aggregated with a constant interval, the significance drops. When you add in smoothing and procedures like this, I don't think you can necessarily solve them by just testing the distribution, because the data set is finite. No doubt others would have something to say about it. Ultimately I think that if its a robust result, it will hold up under different approaches. If the result only appears with a specific approach and set of parameters, that's a problem, and attempting to justify a novel technique just takes up a lot of time and energy. As they say, if you torture data enough it will tell you what you want to hear. So I think one should say, lets look at what we have got with the simplest possible approach, considering the validity of the basic assumptions.

  7. Hi David,

    A few comments on your post (Antarctic Snowfall Data Visualisation).
    First, I’m more than happy to see others play around with the data, and have a moment free right now, but I can’t necessarily respond to any and every post, and I don’t check your blog frequently – I just happened across this today.

    You plot something you call the accumulation of the series (in glaciology, confusingly, we call the original precip series itself snowfall accumulation) – in any case it is the cumulative sum of the original accumulation series minus its mean, then centred to zero mean. This is the integral of the snowfall anomaly, and indeed we have looked at it, but for the purposes of this paper it isn’t the quantity of direct interest. It is, actually of glaciological interest, because it corresponds to snowfall mass-balance anomaly and assuming ice flow response is slow (which it is) compared with the fluctuations we expect to see surface height variations. Indeed there is evidence from old survey work on Law Dome, and modern GPS and repeats that the surface height has increased during the present positive growth anomaly.

    Anyway, since the cumulative sum you plot is the _integral_ of precipitation rate the high and relatively steady levels in the late 1300s-1400s correspond to a steady snowfall (i.e. derivative of this curve ~0) about the same as the long term AVERAGE – i.e. snowfall rate is not high. If you look more closely, this positive anomaly results from excess snowfall events over the period AD1316-1355 when, in a couple of separate spurts, about +1.8m was added over 39 years.

    This is to be compared with the modern period positive anomaly at the end, in which in a single spurt of 36 years, +2.4m was added. Note, that this is of the same order as the height increases seen in survey work.

    I understand that the approach I’ve used is unfamiliar to you and indeed this is what research is all about – doing novel things. The key of course is not to reinvent wheels. In doing the work, I have had input from professional statisticians as well as climate professionals. So, on to the matter of distributions – you will know of course that from any small sample, the task of discerning the shape of a parent distribution and deciding if it is normal is not a straightforward issue. This is why, as stated in the paper, I did some appropriate tests – a Q-Q plot, which is better than just plotting the binned data, a Lilliefors test and Kolmogorov-Smirnov test, which all indicated the validity of a normal distribution. EVEN SO, I then also repeated my significance tests using a non-parametric Wilcoxon test, to avoid the assumption of normality.

    Finally, to the issue of probabilities. The P=7×10^-4 value is the confidence with which a t-test isolates the post-1970s anomaly as NOT coming from the same distribution as the other 56 anomalies. It is a parametric test – you will also read that using the non-parametric Wilcoxon rank-sum, the value falls to P=0.04: still significant, albeit not such a stand-out.

    The 38000 year number comes from the converse argument – say the event is in the same distribution and ask how unlikely it would be. As shown in detail in the supplementary data, the distribution of events has mean -0.0188m and sigma=0.719m, which yields P=3.4×10-4, or once in 2930 occurrences. Here occurrences aren’t yearly, but correspond to our decadal anomalies which (as stated in the supplementary information) occur every 12.86 years in our record. 12.86×2930 should give the 38000 year figure. All of which is not meant to do more than illustrate the unusual nature of this in an otherwise stationary climate (which of course is not something we have over 38000 year periods!).

    • Hi Tas, Thanks for the clarifications. Its clear you have put thought into the statistics, which makes it a more interesting paper to dissect, and given the data is simple, its easy for people to get into.

      Probably I will put together a summary of the main issues that remain at the end, as right a the moment I mainly want to understand what has been done. The issue that these events are of varying lengths, adds a lot of parameters, and is a concern. In fact, the most recent ‘event’ appears longer than most, and so of course would have a higher cumulative value.

      When the data are aggregated with a constant interval, the significance drops. When you add in smoothing and procedures like this, I don’t think you can necessarily clear up concerns by just testing the distribution, because the data set is finite. No doubt others would have something to say about it.

      Ultimately I think that if its a robust result, it will hold up under different approaches. If the result only appears with a specific approach and set of parameters, that’s a problem, and attempting to justify a novel technique just takes up a lot of time and energy. As they say, if you torture data enough it will tell you what you want to hear.

      So I think one should say, lets look at what we have got with the simplest possible approach, considering the validity of the basic assumptions.

      But I don’t have time for discussion of methodology of science, so I am not expecting a reply, just letting you know why I am doing what I am doing. There are still some of these numbers I have to replicate, and your explanation helps a lot. Thanks.

  8. Hi David,A few comments on your post (Antarctic Snowfall Data Visualisation).First, I’m more than happy to see others play around with the data, and have a moment free right now, but I can’t necessarily respond to any and every post, and I don’t check your blog frequently – I just happened across this today.You plot something you call the accumulation of the series (in glaciology, confusingly, we call the original precip series itself snowfall accumulation) – in any case it is the cumulative sum of the original accumulation series minus its mean, then centred to zero mean. This is the integral of the snowfall anomaly, and indeed we have looked at it, but for the purposes of this paper it isn’t the quantity of direct interest. It is, actually of glaciological interest, because it corresponds to snowfall mass-balance anomaly and assuming ice flow response is slow (which it is) compared with the fluctuations we expect to see surface height variations. Indeed there is evidence from old survey work on Law Dome, and modern GPS and repeats that the surface height has increased during the present positive growth anomaly.Anyway, since the cumulative sum you plot is the _integral_ of precipitation rate the high and relatively steady levels in the late 1300s-1400s correspond to a steady snowfall (i.e. derivative of this curve ~0) about the same as the long term AVERAGE – i.e. snowfall rate is not high. If you look more closely, this positive anomaly results from excess snowfall events over the period AD1316-1355 when, in a couple of separate spurts, about +1.8m was added over 39 years.This is to be compared with the modern period positive anomaly at the end, in which in a single spurt of 36 years, +2.4m was added. Note, that this is of the same order as the height increases seen in survey work.I understand that the approach I’ve used is unfamiliar to you and indeed this is what research is all about – doing novel things. The key of course is not to reinvent wheels. In doing the work, I have had input from professional statisticians as well as climate professionals. So, on to the matter of distributions – you will know of course that from any small sample, the task of discerning the shape of a parent distribution and deciding if it is normal is not a straightforward issue. This is why, as stated in the paper, I did some appropriate tests – a Q-Q plot, which is better than just plotting the binned data, a Lilliefors test and Kolmogorov-Smirnov test, which all indicated the validity of a normal distribution. EVEN SO, I then also repeated my significance tests using a non-parametric Wilcoxon test, to avoid the assumption of normality.Finally, to the issue of probabilities. The P=7×10^-4 value is the confidence with which a t-test isolates the post-1970s anomaly as NOT coming from the same distribution as the other 56 anomalies. It is a parametric test – you will also read that using the non-parametric Wilcoxon rank-sum, the value falls to P=0.04: still significant, albeit not such a stand-out.The 38000 year number comes from the converse argument – say the event is in the same distribution and ask how unlikely it would be. As shown in detail in the supplementary data, the distribution of events has mean -0.0188m and sigma=0.719m, which yields P=3.4×10-4, or once in 2930 occurrences. Here occurrences aren’t yearly, but correspond to our decadal anomalies which (as stated in the supplementary information) occur every 12.86 years in our record. 12.86×2930 should give the 38000 year figure. All of which is not meant to do more than illustrate the unusual nature of this in an otherwise stationary climate (which of course is not something we have over 38000 year periods!).

  9. Hi Tas, Thanks for the clarifications. Its clear you have put thought into the statistics, which makes it a more interesting paper to dissect, and given the data is simple, its easy for people to get into. Probably I will put together a summary of the main issues that remain at the end, as right a the moment I mainly want to understand what has been done. The issue that these events are of varying lengths, adds a lot of parameters, and is a concern. In fact, the most recent 'event' appears longer than most, and so of course would have a higher cumulative value. When the data are aggregated with a constant interval, the significance drops. When you add in smoothing and procedures like this, I don't think you can necessarily clear up concerns by just testing the distribution, because the data set is finite. No doubt others would have something to say about it. Ultimately I think that if its a robust result, it will hold up under different approaches. If the result only appears with a specific approach and set of parameters, that's a problem, and attempting to justify a novel technique just takes up a lot of time and energy. As they say, if you torture data enough it will tell you what you want to hear. So I think one should say, lets look at what we have got with the simplest possible approach, considering the validity of the basic assumptions. But I don't have time for discussion of methodology of science, so I am not expecting a reply, just letting you know why I am doing what I am doing. There are still some of these numbers I have to replicate, and your explanation helps a lot. Thanks.

  10. Pingback: wypozyczalnia lawet

  11. Pingback: wypozyczalnia busów

  12. Pingback: anti aging

  13. Pingback: see latest news

  14. Pingback: more inside

  15. Pingback: page

  16. Pingback: kliknij link

  17. Pingback: tutaj

  18. Pingback: wynajem opiekuna wroclaw

  19. Pingback: kliknij

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s