Creating a Statistical Model with a Cherry-picking Process

Steve McIntyre, always gracious in his acknowledgments, mentioned my note in the Australian Institute of Geologists newsletter (AIG News No 83 Mar 2006 pp14) in a post yesterday “The Full Network“.

We’ve discussed on many occasions that you can “get” a HS merely from picking upward-trending series from networks of red noise (David Stockwell had a good note on this phenomenon on his blog a couple of years ago. My first experiments of this type were on the cherry picks in the original Jacoby network.)

This note published way back in May 2006 (citeable but not peer reviewed) was probably the first of my posts that got picked up in other blogs, such as the American Thinker. The graph shows reconstructed temperature anomalies over 2000 years, using completely random numbers with autocorrelation, has a strong resemblance to other published reconstructions, particularly the prominent ‘hockey-stick’ shape, the cooler temperatures around the 1500s and the Medieval Warm Period around the 1000s. This demonstrates that the method from dendroclimatology of choosing proxies based on correlation with the reference period, (aka cherry-picking) will generate plausible climate reconstructions even on random numbers.

This undermines the credibility of reconstructions using this process from proxies, particularly where this source of uncertainty has not been recognized, and confidence intervals have not been expanded to incorporate the additional uncertainty.

0 thoughts on “Creating a Statistical Model with a Cherry-picking Process

  1. Pardon me, but the “hockey stick” “blade” is forced by the … ohhh, “cherry picking”, shall we say? …. of “red noise” data series that correlate to the instrumental temperatures. Once you’ve done that, what you have is no longer random, but in fact highly correlated with the instrumental data record (which shows a sharp positive slope in the 20th century time frame; the “blade” region).

    What happens away from that is that as this correlation gets attenuated (due to the weak temporal correlation of the “red noise”), the variance increases and the data settles to an “arbitrary” (to use your word) baseline. Selecting different parameters would result in a different baseline far away from the “blade” region. Which is not exactly what the MWP is either; it is an anomalous warmer period with lower temperatures on either side, or at least that’s what people are saying.

    In addition, as expected for low-pass filtered random series (your “red noise”), your reconstructions undershoot the instrumental data at the early end, and overshoot the instrumental data at the late end. This feature is not present in the Mann reconstructions

      • If proxies contain a temperature signal, then they *are not* random red noise. Random red noise is the null hypothesis and proxies – as legitimate temp signal – is the alternative hypothesis. Since calibrating random red noise to insturmental data generates a hockey stick, then this is the de facto null.

        f calibrating random red noise to insturmental temperatures generates a “hockey stick” shaped series, then if you are trying to demonstrate that climate proxies *are not* random red noise, then

      • Sorry David, I did not tidy up my post before submitting. Everything after the first paragraph should have been deleted.

      • As I said above, “red noise” is not “random”. It, at the very least, contains “information” in the structure of the noise spectrum (I’ll pass over the meta-discussion as to whether purely random noise is also ‘non-random’ in having the explicit structure [which is testable] of pure randomness….)

        Besides the bandpass characteristics of “red noise”, it may well have (extreme) LF attentuation as well; I suspect that all the numbers that Stockwell generated fell in a certain range (although his “Methods” section is not explicit on this), or there was normalisation, so that there was no DC bias (i.e., “component”) in these series either.

        The absence of a DC component (or LF components) forces the regression away from the calibration towards an “arbitrary” (to use Stockwell’s word) baseline, giving you the “stick” portion of the “hockey stick”.

        We two things about the “red noise” series: First that they’re highly correlated with the instrumental data for the instrumental data period, and second, that they are not correlated with the instrumental data outside this time frame (or more specifically, that any such correlation is more and more attenuated the farther away, although Stockwell doesn’t report his parameters for the “red noise” generator, so we don’t know how far). What we’re saying is that we get a “hockey stick” where there is imperfect temporal correlation and where two different mechanisms pertain in two regions of the reconstruction. This is entirely consistent with physical changes in this timeframe as well, say, oh, CO₂ staying pretty much stable up to the industrial period, and then taking a sharp increase as we start to burn more and more fossil fuel….

      • We could refer to it as trendless red noise, or a random walk if you prefer.

        Consider the null case which can be simulated. Trendless “red” psuedoproxies yield some series which correlate with insturmental data and some which do not. The calibration techniques of reconstructions select those series which by chance mirror the insturmental data. By definition of the simulation, these are obviously spurious correlations. The selected psuedoproxies can have no value to reconstruct past climate in spite of correlating with the insturmental series.

        If you add a signal to the psuedoproxies – both in the calibration period and the pre-insturmental period, the calibration selects both true and false predictors of the historical signal. Yet the process assumes 100% certainty in selecting true proxies and therefore the historic climate signal will be attenuated by the selected trendless red noise, yet the climate of the insturmental/calibration period is unaffected. The amount of attenuation of historic signal would depend on the SNR.

      • Trendless “red” psuedoproxies yield some series which correlate with insturmental data and some which do not.
        Yes. The ones that do mirror some component — some information — in the instrumental data.

        The calibration techniques of reconstructions select those series which by chance mirror the insturmental data.
        They don’t “by chance mirror the insturmental [sic] data”. They mirror it because you (“cherry”?)picked them for that very reason.

        By definition of the simulation, these are obviously spurious correlations.

        No. You’ve said (probably with some degree of truthfulness) that climate data show at least “red noise” characteristics; they show some time persistence. Therefore you picked “red noise” rather than “white noise”. And apparently (from another post here), the actual parameters of the “red noise” distribution were picked to match climate data as well. Then you picked those “red noise” sequences that actually correlated well with actual instrumental temperature data over a certain period, further departing from your supposed randomness (or “spuriousness”). How can you call the match spurious when you made sure as best you could that you would see a match?

        The selected psuedoproxies can have no value to reconstruct past climate in spite of correlating with the insturmental series.

        How so? Let’s say, for argument, there was a constant long-term trend in the actual temperatures. The instrumental data reflect this long term trend (very LF signal), and IF we had collected instrumental data outside the calibration period, we would have seen this. Let’s say also that our pseudoproxies are LF-weighted; they are nor random but contain a significant LF spectrum. The ones that match the phase and amplitude of the LF components in the instrumental data would be selected for, and you’d have “long term” sequences left that mirrored that actual long term (LF) components of the instrumental data on the period you didn’t calibrate against. IOW, you’re essentially doing filtering of your data in a somewhat roundabout way, and saying that your ‘filtered’ data match what was found using other methods, at least as so some particulars (such as LF components: the “hockey stick” blade and handle).

      • Sorry about formatting; seems that HTML tag facility here is crude and there’s no “preview”….

      • “How can you call the match spurious when you made sure as best you could that you would see a match?”

        This is a calibration excercise that uses correlations with insturmental to select predictors of historical climate. How can selected predictors in a psuedocase of trendless red noise (and therefore no connection with climate) be considered anything but spurious?

        I’m sorry, I don’t see how the example in your second paragraph shows how my example of calibrating trendless red psuedoproxies (with actual paleo methods) are going to succesfully select psuedoproxies which mirror simulated real life pre-insturmental. IMHO you would have to change the calibration method, or change the nature of the psuedoproxies (the degree of persistence of autocorrelation), or both. In any event, characteristics of actual proxies used puts constraints on data simulations.

        Jeff Id has already shown that the calibration method used in Mann 08 is sensitive to the red noise characteristics of the actual proxies he selected from (Sorry David, I’m not as familiar with your work 🙂 ) by calibrating Mann’s proxies against fake temp signals.

      • “[Guest]: How can you call the match spurious when you made sure as best you could that you would see a match?

        “[Layman Lurker]: This is a calibration excercise that uses correlations with insturmental to select predictors of historical climate. How can selected predictors in a psuedocase of trendless red noise (and therefore no connection with climate) be considered anything but spurious?”

        Because you’ve tossed all the ones that don’t calibrate. It you had kept them all, you’d look at them and say that the small subset of those that do ‘match’ were spurious. Once you’ve deliberately chucked the rest as not ‘matching’, the ones that do ‘match’ are not “spurious” at all, but in fact are exactly what you were looking for.

        “I’m sorry, I don’t see how the example in your second paragraph shows how my example of calibrating trendless red psuedoproxies (with actual paleo methods) are going to succesfully select psuedoproxies which mirror simulated real life pre-insturmental.”

        Pardon me, but I’m having a bit of a time taking you seriously when you continually misspell “insturmental” and “psuedoproxies”. Jut a little nit, but it suggests perhaps less familiarity with the subject than might be expected.

        In answer to your question, whether or not the “red noise” series ‘mirror’ the real-life instrumental data depends on what that real-life instrumental data is, doesn’t it? If the real-life instrumental data approximates a flat-line, no trend regime prior to the calibration period, then enough random series with no DC component (i.e. normalised to some “arbitrary” baseline mean) will average out to about the same thing: A flat line (with perhaps a deviation from that down towards the actual instrumental data at the beginning of the instrumental data calibration region due to autocorrelation (low-pass) characteristics of the “red noise”. There’s your “stick” portion of the “hockey stick”. This is what happens when you have series with no “trend” (at least far away from the calibration region). But if you have real proxies with no such trend away from the calibration region (say, due [at least in part] to relatively constant CO₂ levels), you will get the same thing (but perhaps with less variance).

        IIRC, Mann et al. did Monte Carlo simulations to show “skill” of the proxies, comparing with “red noise” sequences:

        [Mann et al., PNAS, 2008]: Results from the early and late validation experiments were then averaged for the purpose of estimating skill metrics and uncertainties. So-called “reduction of error” (RE) and “coefficient of efficiency” (CE) skill scores for the decadal reconstructions were used as metrics of validation skill as in past work (20, 32). Because of its established deficiencies as a diagnostic of reconstruction skill (32, 42), the squared correlation coefficient r2 was not used for skill evaluation. Statistical significance of RE and CE scores were estimated by Monte Carlo simulations based on the null hypothesis of first-order autoregressive “red noise” (20, 32). Only those reconstructions that passed validation at the P = 0.05 (i.e., 95% significance) level based on both metrics were retained.

        Jeff Id has already shown that the calibration method used in Mann 08 is sensitive to the red noise characteristics of the actual proxies he selected from (Sorry David, I’m not as familiar with your work 🙂 ) by calibrating Mann’s proxies against fake temp signals.

        Cites for this?

      • You say:
        “where two different mechanisms pertain in two regions of the reconstruction.”

        But the generator of the series is the same throughout. As I say, the series that go into the hockey stick are 20% of the total, that go in all directions in the calibration period:

        “About 20% (204) of those sequences had a positive slope and a very high significance P(> |t|) < 0.001 in a simple linear regression of values against temperatures."

        You then say:

        "This is entirely consistent with [a theory of] physical changes in this timeframe"

        but my point is that this shape is forced by the selection methodology, and so cannot be used to support the theory, as that would be circular. Blind men are blind. You can't select a group of blind men and then use it to prove men are blind.

      • You say:
        “where two different mechanisms pertain in two regions of the reconstruction.”

        But the generator of the series is the same throughout.

        The “generator of the series” must include within this “generator” any subsequent filtering/selection/etc.

        To make this clear, let’s say that I ran a random number generator generating thousands of numbers from 0 to 99. Then I processed these numbers, selecting only those numbers that matched exactly each the lottery numbers drarn for that day. I then used these “validated” numbers as randomly selected numbers (which they were) and looked to see how well such a process selected that day’s lottery numbers. Strangely enough, I find that such a random process is significantly better than chance at picking lottery tickets. Would you agree then that “random” selection (as I — ummm, “modified” — it) is as good as any other process I could come up with in picking that day’s numbers? I could even use “white noise’ random numbers, and I bet it would work….

        As I say, the series that go into the hockey stick are 20% of the total, that go in all directions in the calibration period:

        Yes. You’ve picked series that show LF trends matching the calibration data (which shows a sharp positive slope). Not all your series did that, and you tossed the ones that didn’t show this characteristic.

        You then say:

        “This is entirely consistent with [a theory of] physical changes in this timeframe”

        but my point is that this shape is forced by the selection methodology, and so cannot be used to support the theory, as that would be circular. Blind men are blind. You can’t select a group of blind men and then use it to prove men are blind.

        What I say is that you select series that must correspond closely to the sharply upturned calibration data in that region, and which show what is likely a LF bias towards a “baseline” (your word: “arbitrary”) outside the region where you demand the correlation. The regression towards baseline is probably inherent in your “red noise” generation: A LF bias but DC elimination (normalisation, standardisation, or range restriction on the randomly generated values) so that the average of such series will approach the arbitrary baseline (or “zero”) far away from the region where correlation with the instrumental data is demanded, so that the time correlation effect is lost. Thus a regression towards baseline (along with more variance). But all pretty much as expected given how you selected your sequences. You chose values that would produce a “hockey stick”.

        My point is that if I say “A is a fruit” and you show that “B is a fruit”, it still doesn’t mean that “A is B”. You’ve managed to produce a “hockey stick” with not-quite-so-random data, and you’re saying that random data can account for what Mann saw, but I say that Mann’s data is consistent with the proxies having some intrinsic significance (which external research also shows to be plausible if not quite likely), and that the proxies do reflect a change in the slope of the temperature records in the last century (a “hockey stick”). Your simulations show this slope change too, but that’s because you only insist in the sharp positive slope in one region (the calibration region).

      • I think we are not disagreeing and the following statements are not
        inconsistent:

        “you’re saying that random data can account for what Mann saw, but I
        say that Mann’s data is consistent with the proxies having some
        intrinsic significance (which external research also shows to be
        plausible if not quite likely),”

        The way to look at it is, the proportion of random proxies vs those
        with ‘intrinsic significance’ is unknown as it has not been controlled
        for. It may be impossible to control for it. Thats why you need
        proxies that are a priori temperature proxies without the selection
        process. The jist of the nature exchange, (which I still cant find)
        is that if you regard the proxies as bona fide a priori then the 20
        century increase is significant, but if you regard it as coming from a
        selection process, then the confidence limits blow out beyond the
        present temperature.

      • “The jist of the nature exchange, (which I still cant find)
        is that if you regard the proxies as bona fide a priori then the 20 century increase is significant, but if you regard it as coming from a selection process, then the confidence limits blow out beyond the present temperature.”

        The 20th century increase is a given here (it’s what’s used to “calibrate” or “validate” (or both) the proxies for periods where the instrumental temperature records are not available. That is to say, the “blade” is there pretty much any way you want to look at it (unless you want to dispute the accuracy of the 20th century instrumental data, but that has nothing to do with the proxies).

        Whether this increase is unusual is a different question.

        It wouldn’t be unusual if what we’re seeing in the instrumental record is consistent with longer term trends.

        But that leads one to ask why the pseudoproxies are constructed with an apparent baseline for the data, that assume that the data over the period far from the validation approximate (or average to) some “arbitrary” value. Why? Why not sequences that show LF/extremely-LF components that simply follow the slope of the 20th century instrumental data (and that don’t show the bent “hockey stick”? The variability away from the calibration period should increase with “random” noise, I agree, but why does it settle out (with higher variance) at something akin to ‘average’ values for the calibration period? I think it’s because of restrictions on the range imposed on the generation of the “red noise”, otherwise weak [and getting weaker with distance but still non-zero] effects of the autocorrelation requirement [also an “ad hoc” restriction] would tend to produce plots that simply continued the ‘trend’ of the calibration period, which in fact your sample graph shows in any case for some short distance, but which seems to be overcome over longer distances with regression towards the “arbitrary” baseline. Putting in such range restrictions as seems to have been the case is another ad hoc choice and in fact teh insertion of “information” into the “random” data: To wit, that there is some long-term baseline. Once you’ve assumed that with your random data generation, you will have the “hockey stick”, but that’s kind of like assuming your conclusions which is that conditions in the far past were pretty stable (no changing “baseline”) and comparable to today’s. You take that (“stable” conditions outside the calibration period) and the forced “blade” to correspond to the instrumental temperatures, and your conclusion is unsurprisingly that we had stable temperatures a ways back, but a recent dramatic uptick. And this is true whether or not temperatures actually were stable, simply by your choice of assumptions in generating the “random” data.

      • Are you saying that the overshoot and undershoot are intrinsic to the “red noise” model (I’d agree with that; in fact that was my point), and that the observed temperatures and Mann’s reconstructions outside the ‘calibration’ instrumental data timeframe are just “small deviations”? What are the error bars on your “red noise” series (look pretty small near the instrumental data) and does the Mann reconstruction fall within them? If not, then something else besides pure red noise is going on, no?

      • Yes. There are a couple of comments in Nature that address your question about Mann’s reconstruction, and significance etc. If I can find them today I will post it FYI. My point was just that the hockeystick shape IS the null model, the expectation, and any claim needs to demonstrate deviation from a hockeystick shape, not the existance of a hockeystick shape (which would be circular).

      • Another way of looking at it. The over and undershoot are intrinsic
        to the small sample of the calibration temperature series. Because
        the calibration temperature is largely increasing, a large (and
        unknown) proportion of spurious red noise series slip in to the
        sample. If there were more ‘wiggles’ in the calibration, the
        proportion of ‘real’ temperature proxies would be much higher. The
        parameter reporesenting the fraction of spurious red noise series is
        non-zero, and cannot be assumed to be zero.

      • Well, yes, the over- and under-shoot are intrinsic to a short series of sharply increasing values (which is forced by the requirement that the “red noise” series you … ummm, “cherry pick” by insisting that they correlate strongly with the short instrumental temperature record. This forces a “blade”, and is unremarkable, because all we’ve done is essentially inject information into the “red noise” samples by insisting on that correlation to a “blade” in the instrumental series.

        The over- and under-shoot arise because of the use of “red noise” (another “cherry-picking” constraint of yours). “Red noise” isn’t information-free: It is essentially low-pass-filtered, so it has a structure that is not completely random. Part of that structure is the lack of HF components that would allow a sharp bend near the ends of the instrumental data region, so we have the under- and overshoots. But the take-home point here is: By insisting on “red noise” (and with specific structural parameters), we may simply be saying that we find that there are similarities between the proxy data and other series that have a pronounced LF enancement (or equivalently HF attenuation). Which may be just because we’ve managed to “describe” the data (posssibly in an ad hoc fashion) through the careful selection of our supposedly “random” proxy values.

        What do the simulations look like with “white noise” (pure random data)? With “red noise” of differing ‘saturation” (i.e., different attentuation slopes, cutoff frequencies, etc.)?

  2. Pardon me, but the “hockey stick” “blade” is forced by the … ohhh, “cherry picking”, shall we say? …. of “red noise” data series that correlate to the instrumental temperatures. Once you've done that, what you have is no longer random, but in fact highly correlated with the instrumental data record (which shows a sharp positive slope in the 20th century time frame; the “blade” region).What happens away from that is that as this correlation gets attenuated (due to the weak temporal correlation of the “red noise”), the variance increases and the data settles to an “arbitrary” (to use your word) baseline. Selecting different parameters would result in a different baseline far away from the “blade” region. Which is not exactly what the MWP is either; it is an anomalous warmer period with lower temperatures on either side, or at least that's what people are saying.In addition, as expected for low-pass filtered random series (your “red noise”), your reconstructions undershoot the instrumental data at the early end, and overshoot the instrumental data at the late end. This feature is not present in the Mann reconstructions

  3. These are small deviations from the null model of a hockeystick generated by the proxy selection process.

  4. Yes. There are a couple of comments in Nature that address your question about Mann's reconstruction, and significance etc. If I can find them today I will post it FYI. My point was just that the hockeystick shape IS the null model, the expectation, and any claim needs to demonstrate deviation from a hockeystick shape, not the existence of a hockeystick shape (which would be circular).

  5. Another way of looking at it. The over and undershoot are intrinsicto the small sample of the calibration temperature series. Becausethe calibration temperature is largely increasing, a large (andunknown) proportion of spurious red noise series slip in to thesample. If there were more 'wiggles' in the calibration, theproportion of 'real' temperature proxies would be much higher. Theparameter representing the fraction of spurious red noise series isnon-zero, and cannot be assumed to be zero.

  6. Are you saying that the overshoot and undershoot are intrinsic to the “red noise” model (I'd agree with that; in fact that was my point), and that the observed temperatures and Mann's reconstructions outside the 'calibration' instrumental data timeframe are just “small deviations”? What are the error bars on your “red noise” series (look pretty small near the instrumental data) and does the Mann reconstruction fall within them? If not, then something else besides pure red noise is going on, no?

  7. Guest,

    You should read McIntyre and McKitrick 2005, GRL found here:

    Click to access mcintyre-grl-2005.pdf

    They go into considerable detail about how the red noise series were generated using autocorrelation functions derived from the actual tree ring proxy data. Using white noise would make no sense as the noise structure of the tree ring data isn’t white.

    • Do you deny that “red noise” has an information component; that it has ‘structure’? And what you’re saying is that they ‘cherry picked’ the “red noise” that had certain parameters that matched the tree ring data?!?!? And what result do you get with “white noise”, eh?

      • The main concern is that the simulation matches the noise structure of the proxies. Proxies are reddened.

      • In other words, when you make the “red noise” look like the proxies (in at least some respects), you get the same results.

        When you say “noise structure”, that’s a bit of an oxymoron. What you’re saying is that the structure of your data (be it [formerly] “noise” or “proxy”) is matched.

        I don’t disagree that the “red noise” series have some randomness in them (I guess we can disagree as to how much). But I’d expect that you would agree that the proxies also have some ‘noise’ — some randomness — in them.

        The question in the end is whether it is the ‘randomness’ of your “red noise” series that produces the “hockey stick” or whether it is the constraints you place on your random noise (that is to say, the filtering you so, the structure you force, the ‘information’ you add).

      • Red noise is still noise. It does not contain information. It does have a power spectrum different from white noise, but that’s not information. Individual red noise series correlate with each other only by chance. If you have a red noise series and test it against 10,000 other red noise series at the 95% confidence level, you will find approximately 500 that correlate.

      • How can you say it has no information? Yes, it had a different power spectrum. So does a standard AM broadcast. An AM broadcast also has noise as well, but to say it doesn’t have information would be a bit absurd.

        “Individual red noise series correlate with each other only by chance. If you have a red noise series and test it against 10,000 other red noise series at the 95% confidence level, you will find approximately 500 that correlate.”

        Yes, but here the series that were picked is the ones that correlated with the instrumental temperature data, not with each other.

        And your statement that such series will correlate with each other 500 times at the 95% confidence level is a bit of a tautology, no?

        A better question is whether “red noise” series will correlate with each other more frequently (or more closely) than will “white noise” sequences. The answer is “probably yes”, because the autocorrelation reduces effective sample size (or violates the assumption of sample independence, as you choose to look at it), and a measure of correlation that uses sample size for confidence intervals will overestimate correlation of the “red noise” sequences.

      • The signal to noise ratio of red noise is zero. It contains no information. An AM broadcast has a signal to noise ratio greater than zero. It contains information. The trick is to extract the signal from the noise without distorting it. The argument is that the statistical methods used to reconstruct the past climate can produce what looks like a signal when there is no signal present and worse, attenuate any signal that is present.

        If you think that red noise series will correlate more frequently than white noise, why not do the math? It wouldn’t be difficult in R. I don’t have any interest in doing it because I’m reasonably certain you’re wrong. It’s up to you to do something other than make unsupported assertions.

      • “The signal to noise ratio of red noise is zero. It contains no information. An AM broadcast has a signal to noise ratio greater than zero. It contains information.”

        Oh, does it?!?!? What if it is broadcasting “noise”? What if what they want to broadcast to researchers is a specific “noise” pattern to be used in blog articles?

        “The argument is that the statistical methods used to reconstruct the past climate can produce what looks like a signal when there is no signal present…”

        I understand that this is the ‘argument’. And I say that when you pick your “red noise” carefully so that they end up matching your proxy data, and then throw away the series that don’t correlate strongly with your instrumental data, you may end up with a “reconstruction” that shares the LF characteristics of your proxy reconstructions. I will also say that the “red noise” reconstructions show a more uniform LF “stick” than the proxy reconstructions, and they show an overshoot and undershoot (possibly due to HF sparseness) near the ends of the calibration period, while both show a quite pronounced “hockey stick” “blade” because we insisted on very close correlation with instrumental data in this region in both cases.

        “If you think that red noise series will correlate more frequently than white noise, why not do the math?”

        No. I think that if the hypothesis is that random signal will produce “hockey sticks”, this ought to be tried with random data of all stripes (including “white noise”, and “red noise” of varying frequency/BP attributes). If we find that only specific types of “noise” do this (and that this is true of specific types of “noise” that (carefully) match LF characteristics of our proxies), we might ask ourselves if what we’re doing is simply putting in poorer proxies but simply seeing the same general features.

      • Obviously white noise would have very few simulations selected in the
        calibration period at the same level of significance. That is what
        the whole concern about spurious trends is over, so its hardly novel.

        The main reason that my graph is more uniform on the handle of the
        stick is that its composed of 1000,s of proxies. Real studies have
        used far less proxies and are correspondingly more variable. The
        original Mann paper applied PCA incorrectly, but dating uncertainty
        any other things depress the red component, as has been well
        documented.

      • “Obviously white noise would have very few simulations selected in the calibration period at the same level of significance.”

        First, it’s not obvious. Second, even if so, NP: You just produce more sequences until you have enough that match. Then you look to see if they have the same “stick” portion. If they don’t, then it’s time to look at exactly what it is in your carefully selected “red noise” that matches the same general contours of the Mann reconstruction (I’ve already suggested a reversion to an “arbitrary” baseline inherent in the DC component [or lack thereof] in the “red noise”).

        “The main reason that my graph is more uniform on the handle of the stick is that its composed of 1000,s of proxies.”

        Why don’t you show the “lumpiness” and/or “trend” of runs with the same number of proxies then?

        And can we agree that the “blade” that shows up tells us nothing particularly useful and that all we’re arguing about is what happens outside the calibration period (what kind of a “stick” if any shows up)?

  8. If proxies contain a temperature signal, then they *are not* random red noise. Random red noise is the null hypothesis and proxies – as legitimate temp signal – is the alternative hypothesis. Since calibrating random red noise to insturmental data generates a hockey stick, then this is the de facto null.f calibrating random red noise to insturmental temperatures generates a “hockey stick” shaped series, then if you are trying to demonstrate that climate proxies *are not* random red noise, then

  9. Sorry David, I did not tidy up my post before submitting. Everything after the first paragraph should have been deleted.

  10. Well, yes, the over- and under-shoot are intrinsic to a short series of sharply increasing values (which is forced by the requirement that the “red noise” series you … ummm, “cherry pick” by insisting that they correlate strongly with the short instrumental temperature record. This forces a “blade”, and is unremarkable, because all we've done is essentially inject information into the “red noise” samples by insisting on that correlation to a “blade” in the instrumental series.The over- and under-shoot arise because of the use of “red noise” (another “cherry-picking” constraint of yours). “Red noise” isn't information-free: It is essentially low-pass-filtered, so it has a structure that is not completely random. Part of that structure is the lack of HF components that would allow a sharp bend near the ends of the instrumental data region, so we have the under- and overshoots. But the take-home point here is: By insisting on “red noise” (and with specific structural parameters), we may simply be saying that we find that there are similarities between the proxy data and other series that have a pronounced LF enancement (or equivalently HF attenuation). Which may be just because we've managed to “describe” the data (posssibly in an ad hoc fashion) through the careful selection of our supposedly “random” proxy values.What do the simulations look like with “white noise” (pure random data)? With “red noise” of differing 'saturation” (i.e., different attentuation slopes, cutoff frequencies, etc.)?

  11. Guest,You should read McIntyre and McKitrick 2005, GRL found here:http://climateaudit.files.wordpress.com/2009/12…They go into considerable detail about how the red noise series were generated using autocorrelation functions derived from the actual tree ring proxy data. Using white noise would make no sense as the noise structure of the tree ring data isn't white.

  12. As I said above, “red noise” is not “random”. It, at the very least, contains “information” in the structure of the noise spectrum (I'll pass over the meta-discussion as to whether purely random noise is also 'non-random' in having the explicit structure [which is testable] of pure randomness….)Besides the bandpass characteristics of “red noise”, it may well have (extreme) LF attentuation as well; I suspect that all the numbers that Stockwell generated fell in a certain range (although his “Methods” section is not explicit on this), or there was normalisation, so that there was no DC bias (i.e., “component”) in these series either.The absence of a DC component (or LF components) forces the regression away from the calibration towards an “arbitrary” (to use Stockwell's word) baseline, giving you the “stick” portion of the “hockey stick”.We two things about the “red noise” series: First that they're highly correlated with the instrumental data for the instrumental data period, and second, that they are not correlated with the instrumental data outside this time frame (or more specifically, that any such correlation is more and more attenuated the farther away, although Stockwell doesn't report his parameters for the “red noise” generator, so we don't know how far). What we're saying is that we get a “hockey stick” where there is imperfect temporal correlation and where two different mechanisms pertain in two regions of the reconstruction. This is entirely consistent with physical changes in this timeframe as well, say, oh, COâ‚‚ staying pretty much stable up to the industrial period, and then taking a sharp increase as we start to burn more and more fossil fuel….

  13. Do you deny that “red noise” has an information component; that it has 'structure'? And what you're saying is that they 'cherry picked' the “red noise” that had certain parameters that matched the tree ring data?!?!? And what result do you get with “white noise”, eh?

  14. We could refer to it as trendless red noise, or a random walk if you prefer.Consider the null case which can be simulated. Trendless “red” psuedoproxies yield some series which correlate with insturmental data and some which do not. The calibration techniques of reconstructions select those series which by chance mirror the insturmental data. By definition of the simulation, these are obviously spurious correlations. The selected psuedoproxies can have no value to reconstruct past climate in spite of correlating with the insturmental series.If you add a signal to the psuedoproxies – both in the calibration period and the pre-insturmental period, the calibration selects both true and false predictors of the historical signal. Yet the process assumes 100% certainty in selecting true proxies and therefore the historic climate signal will be attenuated by the selected trendless red noise, yet the climate of the insturmental/calibration period is unaffected. The amount of attenuation of historic signal would depend on the SNR.

  15. You say:”where two different mechanisms pertain in two regions of the reconstruction.” But the generator of the series is the same throughout. As I say, the series that go into the hockey stick are 20% of the total, that go in all directions in the calibration period:”About 20% (204) of those sequences had a positive slope and a very high significance P(> |t|) < 0.001 in a simple linear regression of values against temperatures.”You then say:”This is entirely consistent with [a theory of] physical changes in this timeframe” but my point is that this shape is forced by the selection methodology, and so cannot be used to support the theory, as that would be circular. Blind men are blind. You can't select a group of blind men and then use it to prove men are blind.

  16. <blockqoute>Trendless “red” psuedoproxies yield some series which correlate with insturmental data and some which do not.Yes. The ones that do mirror some component — some information — in the instrumental data.

    The calibration techniques of reconstructions select those series which by chance mirror the insturmental data.They don't “by chance mirror the insturmental [sic] data”. They mirror it because you (“cherry”?)picked them for that very reason.

    By definition of the simulation, these are obviously spurious correlations.

    No. You've said (probably with some degree of truthfulness) that climate data show at least “red noise” characteristics; they show some time persistence. Therefore you picked “red noise” rather than “white noise”. And apparently (from another post here), the actual parameters of the “red noise” distribution were picked to match climate data as well. Then you picked those “red noise” sequences that actually correlated well with actual instrumental temperature data over a certain period, further departing from your supposed randomness (or “spuriousness”). How can you call the match spurious when you made sure as best you could that you would see a match?

    The selected psuedoproxies can have no value to reconstruct past climate in spite of correlating with the insturmental series.

    How so? Let's say, for argument, there was a constant long-term trend in the actual temperatures. The instrumental data reflect this long term trend (very LF signal), and IF we had collected instrumental data outside the calibration period, we would have seen this. Let's say also that our pseudoproxies are LF-weighted; they are nor random but contain a significant LF spectrum. The ones that match the phase and amplitude of the LF components in the instrumental data would be selected for, and you'd have “long term” sequences left that mirrored that actual long term (LF) components of the instrumental data on the period you didn't calibrate against. IOW, you're essentially doing filtering of your data in a somewhat roundabout way, and saying that your 'filtered' data match what was found using other methods, at least as so some particulars (such as LF components: the “hockey stick” blade and handle).

  17. “How can you call the match spurious when you made sure as best you could that you would see a match?”This is a calibration excercise that uses correlations with insturmental to select predictors of historical climate. How can selected predictors in a psuedocase of trendless red noise (and therefore no connection with climate) be considered anything but spurious?I'm sorry, I don't see how the example in your second paragraph shows how my example of calibrating trendless red psuedoproxies (with actual paleo methods) are going to succesfully select psuedoproxies which mirror simulated real life pre-insturmental. IMHO you would have to change the calibration method, or change the nature of the psuedoproxies (the degree of persistence of autocorrelation), or both. In any event, characteristics of actual proxies used puts constraints on data simulations.Jeff Id has already shown that the calibration method used in Mann 08 is sensitive to the red noise characteristics of the actual proxies he selected from (Sorry David, I'm not as familiar with your work 🙂 ) by calibrating Mann's proxies against fake temp signals.

  18. You say:”where two different mechanisms pertain in two regions of the reconstruction.” But the generator of the series is the same throughout.The “generator of the series” must include within this “generator” any subsequent filtering/selection/etc.To make this clear, let's say that I ran a random number generator generating thousands of numbers from 0 to 99. Then I processed these numbers, selecting only those numbers that matched exactly each the lottery numbers drarn for that day. I then used these “validated” numbers as randomly selected numbers (which they were) and looked to see how well such a process selected that day's lottery numbers. Strangely enough, I find that such a random process is significantly better than chance at picking lottery tickets. Would you agree then that “random” selection (as I — ummm, “modified” — it) is as good as any other process I could come up with in picking that day's numbers? I could even use “white noise' random numbers, and I bet it would work….As I say, the series that go into the hockey stick are 20% of the total, that go in all directions in the calibration period:Yes. You've picked series that show LF trends matching the calibration data (which shows a sharp positive slope). Not all your series did that, and you tossed the ones that didn't show this characteristic.You then say:”This is entirely consistent with [a theory of] physical changes in this timeframe” but my point is that this shape is forced by the selection methodology, and so cannot be used to support the theory, as that would be circular. Blind men are blind. You can't select a group of blind men and then use it to prove men are blind.What I say is that you select series that must correspond closely to the sharply upturned calibration data in that region, and which show what is likely a LF bias towards a “baseline” (your word: “arbitrary”) outside the region where you demand the correlation. The regression towards baseline is probably inherent in your “red noise” generation: A LF bias but DC elimination (normalisation, standardisation, or range restriction on the randomly generated values) so that the average of such series will approach the arbitrary baseline (or “zero”) far away from the region where correlation with the instrumental data is demanded, so that the time correlation effect is lost. Thus a regression towards baseline (along with more variance). But all pretty much as expected given how you selected your sequences. You chose values that would produce a “hockey stick”.My point is that if I say “A is a fruit” and you show that “B is a fruit”, it still doesn't mean that “A is B”. You've managed to produce a “hockey stick” with not-quite-so-random data, and you're saying that random data can account for what Mann saw, but I say that Mann's data is consistent with the proxies having some intrinsic significance (which external research also shows to be plausible if not quite likely), and that the proxies do reflect a change in the slope of the temperature records in the last century (a “hockey stick”). Your simulations show this slope change too, but that's because you only insist in the sharp positive slope in one region (the calibration region).

  19. I think we are not disagreeing and the following statements are notinconsistent:”you're saying that random data can account for what Mann saw, but Isay that Mann's data is consistent with the proxies having someintrinsic significance (which external research also shows to beplausible if not quite likely),”The way to look at it is, the proportion of random proxies vs thosewith 'intrinsic significance' is unknown as it has not been controlledfor. It may be impossible to control for it. Thats why you needproxies that are a priori temperature proxies without the selectionprocess. The jist of the nature exchange, (which I still cant find)is that if you regard the proxies as bona fide a priori then the 20century increase is significant, but if you regard it as coming from aselection process, then the confidence limits blow out beyond thepresent temperature.

  20. Red noise is still noise. It does not contain information. It does have a power spectrum different from white noise, but that's not information. Individual red noise series correlate with each other only by chance. If you have a red noise series and test it against 10,000 other red noise series at the 95% confidence level, you will find approximately 500 that correlate.

  21. In other words, when you make the “red noise” look like the proxies (in at least some respects), you get the same results.When you say “noise structure”, that's a bit of an oxymoron. What you're saying is that the structure of your data (be it [formerly] “noise” or “proxy”) is matched.I don't disagree that the “red noise” series have some randomness in them (I guess we can disagree as to how much). But I'd expect that you would agree that the proxies also have some 'noise' — some randomness — in them.The question in the end is whether it is the 'randomness' of your “red noise” series that produces the “hockey stick” or whether it is the constraints you place on your random noise (that is to say, the filtering you so, the structure you force, the 'information' you add).

  22. How can you say it has no information? Yes, it had a different power spectrum. So does a standard AM broadcast. An AM broadcast also has noise as well, but to say it doesn't have information would be a bit absurd.”Individual red noise series correlate with each other only by chance. If you have a red noise series and test it against 10,000 other red noise series at the 95% confidence level, you will find approximately 500 that correlate.”Yes, but here the series that were picked is the ones that correlated with the instrumental temperature data, not with each other.And your statement that such series will correlate with each other 500 times at the 95% confidence level is a bit of a tautology, no?A better question is whether “red noise” series will correlate with each other more frequently (or more closely) than will “white noise” sequences. The answer is “probably yes”, because the autocorrelation reduces effective sample size (or violates the assumption of sample independence, as you choose to look at it), and a measure of correlation that uses sample size for confidence intervals will overestimate correlation of the “red noise” sequences.

  23. “[Guest]: How can you call the match spurious when you made sure as best you could that you would see a match?”[Layman Lurker]: This is a calibration excercise that uses correlations with insturmental to select predictors of historical climate. How can selected predictors in a psuedocase of trendless red noise (and therefore no connection with climate) be considered anything but spurious?”Because you've tossed all the ones that don't calibrate. It you had kept them all, you'd look at them and say that the small subset of those that do 'match' were spurious. Once you've deliberately chucked the rest as not 'matching', the ones that do 'match' are not “spurious” at all, but in fact are exactly what you were looking for.”I'm sorry, I don't see how the example in your second paragraph shows how my example of calibrating trendless red psuedoproxies (with actual paleo methods) are going to succesfully select psuedoproxies which mirror simulated real life pre-insturmental.”Pardon me, but I'm having a bit of a time taking you seriously when you continually misspell “insturmental” and “psuedoproxies”. Jut a little nit, but it suggests perhaps less familiarity with the subject than might be expected.In answer to your question, whether or not the “red noise” series 'mirror' the real-life instrumental data depends on what that real-life instrumental data is, doesn't it? If the real-life instrumental data approximates a flat-line, no trend regime prior to the calibration period, then enough random series with no DC component (i.e. normalised to some “arbitrary” baseline mean) will average out to about the same thing: A flat line (with perhaps a deviation from that down towards the actual instrumental data at the beginning of the instrumental data calibration region due to autocorrelation (low-pass) characteristics of the “red noise”. There's your “stick” portion of the “hockey stick”. This is what happens when you have series with no “trend” (at least far away from the calibration region). But if you have real proxies with no such trend away from the calibration region (say, due [at least in part] to relatively constant COâ‚‚ levels), you will get the same thing (but perhaps with less variance).IIRC, Mann et al. did Monte Carlo simulations to show “skill” of the proxies, comparing with “red noise” sequences:[Mann et al., PNAS, 2008]: Results from the early and late validation experiments were then averaged for the purpose of estimating skill metrics and uncertainties. So-called “reduction of error” (RE) and “coefficient of efficiency” (CE) skill scores for the decadal reconstructions were used as metrics of validation skill as in past work (20, 32). Because of its established deficiencies as a diagnostic of reconstruction skill (32, 42), the squared correlation coefficient r2 was not used for skill evaluation. Statistical significance of RE and CE scores were estimated by Monte Carlo simulations based on the null hypothesis of first-order autoregressive “red noise” (20, 32). Only those reconstructions that passed validation at the P = 0.05 (i.e., 95% significance) level based on both metrics were retained.Jeff Id has already shown that the calibration method used in Mann 08 is sensitive to the red noise characteristics of the actual proxies he selected from (Sorry David, I'm not as familiar with your work 🙂 ) by calibrating Mann's proxies against fake temp signals.Cites for this?

  24. Picking a no signal, red noise series that ‘matches’ intrumental tells you exactly nothing about climate prior to that period (provided the “redness” is properly constrained by reality). The match is meaningless if it does not yield a valid proxy. You would agree that we are not ‘picking’ matches per se, we are using matches to ‘pick’ proxies. Yes?

    A question for guest: If calibrating trendless red noise (no added signal) always produces hockey sticks – no matter what the known or unknown temperature signal is, how do we know if the actual sorted data does or does not contain at least some amount of red noise when the reconstruction is shaped like a hockey stick? Such a condition would necessarily attenuate the historical signal at least to some degree. As I said before, Jeff Id has a post which shows that Mann 08 calibration is sensitive to the red noise of the actual data by calibrating with fake temp data.http://noconsensus.wordpress.com/2009/06/20/hockey-stick-cps-revisited-part-1/

    In contrast, more recent posts from Jeff verify that the calibration method is relatively insensitive to Mann’s red noise simulations, So the salient question then becomes this: are Mann’s simulations consistent with actual data used?http://noconsensus.wordpress.com/2010/07/19/9657/

    http://noconsensus.wordpress.com/

  25. “The jist of the nature exchange, (which I still cant find)is that if you regard the proxies as bona fide a priori then the 20 century increase is significant, but if you regard it as coming from a selection process, then the confidence limits blow out beyond the present temperature.”The 20th century increase is a given here (it's what's used to “calibrate” or “validate” (or both) the proxies for periods where the instrumental temperature records are not available. That is to say, the “blade” is there pretty much any way you want to look at it (unless you want to dispute the accuracy of the 20th century instrumental data, but that has nothing to do with the proxies).Whether this increase is unusual is a different question.It wouldn't be unusual if what we're seeing in the instrumental record is consistent with longer term trends.But that leads one to ask why the pseudoproxies are constructed with an apparent baseline for the data, that assume that the data over the period far from the validation approximate (or average to) some “arbitrary” value. Why? Why not sequences that show LF/extremely-LF components that simply follow the slope of the 20th century instrumental data (and that don't show the bent “hockey stick”? The variability away from the calibration period should increase with “random” noise, I agree, but why does it settle out (with higher variance) at something akin to 'average' values for the calibration period? I think it's because of restrictions on the range imposed on the generation of the “red noise”, otherwise weak [and getting weaker with distance but still non-zero] effects of the autocorrelation requirement [also an “ad hoc” restriction] would tend to produce plots that simply continued the 'trend' of the calibration period, which in fact your sample graph shows in any case for some short distance, but which seems to be overcome over longer distances with regression towards the “arbitrary” baseline. Putting in such range restrictions as seems to have been the case is another ad hoc choice and in fact teh insertion of “information” into the “random” data: To wit, that there is some long-term baseline. Once you've assumed that with your random data generation, you will have the “hockey stick”, but that's kind of like assuming your conclusions which is that conditions in the far past were pretty stable (no changing “baseline”) and comparable to today's. You take that (“stable” conditions outside the calibration period) and the forced “blade” to correspond to the instrumental temperatures, and your conclusion is unsurprisingly that we had stable temperatures a ways back, but a recent dramatic uptick. And this is true whether or not temperatures actually were stable, simply by your choice of assumptions in generating the “random” data.

  26. Picking a no signal, red noise series that 'matches' intrumental tells you exactly nothing about climate prior to that period (provided the “redness” is properly constrained by reality). The match is meaningless if it does not yield a valid proxy. You would agree that we are not 'picking' matches per se, we are using matches to 'pick' proxies. Yes?A question for guest: If calibrating trendless red noise (no added signal) always produces hockey sticks – no matter what the known or unknown temperature signal is, how do we know if the actual sorted data does or does not contain at least some amount of red noise when the reconstruction is shaped like a hockey stick? Such a condition would necessarily attenuate the historical signal at least to some degree. As I said before, Jeff Id has a post which shows that Mann 08 calibration is sensitive to the red noise of the actual data by calibrating with fake temp data.http://noconsensus.wordpress.com/2009/06/20/hockey-stick-cps-revisited-part-1/In contrast, more recent posts from Jeff verify that the calibration method is relatively insensitive to Mann's red noise simulations, So the salient question then becomes this: are Mann's simulations consistent with actual data used?http://noconsensus.wordpress.com/2010/07/19/9657/http://noconsensus.wordpress.com/

  27. The signal to noise ratio of red noise is zero. It contains no information. An AM broadcast has a signal to noise ratio greater than zero. It contains information. The trick is to extract the signal from the noise without distorting it. The argument is that the statistical methods used to reconstruct the past climate can produce what looks like a signal when there is no signal present and worse, attenuate any signal that is present.If you think that red noise series will correlate more frequently than white noise, why not do the math? It wouldn't be difficult in R. I don't have any interest in doing it because I'm reasonably certain you're wrong. It's up to you to do something other than make unsupported assertions.

  28. “The signal to noise ratio of red noise is zero. It contains no information. An AM broadcast has a signal to noise ratio greater than zero. It contains information.”Oh, does it?!?!? What if it is broadcasting “noise”? What if what they want to broadcast to researchers is a specific “noise” pattern to be used in blog articles?“The argument is that the statistical methods used to reconstruct the past climate can produce what looks like a signal when there is no signal present…”I understand that this is the 'argument'. And I say that when you pick your “red noise” carefully so that they end up matching your proxy data, and then throw away the series that don't correlate strongly with your instrumental data, you may end up with a “reconstruction” that shares the LF characteristics of your proxy reconstructions. I will also say that the “red noise” reconstructions show a more uniform LF “stick” than the proxy reconstructions, and they show an overshoot and undershoot (possibly due to HF sparseness) near the ends of the calibration period, while both show a quite pronounced “hockey stick” “blade” because we insisted on very close correlation with instrumental data in this region in both cases.“If you think that red noise series will correlate more frequently than white noise, why not do the math?”No. I think that if the hypothesis is that random signal will produce “hockey sticks”, this ought to be tried with random data of all stripes (including “white noise”, and “red noise” of varying frequency/BP attributes). If we find that only specific types of “noise” do this (and that this is true of specific types of “noise” that (carefully) match LF characteristics of our proxies), we might ask ourselves if what we're doing is simply putting in poorer proxies but simply seeing the same general features.

  29. Obviously white noise would have very few simulations selected in thecalibration period at the same level of significance. That is whatthe whole concern about spurious trends is over, so its hardly novel.The main reason that my graph is more uniform on the handle of thestick is that its composed of 1000,s of proxies. Real studies haveused far less proxies and are correspondingly more variable. Theoriginal Mann paper applied PCA incorrectly, but dating uncertaintyany other things depress the red component, as has been welldocumented.

  30. “Obviously white noise would have very few simulations selected in the calibration period at the same level of significance.”First, it's not obvious. Second, even if so, NP: You just produce more sequences until you have enough that match. Then you look to see if they have the same “stick” portion. If they don't, then it's time to look at exactly what it is in your carefully selected “red noise” that matches the same general contours of the Mann reconstruction (I've already suggested a reversion to an “arbitrary” baseline inherent in the DC component [or lack thereof] in the “red noise”).”The main reason that my graph is more uniform on the handle of the stick is that its composed of 1000,s of proxies.”Why don't you show the “lumpiness” and/or “trend” of runs with the same number of proxies then?And can we agree that the “blade” that shows up tells us nothing particularly useful and that all we're arguing about is what happens outside the calibration period (what kind of a “stick” if any shows up)?

  31. Pingback: zobacz

  32. Pingback: wypozyczalnia samochodow dostawczych

  33. Pingback: wypozyczalnia samochodów Gliwice

  34. Pingback: link do strony

  35. Pingback: strona

  36. Pingback: strona firmy

  37. Pingback: kliknij link

  38. Pingback: polecam

  39. Pingback: zobacz tutaj

  40. Pingback: darmowe anonse

  41. Pingback: strona

  42. Pingback: witryna www

Leave a comment