This week I am posting another quiz, although no-one has yet solved the Spaghetti Graph Quiz. This one, suggested Demetris Koutsoyiannis may require some statistical analysis to solve. I have plotted the points up, and converted them to an R statement below.

The Quiz: The following numbers are synthetic, generated by a mathematical model. Can anybody decompose it into components such as trends, periodicities or whatever, and can one infer the type of the generating model?

data< -c(0.057,0.204,0.469,0.108,0.422,0.046,0.437,0.175,0.371,0.085,0.487,0.602,0.633,0.854,0.529,0.579,0.260,0.695,0.564,0.181,0.991,0.679,0.657,0.648,0.392,0.543,0.293,0.769,0.183,0.932,0.538,0.339,0.335,0.978,0.732,0.325,0.760,0.821,0.651,0.554,0.374,0.692,0.982,0.922,0.604,0.815,0.969,0.986,0.859,0.940)

### Like this:

Like Loading...

*Related*

This dataset appears to me to be consistent with a simple gaussian random walk,

e.g, data = cumsum(rnorm(50,0,0.3)). The mean step difference is not significantly different

from zero. On a probability plot, the differences appear normal. You might have cherry-picked

this run to get a nice-looking “trend”, but you can usually get one this good in fewer than

20 runs, so it isn’t significant.

This dataset appears to me to be consistent with a simple gaussian random walk,

e.g, data = cumsum(rnorm(50,0,0.3)). The mean step difference is not significantly different

from zero. On a probability plot, the differences appear normal. You might have cherry-picked

this run to get a nice-looking “trend”, but you can usually get one this good in fewer than

20 runs, so it isn’t significant.

Thanks Evan, so just to clarify, you think this is an AR(1) model with a s.d. of 0.3? Cheers

Thanks Evan, so just to clarify, you think this is an AR(1) model with a s.d. of 0.3? Cheers

David,

What I described is an AR(1) with sd of 0.3 (and no trend). However, I agree with Mary Ringo on the other thread that you can’t infer a model from this data alone. An infinite number of algorithms could have produced this data. But as it appears consistent with the simplest AR model, it would be hard to defend any other choice.

David,

What I described is an AR(1) with sd of 0.3 (and no trend). However, I agree with Mary Ringo on the other thread that you can’t infer a model from this data alone. An infinite number of algorithms could have produced this data. But as it appears consistent with the simplest AR model, it would be hard to defend any other choice.

Have you looked at acf(data) and spectrum(data)? There appears to be some periodicity but I have yet to model it.

Have you looked at acf(data) and spectrum(data)? There appears to be some periodicity but I have yet to model it.

Some descriptive statistics:

1st moment: 0.5604

2nd moment: 0.2777

3rd moment: -0.1313

4th moment: 2.0133

The null hypothesis that “the data is not normal” is rejected at the 95% level for either the Lilliefors test or the Bera-Jarque test, so it seems reasonably safe to assume normally distributed data.

The spectrum looks quite flat (except for the large DC offset) – perhaps a little too flat for an AR(1) model. However, the data fails the Runs test for randomness at the 95% level so probably isn’t white noise with a DC offset either.

The autocorrelation function looks quite interesting, with what looks like a step function in the middle. This may hold some clues perhaps?

A linear regression gives an increasing trend of around 0.6 over the 50 measurements, but the confidence limits are quite wide – the noise is probably a little too much to be confident of a trend. I should do some proper tests on it but enough for now!

In reality, it is possible to reject certain models as being quite unlikely, it would be impossible to positively and unquestionably identify the real model. That doesn’t stop us guessing though, I guess :-)

Some descriptive statistics:

1st moment: 0.5604

2nd moment: 0.2777

3rd moment: -0.1313

4th moment: 2.0133

The null hypothesis that “the data is not normal” is rejected at the 95% level for either the Lilliefors test or the Bera-Jarque test, so it seems reasonably safe to assume normally distributed data.

The spectrum looks quite flat (except for the large DC offset) – perhaps a little too flat for an AR(1) model. However, the data fails the Runs test for randomness at the 95% level so probably isn’t white noise with a DC offset either.

The autocorrelation function looks quite interesting, with what looks like a step function in the middle. This may hold some clues perhaps?

A linear regression gives an increasing trend of around 0.6 over the 50 measurements, but the confidence limits are quite wide – the noise is probably a little too much to be confident of a trend. I should do some proper tests on it but enough for now!

In reality, it is possible to reject certain models as being quite unlikely, it would be impossible to positively and unquestionably identify the real model. That doesn’t stop us guessing though, I guess :-)

Spence, I like the approach of eliminating the unlikely candidates. Very Popperian. Thanks.

Spence, I like the approach of eliminating the unlikely candidates. Very Popperian. Thanks.

Spence is right– you can reject unlikely models. But given that all we know in the quiz is that the data were produced by a mathematical model, there are an infinite number of possible models that explain the data. Reject as many as you like — an infinity remains. This illustrates the problem of fitting mathematical (or statistical) models to data out of context. You can always fit a 50-point series with a 49th order polynomial, but it means nothing. If, however, the data were real (e.g., time series) measurements of a physical parameter, then real-world physical/chemical/biological constraints would limit the model choices. In a well designed experiment – or even a fortuitous set of observations – these data might be more than adequate to choose among alternative hypothesis (predictive models).

If I were a scientist observing this process for the first time, my initial thought would be that the the observed parameter is gradually increasing with time (according to the linear regression model). If that idea were at all consistent with the rest of my scientific knowledge, I would include it among possible explanatory hypothesis. (Even if it weren’t consistent, I should probably include it in possible hypothesis anyway, just in case my ‘scientific knowledge’ happens to be wrong). Hopefully, future observations will force a choice ammong hypothesis.

If I had to predict the value of this series at t=100, I would probably go with the linear regression. My primary alternative hypothesis – AR(1)- would predict random variation about the t50 value. Eventually, the next 50 or 100 or 1000 measurements will decide between these, or quite possibly, eliminate both while constraining the range of possible models. If eventually, the constraints are narrowed enough that any of the possible remaining models works for all practical purposes, then I would be a happy engineer, but I would remain an unsatisfied scientist until I found THE model that explained what was happening. The quality of an answer depends in no small part on why you are asking the question.

Spence is right– you can reject unlikely models. But given that all we know in the quiz is that the data were produced by a mathematical model, there are an infinite number of possible models that explain the data. Reject as many as you like — an infinity remains. This illustrates the problem of fitting mathematical (or statistical) models to data out of context. You can always fit a 50-point series with a 49th order polynomial, but it means nothing. If, however, the data were real (e.g., time series) measurements of a physical parameter, then real-world physical/chemical/biological constraints would limit the model choices. In a well designed experiment – or even a fortuitous set of observations – these data might be more than adequate to choose among alternative hypothesis (predictive models).

If I were a scientist observing this process for the first time, my initial thought would be that the the observed parameter is gradually increasing with time (according to the linear regression model). If that idea were at all consistent with the rest of my scientific knowledge, I would include it among possible explanatory hypothesis. (Even if it weren’t consistent, I should probably include it in possible hypothesis anyway, just in case my ‘scientific knowledge’ happens to be wrong). Hopefully, future observations will force a choice ammong hypothesis.

If I had to predict the value of this series at t=100, I would probably go with the linear regression. My primary alternative hypothesis – AR(1)- would predict random variation about the t50 value. Eventually, the next 50 or 100 or 1000 measurements will decide between these, or quite possibly, eliminate both while constraining the range of possible models. If eventually, the constraints are narrowed enough that any of the possible remaining models works for all practical purposes, then I would be a happy engineer, but I would remain an unsatisfied scientist until I found THE model that explained what was happening. The quality of an answer depends in no small part on why you are asking the question.

Evan, I agree entirely. Without some understanding of what kind of a process lies behind the model, there are an infinite number of solutions to the problem.

In fact, even my approach of rejecting certain models has dangers. For example, I reject pure white noise on the basis that the data fails the runs test at 95% significance. But one in twenty white noise data sets will fail the runs test at this significance anyway, so I could have just been “unlucky”.

The way we view data tends to affect our opinions as well. I can reject pure white noise, but found I could not reject a relationship of:

data = 0.012s + N(0.5, 0.25)

where s is the sample number from 1 to 50, on the basis of looking at the descriptive statistics, the autocorrelation function, the power spectrum and some simple statistical tests. You could try to come up with more elaborate and esoteric tests to try to reject this hypothesis, but the more tests you create the more likely you are to reject by chance alone, or to inadvertantly apply a posterior knowledge in a test.

In conventional science, such a hypothesis would be proposed and then probably tested over the next 50 samples or so. If the underlying data displays fractal-like qualities, the hypothesis might even pass but it would not reflect the underlying model. We would have to view the data on different scales to determine these qualities. The difficulty is that we (as humans, and scientists) tend to like simple, elegant relationships and eschew things that are more complex and difficult to understand, so the simple linear relationship plus noise becomes more compelling than a system that exhibits chaotic behaviours. Of course, that does not make it correct!

Evan, I agree entirely. Without some understanding of what kind of a process lies behind the model, there are an infinite number of solutions to the problem.

In fact, even my approach of rejecting certain models has dangers. For example, I reject pure white noise on the basis that the data fails the runs test at 95% significance. But one in twenty white noise data sets will fail the runs test at this significance anyway, so I could have just been “unlucky”.

The way we view data tends to affect our opinions as well. I can reject pure white noise, but found I could not reject a relationship of:

data = 0.012s + N(0.5, 0.25)

where s is the sample number from 1 to 50, on the basis of looking at the descriptive statistics, the autocorrelation function, the power spectrum and some simple statistical tests. You could try to come up with more elaborate and esoteric tests to try to reject this hypothesis, but the more tests you create the more likely you are to reject by chance alone, or to inadvertantly apply a posterior knowledge in a test.

In conventional science, such a hypothesis would be proposed and then probably tested over the next 50 samples or so. If the underlying data displays fractal-like qualities, the hypothesis might even pass but it would not reflect the underlying model. We would have to view the data on different scales to determine these qualities. The difficulty is that we (as humans, and scientists) tend to like simple, elegant relationships and eschew things that are more complex and difficult to understand, so the simple linear relationship plus noise becomes more compelling than a system that exhibits chaotic behaviours. Of course, that does not make it correct!

Finally emerging again after a marathon coding session for the new WW application (directions test version at http://landshape.org/surf/ww2). This reminded me of the convergence way of looking at things, which probably comes of doing a maths degree. That is, what we would like is for our answers to converge in some (possibly infinite) limit. So re the comments from Spence, while we cannot have confidence in finding the ‘right’ answer. we can at least be assured that we will converge to it eventually, by some process such as eliminating the unlikely possibilities. By paying attention to the convergence properties of the process, either a series or a methodology, we are avoiding approaches that might result in a complete reversal, or a flip flopping, of belief.

That said, one of the problems with the approach of fitting a linear model and asserting “this is trend + noise” is that at some point we have to change our mind about the trend in a natural system, because they don’t continue. This might be captured in assumptions of stationary versus non-stationary. If we assume non-stationary, that is a finite mean, and eschew the trend, then we should not have to change our mind as we get more data. The extreme nature of the problem can be seen in global temperature series, decreasing since 1998, increasing since 1800, decreasing since 10,000BC, increasing since the last glacial, … and so on, flip flopping according to the observed time scale. If we give up trends we don’t have that arbitrariness…

I don’t know all the implications of this, but it makes sense of a lot of things for me, and I need to explore it more.

Finally emerging again after a marathon coding session for the new WW application (directions test version at http://landshape.org/surf/ww2). This reminded me of the convergence way of looking at things, which probably comes of doing a maths degree. That is, what we would like is for our answers to converge in some (possibly infinite) limit. So re the comments from Spence, while we cannot have confidence in finding the ‘right’ answer. we can at least be assured that we will converge to it eventually, by some process such as eliminating the unlikely possibilities. By paying attention to the convergence properties of the process, either a series or a methodology, we are avoiding approaches that might result in a complete reversal, or a flip flopping, of belief.

That said, one of the problems with the approach of fitting a linear model and asserting “this is trend + noise” is that at some point we have to change our mind about the trend in a natural system, because they don’t continue. This might be captured in assumptions of stationary versus non-stationary. If we assume non-stationary, that is a finite mean, and eschew the trend, then we should not have to change our mind as we get more data. The extreme nature of the problem can be seen in global temperature series, decreasing since 1998, increasing since 1800, decreasing since 10,000BC, increasing since the last glacial, … and so on, flip flopping according to the observed time scale. If we give up trends we don’t have that arbitrariness…

I don’t know all the implications of this, but it makes sense of a lot of things for me, and I need to explore it more.

I have been experimenting a bit with this, now I have some time. Here is what I tried:

1. A FARIMA model and got an error “Hessian singular” which I am told is because there are too many parameters.

2. And ARMA(1,1) model (arima(data,order=c(1,0,1)), and I got ar=0.97 and ma=-0.77. When I tried to simulate it I got an error, ‘ar’ part of the model not stationary.

3. Plotted the acf function. Here were not high correlations between lags as you would expect with a random walk series. The acf function decays quickly to non-significance ruling out a differenced AR(1) generator. The acf is not entirely flat either, so that rules out a simple MA model.

4. So having eliminated the AR(1) and the MA(1) model I tried a FARIMA(0,1,0) model and got a d=0.275 (the fractional differencing parameter). When I simulated this and looked at it with acf the decay in correlations was similar to the given series. The code was:

>cestbtimeslopesfor (i in 1:100) {

> b l slopes[i] }

> hist(slopes)

The slope on the given data was 0.0118. the number of series with slope greater than this was 25, eg.

>length(slopes[slopes>0.0118])

So one would expect one out of four fractionally differenced series with d=0.275 to have a similar or greater slope than the examples, so the given data clearly doesn’t have a statistically significant slope under the assumptions of a FARIMA(0,0.275,0) model, so can’t be rejected on this basis.

Having eliminated simple MA models (including linear trends with noise) and simple AR models with the ACF, the FARIMA(0,0.275,0) remains (actually d=0.2753412). There are other ways of generating series with the appearance of fractional differencing, but I don’t know how to discriminate between them.

Actually, fractional differencing is very cool, and I should write something about it. The idea of a something like water levels depending on the previous time step, in an integer differenced model is OK, but the idea that the current value depends on an infinite integration over all past values freaks me out. It suggests that an influence propogates through all time, like everything is connected, man.

As an aside, fractional differencing gives the long term persistence, or long term memory, but integer differencing gives only short term persistence or memory. AR models have insufficient long term correlations to explain a multitude of phenomemon: from rainfall to internet traffic.

Anyway, that’s my guess, FARIMA(0,0.275,0).

I have been experimenting a bit with this, now I have some time. Here is what I tried:

1. A FARIMA model and got an error “Hessian singular” which I am told is because there are too many parameters.

2. And ARMA(1,1) model (arima(data,order=c(1,0,1)), and I got ar=0.97 and ma=-0.77. When I tried to simulate it I got an error, ‘ar’ part of the model not stationary.

3. Plotted the acf function. Here were not high correlations between lags as you would expect with a random walk series. The acf function decays quickly to non-significance ruling out a differenced AR(1) generator. The acf is not entirely flat either, so that rules out a simple MA model.

4. So having eliminated the AR(1) and the MA(1) model I tried a FARIMA(0,1,0) model and got a d=0.275 (the fractional differencing parameter). When I simulated this and looked at it with acf the decay in correlations was similar to the given series. The code was:

>cestbtimeslopesfor (i in 1:100) {

> b l slopes[i] }

> hist(slopes)

The slope on the given data was 0.0118. the number of series with slope greater than this was 25, eg.

>length(slopes[slopes>0.0118])

So one would expect one out of four fractionally differenced series with d=0.275 to have a similar or greater slope than the examples, so the given data clearly doesn’t have a statistically significant slope under the assumptions of a FARIMA(0,0.275,0) model, so can’t be rejected on this basis.

Having eliminated simple MA models (including linear trends with noise) and simple AR models with the ACF, the FARIMA(0,0.275,0) remains (actually d=0.2753412). There are other ways of generating series with the appearance of fractional differencing, but I don’t know how to discriminate between them.

Actually, fractional differencing is very cool, and I should write something about it. The idea of a something like water levels depending on the previous time step, in an integer differenced model is OK, but the idea that the current value depends on an infinite integration over all past values freaks me out. It suggests that an influence propogates through all time, like everything is connected, man.

As an aside, fractional differencing gives the long term persistence, or long term memory, but integer differencing gives only short term persistence or memory. AR models have insufficient long term correlations to explain a multitude of phenomemon: from rainfall to internet traffic.

Anyway, that’s my guess, FARIMA(0,0.275,0).

I am afraid I have nothing important to add to the interesting discussion. The points I wished to do with the “quiz” are already written above. So I am quoting some:

“There are an infinite number of possible models that explain the data. Reject as many as you like — an infinity remains” (Evan Englund)

[Here I would add that perhaps any given model (even the rejected ones) can generate the given data series, provided plentiful time and consistency of the model domain with that of the series.]

“Without some understanding of what kind of a process lies behind the model, there are an infinite number of solutions to the problem.” (Spence_UK)

“One of the problems with the approach of fitting a linear model and asserting ‘this is trend + noise’ is that at some point we have to change our mind about the trend in a natural system, because they don’t continue. […] If we give up trends we don’t have that arbitrariness.” (David Stockwell)

—–

PS1. For completeness, the data series was generated assuming stationarity and using one of the simplest possible “models”: the uniform (pseudo-) random number generator x_i = q_i / m, where q_i = k * q_(i–1) mod m, with k = 7^5, m = 2^31–1 and random seed q_0 = 1 426 594 706. For calculations I have used excel.

PS2. Again for completeness, I had included this as an example in the first draft of my paper “Nonstationarity vs. scaling in hydrology”. One of the reviewers found it as “one of the most annoying examples of the paper”, showing “lack of basic statistical understanding!” (his/her exclamation mark). He/she sort of invited me to his/her “statistical lectures to illustrate significance levels”. To have less trouble, I removed it from the paper. But I did not attend the lectures – because the review was anonymous.

I am afraid I have nothing important to add to the interesting discussion. The points I wished to do with the “quiz” are already written above. So I am quoting some:

“There are an infinite number of possible models that explain the data. Reject as many as you like â€” an infinity remains” (Evan Englund)

[Here I would add that perhaps any given model (even the rejected ones) can generate the given data series, provided plentiful time and consistency of the model domain with that of the series.]

“Without some understanding of what kind of a process lies behind the model, there are an infinite number of solutions to the problem.” (Spence_UK)

“One of the problems with the approach of fitting a linear model and asserting ‘this is trend + noise’ is that at some point we have to change our mind about the trend in a natural system, because they donâ€™t continue. [â€¦] If we give up trends we donâ€™t have that arbitrariness.” (David Stockwell)

—–

PS1. For completeness, the data series was generated assuming stationarity and using one of the simplest possible “models”: the uniform (pseudo-) random number generator x_i = q_i / m, where q_i = k * q_(iâ€“1) mod m, with k = 7^5, m = 2^31â€“1 and random seed q_0 = 1 426 594 706. For calculations I have used excel.

PS2. Again for completeness, I had included this as an example in the first draft of my paper “Nonstationarity vs. scaling in hydrology”. One of the reviewers found it as “one of the most annoying examples of the paper”, showing “lack of basic statistical understanding!” (his/her exclamation mark). He/she sort of invited me to his/her “statistical lectures to illustrate significance levels”. To have less trouble, I removed it from the paper. But I did not attend the lectures â€“ because the review was anonymous.

I want to thank Demetris for this interesting and educational example. I have looked into his solution by implementing his random number generator. The following reproduces the data points exactly.

However, when generate series with uniform distribution using the R runif function, I do not get the same profile with the acf function indicating low levels of persistence. I get a more or less zero correlation with lags as you would expect with random numbers.

Also, I ran runif and dkrand (modified to vary the random seed) 1000 times to see how many series would have slopes greater than the given points – zero, 0, nada. This modified dkrand also gives a flatter acf profile than the given sequence.

So I would conclude from this, that the given series is a highly unusual case, generating a slope you would expect in less than 1000 cases. To see how unlikely I ran it 10,000 times – again zero cases > 0.0118. So it is less likely than 1 chance in 10,000.

Still… it illustrates DK’s point that:

I want to thank Demetris for this interesting and educational example. I have looked into his solution by implementing his random number generator. The following reproduces the data points exactly.

However, when generate series with uniform distribution using the R runif function, I do not get the same profile with the acf function indicating low levels of persistence. I get a more or less zero correlation with lags as you would expect with random numbers.

Also, I ran runif and dkrand (modified to vary the random seed) 1000 times to see how many series would have slopes greater than the given points – zero, 0, nada. This modified dkrand also gives a flatter acf profile than the given sequence.

So I would conclude from this, that the given series is a highly unusual case, generating a slope you would expect in less than 1000 cases. To see how unlikely I ran it 10,000 times – again zero cases > 0.0118. So it is less likely than 1 chance in 10,000.

Still… it illustrates DK’s point that:

I was considering making a comment in one of my posts that computers (being deterministic beasts) never generate true random numbers at all, but are typically the product of pseudo-random sequences from multiplicative congruential generators. It doesn’t seem half as clever to point this out after being told that is what the model was, of course :-)

The point that you can only reject a model to a certain significance is important, and in my comments above I’m generally fairly careful to note the significance level at which I’m rejecting a model. The interesting point here is that I would expect (typically) a uniform distribution to fail the tests for normality (particularly the Bera-Jarque test). The fact that it didn’t was a real 1-in-20 chance but goes to prove the point that the tests are certainly not infallible!

Another aside often missed about these pseudo random number generators (and a pet subject of mine – that might be obvious by now!) is their validity in large monte-carlo tests. These days with great computing power at our finger tips running millions of test cases seems trivial. But the pseudo random number generator used here by Demetris is typical of the sort of generator used by many libraries, yet holds its statistical properties for only around 46,000 samples. (Square root of the sequence length is a good rule-of-thumb, Demetris’ example is one of maximal length) On top of that, normally distributed number generators often use more than one uniform sample, further reducing the limit on number of valid samples. You’d be surprised how many people overlook that issue in their software….

I was considering making a comment in one of my posts that computers (being deterministic beasts) never generate true random numbers at all, but are typically the product of pseudo-random sequences from multiplicative congruential generators. It doesn’t seem half as clever to point this out after being told that is what the model was, of course :-)

The point that you can only reject a model to a certain significance is important, and in my comments above I’m generally fairly careful to note the significance level at which I’m rejecting a model. The interesting point here is that I would expect (typically) a uniform distribution to fail the tests for normality (particularly the Bera-Jarque test). The fact that it didn’t was a real 1-in-20 chance but goes to prove the point that the tests are certainly not infallible!

Another aside often missed about these pseudo random number generators (and a pet subject of mine – that might be obvious by now!) is their validity in large monte-carlo tests. These days with great computing power at our finger tips running millions of test cases seems trivial. But the pseudo random number generator used here by Demetris is typical of the sort of generator used by many libraries, yet holds its statistical properties for only around 46,000 samples. (Square root of the sequence length is a good rule-of-thumb, Demetris’ example is one of maximal length) On top of that, normally distributed number generators often use more than one uniform sample, further reducing the limit on number of valid samples. You’d be surprised how many people overlook that issue in their software….

I think my comment crossed yours there David, although I think we were drawing very similar conclusions!

I think my comment crossed yours there David, although I think we were drawing very similar conclusions!

Yes Spence. Though I think we do differ in one respect. I think you are referring to limitations of psuedo-random number generators. But it is not these limitations that produce the high slope, it is that the example is an extreme outlier w.r.t its slope.

Yes Spence. Though I think we do differ in one respect. I think you are referring to limitations of psuedo-random number generators. But it is not these limitations that produce the high slope, it is that the example is an extreme outlier w.r.t its slope.

Absolutely, I wanted to stress our posts crossed because my observations on the use of random number generators is to do with assuming they behave as per true random numbers, as opposed to your example where you are examining the behaviour of this particular generator.

Much unintended confusion possible on this one :-)

Absolutely, I wanted to stress our posts crossed because my observations on the use of random number generators is to do with assuming they behave as per true random numbers, as opposed to your example where you are examining the behaviour of this particular generator.

Much unintended confusion possible on this one :-)

Well I think I have to say no-one got the real model (in R):

I would have to grugingly give the prize to Spence for being the closest as his model uses i.i.d errors:

Perhaps DK should be the judge.

Well I think I have to say no-one got the real model (in R):

I would have to grugingly give the prize to Spence for being the closest as his model uses i.i.d errors:

Perhaps DK should be the judge.

David,

I also liked the answer of an AR(1) without trend, combined with the observation that you can’t infer a model from the data alone. But I was more impressed from the discussion rather than the technical answers. So, may I propose that the three quotations I used in #11 (Evan, Spence, David) share the prize.

David,

I also liked the answer of an AR(1) without trend, combined with the observation that you canâ€™t infer a model from the data alone. But I was more impressed from the discussion rather than the technical answers. So, may I propose that the three quotations I used in #11 (Evan, Spence, David) share the prize.

Many thanks again for an entertaining quiz. Even though it is not infallible, I think it would be a very useful, to have a summary of the steps one might follow to go about choosing a model in this domain. This has probably been done somewhere.

Many thanks again for an entertaining quiz. Even though it is not infallible, I think it would be a very useful, to have a summary of the steps one might follow to go about choosing a model in this domain. This has probably been done somewhere.

I also wish to add another quotation here, which I think suits very well to the discussion. This is from Cohn and Lins, GRL, 1995:

“From a practical standpoint, however, it may be preferable to acknowledge that the concept of statistical significance is meaningless when discussing poorly understood systems.”

This may mean that first we should acquire some understanding of the system and then formulate models and test hypotheses. In turn, this may be contrary to the modern trend of “data-driven” models such as so-called “artificial neural networks” (which is another bad term in my opinion – there is nothing “neural” in these) and “attractor reconstructions” from data. Such tools may be good for very simple nonlinear systems but may fail to provide any insight in complex natural systems. In constrast, the old concepts of probability and statistics at least provide indications of the uncertainty and the limitations in modelling.

I also wish to add another quotation here, which I think suits very well to the discussion. This is from Cohn and Lins, GRL, 1995:

“From a practical standpoint, however, it may be preferable to acknowledge that the concept of statistical significance is meaningless when discussing poorly understood systems.”

This may mean that first we should acquire some understanding of the system and then formulate models and test hypotheses. In turn, this may be contrary to the modern trend of “data-driven” models such as so-called “artificial neural networks” (which is another bad term in my opinion – there is nothing “neural” in these) and “attractor reconstructions” from data. Such tools may be good for very simple nonlinear systems but may fail to provide any insight in complex natural systems. In constrast, the old concepts of probability and statistics at least provide indications of the uncertainty and the limitations in modelling.

Of course I meant Cohn and Lins, GRL, 2005.

Of course I meant Cohn and Lins, GRL, 2005.

Re #20. My dilemma is that the prompting to ‘understand the system first’ leads to a range of traps as well. This focus on understanding the system seems to lead to highly parameterized models, such as we see in CGCMs, with large parts abstracted and vague poorly understood parameters. This gets back to the discussion on realclimate where I encountered your comments first.

Perhaps I am not understanding what you mean by understanding, but your views here seem to run counter to your work which seems to me to be strongly based in observations, and induction (if that is the right term) of system character from the observations.

I am not saying any approach leads to definitive answers. All approaches have limitations, and finding those limits is one of most worthwhile things you can do I think. And whether they are neural nets or whatever, the proof is in the rigorous validation, framing severe tests, making ‘surprising’ predictions, and not tautological ones. I don’t think it is the intent of neural nets to provide insight – they are black boxes by design. There main purpose is to predict (or more precisely, fit). Surely when you praise the old concepts of probability and statistics you include significance estimates (i.e. probability of events), or are you suggesting a statistics without significance testing?

I fact, I really don’t know what you mean by ‘understand a system’. You can break it into parts, but then it isn’t a system. The way you break it up is arbitrary and introduces artifacts. You can look for causation, but that is a philosophical black hole. Only statements like ‘minimizes some objective function’, like certainty, or satisfies some equality, like energy conservation, have a sense of meaning to me when talking about systems, and simple concepts like probability distributions.

Re #20. My dilemma is that the prompting to ‘understand the system first’ leads to a range of traps as well. This focus on understanding the system seems to lead to highly parameterized models, such as we see in CGCMs, with large parts abstracted and vague poorly understood parameters. This gets back to the discussion on realclimate where I encountered your comments first.

Perhaps I am not understanding what you mean by understanding, but your views here seem to run counter to your work which seems to me to be strongly based in observations, and induction (if that is the right term) of system character from the observations.

I am not saying any approach leads to definitive answers. All approaches have limitations, and finding those limits is one of most worthwhile things you can do I think. And whether they are neural nets or whatever, the proof is in the rigorous validation, framing severe tests, making ‘surprising’ predictions, and not tautological ones. I don’t think it is the intent of neural nets to provide insight – they are black boxes by design. There main purpose is to predict (or more precisely, fit). Surely when you praise the old concepts of probability and statistics you include significance estimates (i.e. probability of events), or are you suggesting a statistics without significance testing?

I fact, I really don’t know what you mean by ‘understand a system’. You can break it into parts, but then it isn’t a system. The way you break it up is arbitrary and introduces artifacts. You can look for causation, but that is a philosophical black hole. Only statements like ‘minimizes some objective function’, like certainty, or satisfies some equality, like energy conservation, have a sense of meaning to me when talking about systems, and simple concepts like probability distributions.

Perhaps it was my failure to say “first” and “second”.

In my view, understanding is not identical to making a deterministic (e.g. mechanistic) conceptualization of the system. A good example is statistical physics, which offers, in my opinion, better understanding of what happens in a litre of gas than a mechanistic description or a classical thermodynamical description do. For example, if one tries to describe this litre as a deterministic system of several molecules, I do not think one will derive any result for the system . But it is important to understand that the macroscopic behaviour is related to microscopic movement of molecules and infer the macroscopic behaviour using probability, entopy (which is a probabilistic concept), etc.

In another example, certainly each leaf of any tree plays a role in a the hydrological cycle of a catchment. If one would try to estimate the evaporation in the catchment analyzing each tree and each of its leaves separately, certainly ploughs the sand. Here probability would help to acquire a macroscopic picture of the catchment evaporation without the need to examine each leaf. But on the other hand, if one does not care about the process of evaporation at all – because perhaps he/she uses for instance a neural network whose input and ouput are merely rainfall and river flow, perhaps he/she will not acquire any understanding. In the latter case it would be impossible to do extrapolation, for example, for cases that are not represented in the data (e.g. extreme floods).

So, my view is that concepts such as probability and entropy do provide understanding for complex systems (allowing kind of integration of microscopic behaviours into macroscopic ones) whereas black box neural networks may not do.

Perhaps it was my failure to say “first” and “second”.

In my view, understanding is not identical to making a deterministic (e.g. mechanistic) conceptualization of the system. A good example is statistical physics, which offers, in my opinion, better understanding of what happens in a litre of gas than a mechanistic description or a classical thermodynamical description do. For example, if one tries to describe this litre as a deterministic system of several molecules, I do not think one will derive any result for the system . But it is important to understand that the macroscopic behaviour is related to microscopic movement of molecules and infer the macroscopic behaviour using probability, entopy (which is a probabilistic concept), etc.

In another example, certainly each leaf of any tree plays a role in a the hydrological cycle of a catchment. If one would try to estimate the evaporation in the catchment analyzing each tree and each of its leaves separately, certainly ploughs the sand. Here probability would help to acquire a macroscopic picture of the catchment evaporation without the need to examine each leaf. But on the other hand, if one does not care about the process of evaporation at all – because perhaps he/she uses for instance a neural network whose input and ouput are merely rainfall and river flow, perhaps he/she will not acquire any understanding. In the latter case it would be impossible to do extrapolation, for example, for cases that are not represented in the data (e.g. extreme floods).

So, my view is that concepts such as probability and entropy do provide understanding for complex systems (allowing kind of integration of microscopic behaviours into macroscopic ones) whereas black box neural networks may not do.

Still I am not disputing the predictive capacity of neural networks, especially for simple nonlinear systems. And of course I am not disputing significance testing in statistics. My concern is about its blind applicability, without knowing anything about the behaviour of the system that produced the data. But simultaneously, I think that probability is much more than significance testing, it is a way of thinking and understanding.

Still I am not disputing the predictive capacity of neural networks, especially for simple nonlinear systems. And of course I am not disputing significance testing in statistics. My concern is about its blind applicability, without knowing anything about the behaviour of the system that produced the data. But simultaneously, I think that probability is much more than significance testing, it is a way of thinking and understanding.

Re #24. Fascinating stuff, but there is an important difference between a human and a natural system in their behavior that is illustrated by the very informative game you posed. I see modeling as a game with two players, very like ‘Mastermind’ where one holds a secret code and the other tries to guess it by probing and receiving partial clues.

The difference between playing against a human and a natural system is that the human can be accommodating or adversarial, while (we may assume) the natural system is indifferent to our success. The human can make the game same very hard or very easy depending on the clues, for example by selecting typical or atypical examples.

In this framework where all information are simply ‘clues’ the statement that “statistical significance is meaningless when discussing poorly understood systems” seems an odd aversion. All we have are clues, whether they are gained from macroscopic or microscopic probes. And we need all the clues we can get from all angles. I think the quote is actually talking about jumping to conclusions.

Like the adversary, the learner has skills that may include observation and experiment. Capacity to manipulate the system adds power to our probing. You can construct theoretical experimenters that are capable of an infinite number of experiments in a finite time, and they are capable of overcoming the Popperian limitation to falsification, and can prove universal statements (theories).

This framework came out of machine learning developed by Clark Glymour, including hierarchies of learners, such as the pantheon of Greek Gods, each being capable of more powerful discoveries. I see from a Google search his recent book is on the Android Mind, looking at the powers of computer minds for discovery.

” … for the editors of Thinking about Android Epistemology, there should be theories about other sorts of minds, other ways that physical systems can be organized to produce knowledge and competence.”

References:

Causation, Prediction, and Search – 2nd Edition

Peter Spirtes, Clark Glymour and Richard Scheines

Re #24. Fascinating stuff, but there is an important difference between a human and a natural system in their behavior that is illustrated by the very informative game you posed. I see modeling as a game with two players, very like ‘Mastermind’ where one holds a secret code and the other tries to guess it by probing and receiving partial clues.

The difference between playing against a human and a natural system is that the human can be accommodating or adversarial, while (we may assume) the natural system is indifferent to our success. The human can make the game same very hard or very easy depending on the clues, for example by selecting typical or atypical examples.

In this framework where all information are simply ‘clues’ the statement that “statistical significance is meaningless when discussing poorly understood systems” seems an odd aversion. All we have are clues, whether they are gained from macroscopic or microscopic probes. And we need all the clues we can get from all angles. I think the quote is actually talking about jumping to conclusions.

Like the adversary, the learner has skills that may include observation and experiment. Capacity to manipulate the system adds power to our probing. You can construct theoretical experimenters that are capable of an infinite number of experiments in a finite time, and they are capable of overcoming the Popperian limitation to falsification, and can prove universal statements (theories).

This framework came out of machine learning developed by Clark Glymour, including hierarchies of learners, such as the pantheon of Greek Gods, each being capable of more powerful discoveries. I see from a Google search his recent book is on the Android Mind, looking at the powers of computer minds for discovery.

” … for the editors of Thinking about Android Epistemology, there should be theories about other sorts of minds, other ways that physical systems can be organized to produce knowledge and competence.”

References:

Causation, Prediction, and Search – 2nd Edition

Peter Spirtes, Clark Glymour and Richard Scheines

Thanks for the thoughtful comments. I think I have to read your citations and discuss it later. Generally, I have to say that I adopt Sir Roger Penrose’s ideas that understanding is not an algorithmic process. In this respect I do not beleive in articicial intelligence – if we assume that intelligence should be combined with some understanding.

Thanks for the thoughtful comments. I think I have to read your citations and discuss it later. Generally, I have to say that I adopt Sir Roger Penrose’s ideas that understanding is not an algorithmic process. In this respect I do not beleive in articicial intelligence – if we assume that intelligence should be combined with some understanding.

With pleasure, and an ice cold beer. Regards

With pleasure, and an ice cold beer. Regards