What are the conditions for valid extrapolation of statistical predictions? Answer II.

Demetris Koutsoyiannis

Before I attempt to describe my answer, I would like to do some clarifications on the nature of a statistical prediction and mention some points than need caution.

1. A statistical prediction should be distinguished from a deterministic prediction. In a deterministic prediction some deterministic dynamics of the form y = f(x1, …, xk) are assumed, where y is the predicted value, the output of the deterministic model f( ), and x1, …, xk are inputs, i.e. explanatory variables. The model f( ) could be either a physically based one or a black box, data driven one. The latter case is very frequent, e.g. in local linear (chaotic) models and in connectionist (artificial neural network) models.

Now in a statistical prediction we assume some stochastic dynamics of the form Y = f(X1, …, Xk, V). There are two fundamental differences from the deterministic case. The first, apparent in the notation (the upper-case convention), is that the variables are no more algebraic variables but random variables. Random variables are not numbers, as are algebraic variables, but functions of the sample space. This is very important. The second difference is that an additional random variable V has been inserted in the dynamics. This sometimes is regarded as a prediction error that could be additive to a deterministic part, i.e. f(X1, …, Xk, V) = fd(X1, …, Xk) + V. However, I prefer to think of it as a random variable manifesting the intrinsic randomness in nature.

2. To avoid confusion it is always advisable to formulate the stochastic model in such a way that all X1, …, Xk are observable or, better, observed, so that we can directly apply it to obtain predictions that are conditioned on X1 = x1, …, Xk = xk, where x1, …, xk denote observations of X1, …, Xk. Predictions can be of point or interval type. The point prediction is y = E[f(X1, …, Xk, V|X1=x1, …Xk=xk) = E[f(x1, …, xk, V)]. Here E[ ] denotes expectation and in the last part of equation it was assumed that V is independent of X1, …, Xk. Interval predictions are intervals (yb, ya) satisfying P{yb < Y < ya|X1=x1, …, Xk=xk} = alpha, where P{ } denotes probability and alpha is a confidence coefficient. In simple cases these are calculated analytically; in other cases analytical solutions are not feasible and the method of choice is Monte Carlo simulation.

3. In addition to the inherent uncertainty that is described by the variable V, we have also uncertainty in parameters of the model f( ) because these parameters are usually estimated from a sample rather than by theoretical reasoning. This obviously influences our predictions, point and interval, and should be taken into account for a consistent description of uncertainty. Its quantification could be done using the notion of confidence limits of estimation, a notion very different from the prediction limits discussed in point 2.

4. In natural systems, all variables X1, …, Xk are dependent to each other. This is usually missed. For instance the classical statistical law that relates the width of confidence intervals to the square root of the sample size is no longer valid if there is dependence, particularly long-range dependence. Sadly, numerous (if not most) published results on related issues have been based on this and other classical statistical laws that are valid merely when X1, …, Xk are independent. The error in statistical predictions from such misuses could be huge.

5. To summarize, I think that the most important conditions to obtain valid statistical predictions are (1) to be aware of the fundaments of probability, statistics and stochastics, (2) to formulate the problem as clearly as possible, (3) to know the statistical/stochastic properties of the variables involved, such as marginal and dependence properties (particularly, the behaviour of the distribution tails is very important for extrapolations), and (4) to use correct statistical results (formulae, estimators etc.), i.e. those results that correspond to the nature of the problem and the variables involved.


0 thoughts on “What are the conditions for valid extrapolation of statistical predictions? Answer II.

  1. Pingback: startups

  2. Pingback: click here

  3. Pingback: my

  4. Pingback: here source

  5. Pingback: this site

  6. Pingback: crazy bulk

  7. Pingback: this site

  8. Pingback: my website

  9. Pingback: ceramide

  10. Pingback: here source

  11. Pingback: Automotive

  12. Pingback: Automotive

  13. Pingback: go

  14. Pingback: Health Insurance and Preventive Health

  15. Pingback: Business and Finance Consulting

  16. Pingback: home

  17. Pingback: zobacz tutaj

  18. Pingback: tutaj

  19. Pingback: oferta

  20. Pingback: reference large

  21. Pingback: zobacz tutaj

  22. Pingback: darmowe anonse kobiet

  23. Pingback: read dresses

  24. Pingback: strona firmy

  25. Pingback: Continental Automotive Tips

  26. Pingback: Real Estate Commersial News

  27. Pingback: Business and Careers Resource

  28. Pingback: College Of Fashion and Design

  29. Pingback: Clothes Shopping Center

  30. Pingback: Small Business

  31. Pingback: Daily Business Advice

  32. Pingback: Luxury Bathroom Design

  33. Pingback: Traveling & Vacation Ideas

  34. Pingback: tutaj

  35. Pingback: House and Home Design

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s