Autocorrelation in GAM and GRASP models is an important topic of discussion since these models are being widely used in predictive animal and plant distribution models in the discipline of ecology.
The most widely used statistical models in the fields of ecological modeling, biodiversity and conservation are Generalized Linear Models (GLM) and GAM (General Additive Model) which is a semi-parametric extension of GLM. GRASP stands for Generalized Regression Analysis and Spatial Prediction (http://www.cscf.ch/grasp/grasp-s/welcome.html). GRASP is a combination of advanced S Plus functions and GIS (Geographical Information System) Many of these applications can be run through the software “R” (www.r-project.org).
What is Autocorrelation?
Autocorrelation describes correlation between a process, say Xt, at a different point of time Xs. The autocorrelation function can be depicted in a formula as
where Xt has the variance σ2 and mean μ. E is the expected value. The result will range between -1 and 1. 1 indicates perfect correlation while -1 indicates perfect anti-correlation. You must note that the function should be well defined.
If you use time series data in regression analysis, autocorrelation of residuals will be a problem area, since it will lead to an upward bias in the statistical significance of coefficient estimates. A Durbin Watson test can be used to detect the presence of autocorrelation. You can use Durbin’s h statistic, if your explanatory variables include a lagged dependent variable. To avoid autocorrelation related problems you may use differencing of data and lag structures in estimation. (http://en.wikipedia.org/wiki/Autocorrelation)
Biogeographical Predictions of Distributions of Species
Biographical predictions of distribution of species are very important in assessing the impact of changing environmental conditions on the distribution of species, eco systems and natural communities. Many of the statistical models ignore the important issues related to the distribution of species like spatial autocorrelation, dispersal and migration, and biotic and environmental interactions. When modeling spatial distribution of species following the things should be taken into consideration (GLF+06).
- Links with ecological theory
- Incorporation of spatial concept
- Optimal use of artificially generated data and existing data
- Integration of environmental and ecological interactions
- Prediction of errors and uncertainties and
- Prediction of the distribution of communities
In most of the spatial analyses, spatial autocorrelation is an important source of bias. Although GAM, CTA and GLM models are vulnerable to spatial autocorrelation the performance of GAM and CTA models are better. The reliability of niche modeling can be improved if certain procedures and techniques, such as ‘null model approach,’ are taken into consideration during the modeling process (SAK06).
GRASP Versions
The R version shares the basic idea and architecture of the S Plus version, but has evolved independently in ways specific to the R environment. Both the versions are made available for free to promote the use of spatial predictions in environmental management and other niche modeling.
The Need for Robust Distribution Models
You need robust species’ distribution models and documentation in order to predict the effects of changing environmental conditions on biological communities and ecosystems. Autocorrelation in GAM and GRASP models should be considered seriously. An improved collaborative effort between theoretical and functional ecologists, ecological modelers and statisticians is needed to make better distribution models.
References
GLF+06 – Making better biogeographical predictions of species’ distributions by ANTOINE GUISAN, ANTHONY LEHMANN, SIMON FERRIER, MIKE AUSTIN, JACOB MC. C. OVERTON, RICHARD ASPINALL and TREVOR HASTIE Journal of Applied Ecology
Volume 43 Page 386 – June 2006, (http://www.blackwell-synergy.com/doi/abs/10.1111/j.1365-2664.2006.01164.x)
SAK06 – Consequences of spatial autocorrelation for niche-based models, P. SEGURADO, M. B. ARAÚJO and W. E. KUNIN, Journal of Applied Ecology, Volume 43 Page 433 – June 2006. (http://www.blackwell-synergy.com/doi/abs/10.1111/j.1365-2664.2006.01162.x).
David,
Let me offer a little experiment. Generate a series with LTP noise. Now run a regression of it on a constant and linear trend. Check the D-W stat. It should generally be in the danger zone — well below 2.0. Now run the regression with the series on a constant, the linear time trend and the lagged value of the series. This should remove almost all of your first order autocorrelation and give a commensurately pretty D-W stat.
Take the residuals from that regression and run them on the trend and their lagged values up to, say, 10 to 20 lags. Then do a Wald test of the constraint that all the coefficients of the lagged residuals are zero. This is sort of rough version of the Breusch-Godfrey Serial Correlation LaGrangian Multiplier Test. Run the regression on the squared residuals without the exogenous and you have the basic construct of the ARCM LM test. For each there is test statistic that is asymptotically Chi Squared. I am certain that these tests are somewhere in one of the R packages.
Anyway the purpose of this comment is note for many of the variables that you and others have expressed interest in the autocorrelations, e.g. temperature series, there is often more than first order autocorrelation, and it might be wise to use test statistics that a little more general. (Note D&W came up with a D-W stat for higher order autocorrelations but I forgot just what it was. Also there are things more hip than the B-G and ARCH LM tests, but I wasn’t impressed enough to bother to implement them into my regression routines.)
Also for cases where one is interested in LTP per se, the Andrew Lo (of Campbell, Lo and MacKinlay fame) paper “Long-Term Memory in Stock Prices,” Econometrica, Vol 59, No. 5, Sep 1991 shows how to test for a function of a quasi Hurst exponent that accounts for the short term autocorrelation function. (Lo and MacKinlay’s book “A Non-Random Walk Down Wall Street” shows the test in a somewhat more digestible form.) The trick Lo uses is part of several tests or adjusted estimates of standard errors including the Newey-West estimates (which correct for autocorrelation and heteroskedasticity — well to a degree — in the S.E.s of the regression coefficients). Anyone who wants to assert long term persistence — or correlation or memory or what have you — needs to have a test that distinguishes from the “short term” correlation effects and the long term correlation affects. Lo gives one.
David,
Let me offer a little experiment. Generate a series with LTP noise. Now run a regression of it on a constant and linear trend. Check the D-W stat. It should generally be in the danger zone — well below 2.0. Now run the regression with the series on a constant, the linear time trend and the lagged value of the series. This should remove almost all of your first order autocorrelation and give a commensurately pretty D-W stat.
Take the residuals from that regression and run them on the trend and their lagged values up to, say, 10 to 20 lags. Then do a Wald test of the constraint that all the coefficients of the lagged residuals are zero. This is sort of rough version of the Breusch-Godfrey Serial Correlation LaGrangian Multiplier Test. Run the regression on the squared residuals without the exogenous and you have the basic construct of the ARCM LM test. For each there is test statistic that is asymptotically Chi Squared. I am certain that these tests are somewhere in one of the R packages.
Anyway the purpose of this comment is note for many of the variables that you and others have expressed interest in the autocorrelations, e.g. temperature series, there is often more than first order autocorrelation, and it might be wise to use test statistics that a little more general. (Note D&W came up with a D-W stat for higher order autocorrelations but I forgot just what it was. Also there are things more hip than the B-G and ARCH LM tests, but I wasn’t impressed enough to bother to implement them into my regression routines.)
Also for cases where one is interested in LTP per se, the Andrew Lo (of Campbell, Lo and MacKinlay fame) paper “Long-Term Memory in Stock Prices,” Econometrica, Vol 59, No. 5, Sep 1991 shows how to test for a function of a quasi Hurst exponent that accounts for the short term autocorrelation function. (Lo and MacKinlay’s book “A Non-Random Walk Down Wall Street†shows the test in a somewhat more digestible form.) The trick Lo uses is part of several tests or adjusted estimates of standard errors including the Newey-West estimates (which correct for autocorrelation and heteroskedasticity — well to a degree — in the S.E.s of the regression coefficients). Anyone who wants to assert long term persistence — or correlation or memory or what have you — needs to have a test that distinguishes from the “short term†correlation effects and the long term correlation affects. Lo gives one.
Thanks for the information on the tests Martin. If I were writing a paper for a journal I would use them.
I have been interested in building intution for LTP also, with posts like Scale Invariance for Dummies. The difference between LTP and AR(1) say is easily seen on a plot of log standard deviation vs log aggregation (i.e. daily, weekly, yearly averaging say). LTP has this constant increasing relationship of variance to scale, but AR(1) (Markov or STP) the variance ‘fades’ at the higher aggregations or longer time scales. This shows clearly the danger in finding apparent ‘trends’ that are really noise at longer scales.
I would be interested in what you think of the arguments for LTP at a basic level. For something like financial series that ‘clear’ daily, one could imagine AR(1) applying, but for natural series the timing of measurements is arbitrary (above yearly). So the argument is that AR(1) is a function of the observers arvbitrary choice of time scale. Natural series can be imagined to be like AR(1) simultaneously at all time scales, a model approximated by fractional differencing. Random Numbers Predict Future Temperatures. Thus there is an argument underpinning LTP similar to special relativity, i.e. Newtonain physics is based on an absolute inertial frame, Special relativity drops that assumption. AR(1) is based on an absolute time scale, but dropping that assumption produces LTP behaviour. The evidence is that this is seen in all natural phenomena.
It terms of detecting LTP, another visual approach is the partial autocorrelation I showed in Options for ACF in R which shows you the possible order of the lags. What do you think of partial ACF?
Thanks again Martin.
Thanks for the information on the tests Martin. If I were writing a paper for a journal I would use them.
I have been interested in building intution for LTP also, with posts like Scale Invariance for Dummies. The difference between LTP and AR(1) say is easily seen on a plot of log standard deviation vs log aggregation (i.e. daily, weekly, yearly averaging say). LTP has this constant increasing relationship of variance to scale, but AR(1) (Markov or STP) the variance ‘fades’ at the higher aggregations or longer time scales. This shows clearly the danger in finding apparent ‘trends’ that are really noise at longer scales.
I would be interested in what you think of the arguments for LTP at a basic level. For something like financial series that ‘clear’ daily, one could imagine AR(1) applying, but for natural series the timing of measurements is arbitrary (above yearly). So the argument is that AR(1) is a function of the observers arvbitrary choice of time scale. Natural series can be imagined to be like AR(1) simultaneously at all time scales, a model approximated by fractional differencing. Random Numbers Predict Future Temperatures. Thus there is an argument underpinning LTP similar to special relativity, i.e. Newtonain physics is based on an absolute inertial frame, Special relativity drops that assumption. AR(1) is based on an absolute time scale, but dropping that assumption produces LTP behaviour. The evidence is that this is seen in all natural phenomena.
It terms of detecting LTP, another visual approach is the partial autocorrelation I showed in Options for ACF in R which shows you the possible order of the lags. What do you think of partial ACF?
Thanks again Martin.
Pingback: ???
Pingback: ???
Pingback: list your startup
Pingback: XXX Videos
Pingback: wypozyczalnia lawet
Pingback: wypozyczalnia samochodow dostawczych
Pingback: zobacz tutaj
Pingback: warsztaty kulinarne malopolska
Pingback: Nue Design
Pingback: anti aging
Pingback: website
Pingback: see latest news
Pingback: massage sensuel paris 19
Pingback: web link
Pingback: taco man catering palm springs
Pingback: more info
Pingback: follow this link
Pingback: aiéoi oaáa äéîaoéí
Pingback: strona www
Pingback: ????? ?????
Pingback: ????? ?????
Pingback: aiéoi oaáa äéîaoéí.
Pingback: ??????—?&???SEO
Pingback: Denver Pest Control
Pingback: degage
Pingback: link indexr bonus
Pingback: ehpad
Pingback: Glenn Lyvers
Pingback: ?? ?????
Pingback: Cars and Motorcycles Website
Pingback: polecam
Pingback: status nieruchomosci mieszkaniowe
Pingback: Stanton Optical Roseville
Pingback: zobacz tutaj
Pingback: emdmars2014
Pingback: emdmars2014
Pingback: blood testing
Pingback: Birmingham escorts
Pingback: massage sensuel paris
Pingback: piese
Pingback: izmir masaj
Pingback: lucrare licenta
Pingback: kliknij link
Pingback: tastaturi
Pingback: Birmingham escort agency
Pingback: www.blacklabelagency.co.uk/
Pingback: orange yomi
Pingback: Watch this Zenerect Review
Pingback: cipki
Pingback: body chains to buy
Pingback: location photocopieur
Pingback: strona
Pingback: love spells
Pingback: penapis ar aljabbar
Pingback: exploding targets
Pingback: BORDER CUSTOM CLEARANCE
Pingback: Camilo Concha
Pingback: witryna www
Pingback: montaz sufitów armstrong