Statistical Predictive Models in Ecology: Comparison of Performances and Assessment of Applicability
Authors: Can Ozan Tan, Uygar Ozesmi, Meryem Beklioglu, Esra Per, Bahtiyar Kurt
Comments: Submitted to Ecological Informatics
This interesting study on the open archive site arXiv (now in Ecological Informatics 1:195-211) compares some predictive models for species distribution not examined in the study of EG+06, reviewed in Novel methods continue to improve prediction of speciesâ€™ distributions. They used nearest neighbor (k-NN, ARTMAP) and neural net methods (not evaluated in EG+06) and generalized linear models and discriminant analysis (LDA and QDA) (evaluated in EG+06). The GLM method is in common to both TO+06 and EG+06. They found:
The methods considered k-NN, LDA, QDA, generalized linear models (GLM) feedforward multilayer backpropagation networks and pseudo-supervised network ARTMAP. For ecosystems involving time-dependent dynamics and periodicities whose frequency are possibly less than the time scale of the data considered, GLM and connectionist neural network models appear to be most suitable and robust, provided that a predictive variable reflecting these time-dependent dynamics included in the model either implicitly or explicitly. For spatial data, which does not include any time-dependence comparable to the time scale covered by the data, on the other hand, neighborhood based methods such as k-NN and ARTMAP proved to be more robust than other methods considered in this study.
Both of the nearest neighbor methods performed better than GLM methods on bird breeding data sets. To the extent results are comparable, this would place them above GLM in the EG+06 study and possibly in the best performing techniques. The traditional neural net methods performed worse than GLM.
Not really wanting to use this review as an opportunity for advertising, but the good results for neighbourhood based methods are encouraging for my favourite method at the moment, WhyWhere. It uses a categorization heuristic related to a nearest neigbour approaches and hasn’t been evaluated w.r.t. other methods. I got interested in these approaches when simple categorization approaches gave superior performance over GLMs in
Effects of sample size on accuracy of species distribution models. WhyWhere of course has the additional feature, that the best variable(s) are selected from up to 1000 environmental variables. Results in Improving ecological niche models by data mining large environmental datasets for surrogate models show better performance than GARP models restricted a few environmental variables such as temperature and rainfall.
I am not claiming that WhyWhere is an optimized algorithm for niche modelling. WhyWhere is at a proof of concept stage and could be greatly improved in many directions. However, I was struck by the following as the main reason for trying the current implementation, summed up nicely in TO+06.
In addition, for predictive modeling purposes, first a suitable, computationally inexpensive method should be applied to the problem at hand a good predictive performance of which would render the computational cost and efforts associated with complex variants unnecessary.
The idea of viewing model selection as a cost benefit analysis is a good one. Using WhyWhere to initially to sift through many possible correlates is very low cost in time given the internet version together with available datasets. In comparison, most other methods do not come with an environmental database.