WhyWhere 2.0 update

I received a number of emails suddenly about [tag]WhyWhere[/tag], and I thought I would answer them all here with an update on progress of the new version. This is my highest priority now, and should be available as beta in a week or so. The old version was too hard to maintain, being built on via a number of student postdocs over many years. The new version will be in [tag]R[/tag] and so have far fewer lines of code. It will also be more more consistent with subscription trends. It will consist of a small block of html code that you cut and paste into your web page. Then, it will (hopefully!) generate a dynamic output of the prediction of the best model so-far as it mines through the database for correlations. So of the questions I have been asked are below.

I was wondering if the dataset used with whywhere includes attributes of river variables. I am speaking about channel width, reach gradient, mean annual flow, etc. I am intersted in modeling potential species distributions in a riverine network. Any insight would be valuable…

The hydro1k data set is included. Alist of variables is at http://edc.usgs.gov/products/elevation/gtopo30/hydro/namerica.html

[tag]Biodiversity[/tag] section is a latest initiative and we want to use desktopGARP. It seems very useful for our research work. We are having problems in running it and understanding the results. We were hoping that if there was a detailed manual that could help us with more understanding of this software.

All questions regarding DesktopGARP should go to the list for the purpose at http://www.cria.org.br/mailman/listinfo/desktopgarp.

I would like to use WhyWhere for predicting coral species distributions. Since I’m not familiar with the software and whether the model can meet my needs, I would like to try the web version first. However, I cannot open the webpage to the web version of WhyWhere. Is the link removed? Where will I be able to get access to WhyWhere web version? Or must I download the window version?

I have unlinked the web server version of WhyWhere as I don’t intend to support it long-term, and will soon have a new server version. The desktop version can be downloaded and works fine. It does however require a few packages to be installed for use by the Perl modules. People have found this a bit of a hurdle.

Do you have a time estimate available for when the WhyWhere version 2.0 portal and FAQ and ToDo List documents will be available? As part of a project for class and as part of my masters work I am investigating the various models available. I am interested in trying the WhyWhere model out for the class project, but want to make sure I’ll have all of the necessary information in time to complete the project. If not everything will be available, I will likely do my class project with Desktop GARP instead.

My best estimate is a week or two. Sorry I can’t be more precise. I have had to move all my stuff onto a new server, but this will be a permanent home from now on. Version 2.0 should be much better and easier to use.

I would like to know if the GARP model is suitable for use on smaller areas (roughly 150 000 ha) and if so, would the pre-processed dataset of North America be usable in this case or would we have to create our own? Also, I’ve noticed that the section in the user’s manual on Climate Change is under construction and I was wondering if there has been any progress or anywhere else I might be able to learn a bit more about the model’s ability to function using climate change scenarios. Lastly, if possible, would you be able to let me know what the 0 of the parameters in the dataset layers for North America represent?

If you are interested in DesktopGARP go to the list above. To work at that scale you would need vegetation data from satellite. I have a few layers, the continuous fields dataset from Uni Maryland that will be available in WhyWhere. Other than that you would need to get your own. Climate change predictions are possible (by saving a model, then reapplying it to shifted variables). I have no shifted variables available in the WhyWhere database however.

AIG Article

The Australian Institute of Geoscientists News has published online my article “Reconstruction of past climate using series with red noise” on page 14. Many thanks to Louis Hissink the editor for the rapidity of this publication. It is actually a very interesting newsletter with articles on the IPCC, and a summary of the state of the hockey stick (or hokey stick). There are articles on the K-T boundary controversy and how to set up an exploration company.

Reconstructing the hokey stick with random data neatly illustrates the circular reasoning in a general context, showing the form of the hokey stick is essentially encoded in the assumptions and proceedures of the methodology. The fact that 20% of LTP series (or 40% if you count the inverted ones) correlate significantly with the temperature instrument record of the last 150 years illustrates that (1) 150 years is an inadequate constraint on possible models to base an extrapolation of over 1000 years, and (2) the propensity of analogs of natual series with LTP to exhibit ‘trendiness’ or apparent long runs that can be mistaken for real trends. And check back shortly for the code, I have been playing around with RE and R2 and trying some ideas suggested by blog readers to tighten things up.

With the hokey stick discredited from all angles, even within the paleo community itself with recent reconstructions of Esper and Moberg showing large variation in temperature over the last 1000 years, including temperatures on a par with the present day, one wonders why it is taking so long for the authors of the hokey stick to recant and admit natural climate variability. While the past variability of climate may or may not be important to the attribution debate, it is obviously important on the impacts side, as an indicator of the potential tolerances of most species.

Hurst, Joseph, colours and noises

Demetris Koutsoyiannis contributed the following excellent piece as a comment on a previous post. I have made it into a post to ensure it gets the widest distribution.

Hurst, Joseph, colours and noises: The importance of names in an important natural behaviour

“What’s in a name? That which we call a rose
By any other name would smell as sweet.

William Shakespeare, “Romeo and Juliet, Act 2 scene 2

Is the name given to a physical phenomenon or in a scientific concept (e.g. a mathematical object) really unimportant? Let us start with a characteristic example, the term “regression”. The term was coined by Frances Galton who studied biological data and noticed that the offspring population were closer to the overall mean size than the parent population. For example, sons of unusually short fathers have heights typically closer to the mean height than their fathers. Today we know that this does not manifest a peculiar biological phenomenon but a normal and global statistical behaviour. The slope of the least squares straight line of two variables x and y is r_xy * s_y / s_x, where s_x and s_y are the standard deviations of the variables and r_xy is the correlation coefficient. In the example of the height of fathers and sons, s_x = s_y, so the slope is precisely r_xy, which (by definition) is not greater than one; hence the “regression” towards the mean. Today no one has any problem with this generally accepted term, even though clearly it is not a good name. No one has problem to understand the statistical (rather than biological or physical) origin of the “regression” and its irrelevance with time: For example the fathers of exceptionally short people also tend to be closer to the mean than their sons. Just interchange y and x (and the axes in the graph) and you will have again another line whose slope (in the new graph) will be again r_xy, that is, not greater than unity. However, until people understood these simple truths, the improper term must have caused several fallacies (see Regression fallacies in the Wikipedia article “Regression toward the mean”, http://en.wikipedia.org/wiki/Regression_toward_the_mean).

Continue reading

How to start a science blog (nice version)

You might have noticed the change in the URL for this site to http://www.landshape.org/enm. I have had to set up a site on web hoster and move the blog over as the old server couldn’t cope with the traffic. Here are some of my thoughts on blogs for others who might be interested in starting their own.

There are many reasons a scientist might start a blog:

  • Prepublication of work to enable review by others
  • Outreach to the general community
  • Dissemination of research notes
  • Provide a review of the literature
  • Advocate a position or idea
  • Facilitate project management
  • Make money

Of these the last is probably the most tricky, but I will say something about that too. After deciding to start a blog, the next question is how to do it. There are a range of possibilities available. Following are my notes on the experience.

Continue reading

Blogs on random temperature reconstruction

A new temperature reconstruction has certainly resonated with many people. Here is a summary of what some of the blogs have been saying, and my corrections of some small inaccuracies.

American Thinker wrote a very upbeat but over the top piece.

The scientific argument that humans have caused global warming – a major underpinning of the “Kyoto Protocols�? – suffered a major blow last week, with the publication of a new study. The implications have not yet spread very far beyond the rarified circles of specialists, but the gospel of “anthropogenic�? – human-caused – global warming has lost one of its intellectual foundations.

However, the article has not yet been through the rigor of publishing – but some preliminary results will be in the Australian Institute of Geologists newsletter next month.

Continue reading

Cross validation as a test of random reconstructions

To recap previous posts (http://www.climateaudit.org/?p=566), about replicating the cross-validation procedure used in MBH98 for reconstruction skill of randomly generated series on raw and filtered CRU temperatures. The RE statistic correctly indicated no skill for the reconstruction in both the raw and filtered temperature data. The R2 statistic indicated no skill on the raw temperature data and skill at predicting the filtered temperature data. The importance of these ‘tests’ is that they are the basis for accepting or rejecting a reconstruction. The question addressed is, are the tests using RE and R2 capable of discriminating between meaningful proxy data and a reconstruction developed using random data?

Continue reading

RE of random reconstructions

To follow up on the last post, I have calculated the RE as well as the R2 statsitics for the reconstruction from the random series. The same approach was used, i.e. generate 1000 sequences with LTP, select those with positive slope and R2>0.1, calibrate on linear model, and average. Here is the reconstruction again, with the test and training periods marked with a horizontal dashed line (test period to the left, training to right of temperature values):

Continue reading

R2 statistics for random reconstructions

As a follow-up om the previous post, I have examined the correlation statistics for the reconstruction of past climate from random series with red noise. I have tried to use the same approach as MBH98, where the model is tested over data for years held back from the main analysis and model development. Different intervals of years could be chosen, but in the case of MBH98, the model is trained on years 1901-1990 and tested on years 1856-1900. The distribution of R2 values are as follows:

Figure 1. The frequency distribution of R2 values for all series (trees) over the training interval in blue, and the test interval in red. The distribution of R2 before selection is shown by the solid line and after selection by the dashed line. Series are selected if the R2 value is greater than 0.1 and have a positive slope.

Continue reading