The WhyWhere system has integrated a lot of environmental data sets of many different kinds with a robust method. This allows you to search for for correlates of any geographic points, not just species. The user does not have to prepare these, just enter the coordinates. I thought it would be interesting to see what correlated with recent temperature anomalies. We all know average annual temperatures have increased in the last 30 years, but the spatial pattern of those increases is less well understood.
Just as a test, I downloaded the PC version of WhyWhere onto a new machine and see what problems I might encounter and record the results. Here are the steps and results…
Description of steps
1. I downloaded the temperature anomaly data as a text file for the last 30 years from from http://data.giss.nasa.gov/gistemp/maps/.
Figure 1. This is what it GHCN_GISS_HR2SST_250km_Anom0112_1975_2005_1951_1980 looked like.
2. I output it as a file of x and y coords suitable for entry with a R script. Actually, I had to go back and modify the script to get only points with anomalies > 0.2 and with latitude > -0.45 to cut out Antarctica. There are problems with Antarctica described later.
d0 .2& ! is.na(data$t) & data$lat>-0.45,],select=c("lon","lat"))
1. I went to http://biodi.sdsc.edu/ww_home.html and downloaded the three zip files source, data and support. I then unzipped them and had to be careful the contents went into the same directory called WW.
2. I installed the source packages as there were no ActivePerl programs on the machine. This went according to the install instructions in INSTALL.txt
3. I went to WW and clicked on t.bat to start the three servers. Then I clicked on PROGRAMS.html to bring up the list of programs in IE.
4. I clicked on separate_portal and got a login interface in the browser.
5. I clicked new directory, then Get_Points and passed the locations of positive temperature anomaly into the box, and clicked the button. 3891 points were uploaded, e.g.
Figure 2. Here is a picture of the points of anomaly greater than 0.2C.
6. Going to Get_GIS I selected the All_Terrestrial_Data data set and a working resolution of 0.5 degree. This returned the 184 data sets in this category in the scratch file system guest_WW2968 (about 15mins). Later, I selected Remote_All_Data to get the entire dataset downloaded from the SRB, but I left it overnight. The data only has to be downloaded once – after that I can use it for a series of analyses. Oh, I also clicked no_delay on the Remote_All_Data so the browser wouldn’t be waiting for the script to return. When I did this I had to wait for the file Agetgisdone to appear in the working directory. It was good to look at a listing of the files in the working directory to see the files appearing.
7. I clicked on Predict and waited for Apredictdone to appear in the working directory. The final statistics of themodel were as follows.
Best model used vars .33.36. lctmp04 has objective internal 0.622,0.622 external 0.553 Z-score 0.70
Result loop 1 model .33 accuracy max 0.613 int 0.613 ext 0.498 sig. 9.105
Result loop 2 model .33.36 accuracy max 0.622 int 0.622 ext 0.553 sig. 0.704
8. I clicked on Explain to see the results. The variables correlated with temperature anomalies from 1975 to present were as follows.
33. lctmp01 Leemans and Cramer January Temperature (0.1C)
Range -583 to 406 0.1 degrees Celsius
36. lctmp04 Leemans and Cramer April Temperature (0.1C)
Range -430 to 432 0.1 degrees C
Figure 3. Here is an image of the frequency histograms in 1D.
The line through the green crosses was added by me, as the package doesn’t fit a smooth model yet. It represents the posterior probability, and the crosses could also show the expected variance. This would be a welcome addition as it would provide a smoother predictions with less ‘speckles’ due to random noise. Also, this variable is the most significant, but there could be many others that are also very significant but not shown.
Figure 4. Here is the predicted probability surface for temperature anomolies.
Note that this is actually a fairly low accuracy compared with species predictions. Looking at the histogram you can see a single large spike in the middle. I mentioned before a large spike due to Antarctica being a large area on the image with a single value for l&c jan temp. When I ran it again, I think reduced but didn’t eliminate the problem. Nevertheless, the algorithm was able to function despite the very considerable inhomogeneity in the data sets and produce an interesting answer.
The interpretation? While increasing temperatures have been attributed to increasing GHG generally, that this is largely due to milder winters is well known, and this is perhaps the pattern being picked up by the January temperatures. By looking at the histogram you can see the anomalous mild winters are spread throughout the cooler climates NH rather than the tropics.
vThe variable (33 – January temperatures) is significant, but the second variable (36 – April temperatures) in not significant in addition. The 2D surface seems to have better visual agreement with the original map of anomalies.
Figure 5. For interest here is the 2D histogram and the predicted surface.
Figure 6. Here is the predicted surface for the 2D model.
So the result is not earth-shattering but does illustrate some important issues.
- WhyWhere works, is easy to install and use, and can provide interesting insights in a convienient framework.
- I like the transparency that the low dimensional model gives, as it is easy to see the data and associations you are working with.
- One sourse of significant associations between variables are arithmetic derivation from each other (January temperature and Annual temperature). These are however ‘spurious’ in the sense they do not speak to causal explanations.
- The from of relationships (in this case parabolic) can be discovered, not assumed.
To be continued …