The new [tag]WhyWhere[/tag] application is starting to work smoothly on [tag]large datasets[/tag] now (http://landscape.sdsc.edu/ww-testform.html). I have added a list of all terrestrial data (All_Terrestrial), though there will be some errors until I clean that lot up, it should be usable. I have been thinking about how to deal with the numbers of data sets in a [tag]streaming data[/tag] framework.
The issue with specifying environmental datasets to use in an analysis comes down to accommodating series of options:
1. User prepares custom datasets: three sub options:
1.1 uploads them onto the server for others to use too,
1.2 upload to use for a fee if not public
1.3 installs the program themselves and use
2. Use a growing collection of global data sets. Sub options for organization.
2.1 separate lists for high resolution and low resolution
2.2 separate lists for marine, terrestrial and freshwater (though some variables overlap)
2.3 user generates customized lists
3. Issues to do with the kind of questions:
3.1 what is the best predictor(s) for this species in all data sets
3.2 what is the specific relationship to selected datasets (e.g. veg)
3.3 what variable other than a given variable (say climate) is important.
4. Various modalities
4.1 distribution modeling
4.2 invasive species
4.3 climate change
4.4 ensembles of species
Rather than allow development to go in all directions I have the idea (vision) of a new type of data streaming web, where customized information including model predictions is just flowing continuously down stream, rather than chunked into stages (data preparation and variable selection, modeling, writing, publication, etc). So it would make sense to envision custom filters applied to the environmental data. One approach would be to have a text selection capability such as regular expressions. Another would be to have a custom editable list in the users scratch directory, rather than have a multitude of options available.
These filters could be applied on each of the channels of the image. For example, the red channel could be used for temperature and rainfall and the green channel used for another category of variable, such transportation. This would ensure variables from the required class are incorporated. A channel could be set to a specific variable: e.g. red is annual temperature, ensuring this variable is used. This forcing method was a feature of the previous version, but the filtering approach would make it a specific behavior in a more general system.