Heterogeneous Weather Sequences

The recorded sequences of temperature and rainfall from weather stations is often strikingly heterogeneous, with many different formats, protocols, and disruptions in the records. The Australian temperature record we see is the product of smoothing algorithms used to produce graphic displays that hide this structure. The problem of parameter estimation must be approached using methods that approximate the ideal homogeneous case.

While a standard formats for water data exists (WDF) there do not appear to be standards for temperature data. The function sequences reads and detects two types of data file downloaded the Australian Bureau of Meteorology: the Climate Data Online (CDO) and the ACORN-SAT reference data set. You can use a wild card to identify the ones you want or a specific one.


The sequences function returns a ‘zoo’ series which is a particularly powerful time series structure in R. The zoo series can be combined on the union or intersection of their dates with the ‘merge’ command.

Zoo series can represent time of day as well with the ?YYYY-mm-dd hh:mm:ss? format, allowing separate maximum and minimum temperature series to be effectively combined to achieve a single daily temperature series.


Figure 1 is the plot of the Rutherglen minimum daily temperature in from the raw CDO and homogenized ACORN network. The difference between the raw CDO data and the ACORN series is in blue. These are the adjustments to the Rutherglen minimum series.

All of the sequences that match criteria can also be loaded with a command such as the following.


The summary function returns descriptive statistics about the heterogeneity of single or multiple series such as the date of the first and last value, the number and proportion of NAs.


Figure 2 shows the number of active stations (non-NAs) in 176 stations in south-eastern Australian. Note the extremely uneven collection of weather data over time. Such extreme heterogeneity can easily bias analysis that is sensitive to the number of missing values over time.

For example, calculation of a mean temperature sequence could only done reliably if the missing values were distributed uniformly. If the weather stations that recorded during the latter 20th century tended to be situated inland where the weather is hotter, this would tend to bias a simple average warm over that period. If the sequences were standardized on a common time period such as the 60’s, there would be a bias between periods before and after the standard period.

Clearly, taking the mean, averaging, or any similar operation is fraught with danger with extremely heterogeneous sequences such as these.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s