Successful modeling relies heavily on a few

basic concepts in mathematics and statistics. This post summarizes

the major areas you need to know for ecological niche modeling,

illustrated with examples in the vector language R.

## Elements

We assume that readers have a basic knowledge of mathematics.

For people not familiar with the R language, it is helpful to have a summary of the

major types and operations comparison.

R is a very powerful vector language

that supports the basic data types: integer,

numeric, logical, character/string, as well as more advanced

types factor, complex, and raw, and complex containers such as lists, vectors and matrices.

Some types not supported in most languages are as follows:

**Factor** is used to express categories, or enumerated types and consist of a finite set of

named levels. They can also be ordered. It is important to know because R imports data tables as factors by default

and this can get confusing when you want them to be numbers. The example shows factors

of population density of a species.

> factor(c("1", "2", "3", "4"), ordered = TRUE) [1] 1 2 3 4 Levels: 1 < 2 < 3 < 4

**Complex** numbers are of the form *x+yi* where *x* and *y*

are real numbers and i is equal to the square root of -1. The two numbers are called

the real and imaginary part. These are useful abstract quantities as meaningful

solutions often result from basic numeric operations on them.

For example, the two parts can represent the coordinates of a point in a plane.

> j j^2 [1] 23249.52-6872.86i

Type **Raw** holds bytes and is useful for handling binary data in a compact form.

The only operations on raw are bitwise operations, AND, OR and NOT.

The byte value is displayed in hex notation, where the basic digits

range from 0 to 15 are represented by letters 0 to f.

Raw values are most frequently used in images where

the numbers represent intensity, e.g. 255 is white and 0 is black.

However, in using images in a numeric context, the raw values can also represent

categories such as vegetation types in a vegetation map, or normalized

values such as average temperature or rainfall. Bitwise operation can then be

useful to for operations such as masking out areas.

> as.raw(255) [1] ff > as.raw(15) | as.raw(255) [1] ff

**Vectors** are an ordered set of items of identical type and are one

of the most versatile features of R. Vectors can be created in a number of

ways. Below are some of the most common:

> x x [1] 2 1 4 6 5 > y y [1] 3 3 3 3 3 3 3 3 3 3 > z z [1] 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990

**Lists** contain an unordered set of named items of different type.

These are a general purpose type for holding all kinds of data.

Below is an example of a list for holding a vector of locations of a species

and the species name.

> list(coords = c(123.12 - (0+45i), 122 - (0+41i), 130 - (0+40i)), + species = "Puma concolor")coords [1] 123.12-45i 122.00-41i 130.00-40ispecies [1] "Puma concolor"

**Data frames** are an extremely useful construct for organizing

data in R, very similar to tables or spreadsheets. A data frame is essentially

a list of vectors of equal length. That is, each column in a table can be a

different type, but they must all have the same number of items. Data frames

are very commonly used in reading in data for analysis via the read.table command.

**Time series** are another useful construct for handling series where

the elements have an inherent and differing frequencies. The ts command creates a time series

with information on the start, end, frequency and actual data.

> ts(1:10) Time Series: Start = 1 End = 10 Frequency = 1 [1] 1 2 3 4 5 6 7 8 9 10

A **matrix** is a two dimensional vector with a defined number of rows and columns.

The number of columns and rows is set or returned by the dim command.

> m dim(m) [1] 20 20

Below is a table of basic the types, and examples follow.

Examples Integer 7 Numeric 5.6 Logical TRUE, FALSE Character here Factor 1 Complex 0+0i Raw ff Constants pi, NULL, Inf, -Inf, nan, NA Vectors 1:10, rep(1,10), seq(0,10,1) matrices matrix(0,3,3), array(0, c(3,3)) lists list(x=1, y=1) data.frames data.frame(x=numeric(10), y=character(10))

## Operations

R being an interpreted language has the flexibility that

types are usually cast into the correct form, i.e. integer + float = float.

Each of these types use the usual operators available in most computer languges:

Operators Numeric x+y, x-y, x*y, x/y, x^ Logical !x, x&y, x&&y, x|y, x||y, xor(x, y), isTRUE(x) Bitwise !, |, & Relational xy, x=y, x==y, x!=y Assignment x< -value, x<x, value->>x Accessors x.y, x[y], pkg::name, pkg:::name Constructors x:y, x=y, y model

But unlike most

languages, being a vector language overloads these operators even further to

operate on sets of numbers.

For example, lets use

the vectors we constructed using ‘c’, ‘seq’ and ‘rep’ and perform

some operations on them.

In a vector language more complex structures such as vectors and lists can be treated

as basic types because many of the basic operators apply to them.

Basic arithmetric operations like addition are elementwise on vectors.

When vectors are of unequal length they wrap around.

Logical operations on numeric vectors produce logical vectors, very important

for building complex expressions.

> x + y [1] 5 4 7 9 8 5 4 7 9 8 > !y [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > x > y [1] FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE

## Functions

There are many ways to introduce functions. In most language they

are given preferred treatment, but in R they could in many ways be also be regarded

as another simple type because just like the other simple types,

they can be passed as arguments to other functions.

In mathematical terms they are described

as a mapping from one domain to another, such as numbers to numbers.

The identity function that returns whatever is input

would be defined in the following way:

> f f("this") [1] "this"

Another approach to functional relationship is to define it on the Cartesian

product of two sets *X* with *Y* . The *f* is a function provided there is at most one

element *y* of *Y* that is related to *x* via *f* .

When *y* is uniquely determined by the value of *x* , *x* acts like an index.

Using this definition, where indexing of vectors can be regarded as a basic function,

where *x* is the position of the element.

> f f(3, y) == y[3] [1] TRUE

The examples so-far have returned single values, but typical R functions return

much more complex return values: vectors, lists and data frames.

Many functions will operate on whole vectors, it so can be viewed as parallel functions.

The first example below simulates annual temperatures. The second example is a very important

application of indexing a vector with a vector. The third shows the difference between

the simple max operation and the parallel pmax operation.

> daylight daylight(1:12) [1] -8.660254e-01 -5.000000e-01 -6.123234e-17 5.000000e-01 8.660254e-01 [6] 1.000000e+00 8.660254e-01 5.000000e-01 1.836970e-16 -5.000000e-01 [11] -8.660254e-01 -1.000000e+00 > z[x] [1] 1981 1980 1983 1985 1984 > max(x, 4) [1] 6 > pmax(x, 4) [1] 4 4 4 6 5

R also contains a rich set of built in functions that make many programming

tasks simple and efficient. Here is a list of some of the ones used throughout this book.

Description aggregate Splits the data into subsets, computes summary statistics for each cumsum Returns a vector whose elements are the cumulative sums filter Applies linear filtering to a univariate time series hist Computes and plots a histogram of the given data values spectrum Estimates the spectral density of a time series acf Computes and plots estimates of the autocorrelation function

# Basic models used in niche modeling

A basic understanding of various kinds of functions is necessary for ecological niche modelling.

Two main types of function are those describing the response of a species to

an environmental driver, such as nutrients, rainfall temperature, or many others.

The second major type are those describing the biological response over

time or space, and are modeled by functions that incorporate a large

degree of random error, called stochastic functions. These functions

can also be used in higher dimensions.

## Response functions

The first shows basic relationships used for describing populations: linear, exponential and power.

Linear relations are the basic idealized way of describing functional response of organisms to

anything from nutrient to predator populations. Typically responses are non-linear, and exponential

or power relationships are often seen and used.

> y plot(i, y, type = "l") > lines(i, exp(i), lty = 2) > lines(i, i^2, lty = 4)

While useful, these cannot capture the basic concept of a niche model.

A niche is a concept that at minimum is the tendency of a species

to prefer a particular set or range of values. To express this with

a function requires a ‘hump’ or ‘inverted U’ shape centered on values optimal

to the species growth and reproduction. Below are three ways to do this is with

functions: step function, a truncated quadratic, and exponential.

> i y plot(i, y, type = "l") > sf lines(i, sf(i)) > lines(i, exp(-(i^2)), lty = 2)

Another frequently encountered relationship is the periodic useful

to describe everything from daily, annual, and multiyear cycles.

Relationships between periodic series to be aware of are period

doubling and additive cycles.

> y1 y2 plot(i, y1 + y2, type = "l") > lines(i, y1, lty = 2) > lines(i, y2, lty = 4)

## Stochastic functions or series

The next important functions to know about are stochastic functions.

These are used to describe various forms of randomness in models.

While on the one hand a stochastic series can be regarded as a time series,

it can also be regarded as a function, with the input as the index according

to the equivalence above. The primary example is the rnorm function

that returns a vector of random numbers between zero and one.

**IID** refers to the statistics of independent and identically distributed random numbers where

every *y* value is a simple random number with no reference to any other

number.

> par(mfrow = c(2, 1)) > y plot(y) > acf(y)

The second graph is an autocorrelation function a useful tool for

discriminating different types of random variables. Correlation

is a quantification of the degree to which two variables are linearly

related to each other. Autocorrelation then is the degree to which

a variable is related to itself. The autocorrelation function (ACF) is

the correlation at each distance between points, called lags.

The IID series has high autocorrelation at zero lag, as every number

correlates with itself. Correlation at all other lags are below the level of

significance, indicated by the dashed line.

**Moving Average** is the application of a moving window

over the points in the series. As can be seen in the ACF plot, the averaging process

produces some autocorrelation between neighboring points with low lag. Note that we must

remove some values at each ends of the averaged series as the filter operation

consumes end points. The moving average is often called a low pass filter,

as it removes high frequency variations and leave low frequency ones.

> par(mfrow = c(2, 1)) > y plot(y) > acf(y)

A **random walk** is a series where each value

is dependent on the previous, plus some noise.

It is simply created by taking the cumulative sum of a random

IID variable. This is easily generated in

from the cumulative sum of IID random numbers using the cumsum function in R.

The ACF plot shows autocorrelations in a random walk are extremely persistent,

with significant correlation between lags of up to 100.

As an IID process has the properties that the average of

the random numbers is finite (zero) the series

tends to stay roughly level with the starting point no matter how many numbers are generated.

A random walk however, has the property that its average value is infinity.

Over time notice the random walk appears to ‘trend’. The

series can diverge arbitrarily far from the starting point.

> par(mfrow = c(2, 1)) > y plot(y, type = "l") > acf(y, lag.max = 100)

The **Markov process** is a series where each value is

partially dependent on the previous value, but no other previous

values. That is, there is no additional knowledge to be gained

about the future from the past values as it is all captured in the present value.

This variable describes such things

as water levels where the daily evaporation is a random variable

and allows an intermediate level of autocorrelation.

> par(mfrow = c(2, 1)) > y plot(y) > acf(y)

# Two dimensional models

All of the forms of functions can also apply to two dimensions.

A biological response function might represent the response

of a species to two environmental factors, such temperature

and rainfall. The most common use of 2D functions is

a distribution over part of the earth’s surface, i.e. a map.

In R we can generate a two dimensional matrix of random numbers

and display it with the image command. We can also examine

the correlation structure treating the 2D matrix as a 1D vector.

We can generate different types of autocorrelation structure

with the random fields library. Here we can see the

autocorrelation between neighboring points. The ACF also

shows a second peak of correlation at a lag of 100, corresponding to

the dimensions of the matrix, when each point becomes correlated

with itself. Called features are called artifacts and are something

to watch out for in any form of analysis.

The third graph is another useful diagnostic for autocorrelated series is the

spectral frequency, showing the intensity of variation at particular frequencies (or lags).

IID series have a flat spectral frequency. However, autocorrelated series have

greater variation in the longer frequencies (or lags) as shown in the figure.

The artifacts are more clearly revealed as regular ‘spikes’ in intensity in this plot.

Normally the periodogram should be smooth.

There are many more forms of mathematical and statistical relationships that could be described

and used in models. In this chapter are the most basic and frequently used.

Some are introduced in subsequent chapters, but these will be used

frequently. It would be worth while to install R and experiment with them yourself and

become familiar with the various options and their effects. Playing with them

in this way will improve your intuition for models, and most importantly prepare you

for recognizing the possible things that can go wrong.

davids@David-Stockwells-Computer.local

Insightful read. I have stumbled and twittered this for my friends. Others no doubt will like it like I did.

Pingback: bateria do laptopa acer

Pingback: wypozyczalnia samochodów Gliwice

Pingback: organizacja eventów kulinarnych

Pingback: link do strony

Pingback: strona www

Pingback: polecam

Pingback: link do strony

Pingback: kliknij

Pingback: kliknij tutaj

Pingback: zakupzlota.wordpress.com

Pingback: program pity 2015 - pity2015program.pl

Pingback: zobacz tutaj