In this section we show how to use R as a stand-alone database for niche modeling.
Even though R is a vector programming language, R has powerful
operations that replicate relational database operations including select and join.
Using R in this way avoids the need
for setting up and for analysis and interacting with an additional piece of software.
describing basic database operations in R helps to build knowledge of the
R’s powerful indexing operations.
Loading and saving a database
One of the main languages used in databases is the SQL or structured query language.
While not going into the syntax of this language, we will use it to compare with
similar operations written in R.
The first set of commands you encounter is for loading and saving a database.
This role can be played by the user environment, the set of
all data and functions defined by the user.
R allows data consisting of many different forms of information
to be loaded or saved as a distinct object. This is easier that dealing
with separate objects. The command $load$ brings a saved R object
into the current environment.
The command $ls$ or $object$ lists the data sets and
functions defined in the currently by the user. The command $save$ writes R objects to file,
while $save.image$ saves all user-defined objects in the current environment.
> objects()  "g" "last.warning" "m" "obs" > save.image(file = "test.Rdata") > load("test.Rdata")
External transactions for importing and exporting
data to other programs are a necessary, such moving too and from a spreadsheet.
Where one might use import or mysqlimport here we read a small local database
of locations where a species was sighted and write it back out to a table.
We can also get this data from the web.
> obs write.table(obs, file = "obs.txt") > obs X Y Code A 3.4 4.3 a B 1.0 1.0 b C 2.0 2.0 c D 3.4 4.3 d
Creating a database
The first thing we need to do in creating a database is to generate tables.
The R data structure called data frame serves the purpose. A data frame is
really a list of vectors (columns) of equal length,
ideal for storing all kinds of information. Relative to relational databases,
column names will be used as variable names, and the variable name will serve as the table name.
Comprehensive information about a data.frame, or any R object, can be listed with the command $attributes$
> locations names(locations)  "id" "X" "Y"
Manipulating and entering data in R is perhaps easier than mose databases.
To insert rows at the bottom of a table we use rbind. To insert a column we would use cbind.
> locations locations locations locations id X Y MyCode 1 PC101 3.4 4.3 a 11 PC101 1 1 b 2 PC101 2 2 c 3 PC101 3.4 4.3 d
Similarly altering tables is easy. Whereas a table is modified in relational database with
the ‘drop’ command, in R a column is deleted by assigning NULL to it.
R automatically assignes names to rows but not the ones we want.
Say we didn’t want to use the existing codes, but wanted to use a
a unique index for each observation. These could be added to the
table as row names providing they are unique.
We can easily change the name of a table with a reassignment.
The old $locations$ object can be deleted.
As shown, changing the name of a column is also simpler that
the SQL command.
> row.names(locations) obs rm(locations)
sql> UPDATE obs SET newfield=’Code’ WHERE existingfield=’MyCode’;
> names(obs) obs id X Y Code 12 PC101 3.4 4.3 a 13 PC101 1 1 b 14 PC101 2 2 c 15 PC101 3.4 4.3 d
Of course the previous are preliminaries relative to
the main purpose of a relational database, which is to perform queries.
As before R can be coerced to replicate SQL statements
for queries. The following are examples of different ways
the indexing functions of R can be used to mimic the SQL
select statement on rows.
sql> select * from obs
> obs id X Y Code 12 PC101 3.4 4.3 a 13 PC101 1 1 b 14 PC101 2 2 c 15 PC101 3.4 4.3 d
sql> select * from obs where rownum >= 2 and rownum < =3
> obs[2:3, ] id X Y Code 13 PC101 1 1 b 14 PC101 2 2 c > obs[c(2, 3), ] id X Y Code 13 PC101 1 1 b 14 PC101 2 2 c > obs[2:3, ] id X Y Code 13 PC101 1 1 b 14 PC101 2 2 c > obs[c(FALSE, TRUE, TRUE, FALSE), ] id X Y Code 13 PC101 1 1 b 14 PC101 2 2 c
Operations on columns are easily defined. The dollar
is often used.
sql> select id from obs
> obs$id  PC101 PC101 PC101 PC101 Levels: PC101
sql> select X,Y from obs
> subset(obs, select = c("X", "Y")) X Y 12 3.4 4.3 13 1 1 14 2 2 15 3.4 4.3 > subset(obs, select = c(2, 3)) X Y 12 3.4 4.3 13 1 1 14 2 2 15 3.4 4.3
sql> select * from obs where code = a
> obs[obs$Code == "a", ] id X Y Code 12 PC101 3.4 4.3 a > obs$Code == "a"  TRUE FALSE FALSE FALSE > subset(obs, Code == "a") id X Y Code 12 PC101 3.4 4.3 a
Another important feature of database languages are
functions that aggregate data, such as summing and
counting. The following are examples of R implementations
of typical SQL queries with aggregation.
sql> select code, count(*) from obs
> dim(obs[obs$Code == "a"])  4
sql> select * from obs group by code
> aggregate(obs, list(obs$id), FUN = mean) Group.1 id X Y Code 1 PC101 NA NA NA NA
The next most important operation of relational databases apart from
selects are joins. This is where the columns in two tables are brought together
under the guidance of a common index value. To demonstrate this
we will create another table for our species database called
species. The merge operation can be used together with subset
to obtain the results wanted from a join. In the second case, the
result is the list of location where the Puma has been observed.
> species < - data.frame(id = c("PC101", "ML240", "J2"), Name = c("Puma", + "Mountain Lion", "Jaguar"))
sql> SELECT * FROM species, obs WHERE obs.id = species.id
> merge(species, obs, by.species = "id", by.obs = "id") id Name X Y Code 1 PC101 Puma 3.4 4.3 a 2 PC101 Puma 1 1 b 3 PC101 Puma 2 2 c 4 PC101 Puma 3.4 4.3 d
sql> SELECT X, Y FROM Obs, Species WHERE Obs.id = Species.id AND Species.name=Puma;
> subset(merge(species, obs), Name == "Puma", select = c(X, + Y)) X Y 1 3.4 4.3 2 1 1 3 2 2 4 3.4 4.3
This chapter has demonstrated how to manage data in R using data frames as tables,
including replication the basic spreadsheet and relational database operations.
While a database is necessary for large scale data management, it is clear that
R can serve the basic functions of a database system. This can be an
advantage, as data does
not need to be moved between systems, causing the conversion problems
and increasing numbers of errors.