GRASS Short Course -
Notes for Part 6: GRASS, R-stats and PostgreSQL

1) Installation and preparations

R-stats and extra packages installation We assume that you have a working installation of R-1.8 or later.

For the R/PostgreSQL connection we need two extra packages. Since 10/2003 there is a new Rdbi/PostgreSQL interface for R-stats. Download it from BioConductor (but not the old SourceForge version!):

Maybe soon the BioConductor version contains our fixes (should be Rdbi 1.0.2 or later)...

Interface installation (enter in shell terminal, appropriate version number):

R CMD INSTALL Rdbi_1.0.2.tar.gz
R CMD INSTALL RdbiPgSQL_1.0.2.tar.gz

Note: The RODBC interface in an alternative, but the data transfer is rather slow.

PostgreSQL installation

We assume that you have a working installation of PostgreSQL 7.3 or later. For GNU/Linux RPM packages are available, for Mac OSX packages from FINK.

2) Using PostgreSQL (PG)

The first step is to try if PostgreSQL is running - we start the 'psql' interactive terminal, a command-line tool to maintain and query PG databases:
psql template1
Welcome to psql 7.3.2, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help on internal slash commands
       \g or terminate with semicolon to execute query
       \q to quit

template1=> \q

#maybe only user "postgres" is available, so start like this:
psql -U postgres template1
Welcome to psql 7.3.2, the PostgreSQL interactive terminal.
If you see a similar message, PostgreSQL is running well. PostgreSQL is running a server, several clients (like 'psql' or MapServer or GRASS or ...) may connect, even simultaneously.

Server does not run? This requires 'root' permissions:


#start postgresql daemon:
service postgresql start

#To enable the PostgreSQL server to launch after boot:
chkconfig --list postgresql
chkconfig postgresql on
chkconfig --list postgresql
#... should be changed to 'on' in some levels now.
Now retry above command.

Server running? Yes (maybe it tells you that the user name does not exist).

We add a new user now:

#become 'root':

#become special user 'postgres'
su postgres

#create new PostgreSQL user (warning, we don't use a passowrd here!):
createuser -a -d pablo
#... CREATE USER should be printed


Now retry to connect (see above, 'psql ...').

Then we explore a bit more (note that the TAB key expands command and table names). Every query has to be followed with semicolon:

psql template1
template1=> SELECT * from pg_user;

#show all tables (called 'relation' in SQL speech):
No relations found.

#what time is it?
 2003-11-01 13:24:42.960403+00
(1 row)

#get some help for all backslash commands:

#get some help for all commands:
Available help:
  ABORT                     CREATE TABLE              EXECUTE

#print help for 'CREATE TABLE' command:

#get out of PostgreSQL:

Now we want to create an own database.
This requires 'root' permissions:


#stop postgresql daemon (never ! modify databases with PostgreSQL still running):
service postgresql stop

#just notes, we won't do it now:
# NOTE 1: new users are added in: /var/lib/pgsql/data/pg_hba.conf
#         it is recommended to use "md5" encryption when defining users
# NOTE 2: if network access is desired, activate "tcpip_socket = true" in
#         /var/lib/pgsql/data/postgresql.conf

#so we start postgresql daemon again:
service postgresql start
#-> should print "ok"

#to create a new database, we login as 'postgres' user:
su postgres

createdb grasscourse

#exit from root
#now we are normal user again.

#Check available databases (the new DB 'grasscourse' should be listed):
psql -l
The still empty database 'grasscourse' is ready.

Now we create a new table within this database:

psql grasscourse

#Note: column definitions are comma separated, no comma after last column,
#to make it more sophisticated, we add a constraint (Y, N) to 
# column 'enjoyed' to only accepts these two values for that column:

 id           integer,
 name         varchar(50),
 lastvisit    date,
 enjoyed      varchar(1) check (enjoyed in ('Y','N'))
# -> CREATE TABLE should be printed

#list all tables:

#get column info:
\d cities

#Insert data into the new table (use single quotes!):
#Date order: YY-mm-dd
INSERT INTO cities VALUES ( 1, 'Trento', '2003-11-01', 'Y');
# -> INSERT some_numbers should be printed

INSERT INTO cities VALUES ( 2, 'Milano', '2003-10-12', 'N');
INSERT INTO cities VALUES ( 3, 'Povo', '2003-11-25', 'Y');

#test for the constraint:
INSERT INTO cities VALUES ( 4, 'Bolzano', '2003-10-23', 'A');
# -> should fail: 'A' not accepted

#test for the date
INSERT INTO cities VALUES ( 4, 'Bolzano', '2003-10-33', 'Y');
# -> should fail: day 33 not accepted

#now we query the table:
SELECT * FROM cities;
 id |  name  | lastvisit  | enjoyed
  1 | Trento | 2003-11-01 | Y
  2 | Milano | 2003-10-12 | N
  3 | Povo   | 2003-11-25 | Y
(3 rows)

#now we query the table, ordered by one colum:
SELECT * FROM cities order by lastvisit;
 id |  name  | lastvisit  | enjoyed
  2 | Milano | 2003-10-12 | N
  1 | Trento | 2003-11-01 | Y
  3 | Povo   | 2003-11-25 | Y
(3 rows)

#now we query the table, but only two columns:
SELECT name,lastvisit FROM cities order by lastvisit;
  name  | lastvisit
 Milano | 2003-10-12
 Trento | 2003-11-01
 Povo   | 2003-11-25
(3 rows)

#leave the 'psql' terminal:

3) Using R-stats

Starting R-stats is pretty straightforward:

#it will appear something like this:
R : Copyright 2003, The R Development Core Team
Version 1.8.0  (2003-10-08)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for a HTML browser interface to help.
Type 'q()' to quit R.

Now you have reached the R prompt ('>'). Let's draw a plot and leave it:
Save workspace image? [y/n/c]: n

Since you are able to start and leave R, we can try a sample session:


#get help in terminal window:
#use q key to leave the help, cursor and space to browse the text.

#open help in browser:

# [1] 2

#NOTE: if you get lost while entering a command, CTRL-C is your friend to
#get back to the prompt.

x <- 1+1
# [1] 2

#make a sequence of 12 numbers:

#assign result to a variable:
x <- seq(1,12)

#print variable contents:
# [1]  1  2  3  4  5  6  7  8  9 10 11 12

#what's this variable:

#simple statistics of this variable:

#see command help (in browser, if still open):

#make some categorical data (called "factors" in R speech):
rgb <- c("red", "green", "blue")

#have some more:
infra <- c("nir", "mir", "fir")
thermal <- c("tir", NA, NA)

#for convenience we can handle variable composites in a "data frame":
landsat5 <- data.frame(rgb, infra, thermal)

#Now we look at some sample maps:
# Maunga Whau (Mt Eden) is one of about 50 volcanos in the Auckland
# volcanic field.  This data set gives topographic information for
# Maunga Whau on a 10m by 10m grid.
#... is a matrix of numerics

#...looks strange!

#but a nice map:
filled.contour(volcano, color = terrain.colors, asp = 1)
title(main = "Maunga Whau volcano: filled contour map")

Up to now we used only base functions and data. There are many extensions, called R libraries which provide enormous functionality. One is the "GRASS" library to link R and GRASS:

#load GRASS library (GRASS/R interface):

#there are sample data built-in, e.g. river Maas soil pollution data:

#many commands in R provide an example. You can just run it:

#get out of this R session:
Save workspace image? [y/n/c]: n

Next task is to use the Rdbi interface to connect R and PostgreSQL:


conn <- dbConnect(PgSQL(), host="localhost", dbname="grasscourse")

#list available tables:

Reading data (the table we created above) from PostgreSQL into R:

res <- dbSendQuery(conn, "select * from cities")
cities <- dbGetResult(res)
`data.frame':   3 obs. of  4 variables:
 $ id       : int  1 1 1
 $ name     : chr  "Trento" "Milano" "Povo"
 $ lastvisit: chr  "11-01-2003" "10-12-2003" "11-25-2003"
 $ enjoyed  : chr  "Y" "N" "Y"


#print the downloaded data frame:
#     date: mm-dd-YYYY
  id   name  lastvisit enjoyed
1  1 Trento 11-01-2003       Y
2  2 Milano 10-12-2003       N
3  3   Povo 11-25-2003       Y

#make the character date a real date:
tmp <- strptime(cities$lastvisit, "%m-%d-%Y")
[1] "2003-11-01" "2003-10-12" "2003-11-25"
#    YYYY-mm-dd is new date format, so identical to PostgreSQL order

#update in the data frame:
#cities$lastvisit <- strptime(cities$lastvisit, "%m-%d-%Y")

So far, so nice. Next step is to learn writing data from R to PostgreSQL (e.g. you results): For illustration we use some meteo data, a time series of Nottingham average air temperatures at Nottingham Castle in degrees F for 20 years (note that the MASS library is part of the VR bundle):


#let's look at the data (note: Fahrenheit, not Celsius!)):

#connect to PostgreSQL:
conn <- dbConnect(PgSQL(), host="localhost", dbname="grasscourse")

#write data to PG database as new table "nottingtemp":
dbWriteTable(conn, nottem, "nottingtemp")
Now run (in another shell) the 'psql' terminal to see the new table written from R:
psql grasscourse
SELECT * from nottingtemp;


To use GRASS within R, you have to run R from within the GRASS shell. First start GRASS (Spearfish location):

#reset region:
g.region -dP

#within GRASS, start R:


#load the GRASS environment into R:
G <- gmeta()

# see current GRASS location settings:

# print list of available raster maps:
system("g.list rast")

#NOTE: You always need to specify "G" for the next commands!
#load the 'elevation.dem' raster map into R:
elev <- rast.get(G,"elevation.dem")

#look at the map:
plot(G, elev$elevation.dem)
plot(G, elev$elevation.dem, col=terrain.colors(20))

#draw elevation histogram:
truehist(elev$elevation.dem, nbins=200)

#Is that normally distributed?
#-> no, deviates from qqline

#delete the object from R memory:

#Now we import two raster maps into a data frame:
#cat: T (true) or F (false) enables to copy over category labels rather than
#numeric values:
spearfish <- rast.get(G, c("elevation.dem","vegcover"),cat=(c(F,T)))

#we can also plot the categorical 'vegcover' map, converting to numeric
plot(G, unclass(spearfish$vegcover))

#get basic statistics on the distribution of vegetation related to 
tapply(spearfish$elevation.dem, spearfish$vegcover, summary)

#make a boxplot:
boxplot(tapply(spearfish$elevation.dem, spearfish$vegcover, summary))

#vice-verse (warning: long calculation...):
tapply(spearfish$vegcover, spearfish$elevation.dem, summary)

Simple regression exercise

For this exercise, we start GRASS with Spearfish location. You need the LANDSAT-TM7 surface temperature map ("temp_celsius_12Jul2000") from lecture 4. We want to study the correlation between temperature and elevation.

#set resolution/region to Celsius temperature map from lecture 4:
g.region rast=temp_celsius_12Jul2000 -p

#within GRASS, start R:
G <- gmeta()

#NOTE: You always need to specify "G" for the next commands!
#load the 'elevation.dem' raster map into R:
elev <- rast.get(G,"elevation.dem")

#load the 'temp_celsius_12Jul2000' raster map into R:
celsiusmap <- rast.get(G,"temp_celsius_12Jul2000")
#Note that R changes underscores to dots

#now we extract sample points. First get 200 coordinate pairs:

E.random <- sample(G$xseq,200)
N.random <- sample(G$yseq,200)
#merge to a data frame:
randompoints <- data.frame(E.random,N.random)

#nice, but complicated. Easier to fetch the cell indexes:
randompoints <- sample(G$Ncells,200)

#now we query the maps at the randomly selected cell numbers:
elevsamples <- elev$elevation.dem[randompoints]
celsiussamples <- celsiusmap$temp.celsius.12Jul2000[randompoints]
#...the sample data are ready.

#look at the plot:

#Linear regression analysis (Linear Model):
lm(elevsamples ~ celsiussamples)

summary(lm(elevsamples ~ celsiussamples))
#For a related discussion, compare Dalgaard 2002, p.97ff

#plot in one graph data and regression line (note parameter order!):
abline(lm(elevsamples ~ celsiussamples))

#The result is: temperature decreasing with height, but the linear 
#correlations is poor (as expected).

(Useless) Exercise: Time series of meteo data simulation

#simulate 4 time series for 4 stations, 500 days:
sim1 <- as.double(arima.sim(list(order=c(1,1,0),ar=0.7),n=499))
sim2 <- as.double(arima.sim(list(order=c(1,1,0),ar=0.7),n=499))
sim3 <- as.double(arima.sim(list(order=c(1,1,0),ar=0.7),n=499))
sim4 <- as.double(arima.sim(list(order=c(1,1,0),ar=0.7),n=499))

#station heights:
height1 <- seq(122,122,l=500)
height2 <- seq(332,332,l=500)
height3 <- seq(1543,1543,l=500)
height4 <- seq(534,534,l=500)

#generate 500 days:
ISOdate(2002, 7, 11) - ISOdate(2001, 2, 26)
days <- seq(ISOdate(2001, 2, 26),ISOdate(2002, 7, 10), by="d")

#we need a special table structure:
tmpsim <- print(c(sim1,sim2,sim3,sim4))
tmpdate <- print(c(days,days,days,days))
tmpheight <- print(c(height1,height2,height3,height4))

#put all in one data frame:
meteo <- data.frame(tmpheight,tmpdate,tmpsim)
names(meteo) <- c("height","date","var")

xyplot(var ~ date  | height, data=meteo, type="l")

© 2003 Markus Neteler (neteler AT
Back Course HOME
Last change: $Date: 2004-11-25 10:46:31 +0100 (Thu, 25 Nov 2004) $