How to plot climate data

Many dataloggers dump their collected data directly into Microsoft Excel. The users of dataloggers which don't dump the data into excel often transfer the data for processing in excel, which has become the standard tool for scientific data reduction.

However, there is at least one other program, gnuplot, which does a better job of climate plotting, mainly because it is much quicker and can handle very much more data - plotting twenty years of hourly data from twenty measuring points without noticeable hesitation. It will also calculate on the fly and plot derived values, such as dewpoint. It expects data in a plain text file, which is the only reliable long term archival format. There is no need to make and preserve extra columns of derived data, since the calculation formulae can be stored compactly in a plain text gnuplot instruction set.

gnuplot is a command line program. That means that you write plain text instructions into a file and then call gnuplot to chew through the file and follow the instructions. No mouse is required. This way of doing things still has merit. It takes a little time to learn what to write, but after that one can use the same script with slight variations to plot varied input data into a standardised graph layout. Gnuplot is widely used on web servers because it can be automated to plot a graph of accumulating data, without human intervention.

It runs on all common operating systems and is free software.

Here is a graph made by gnuplot. It is plotting inside and outside temperature and relative humidity and also a derived value which is not in the data set - the difference between inside and outside water vapour concentration.


Here is a fragment of the data file

#Date time t-in rh-in t-out rh-out
1/10/07 0:00	24.1	82.8	25.7	73.9
1/10/07 0:15	24.1	82.9	26	72.1
1/10/07 0:30	24.1	83	26	72.2
1/10/07 0:45	24.1	82.8	26	70.9
1/10/07 1:00	24.1	82.4	25.9	71.2
1/10/07 1:15	24.1	82.1	25.8	72.7
1/10/07 1:30	24.1	81.9	25.3	74.2
1/10/07 1:45	24.1	81.6	25.5	73.1

And here is the script that constructed the graph. The lines beginning with a # are comment lines, so the actual instructions are quite brief.

# Gnuplot script for climate data
# giving png output (term png)
# (use term postscript for scalable vector graph)
#command prompt>gnuplot
# > call '' 'datafile' 'graph.png'
# q to exit gnuplot
#functions for dew point,vapour pressure and g/m3
svp(t) = 610*exp(t/(t+238.3)*17.2694)
vp(t,rh) = svp(t)*rh/100
w(t,rh) = log(vp(t,rh)/610.78)
dpt(t,rh) = (w(t,rh)*238.3)/(17.294-w(t,rh))
gm3(t,rh) = vp(t,rh)*2.166/(t + 273.16)
#png terminal specification
#the font spec is computer dependent, can be omitted
set terminal png enhanced font "/usr/share/fonts/truetype/DejaVuSans.ttf" 12 size 750,550 \
xffffff x000000 x666666 xff9c6c x77ccdb x507e00 xc90000 x1200ff
#white (background) black (text) grey (grid) 
#then line colours beginning with 1:pale orange 2:pale blue 3:green 
#4:red 5:blue
# end of png spec
# set output file to the third item on 'call' command line
set output "$1"
set xlabel "Year/Month/Day"
set key below left
# indicate that the x axis is time
set xdata time
# set the time format as it is in the input data file
# default spacer is a space, so tabs need to be put
#explicitly in the format string as \t
set timefmt "%d/%m/%y %H:%M"
#x range must be in the same format, though hours can be omitted
# commenting out xrange allows gnuplot to do it
#set xrange ["1/10/07":"1/11/07"]
#The display format can be different
set format x "%Y/%m/%d"
# no ticks between months, because they would be meaningless
set nomxtics
set yrange [-10:100]
set ytics 0,10,100
set nomytics
set ylabel "%RH"
# the right y axis is used for temperature and weight/cu m.
set y2label "C and g/m^3"
# take care with y2 range, so its ticks coincide with the y1 ticks
set y2range [-5:50]
set y2tics -10,5,25
set grid
# plot the data file listed on the command line
plot "$0"\
 using 1:5 axes x1y2 t 't out' with lines lt 1 lw 1\
,'' using 1:6 t 'rh out' with lines lt 2 lw 1\
,'' using 1:3 axes x1y2 t 't-in' with lines lt 3 lw 2\
,'' using 1:4 t 'rh-in' with lines lt 4 lw 2\
,'' using 1:(gm3($$5,$$6) - gm3($$3,$$4)) axes x1y2 t 'g/m^3 out-in' with impulses lt 5
# notes on the plot instruction:
# the \ at end of line continues the single line instruction.
# the repeated ,'' indicates that the same input data file is used for all graph traces.
# the double $$5 distinguishes the fifth data column from the 
#fifth item on the command line
# the line with the $$ calls the functions to generate a derived trace, 

Command line programs tend to give cryptic error messages, which are very frustrating when one first begins to use them. The biggest frustration in gnuplot is probably getting the date format right. There are two date formats: one describes the date format of the data file, this is the 'set timefmt' instruction. The other format describes the appearance of the time scale on the graph, this is the 'set format x' instruction.

The output graphic file format is controlled by the 'set term' command. The simplest format is png (portable network graphics). That is used in the example. A more versatile format is postscript, which is a vector format which can be scaled to give very smooth publishable graphs. However, on the MSWindows operating system, one needs to install helper programs to see the result: ghostscript and ghostscript viewer.

Gnuplot is versatile, it can make graphs of many types, but what we usually want is a graphical record of the climate which we can scan quickly for deviations from safe levels. One can write many subtle variations and tests into a gnuplot script, such as changing the line colour when the RH exceeds a certain level. There are many possibilities and fortunately a large online community.

It is possible to construct a nearly automatic quality control system for museum climate by combining gnuplot with helper programs to join up data files and to display the graph on a web server. These programs, such as 'awk' for manipulating raw data files and shell scripts for chaining a set of programs to make the final product, are unknown to most users of desktop computers, almost universally fitted with the MSWindows or Mac OSX operating systems. These ancient and by now bug free programs have many advantages over modern programs designed to do everything you want by a succession of mouse clicks, if only you can work out how. All these programs are free. Using gnuplot, and manipulating climate data generally, becomes easier with a unix type operating system such as linux, for which these programs were originally designed. Using linux is getting easier all the time, as more user friendly compilations are made. These can now run from a cd without disturbing the installed operating system.

ImageMagick image conversion utility:


Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.