By Andy May

It doesn’t matter if you think fossil fuel CO2 emissions are going to end the world or lead us to a greener and more beautiful one. Either way, to be a true climate change geek you need access to climate data! Unfortunately, much of the climate raw and gridded raw data comes in a very difficult to read format called NetCDF. Worldwide temperature data is complex, the dimensions of climate arrays are latitude, longitude, depth (or the equivalent pressure in the atmosphere or ocean), year, and month. It is very hard to simplify it more than that. NetCDF does handle data of this complexity in a machine independent way, but you will not be able to read it with Excel. So how do we do this?

R is a wonderful statistical package that is free. It is very widely used and lots of books and websites offer help. It is not easy to learn, but with work you can get a lot out of it. You can download it here. If you plan on working with climate data I recommend the 64 bit version, which requires a 64 bit computer running 64 bit Windows. The arrays are very large, often larger than the 32 bit limit of 2 GB. Try and use a computer with at least 12 GB of memory. The computer I use for this work has 18 GB and runs 64 bit Windows 10.

You can use the gui or text based interface built into R for all of your work, but I’ve found that RStudio is a better environment. RStudio can be downloaded here. It is also free for personal use. There are numerous books on R and I’ve certainly not read them all, but two books stand out for most R users. A good beginning book is R for Dummies. Once you’ve gotten started and you want to get in deeper you will need The R Book. R has powerful map making modules, called “packages,” and you will want to learn them, start here.

The NetCDF files we will be working with contain the JAMSTEC MOAA GPV worldwide grid of ocean temperatures to 2000 meters. You will want to download all of the files with names like “TS_YearMo_GLB.nc”. There are 176 files to download so I suggest using an ftp utility that offers multiple file downloading like Filezilla, which is also free. Each file contains the temperatures for one month, for example TS_201508_GLB.nc contains a world-wide grid of ocean temperatures from the surface to a depth of 2,000 meters for August, 2015.

JAMSTEC Ocean Temperature Dataset overview

The final lines of the R program described below build a summary matrix “MTYr” that holds the yearly mean global temperatures for 25 depths. Below is a plot of ocean depth versus temperature for 2001, 2005, 2010, and 2014 from the summary matrix.

The lines are indistinguishable at this scale below 100 meters. According to the JAMSTEC MILA-GPV grid, which can also be downloaded and read with a similar R program, the average mixed layer depth is about 59 meters. The mixed layer begins immediately below the surface layer, which is affected by the “skin” layer. The skin layer is less than a millimeter thick, but it can affect the surface water temperature down to 10 meters or so in some conditions during the daylight hours. Below 10 meters we encounter the sea surface foundation temperature.

The mixed layer has a nearly constant vertical temperature and density due to waves, wind and turbulence or “mixing.” The ocean skin and the mixed layer are the portions of the ocean most affected by the Sun and atmosphere. Obviously, the mixed layer varies in thickness and temperature over time, but at any point in time the temperature and water density of the layer is very uniform vertically. Notice the variability in the ocean surface temperature in the plot below and the similar temperature variability at 40 meters depth. Contrast it with the lack of variability deeper in the ocean. For perspective, the ocean mixed layer contains 23 times the heat capacity of the entire atmosphere to 22 km, the entire ocean contains 1,500 times the heat of the atmosphere. The heat and heat capacity calculations and the constants used are described in an Excel spreadsheet here, with references. Some heat is lost from the mixed layer to the deeper ocean over time mostly through deep ocean thermohaline currents, but generally the mixed layer exchanges heat with the atmosphere. The thermohaline currents move very slowly which is why the heat transfer from the surface to the deep ocean can take as much as a thousand years or more.

The mixed layer is showing some apparent warming over the period 2001 to 2014. A best fit line through the 40 meter data, shown in green above, suggests 0.154°C per decade of warming or 1.54°C per century. However, this warming is less than the JAMSTEC MOAA-GPV gridding error. The ARGO and Triton float instrument error is less than +-0.005°C, but the error in gridding the data is larger and more dependent upon the number of working floats and their distribution. The method of computing this error is described in Hosada, Ohira and Nakamura, 2008. The gridding error for the period of interest is shown below and it is never less than the apparent warming for the period, so no statistically significant warming can be claimed for this period from this data. The error is much larger in the early years due to fewer floats, the goal of at least 3,000 floats was not achieved until 2007.

So, this is a summary of the data we can extract from the JAMSTEC NetCDF files. Below we will discuss how this is done.

Using ncdump to examine the data

Before we start learning how to use R to read NetCDF files, we will need one other essential free tool, “ncdump.” You can download this tool here. Ncdump is a DOS command line program that will dump the contents of a NetCDF file to a text file. You can then see what is inside using a text editor like Ultraedit or Notepad ++. If the NetCDF installer cannot update your path (likely) you can right click on the start button, chose “system”, click on “advanced system settings”, then “environment variables” to change the path manually. That way Windows can see the ncdump program from any directory. Otherwise, simply copy the files listed in the screen shot below to the directory where your NetCDF files (“.nc” files) are and run the program in that directory. To open a DOS command line window, right click on the start button on the lower left of your screen, choose “run” and type “cmd.” Then a command line window will open. Below is what the window looks like, I’ve copied the necessary files to run ncdump into the directory and copied an “nc” file into the same directory. When ready to go, the directory contains the following files:

To see the ncdump command syntax, type “ncdump -?” as I’ve done below:

To dump the entire NetCDF contents to a text file, showing the arrays in Fortran style, type the following command:

ncdump –f fortran TS_201508_GLB.nc >nc_201508.txt

The greater than symbol (>) directs the output to the file nc_201508.txt. The file created is large, but it contains all of the data in the “nc” file in an easily read text format. Most important, the array values are given with their exact dimensions so you can make sure that the R array we will construct is populated as we expect. We will be creating a five dimensional array and it is easy to confuse the array indices. So save this file, we will be referring to it later.

Running the program and verifying the array

Now we will run the R program and read the JAMSTEC MOAA GPV NetCDF files into a five dimension array called “DBT.” The dimensions of the array are Year, Month, Longitude, Latitude, and Pressure in decibars. The full program is available in a Word docx file here. After downloading it save it as a text file so that R or RStudio can read it. Place the program in the same directory as the JAMSTEC MOAA GPV files, this will be the working directory. For convenience you will also want to copy the ncdump output text file into the working directory. As discussed above R has its own command line interface, but it is very basic and I prefer RStudio. Below is an image of the RStudio interface as it looks after running the R program.

The upper left window contains the R program. Placing the cursor on any line and pressing ctrl-enter causes the line to execute. Don’t execute the program yet, it will not work until we download and install two additional R modules or “packages” that I will discuss in the next section. Lines in green are comments and lines in black are actual R statements. Character strings are also shown in green, even in the R statements. The lower left window is the console, it shows the statements executed and any messages. The upper right window shows the variables created. The main objective of the program is to create DBT, the five dimension array of temperatures. As stated in the window DBT has 213,840,000 values and is 1.6 GB in size. This array has all of the world-wide monthly temperatures from January, 2001 to August, 2015 at 25 different depths (or pressures) in the ocean.

Before we begin explaining the program statement by statement, let’s look at the data. We can open the ncdump output file created above with an editor, following are the first few lines in the file:

netcdf TS_201508_GLB {

dimensions:

    LONGITUDE = 360 ;

    LATITUDE = 132 ;

    PRES = 25 ;

variables:

    float LONGITUDE(LONGITUDE) ;

        LONGITUDE:name = “LONGITUDE” ;

        LONGITUDE:units = “degrees_east” ;

    float LATITUDE(LATITUDE) ;

        LATITUDE:name = “LATITUDE” ;

        LATITUDE:units = “degrees_north” ;

    float PRES(PRES) ;

        PRES:name = “PRES” ;

        PRES:long_name = “Pressure” ;

        PRES:positive = “down” ;

        PRES:units = “decibar” ;

    float TOI(PRES, LATITUDE, LONGITUDE) ;

        TOI:name = “TOI” ;

        TOI:long_name = “Temperature.(ITS90)” ;

        TOI:_FillValue = 99999.f ;

        TOI:units = “degree_Celsius” ;

The first line simply identifies the filename, the second gives the dimensions of the array the file contains. “PRES” is pressure in decibars. In the ocean, pressure in decibars is also the approximate depth in meters, 2000 decibars is 1986 meters and 10 decibars is a depth of 9.9 meters. The variables stored in the file are listed below, I’m showing the top four. The third one is particularly interesting since it contains the temperatures by pressure, latitude, and longitude.

Now let’s check the values in our array. We can type the following:

DBT[15,8,354,132,25]

And press ctrl-Enter and R will tell us the value of that element. It is -0.983. If we go to our ncdump output, which is the year 2015 and the month “8”, on line 1,188,559, we see this:

-0.983, // TOI(354,132,25)

So the values match, we know the dimensions of the array are correct, they are [year, month, longitude, latitude, pressure]. The longitude and latitude indexes can be translated to actual longitude and latitude values through two single dimension arrays given in the NetCDF file. For example, latitude[1] is -60.5 or 60.5 degrees south. Longitude[1] is 0.5 degrees east and longitude[268] is -92.5 west. In the array we have all longitude values, but we only have latitude values from 60.5 degrees south to 70.5 degrees north.

The R program to read the JAMSTEC ocean temperature grid

Now we will go through the program line-by-line and finally we can run it. The two lines load the external R packages RNetCDF and lubridate:

library(RNetCDF)

library(lubridate)

These packages are not delivered with R, but they are readily available on many CRAN sites. To download and install them, press the packages tab in the lower right window in RStudio. Then press the install tab and enter “RNetCDF, lubridate” in the second line and then press the install button. These two functions will then be downloaded and installed for you. They will not be available for use in any program until the library statements given above are executed. So, put the cursor on the first library statement and press ctrl-Enter, then press it again to execute the next library statement and you have loaded these critical libraries.

fname<-paste(“TS_200101_GLB.nc”,sep=””)

nc=open.nc(fname)

#

# TOI is the temperature array extracted from the NCDF file

TOI = var.get.nc(nc,”TOI”,start=c(1,1,1),count=c(360,132,25))

Value<-TOI[1,1,1]

#

# DBT is the Database Array

# Dimensions in order: “Year”, “Month”, “Longitude”, “Latitude”, “Pressure”

DBT<-array(Value,dim=c(15,12,360,132,25))

The next statements set up a file name character variable that we will use to read all 176 files in one go, then we open the first file with open.nc and copy the TOI array from the first file to our TOI array with var.get.nc. The first element of TOI is stored in the variable value. Finally we create the database array DBT and initialize all elements of it with the variable “value”.

YrMo <- as.Date(“20010101”,format(“%Y%m%d”))

for (k in 1:14) {

year(YrMo)<-k+2000

#

for(i in 1:12) {

month(YrMo)<-i

X<-format(YrMo,”%Y%m”)

fname<-paste(“TS_”,X,”_GLB.nc”,sep=””)

nc=open.nc(fname)

TOI = var.get.nc(nc,”TOI”,start=c(1,1,1),count=c(360,132,25))

DBT[k,i,,,]<-TOI[,,]

}}

This next section of code works through all of the files, reading each TOI array through the end of 2014 and then loading it into the proper year and month of DBT.

# Read in 2015 through August, all we have right now

year(YrMo)<-2015

for(i in 1:8) {

month(YrMo)<-i

X<-format(YrMo,”%Y%m”)

fname<-paste(“TS_”,X,”_GLB.nc”,sep=””)

nc=open.nc(fname)

TOI = var.get.nc(nc,”TOI”,start=c(1,1,1),count=c(360,132,25))

DBT[15,i,,,]<-TOI[,,]

}

This section reads January through August of 2015, the latest data available.

MTYr<-array(Value,dim=c(16,25))

MTYr<-apply(DBT,c(1,5),mean,na.rm=TRUE)

write.table(MTYr,file=”MTYr.csv”,sep=”,”,eol=”\r\n”)

We build the MTYr array and then compute the averages and populate the array using the R “apply” function. Finally we write out the matrix as a comma delimited text file suitable for loading into excel.

We have extracted all of the gridded ocean temperatures from January 2001 through August of 2015, we have also summarized the data by depth and year and formatted the results for Excel. The next step, to be covered in a future post, is how to map and analyze the data spatially with R.