R Programming tips on mapping IGRA2 Radiosonde data

By Andy May

Our last post was on reading and plotting IGRA2 weather balloon radiosonde data. This post shows how to map the data. The IGRA2 data can be downloaded from here (ftp site: ftp.ncei.noaa.gov/pub/data/igra). Once the data are prepared, some further tricky programming is needed to map it. This post will introduce the key concepts and code structure required. Writing complete programs will require understanding these concepts and working with Grok.

Computing the mappable values

As mentioned in the first post, the volume of IGRA2 data is so large that several steps are required to get the data prepared for analysis, mapping and plotting. The initial R program that reads the IGRA2 data was described in the last post, here we will show how to map it. An example map is shown in figure 1.

Figure 1. Example map from the R code described below. This is a map of the air temperature at the molar density intersection as described in (May, 2025).

First, since the goal is to map the IGRA2 data, we need to group the data by radiosonde ascent, then summarize the detailed data by station. ‘ckdate’ is a computed parameter that combines year, month, day and hour. It is unique for each radiosonde ascent. The following code uses data.table package syntax to create a data.table called ‘ascent.’ The code is from the R program ‘station_analysis_no_plots.r’ (May, 2025b). It groups the contents of ‘station’ by the value in ckdate. Then it copies the columns of ‘station’ specified (id, lat, …) into ‘ascent.’ The rest of the columns in station are not copied. The period ‘.’ before the list of station columns to copy simply means ‘list.’

# Read file as data.table
station <- readRDS(fname)
  
# Calculate decimal month
station[, dec_month := (year - 1990) * 12 + (month - 1) + day / 30.4375 + hour / 730.5]
  
# Group by ascent (unique dec_month)
ascents <- station[, .(dec_month = unique(dec_month)), by = dec_month]

# Process each ascent
ckdate <- ascents$dec_month[1]
for (ckdate in ascents$dec_month) {
# Extract ascent data
ascent <- station[dec_month == ckdate, .(id, lat, long, month, year, hPa, temp, mol_den, gph, rh, wdir, wspd, q_gkg)]

Next, the key variables are interpolated to fill in missing values, if necessary. After interpolation, the next bit of code computes the molar density intersection and the values of various variables at the molar density intersection in the same way as described in the previous post. The last preparation step groups the results into files by month.

In order to make the maps we need to set up some key map variables first.

The Robinson map projection (‘proj=robin’) is attractive and reasonable to use when making global maps. The code below stores a character string of Robinson map parameters that we will need to use often, this way we only need to type them all once. The next three parameters specify that the Greenwich /Prime Meridian is zero and there is no false easting or northing. The reference ellipsoid is WGS84, the modern standard. The ‘no_defs’ flag says don’t add anything to the projection, leave it as we have defined it.

# Robinson projection (defined once)
rob_crs <- "+proj=robin +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"

We need country outlines to orient the viewer. The country outlines in ‘ne_countries’ come from the ‘rnaturalearth’ R package. We are loading the medium resolution version, which is suitable for most publications. We have asked that the returned data (stored in ‘world’) be an ‘sf’ or simple feature object. The R function ‘st_transform’ changes the ‘world’ geometry to conform to the ‘rob_crs’ (Robinson and WGS84) projection as defined above.

# Load world map and transform once
world <- ne_countries(scale = "medium", returnclass = "sf")
world_rob <- st_transform(world, rob_crs)

We need a latitude and longitude grid on the map. This R code segment creates a ‘graticule’ or latitude/longitude grid for a Robinson projection. The grid is from 60°S to 60°N by 15° and from the International date line (-180 and +180) all the way around the world in 30° steps. ‘st_crs(4326)’ refers to the WGS 84 coordinate reference system (CRS). After creation it is transformed to rob_crs (Robinson and WGS84) by st_transform.

It is interesting that the full name for WGS 84 is EPSG:4326, which stands for European Petroleum Survey Group mapping standards for WGS 84 latitude and longitude coordinates. Beginning in the oil and gas industry, it is now the worldwide mapping standard, Google maps uses a variation called EPSG:3857 for Web Mercator, the official name is WGS 84/Pseudo-Mercator. Web Mercator is a simplified projection that assumes the Earth is a perfect sphere.

# Graticule (lat: -60 to 60, lon: every 30°)
graticule <- st_graticule(lat = seq(-60, 60, by = 15),
                          lon = seq(-180, 180, by = 30),
                          crs = st_crs(4326)) %>%
  st_transform(rob_crs)

The location of the ITCZ changes constantly, so we need to display a reasonable average monthly latitude for the ITCZ to see how the climate parameters change with it. Reasonable average latitudes for the ITCZ were determined using the method described in May, 2025 and will be discussed in a future post. The mean latitudes are read from ‘ITCZ_mean_latitudes.csv.’ The map color scales and other particulars for each map are read from ‘map_color_scale_top_base.csv.’

# Load ITCZ latitudes and legend scales
itcz_lat <- fread(file.path(root_dir, "ITCZ_mean_latitudes.csv"))
legend_limits <- fread(file.path(root_dir, "variable_lists_details/map_color_scale_top_base.csv"))

The maps to make are specified in ‘map_indices’ using the row index in ‘legend_limits.’

# Define which variables to map (by row index in legend_limits)
# Update this list if you add/remove variables
map_indices <- c(2, 3, 6, 7, 8, 15, 16, 17, 25, 26, 27, 35, 36, 37, 41, 42, 43, 44)

maps_to_make <- legend_limits[map_indices]
if (nrow(maps_to_make) != length(map_indices)) {
  stop("Some selected map indices are out of range in legend_limits.")
}

The code below uses the terra package ‘vect’ function to convert the global map of countries to a ‘SpatVector’ object and then extracts (‘ext’) the limits or boundaries of the object and stores them in ‘ext_rob.’ The vect function is the primary constructor and reader function for SpatVector objects.

The terra function ‘rast’ is used to set the bounding box of the raster dataset to be created to that of the country outlines, so they match. The raster cell size is set to 100,000 or 100 km x 100 km. Finally, the rob_crs (Robinson/WGS84) projection is assigned to keep everything on the same projection.

# Pre-compute template raster for interpolation (100 km resolution)
ext_rob <- ext(vect(world_rob))
template_rast <- rast(ext_rob, resolution = 100000, crs = rob_crs)

Loop through months to make the maps.

This loop makes the maps for each month (‘month_idx’).

# Main loop over months
for (month_idx in 1:12) {

Read data for the month

dt <- readRDS(file_name)

Aggregate all the data for a station by taking the means of the columns in the data frame ‘dt’. First the data frame ‘dt’ is grouped ‘by’ the station id. The data.table special variable ‘.SD’ contains the current ‘by’ group variables except for the grouping variable ‘station_id.’

The base R function ‘lapply’ is used to take the mean of all the variables in dt (ignoring nulls, because na.rm=TRUE).

This sort of statement combines ‘lapply,’ ‘.SD,’ and ‘by’ and is one of the most powerful features of data.table. It works with any number of variables and you do not have to list them. The special variable ‘.SD’ contains all the dt variables except those in the ‘by’ function. The means in ‘agg’ have the same names as the raw data in ‘dt’ but only the mean value per station.

  # Aggregate by station: mean of all computed variables
    agg <- dt[, lapply(.SD, mean, na.rm = TRUE), by = station_id]

Here are the column names after the above statement is executed:

> colnames(agg)
 [1] "station_id"     "d_month"        "latitude"       "longitude"     
 [5] "Month"          "Year"           "int_hPa"        "int_mol_den"   
 [9] "int_gph"        "int_wspd"       "int_wdir"       "int_temp"      
[13] "int_rh"         "int_q_gkg"      "int_spDen"      "int_M_flux"    
[17] "spd_lower"      "dir_lower"      "spDen_lower"    "M_flux_lower"  
[21] "NS_flux_lower"  "mean_u_lower"   "mean_v_lower"   "temp_lower"    
[25] "rh_lower"       "q_gkg_lower"    "spd_middle"     "dir_middle"    
[29] "spDen_middle"   "M_flux_middle"  "NS_flux_middle" "mean_u_middle" 
[33] "mean_v_middle"  "temp_middle"    "rh_middle"      "q_gkg_middle"  
[37] "spd_upper"      "dir_upper"      "spDen_upper"    "M_flux_upper"  
[41] "NS_flux_upper"  "mean_u_upper"   "mean_v_upper"   "temp_upper"    
[45] "rh_upper"       "q_gkg_upper"    "mean_M_flux"    "mean_NS_flux"  
[49] "mean_u_i"       "mean_v_i"

We do not want all the variables in ‘agg.’ The following statement keeps the variable names that contain “latitude | longitude | d_month | int_|mean_”. The ‘grepl’ R function returns TRUE or FALSE for each element in a character vector, depending upon whether the strings in the call are found in the vector element. This excludes ‘station_id’, ‘Month’, ‘Year’, and some other variables. The second statement replaces the original variable list for a modified agg with the variables we really want, effectively deleting the variables we don’t want.

# Keep only relevant columns: location + all int_* and mean_* variables
cols_to_keep <- names(agg)[grepl("^(latitude|longitude|d_month|int_|mean_)", names(agg))]
agg <- agg[, ..cols_to_keep]

Here are the column names after the two statements above:

> colnames(agg)
 [1] "d_month"       "latitude"      "longitude"     "int_hPa"      
 [5] "int_mol_den"   "int_gph"       "int_wspd"      "int_wdir"     
 [9] "int_temp"      "int_rh"        "int_q_gkg"     "int_spDen"    
[13] "int_M_flux"    "mean_u_lower"  "mean_v_lower"  "mean_u_middle"
[17] "mean_v_middle" "mean_u_upper"  "mean_v_upper"  "mean_M_flux"  
[21] "mean_NS_flux"  "mean_u_i"      "mean_v_i"     
>

The last two variables are: ‘mean_u’ and ‘mean_v.’ These are the average east/west (plus means blowing to the east) and north/south (plus means blowing to the north) wind vectors respectively. The following code creates variable names to hold their averages for the upper, middle and lower troposphere.

  # Vector-averaged wind speed and direction (upper, middle, lower troposphere)
    wind_levels <- c("upper", "middle", "lower")
    for (lvl in wind_levels) {
      u_col <- paste0("mean_u_", lvl)
      v_col <- paste0("mean_v_", lvl)

Create the vector mean wind speed and direction for the upper, middle and lower troposphere and store in agg, the aggregate set of map values. This statement computes the vector average wind speeds, spd_upper, spd_middle, and spd_lower.

agg[[paste0("spd_", lvl)]] <- sqrt(agg[[u_col]]^2 + agg[[v_col]]^2)

The statement below computes the “from” wind direction averages dir_upper, dir_middle, and dir_lower. ‘u_col’ and ‘v_col’ are the east/west and north/south TO (+ = east or north) wind direction components. This code reverses that to the meteorological convention (FROM).

# Direction FROM (meteorological: 0° = from north, 90° = from east)
agg[[paste0("dir_", lvl)]] <- (atan2(agg[[u_col]], agg[[v_col]]) * 180 / pi + 180 + 360) %% 360
    }

Here are column names in agg after adding the direction and speed variables from the statements above:

> colnames(agg)
 [1] "d_month"       "latitude"      "longitude"     "int_hPa"      
 [5] "int_mol_den"   "int_gph"       "int_wspd"      "int_wdir"     
 [9] "int_temp"      "int_rh"        "int_q_gkg"     "int_spDen"    
[13] "int_M_flux"    "mean_u_lower"  "mean_v_lower"  "mean_u_middle"
[17] "mean_v_middle" "mean_u_upper"  "mean_v_upper"  "mean_M_flux"  
[21] "mean_NS_flux"  "mean_u_i"      "mean_v_i"      "spd_upper"    
[25] "dir_upper"     "spd_middle"    "dir_middle"    "spd_lower"    
[29] "dir_lower"

Convert the latitude and longitude in agg to Robinson map coordinates

  # Convert to sf (WGS84 → Robinson)
  points_sf <- st_as_sf(agg, coords = c("longitude", "latitude"), crs = 4326)
  points_rob <- st_transform(points_sf, rob_crs)
  points_vect <- vect(points_rob)

Create a line across the map at the average ITCZ (Intertropical Convergence Zone) for the current month and convert the line to Robinson coordinates. First a data.frame of 361 points around the world is created at the mean ITCZ for this month. The data.frame is then converted into an sf object with 361 points with ‘st_as_sf’ in WGS 84/EPSG: 4326 geographic coordinates. Next ‘summarise’ combines the 361 points into one feature with multiple points (called a MULTIPOINT) using ‘st_combine.’ Finally, ‘st_cast’ is used to combine the multiple points into a line called a LINESTRING.

  # Create ITCZ line for this month
  itcz_this_month <- itcz_lat$ITCZ_lat[month_idx]
  itcz_line <- data.frame(
    lon = seq(-180, 180, by = 1),
    lat = rep(itcz_this_month, 361)
  ) %>%
    st_as_sf(coords = c("lon", "lat"), crs = 4326) %>%
    summarise(geometry = st_combine(geometry)) %>%
    st_cast("LINESTRING") %>%
    st_transform(rob_crs)

Loop through selected variables and make the maps. For the current map store the variable name, map title, and contour scale.

  # Loop over variables to map
  for (i in seq_len(nrow(maps_to_make))) {
    
    var_name <- maps_to_make$Name[i]
    var_title <- maps_to_make$Title[i]
    var_min <- maps_to_make$base[i]
    var_max <- maps_to_make$top[i]
    
    if (!var_name %in% names(agg)) {
      message("Variable ", var_name, " not found in data. Skipping.")
      next
    }

Next we generate contours and store the contour variable name explicitly. The function ‘interpIDW’, from the terra package, is a moving-window inverse distance weighting interpolation. Template_rast is the target raster grid and is called a SpatRaster. ‘points_vect’ are the data points to be rasterized. The search radius for data to be included is 300,000 m or 300 km. The inverse power value of ‘2’ is a good compromise value for contouring.

    # Interpolate using Inverse Distance Weighting
    interpolated <- interpIDW(template_rast, points_vect, var_name,
                              radius = 300000, power = 2)
    names(interpolated) <- var_name
    
    # Define color breaks
    breaks <- seq(var_min, var_max, length.out = 11)

Make the maps

We use ggplot2 to build the map and store the digital map in ‘p.’

    # Build plot
    p <- ggplot() +

Below the tidyterra ‘geom_spatraster’ function creates a raster map layer from the ‘interpolated’ SpatRaster object created above. Each cell of the map is colored appropriate to the interpolated value of ‘var_name” at the grid location. ‘alpha’ makes the layer a bit transparent so underlying features and layers can be seen.

geom_spatraster(data = interpolated, aes(fill = .data[[var_name]]), alpha = 0.8) +

Here we add the country outlines from ‘rnaturalearth’.

 geom_sf(data = world_rob, fill = "gray50", color = "gray90", linewidth = 0.5) +

Below we add contours using the tidyterra function geom_spatraster. The data is ‘interpolated’ and is identified as ‘var_name’. The ‘breaks” are the contour levels used.

geom_spatraster_contour(data = interpolated,
                              aes(z = .data[[var_name]]),
                              breaks = breaks,
                              color = "black",
                              size = 0.3) +

Here we add the station locations and color them using the same color scale as the rasters. It picks up the colors from the previous geom_spatraster call.

geom_sf(data = points_rob,
              aes(color = .data[[var_name]]),
              size = 1.8,
              show.legend = "point") +

Here we add the ITCZ line and outline it. It is plotted once as a thick white line and then again as a thinner black line to get the outline effect.

geom_sf(data = itcz_line, color = "white", linewidth = 1.5, linetype = "solid") +
      geom_sf(data = itcz_line, color = "black", linewidth = 0.8, linetype = "solid") +  # outline for visibility

Below we add the latitude/longitude lines, the ‘graticule.’

geom_sf(data = graticule, color = "gray60", linewidth = 0.2, alpha = 0.6) +
      geom_sf(
          data = sf::st_graticule(lat = seq(-60, 60, by = 15), 
               lon = seq(-180, 180, by = 30), crs = rob),
               color = "black",
               linewidth = 0.2,
               alpha = 0.5
      ) +

Below we add the color legend that is friendly to the colorblind (‘viridis’).

      scale_fill_viridis_c(option = "plasma",
                           name = var_title,
                           limits = c(var_min, var_max),
                           guide = guide_colorbar(title.position = "top", title.hjust = 0.5)) +
      scale_color_viridis_c(option = "plasma",
                            name = var_title,
                            limits = c(var_min, var_max),
                            guide = "none") +

The ‘coord_sf’ ggplot2 function ensures that all objects plotted conform to the Robinson projection, that is what ‘crs=rob_crs’ means. Anything not already converted to rob_crs is now projected properly.

      coord_sf(crs = rob_crs, expand = FALSE) +

Below ‘labs’ places and plots the title and subtitle for the whole map

      labs(title = paste(var_title, "–", month.name[month_idx], "1990–2025"),
           subtitle = paste("Mean ITCZ Latitude:", round(itcz_this_month, 1), "°N")) +

The ‘theme_void’ ensures that no automatic non-data elements are added to the plot so the programmer has total control.

theme_void(base_size = 13) +

The ggplot2 ‘theme’ function provides a way to add non-data elements to the map explicitly.

 theme(
   plot.background = element_rect(fill = "lightblue", color = NA),
   panel.background = element_rect(fill = "lightblue", color = NA),
   plot.title = element_text(size=16, face="bold", hjust = 0.5, color = "navy"),
   plot.subtitle = element_text(size = 13, hjust = 0.5, color = "navy"),
   legend.position = "bottom",
   legend.title = element_text(size = 12, color = "navy"),
   legend.text = element_text(size = 10, color = "navy"),
   legend.key.width = unit(1.5, "cm")
   )

If uncommented the print command below displays the map on the screen.

# print(p)

Below the map is saved to disk as a 300 dpi png file.

    # Save
    out_file <- file.path(output_dir,
                          paste0(month.name[month_idx], "_", var_name, ".png"))
    
    ggsave(out_file, plot = p, width = 12, height = 7, dpi = 300, bg = "white")

Summary

Map esthetics are a matter of taste and practicality. I tried to make the maps as attractive as possible but still make them interpretable to the colorblind. The points are not contoured over land so the country outlines can be seen, another option is to make the overlying layers more transparent.

When the full set of maps are viewed it is easy to see how the peak height and lowest pressure at the molar density intersection moves with the ITCZ average latitude as described in (May, 2025). The full set of programs to make all the maps and plots can be downloaded here (May, 2025b).

The programs written just for this series can be downloaded here. The full suite of IGRA2 R programs can be downloaded here.

Works Cited

May, A. (2025). The Molar Density Tropopause Proxy and its relation to the ITCZ and Hadley Circulation. OSF. https://doi.org/10.17605/OSF.IO/KBP9S

May, A. (2025b, November 28). Supplementary Materials: The Molar Density Tropopause Proxy and Its Relation to the ITCZ and Hadley Circulation. https://doi.org/10.5281/zenodo.17752293

5 6 votes

Article Rating

8 Comments

Inline Feedbacks

View all comments

Bob Armstrong

January 13, 2026 9:25 pm

Interesting to learn about that Robinson projection .

I suspect that R and APL and K and CoSy , operating on generalized arrays are roughly comparable in concision .

I’m going to indulge myself in implementing seq in CoSy which has a number of similar words defined CoSy.com/4thCoSy/Code/CoSy/Furniture.f , variations on the venerable IBM Arithmetic Progression Vector ( we’re talking the 60s ) , but not one taking range and step . I think it illustrates the difference between mature and market polished R and rough but totally open and malleable at all levels from the interface & language to the chip(s) .

: apvi ( start increment n — a ) iota *i +i ;
| IBM’s Arithmetic Progression Vector . Affine transform of iota n .
I split it into 2 words which I’ll add to furniture .

| Returns the number of steps to get from start to end , inclusive .
: steps ( start_end step — n ) 2p L@ur momisugly ‘ -i ./ -1*i R@ur momisugly %i 1+i 2P> ;

i( -60 60 )i i 15 steps |>| 9

| list of values from start to end by step increments . Like the R word .
: seq ( start_end step ) 2p> steps >a L@ur momisugly dsc R@ur momisugly a> apvi 2P> ;
| See https://wattsupwiththat.com/2026/01/13/r-programming-tips-on-mapping-igra2-radiosonde-data/

i( -60 60 )i `i 15 seq |>| -60 -45 -30 -15 0 15 30 45 60
I assume that is what R‘s seq does .

I think this illustrates the difference in the nature of the languages . If this level of next-gen language which can be openly mapped to fields of processors on FPGAs if you have the budget to make a voxel model of the planet interests you , download Cosy and we’ll set up a Zoom .

Bob Armstrong

Reply to Bob Armstrong

January 13, 2026 9:33 pm

Why the hell this discussion widget adds “ur momisugly ” to @’s when I edited it , and appears to refuse to give them up is — annoying .

Sparta Nova 4

January 14, 2026 8:13 am

A lot of thought and work went into that. Impressive and well done.

stinkerp

January 14, 2026 8:26 am

Ggplot2 is fine for generating static plots that you can print, but who reads paper anymore? You can screenshot your beautiful plot to make an image that you can upload in a web post or share on social media, but it’s limited by the resolution which, for plots with a lot of small detail, can be difficult to read. If you want it to be interactive or to embed it in a website and allow downloading or printing the plot in various forms, as well as the data, frameworks like Plotly and Shiny are excellent tools.

Plotly for R:
https://plotly.com/r/

Plotly for JavaScript:
https://plotly.com/javascript/

Plotly for Python:
https://plotly.com/python/

Andy May

Author

Reply to stinkerp

January 14, 2026 9:43 am

Thanks, useful information.

Steve Richards

January 14, 2026 12:51 pm

I wonder if you use RStudio to help and configure your r scripts? Once I discovered RStudio I found it great to help sequence multiple files during large processing efforts.

TBeholder

January 14, 2026 3:26 pm

Why R?

stinkerp

Reply to TBeholder

January 15, 2026 5:15 pm

It’s been the statistical processing language of choice for decades. It’s widely used. It’s extensible; you can write your own processing functions and save them. And it has numerous packages of functions written by others that you can use so you don’t have to write complicated functions yourself.

wpDiscuz

Welcome to Watts Up With That, one of the most well-known climate blogs! We gather the latest scientific research, news, and expert opinion to help you understand how our planet is changing and what implications it may have for humanity. Our approach is based on facts, objective analysis, and open discussions about one of the most critical issues of our time. Watts up with that climate and what changes await us – let’s figure it out together!

Watts Up With That covers a wide range of topics related to climate change and its impact on the world. Here’s what’s important to us:

Global warming – its causes, consequences, and future forecasts.
Analysis of current climate research and its findings.
Climate change news.
Extreme weather events – hurricanes, droughts, floods, and their connection to climate change.
The impact of different energy sources on the environment and the development of sustainable technologies.
Political and economic aspects and how states and international organizations respond to climate change.

Watts Up With That?

R Programming tips on mapping IGRA2 Radiosonde data

Computing the mappable values

Loop through months to make the maps.

Make the maps

Summary

Works Cited

Like this:

Get notified when a new post is published.

Computing the mappable values

Loop through months to make the maps.

Make the maps

Summary

Works Cited

Share this:

Like this:

Related Posts

R Programming – Improvements in the Language