By Andy May
Our last post was on reading and plotting IGRA2 weather balloon radiosonde data. This post shows how to map the data. The IGRA2 data can be downloaded from here (ftp site: ftp.ncei.noaa.gov/pub/data/igra). Once the data are prepared, some further tricky programming is needed to map it. This post will introduce the key concepts and code structure required. Writing complete programs will require understanding these concepts and working with Grok.
Computing the mappable values
As mentioned in the first post, the volume of IGRA2 data is so large that several steps are required to get the data prepared for analysis, mapping and plotting. The initial R program that reads the IGRA2 data was described in the last post, here we will show how to map it. An example map is shown in figure 1.

First, since the goal is to map the IGRA2 data, we need to group the data by radiosonde ascent, then summarize the detailed data by station. ‘ckdate’ is a computed parameter that combines year, month, day and hour. It is unique for each radiosonde ascent. The following code uses data.table package syntax to create a data.table called ‘ascent.’ The code is from the R program ‘station_analysis_no_plots.r’ (May, 2025b). It groups the contents of ‘station’ by the value in ckdate. Then it copies the columns of ‘station’ specified (id, lat, …) into ‘ascent.’ The rest of the columns in station are not copied. The period ‘.’ before the list of station columns to copy simply means ‘list.’
# Read file as data.table
station <- readRDS(fname)
# Calculate decimal month
station[, dec_month := (year - 1990) * 12 + (month - 1) + day / 30.4375 + hour / 730.5]
# Group by ascent (unique dec_month)
ascents <- station[, .(dec_month = unique(dec_month)), by = dec_month]
# Process each ascent
ckdate <- ascents$dec_month[1]
for (ckdate in ascents$dec_month) {
# Extract ascent data
ascent <- station[dec_month == ckdate, .(id, lat, long, month, year, hPa, temp, mol_den, gph, rh, wdir, wspd, q_gkg)]
Next, the key variables are interpolated to fill in missing values, if necessary. After interpolation, the next bit of code computes the molar density intersection and the values of various variables at the molar density intersection in the same way as described in the previous post. The last preparation step groups the results into files by month.
In order to make the maps we need to set up some key map variables first.
The Robinson map projection (‘proj=robin’) is attractive and reasonable to use when making global maps. The code below stores a character string of Robinson map parameters that we will need to use often, this way we only need to type them all once. The next three parameters specify that the Greenwich /Prime Meridian is zero and there is no false easting or northing. The reference ellipsoid is WGS84, the modern standard. The ‘no_defs’ flag says don’t add anything to the projection, leave it as we have defined it.
# Robinson projection (defined once)
rob_crs <- "+proj=robin +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"
We need country outlines to orient the viewer. The country outlines in ‘ne_countries’ come from the ‘rnaturalearth’ R package. We are loading the medium resolution version, which is suitable for most publications. We have asked that the returned data (stored in ‘world’) be an ‘sf’ or simple feature object. The R function ‘st_transform’ changes the ‘world’ geometry to conform to the ‘rob_crs’ (Robinson and WGS84) projection as defined above.
# Load world map and transform once
world <- ne_countries(scale = "medium", returnclass = "sf")
world_rob <- st_transform(world, rob_crs)
We need a latitude and longitude grid on the map. This R code segment creates a ‘graticule’ or latitude/longitude grid for a Robinson projection. The grid is from 60°S to 60°N by 15° and from the International date line (-180 and +180) all the way around the world in 30° steps. ‘st_crs(4326)’ refers to the WGS 84 coordinate reference system (CRS). After creation it is transformed to rob_crs (Robinson and WGS84) by st_transform.
It is interesting that the full name for WGS 84 is EPSG:4326, which stands for European Petroleum Survey Group mapping standards for WGS 84 latitude and longitude coordinates. Beginning in the oil and gas industry, it is now the worldwide mapping standard, Google maps uses a variation called EPSG:3857 for Web Mercator, the official name is WGS 84/Pseudo-Mercator. Web Mercator is a simplified projection that assumes the Earth is a perfect sphere.
# Graticule (lat: -60 to 60, lon: every 30°)
graticule <- st_graticule(lat = seq(-60, 60, by = 15),
lon = seq(-180, 180, by = 30),
crs = st_crs(4326)) %>%
st_transform(rob_crs)
The location of the ITCZ changes constantly, so we need to display a reasonable average monthly latitude for the ITCZ to see how the climate parameters change with it. Reasonable average latitudes for the ITCZ were determined using the method described in May, 2025 and will be discussed in a future post. The mean latitudes are read from ‘ITCZ_mean_latitudes.csv.’ The map color scales and other particulars for each map are read from ‘map_color_scale_top_base.csv.’
# Load ITCZ latitudes and legend scales
itcz_lat <- fread(file.path(root_dir, "ITCZ_mean_latitudes.csv"))
legend_limits <- fread(file.path(root_dir, "variable_lists_details/map_color_scale_top_base.csv"))
The maps to make are specified in ‘map_indices’ using the row index in ‘legend_limits.’
# Define which variables to map (by row index in legend_limits)
# Update this list if you add/remove variables
map_indices <- c(2, 3, 6, 7, 8, 15, 16, 17, 25, 26, 27, 35, 36, 37, 41, 42, 43, 44)
maps_to_make <- legend_limits[map_indices]
if (nrow(maps_to_make) != length(map_indices)) {
stop("Some selected map indices are out of range in legend_limits.")
}
The code below uses the terra package ‘vect’ function to convert the global map of countries to a ‘SpatVector’ object and then extracts (‘ext’) the limits or boundaries of the object and stores them in ‘ext_rob.’ The vect function is the primary constructor and reader function for SpatVector objects.
The terra function ‘rast’ is used to set the bounding box of the raster dataset to be created to that of the country outlines, so they match. The raster cell size is set to 100,000 or 100 km x 100 km. Finally, the rob_crs (Robinson/WGS84) projection is assigned to keep everything on the same projection.
# Pre-compute template raster for interpolation (100 km resolution)
ext_rob <- ext(vect(world_rob))
template_rast <- rast(ext_rob, resolution = 100000, crs = rob_crs)
Loop through months to make the maps.
This loop makes the maps for each month (‘month_idx’).
# Main loop over months
for (month_idx in 1:12) {
Read data for the month
dt <- readRDS(file_name)
Aggregate all the data for a station by taking the means of the columns in the data frame ‘dt’. First the data frame ‘dt’ is grouped ‘by’ the station id. The data.table special variable ‘.SD’ contains the current ‘by’ group variables except for the grouping variable ‘station_id.’
The base R function ‘lapply’ is used to take the mean of all the variables in dt (ignoring nulls, because na.rm=TRUE).
This sort of statement combines ‘lapply,’ ‘.SD,’ and ‘by’ and is one of the most powerful features of data.table. It works with any number of variables and you do not have to list them. The special variable ‘.SD’ contains all the dt variables except those in the ‘by’ function. The means in ‘agg’ have the same names as the raw data in ‘dt’ but only the mean value per station.
# Aggregate by station: mean of all computed variables
agg <- dt[, lapply(.SD, mean, na.rm = TRUE), by = station_id]
Here are the column names after the above statement is executed:
> colnames(agg)
[1] "station_id" "d_month" "latitude" "longitude"
[5] "Month" "Year" "int_hPa" "int_mol_den"
[9] "int_gph" "int_wspd" "int_wdir" "int_temp"
[13] "int_rh" "int_q_gkg" "int_spDen" "int_M_flux"
[17] "spd_lower" "dir_lower" "spDen_lower" "M_flux_lower"
[21] "NS_flux_lower" "mean_u_lower" "mean_v_lower" "temp_lower"
[25] "rh_lower" "q_gkg_lower" "spd_middle" "dir_middle"
[29] "spDen_middle" "M_flux_middle" "NS_flux_middle" "mean_u_middle"
[33] "mean_v_middle" "temp_middle" "rh_middle" "q_gkg_middle"
[37] "spd_upper" "dir_upper" "spDen_upper" "M_flux_upper"
[41] "NS_flux_upper" "mean_u_upper" "mean_v_upper" "temp_upper"
[45] "rh_upper" "q_gkg_upper" "mean_M_flux" "mean_NS_flux"
[49] "mean_u_i" "mean_v_i"
We do not want all the variables in ‘agg.’ The following statement keeps the variable names that contain “latitude | longitude | d_month | int_|mean_”. The ‘grepl’ R function returns TRUE or FALSE for each element in a character vector, depending upon whether the strings in the call are found in the vector element. This excludes ‘station_id’, ‘Month’, ‘Year’, and some other variables. The second statement replaces the original variable list for a modified agg with the variables we really want, effectively deleting the variables we don’t want.
# Keep only relevant columns: location + all int_* and mean_* variables
cols_to_keep <- names(agg)[grepl("^(latitude|longitude|d_month|int_|mean_)", names(agg))]
agg <- agg[, ..cols_to_keep]
Here are the column names after the two statements above:
> colnames(agg)
[1] "d_month" "latitude" "longitude" "int_hPa"
[5] "int_mol_den" "int_gph" "int_wspd" "int_wdir"
[9] "int_temp" "int_rh" "int_q_gkg" "int_spDen"
[13] "int_M_flux" "mean_u_lower" "mean_v_lower" "mean_u_middle"
[17] "mean_v_middle" "mean_u_upper" "mean_v_upper" "mean_M_flux"
[21] "mean_NS_flux" "mean_u_i" "mean_v_i"
>
The last two variables are: ‘mean_u’ and ‘mean_v.’ These are the average east/west (plus means blowing to the east) and north/south (plus means blowing to the north) wind vectors respectively. The following code creates variable names to hold their averages for the upper, middle and lower troposphere.
# Vector-averaged wind speed and direction (upper, middle, lower troposphere)
wind_levels <- c("upper", "middle", "lower")
for (lvl in wind_levels) {
u_col <- paste0("mean_u_", lvl)
v_col <- paste0("mean_v_", lvl)
Create the vector mean wind speed and direction for the upper, middle and lower troposphere and store in agg, the aggregate set of map values. This statement computes the vector average wind speeds, spd_upper, spd_middle, and spd_lower.
agg[[paste0("spd_", lvl)]] <- sqrt(agg[[u_col]]^2 + agg[[v_col]]^2)
The statement below computes the “from” wind direction averages dir_upper, dir_middle, and dir_lower. ‘u_col’ and ‘v_col’ are the east/west and north/south TO (+ = east or north) wind direction components. This code reverses that to the meteorological convention (FROM).
# Direction FROM (meteorological: 0° = from north, 90° = from east)
agg[[paste0("dir_", lvl)]] <- (atan2(agg[[u_col]], agg[[v_col]]) * 180 / pi + 180 + 360) %% 360
}
Here are column names in agg after adding the direction and speed variables from the statements above:
> colnames(agg)
[1] "d_month" "latitude" "longitude" "int_hPa"
[5] "int_mol_den" "int_gph" "int_wspd" "int_wdir"
[9] "int_temp" "int_rh" "int_q_gkg" "int_spDen"
[13] "int_M_flux" "mean_u_lower" "mean_v_lower" "mean_u_middle"
[17] "mean_v_middle" "mean_u_upper" "mean_v_upper" "mean_M_flux"
[21] "mean_NS_flux" "mean_u_i" "mean_v_i" "spd_upper"
[25] "dir_upper" "spd_middle" "dir_middle" "spd_lower"
[29] "dir_lower"
Convert the latitude and longitude in agg to Robinson map coordinates
# Convert to sf (WGS84 → Robinson)
points_sf <- st_as_sf(agg, coords = c("longitude", "latitude"), crs = 4326)
points_rob <- st_transform(points_sf, rob_crs)
points_vect <- vect(points_rob)
Create a line across the map at the average ITCZ (Intertropical Convergence Zone) for the current month and convert the line to Robinson coordinates. First a data.frame of 361 points around the world is created at the mean ITCZ for this month. The data.frame is then converted into an sf object with 361 points with ‘st_as_sf’ in WGS 84/EPSG: 4326 geographic coordinates. Next ‘summarise’ combines the 361 points into one feature with multiple points (called a MULTIPOINT) using ‘st_combine.’ Finally, ‘st_cast’ is used to combine the multiple points into a line called a LINESTRING.
# Create ITCZ line for this month
itcz_this_month <- itcz_lat$ITCZ_lat[month_idx]
itcz_line <- data.frame(
lon = seq(-180, 180, by = 1),
lat = rep(itcz_this_month, 361)
) %>%
st_as_sf(coords = c("lon", "lat"), crs = 4326) %>%
summarise(geometry = st_combine(geometry)) %>%
st_cast("LINESTRING") %>%
st_transform(rob_crs)
Loop through selected variables and make the maps. For the current map store the variable name, map title, and contour scale.
# Loop over variables to map
for (i in seq_len(nrow(maps_to_make))) {
var_name <- maps_to_make$Name[i]
var_title <- maps_to_make$Title[i]
var_min <- maps_to_make$base[i]
var_max <- maps_to_make$top[i]
if (!var_name %in% names(agg)) {
message("Variable ", var_name, " not found in data. Skipping.")
next
}
Next we generate contours and store the contour variable name explicitly. The function ‘interpIDW’, from the terra package, is a moving-window inverse distance weighting interpolation. Template_rast is the target raster grid and is called a SpatRaster. ‘points_vect’ are the data points to be rasterized. The search radius for data to be included is 300,000 m or 300 km. The inverse power value of ‘2’ is a good compromise value for contouring.
# Interpolate using Inverse Distance Weighting
interpolated <- interpIDW(template_rast, points_vect, var_name,
radius = 300000, power = 2)
names(interpolated) <- var_name
# Define color breaks
breaks <- seq(var_min, var_max, length.out = 11)
Make the maps
We use ggplot2 to build the map and store the digital map in ‘p.’
# Build plot
p <- ggplot() +
Below the tidyterra ‘geom_spatraster’ function creates a raster map layer from the ‘interpolated’ SpatRaster object created above. Each cell of the map is colored appropriate to the interpolated value of ‘var_name” at the grid location. ‘alpha’ makes the layer a bit transparent so underlying features and layers can be seen.
geom_spatraster(data = interpolated, aes(fill = .data[[var_name]]), alpha = 0.8) +
Here we add the country outlines from ‘rnaturalearth’.
geom_sf(data = world_rob, fill = "gray50", color = "gray90", linewidth = 0.5) +
Below we add contours using the tidyterra function geom_spatraster. The data is ‘interpolated’ and is identified as ‘var_name’. The ‘breaks” are the contour levels used.
geom_spatraster_contour(data = interpolated,
aes(z = .data[[var_name]]),
breaks = breaks,
color = "black",
size = 0.3) +
Here we add the station locations and color them using the same color scale as the rasters. It picks up the colors from the previous geom_spatraster call.
geom_sf(data = points_rob,
aes(color = .data[[var_name]]),
size = 1.8,
show.legend = "point") +
Here we add the ITCZ line and outline it. It is plotted once as a thick white line and then again as a thinner black line to get the outline effect.
geom_sf(data = itcz_line, color = "white", linewidth = 1.5, linetype = "solid") +
geom_sf(data = itcz_line, color = "black", linewidth = 0.8, linetype = "solid") + # outline for visibility
Below we add the latitude/longitude lines, the ‘graticule.’
geom_sf(data = graticule, color = "gray60", linewidth = 0.2, alpha = 0.6) +
geom_sf(
data = sf::st_graticule(lat = seq(-60, 60, by = 15),
lon = seq(-180, 180, by = 30), crs = rob),
color = "black",
linewidth = 0.2,
alpha = 0.5
) +
Below we add the color legend that is friendly to the colorblind (‘viridis’).
scale_fill_viridis_c(option = "plasma",
name = var_title,
limits = c(var_min, var_max),
guide = guide_colorbar(title.position = "top", title.hjust = 0.5)) +
scale_color_viridis_c(option = "plasma",
name = var_title,
limits = c(var_min, var_max),
guide = "none") +
The ‘coord_sf’ ggplot2 function ensures that all objects plotted conform to the Robinson projection, that is what ‘crs=rob_crs’ means. Anything not already converted to rob_crs is now projected properly.
coord_sf(crs = rob_crs, expand = FALSE) +
Below ‘labs’ places and plots the title and subtitle for the whole map
labs(title = paste(var_title, "–", month.name[month_idx], "1990–2025"),
subtitle = paste("Mean ITCZ Latitude:", round(itcz_this_month, 1), "°N")) +
The ‘theme_void’ ensures that no automatic non-data elements are added to the plot so the programmer has total control.
theme_void(base_size = 13) +
The ggplot2 ‘theme’ function provides a way to add non-data elements to the map explicitly.
theme(
plot.background = element_rect(fill = "lightblue", color = NA),
panel.background = element_rect(fill = "lightblue", color = NA),
plot.title = element_text(size=16, face="bold", hjust = 0.5, color = "navy"),
plot.subtitle = element_text(size = 13, hjust = 0.5, color = "navy"),
legend.position = "bottom",
legend.title = element_text(size = 12, color = "navy"),
legend.text = element_text(size = 10, color = "navy"),
legend.key.width = unit(1.5, "cm")
)
If uncommented the print command below displays the map on the screen.
# print(p)
Below the map is saved to disk as a 300 dpi png file.
# Save
out_file <- file.path(output_dir,
paste0(month.name[month_idx], "_", var_name, ".png"))
ggsave(out_file, plot = p, width = 12, height = 7, dpi = 300, bg = "white")
Summary
Map esthetics are a matter of taste and practicality. I tried to make the maps as attractive as possible but still make them interpretable to the colorblind. The points are not contoured over land so the country outlines can be seen, another option is to make the overlying layers more transparent.
When the full set of maps are viewed it is easy to see how the peak height and lowest pressure at the molar density intersection moves with the ITCZ average latitude as described in (May, 2025). The full set of programs to make all the maps and plots can be downloaded here (May, 2025b).
The programs written just for this series can be downloaded here. The full suite of IGRA2 R programs can be downloaded here.
Works Cited
May, A. (2025). The Molar Density Tropopause Proxy and its relation to the ITCZ and Hadley Circulation. OSF. https://doi.org/10.17605/OSF.IO/KBP9S
May, A. (2025b, November 28). Supplementary Materials: The Molar Density Tropopause Proxy and Its Relation to the ITCZ and Hadley Circulation. https://doi.org/10.5281/zenodo.17752293
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.