By Andy May
This is an introductory post to a series on using R to read IGRA2 radiosonde data, process it, and produce both plots and maps of the data. I started using R over 10 years ago mainly because it was a free and very powerful language for statistical analysis (download the current 64-bit Windows version here). At the time, it was a clunky programming language and difficult to use, but that has recently changed. While working on new R programs to analyze the radiosonde data I saw the many substantial improvements to the language added since around 2020. It is now a very impressive language and much easier to use and to read. Before we get into the radiosonde analysis, I’d like to cover the recent improvements in the language. Future posts in this series will provide more details about the R language and my analysis of IGRA2.
Reading and writing data
I used the original “base” R “readLines” function to read the IGRA2 files because each record had to be read as a text string and later parsed into its component values. For this purpose, readLines is ideal and efficient. However, for efficiency, once the records were parsed and prepared for processing in R, I used “fwrite” from the data.table R package to write the resulting data frames (data frames are R matrix structures that contain a mixture of data types) or tibbles (a more modern data frame structure of variables) to disk. Both data frames and tibbles are organized as a table where observations are rows and variables or measurements are columns. Different columns can have different variable types, for example character, float, or integer data types.
The function fwrite was written by Otto Seiskari and Matt Dowle and first released in 2016. It became fully parallel by 2020; it is 10-100 times faster than the alternative write functions. For reading comma delimited files (CSV files) the companion fread function is also very fast and efficient.
dplyr: data manipulation
In June 2020 the very useful dplyr R package, written by Hadley Wickham, was released. The operators and functions in this package are widely used in my IGRA2 processing. The %>% operator (called a ‘pipe’) is especially useful because it makes R code so much more readable and intuitive (see below). Dplyr is used to organize data into subsets (filter, select, arrange, group functions), add variables to an existing data frame or tibble (mutate function), or compute values that summarize a group (summarise function). Other useful column functions are across, rename, relocate, and pull. Row or observations manipulation functions include rowwise, slice, distinct, and across.
Pipes (%>%) for making code more readable
Pipes are supplied by magrittr, which is loaded when dplyr is loaded. In summary, the value to the left of the pipe (lhs), is placed as the first argument of the function on the righthand side (rhs), that is:
lhs %>% rhs is the same as rhs(lhs).
Simple Example
Without using a pipe, the following nested and inside out code is used:
round(mean(sqrt(c(1, 4, 9, 16)), na.rm = TRUE), 2)
# → 2.5
With a pipe we can write this chained and readable equivalent code:
c(1, 4, 9, 16) %>%
sqrt() %>%
mean(na.rm = TRUE) %>%
round(2)
# → 2.5
Use the dot ‘.’ placeholder when you need the piped value in a non-first position:
df %>% lm(y ~ x, data = .) # ‘.’ represents the lhs (df)
tidyverse: integrating data manipulation and plotting
Hadley Wickham, and his collaborators including Romain François, Winston Chang, Garrett Grolemund, Lionel Henry, and others developed and integrated the components of the tidyverse suite of consistent packages for modern R workflows. The key components of tidyverse are ggplot2, tidyr, dplyr, readr, purrr, and tibble. Tidyverse was a mature set of programming tools by 2019 when the paper: “Welcome to the Tidyverse” was published.
Using concepts developed by Hadley Wickham and described in his paper “Tidy Data” tidyverse is designed to help the user clean up messy datasets like IGRA2 (Wickham, 2014). As Wickham says, 80% of data analysis is spent on the process of cleaning and preparing the data. Much of the preparation time is spent preparing the output from one tool or device so it can be input to another. Tidyverse provides a wide range of tools and display functions that all work on basic R data frames and tidyverse tibbles. Further it allows tibbles to be easily translated into data frames.
ggplot2: the plotting workhorse
Ggplot2 is a very powerful part of tidyverse that allows useful and attractive graphs and maps to be made. All the graphs in my most recent paper, except one, were made using ggplot2 (May, 2025).
Hadley Wickham created the initial version of ggplot2 during his PhD studies at Iowa State University, inspired by Leland Wilkinson’s 1999 book The Grammar of Graphics. It provided a more structured alternative to R’s base graphics. The first general version was available in 2014, but a full and modern mapping capability had to wait for the introduction of terra in 2022 by Robert Hijmans. Terra did not integrate very well with ggplot and tidyverse at first and this required a major update to ggplot2 which was completed in September of 2025.
Terra: Mapping
Terra was developed to replace the raster and sp packages (first released around 2010 and 2005 respectively), which were the cornerstones of mapping in R before terra was released and matured around 2023. Robert J. Hijmans (University of California, Davis) created terra to address problems with the earlier R mapping system, terra has a simpler interface, faster performance, expanded capabilities (e.g., better vector integration), and streamlined data classes.
Terra provides efficient methods for geometric operations, local/focal/zonal/global computations, spatial predictions (e.g., via interpolation or machine learning models), and processing of very large files. Key data classes are SpatRaster (replacing multiple raster classes) and SpatVector, which provides robust vector handling. Terra is written in C++ for speed.
Tidyterra: integrating data manipulation and mapping
Tidyterra is a package developed by Diego Hernangómez that successfully allows tidyverse and ggplot2 to work with the spatial mapping package terra. Finally, ggplot2 display functions and the tidyverse data manipulation functions can work seamlessly with the terra mapping functions (Hernangómez, 2023).
Tidyterra extends the functionality of the ggplot2 package by providing additional functions specific to mapping like geom_spatraster and geom_spatvector, as well as other functionality specifically designed for map production. Spatvector objects are lines and polygons and Spatraster objects consist of equal-sized rectangles that contain one or more values (Hernangómez, 2023).
Summary
Thus, with the completion of all this work between 2023 and 2025, R became a very robust data processing and display programing language. In the next few posts, I will provide some critical details about how I used this powerful new version R to make the data displays in May, 2025 and in the paper’s supplementary materials (May, 2025b).
Works Cited
Hernangómez, D. (2023). Using the tidyverse with terra objects: the tidyterra package. J of Open Source Software, 8(91). https://doi.org/10.21105/joss.05751
May, A. (2025). The Molar Density Tropopause Proxy and its relation to the ITCZ and Hadley Circulation. OSF. https://doi.org/10.17605/OSF.IO/KBP9S
May, A. (2025b, November 28). Supplementary Materials: The Molar Density Tropopause Proxy and Its Relation to the ITCZ and Hadley Circulation. https://doi.org/10.5281/zenodo.17752293
Wickham, H. (2014). Tidy Data. J of Statistical Software, 59(10). https://doi.org/10.18637/jss.v059.i10
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., & François, R. (2019). Welcome to the Tidyverse. Journal of Open Source Software, 4(43). https://doi.org/10.21105/joss.01686
Discover more from Watts Up With That?
Subscribe to get the latest posts sent to your email.