R Programming – Improvements in the Language

By Andy May

This is an introductory post to a series on using R to read IGRA2 radiosonde data, process it, and produce both plots and maps of the data. I started using R over 10 years ago mainly because it was a free and very powerful language for statistical analysis (download the current 64-bit Windows version here). At the time, it was a clunky programming language and difficult to use, but that has recently changed. While working on new R programs to analyze the radiosonde data I saw the many substantial improvements to the language added since around 2020. It is now a very impressive language and much easier to use and to read. Before we get into the radiosonde analysis, I’d like to cover the recent improvements in the language. Future posts in this series will provide more details about the R language and my analysis of IGRA2.

Reading and writing data

I used the original “base” R “readLines” function to read the IGRA2 files because each record had to be read as a text string and later parsed into its component values. For this purpose, readLines is ideal and efficient. However, for efficiency, once the records were parsed and prepared for processing in R, I used “fwrite” from the data.table R package to write the resulting data frames (data frames are R matrix structures that contain a mixture of data types) or tibbles (a more modern data frame structure of variables) to disk. Both data frames and tibbles are organized as a table where observations are rows and variables or measurements are columns. Different columns can have different variable types, for example character, float, or integer data types.

The function fwrite was written by Otto Seiskari and Matt Dowle and first released in 2016. It became fully parallel by 2020; it is 10-100 times faster than the alternative write functions. For reading comma delimited files (CSV files) the companion fread function is also very fast and efficient.

dplyr: data manipulation

In June 2020 the very useful dplyr R package, written by Hadley Wickham, was released. The operators and functions in this package are widely used in my IGRA2 processing. The %>% operator (called a ‘pipe’) is especially useful because it makes R code so much more readable and intuitive (see below). Dplyr is used to organize data into subsets (filter, select, arrange, group functions), add variables to an existing data frame or tibble (mutate function), or compute values that summarize a group (summarise function). Other useful column functions are across, rename, relocate, and pull. Row or observations manipulation functions include rowwise, slice, distinct, and across.

Pipes (%>%) for making code more readable

Pipes are supplied by magrittr, which is loaded when dplyr is loaded. In summary, the value to the left of the pipe (lhs), is placed as the first argument of the function on the righthand side (rhs), that is:

lhs %>% rhs is the same as rhs(lhs).

Simple Example

Without using a pipe, the following nested and inside out code is used:

round(mean(sqrt(c(1, 4, 9, 16)), na.rm = TRUE), 2) 
# → 2.5

With a pipe we can write this chained and readable equivalent code:

c(1, 4, 9, 16) %>%
  sqrt() %>%
  mean(na.rm = TRUE) %>%
  round(2)
# → 2.5

Use the dot ‘.’ placeholder when you need the piped value in a non-first position:

df %>% lm(y ~ x, data = .) # ‘.’ represents the lhs (df)

tidyverse: integrating data manipulation and plotting

Hadley Wickham, and his collaborators including Romain François, Winston Chang, Garrett Grolemund, Lionel Henry, and others developed and integrated the components of the tidyverse suite of consistent packages for modern R workflows. The key components of tidyverse are ggplot2, tidyr, dplyr, readr, purrr, and tibble. Tidyverse was a mature set of programming tools by 2019 when the paper: “Welcome to the Tidyverse” was published.

Using concepts developed by Hadley Wickham and described in his paper “Tidy Data” tidyverse is designed to help the user clean up messy datasets like IGRA2 (Wickham, 2014). As Wickham says, 80% of data analysis is spent on the process of cleaning and preparing the data. Much of the preparation time is spent preparing the output from one tool or device so it can be input to another. Tidyverse provides a wide range of tools and display functions that all work on basic R data frames and tidyverse tibbles. Further it allows tibbles to be easily translated into data frames.

ggplot2: the plotting workhorse

Ggplot2 is a very powerful part of tidyverse that allows useful and attractive graphs and maps to be made. All the graphs in my most recent paper, except one, were made using ggplot2 (May, 2025).

Hadley Wickham created the initial version of ggplot2 during his PhD studies at Iowa State University, inspired by Leland Wilkinson’s 1999 book The Grammar of Graphics. It provided a more structured alternative to R’s base graphics. The first general version was available in 2014, but a full and modern mapping capability had to wait for the introduction of terra in 2022 by Robert Hijmans. Terra did not integrate very well with ggplot and tidyverse at first and this required a major update to ggplot2 which was completed in September of 2025.

Terra: Mapping

Terra was developed to replace the raster and sp packages (first released around 2010 and 2005 respectively), which were the cornerstones of mapping in R before terra was released and matured around 2023. Robert J. Hijmans (University of California, Davis) created terra to address problems with the earlier R mapping system, terra has a simpler interface, faster performance, expanded capabilities (e.g., better vector integration), and streamlined data classes.

Terra provides efficient methods for geometric operations, local/focal/zonal/global computations, spatial predictions (e.g., via interpolation or machine learning models), and processing of very large files. Key data classes are SpatRaster (replacing multiple raster classes) and SpatVector, which provides robust vector handling. Terra is written in C++ for speed.

Tidyterra: integrating data manipulation and mapping

Tidyterra is a package developed by Diego Hernangómez that successfully allows tidyverse and ggplot2 to work with the spatial mapping package terra. Finally, ggplot2 display functions and the tidyverse data manipulation functions can work seamlessly with the terra mapping functions (Hernangómez, 2023).

Tidyterra extends the functionality of the ggplot2 package by providing additional functions specific to mapping like geom_spatraster and geom_spatvector, as well as other functionality specifically designed for map production. Spatvector objects are lines and polygons and Spatraster objects consist of equal-sized rectangles that contain one or more values (Hernangómez, 2023).

Summary

Thus, with the completion of all this work between 2023 and 2025, R became a very robust data processing and display programing language. In the next few posts, I will provide some critical details about how I used this powerful new version R to make the data displays in May, 2025 and in the paper’s supplementary materials (May, 2025b).

Works Cited

Hernangómez, D. (2023). Using the tidyverse with terra objects: the tidyterra package. J of Open Source Software, 8(91). https://doi.org/10.21105/joss.05751

May, A. (2025). The Molar Density Tropopause Proxy and its relation to the ITCZ and Hadley Circulation. OSF. https://doi.org/10.17605/OSF.IO/KBP9S

May, A. (2025b, November 28). Supplementary Materials: The Molar Density Tropopause Proxy and Its Relation to the ITCZ and Hadley Circulation. https://doi.org/10.5281/zenodo.17752293

Wickham, H. (2014). Tidy Data. J of Statistical Software, 59(10). https://doi.org/10.18637/jss.v059.i10

Wickham‎, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., & François, R. (2019). Welcome to the Tidyverse. Journal of Open Source Software, 4(43). https://doi.org/10.21105/joss.01686

5 3 votes
Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

Subscribe
Notify of
24 Comments
Inline Feedbacks
View all comments
Editor
January 11, 2026 10:12 am

Thanks, Andy. I use R exclusively for my work. A question.

You recommend fread and fwrite for saving and reading csv files and data frames to-from disk. Do they work for arrays?

I ask because I’ve never used a csv or data.frame file so large that read or write times were an issue. The big data I use are all 3-d arrays, which (near as I can tell) are not read or written by fread/fwrite.

Great article,

w.

Bob Armstrong
Reply to  Andy May
January 11, 2026 5:00 pm

Reading & writing various formats including particularly CSVs is very important . CoSy has an extensive vocabulary mainly at https://cosy.com/4thCoSy/Code/CoSy/Furniture.f .

bdgwx
Reply to  Willis Eschenbach
January 11, 2026 2:12 pm

Are you using netCDF or GRIB file formats? If so which packages in R are using to read these files?

January 11, 2026 10:28 am

Thanks for this, finally something I can agree with. As a hobbiest, discovering the Tidyverse packages revolutionised how I used R. I definitly need to look into Terra now.

One small quibble – I’m pretty sure that dplyer has been around since well before 2021.

January 11, 2026 11:11 am

A good write up, thanks

Eng_Ian
January 11, 2026 11:45 am

round(mean(sqrt(c(1, 4, 9, 16)), na.rm = TRUE), 2)

# → 2.9

I must be missing something here. Isn’t the mean of the sqrt of 1, 4, 9 and 16 going to be 2.5?

eg (1+2+3+4)/4

Or should it be the sqrt of (1+4+9+16)/4 = 2.7

Where did 2.9 come from?

Mr.
January 11, 2026 12:09 pm

So SuperCalc and Lotus 123 aren’t in vogue any more?

January 11, 2026 12:13 pm

I use R all the time. Any idea how I can get Excel and R to “talk” with each other? For example, to make a prediction on new live data feeding into Excel it would be nice to call the predict function from R to make the forecast. For simple linear regression models I just copy the regression coefficients from R over to Excel to make predictions on the fly in Excel. But that’s more difficult to do with more complicated models created using GAM or LOESS. Any thoughts?

bdgwx
Reply to  Bob Vislocky
January 11, 2026 2:07 pm

I use the readxl package. Here is a quick snippet for pulling values out of Excel.

library("readxl")
sheet <- read_excel("path-to-file", sheet = "name-of-sheet")
yvalues <- sheet[["XValues"]]
yvalues <- yvalues[!is.na(yvalues)]
xvalues <- sheet[["YValues"]]
xvalues <- xvalues[1:length(yvalues)]
fah
January 11, 2026 1:05 pm

Thanks for this discussion. I myself have been using MATLAB since before it was called MATLAB, when it was just a couple of matrix inversion packages for mainframe computing. Over the years, I have eschewed using R in lieu of the (in my view then) more powerful MATLAB statistical toolboxes, coupled with the associated other capabilities and toolboxes involving image analysis, optimization, PDEs, ML, symbolic math, large scale intrinsic vectorization, etc. etc. and I have been teaching students to use it in analysis of lab experimental data for some years. But I have been re-thinking that perspective as AI and LLMs have become ever more powerful owing to virtually unlimited availability of memory and computational power by virtue of the cloud.

The world of AI seems to be comprised of largely non-proprietary codes such as Python, C++, JavaScript, etc. and I presume R in its present form has or will find a place in that world. As a result I have been revisiting my familiarity with all of the open source codes and this convinces me to add R to the list. And continue rethinking inclusion of these codes in teaching practice. One nice thing I have found about AI is that if you have some code in MATLAB (or other platforms) it will convert that code to whatever open source context you want and debug issues for you along the way. I have not tried it, but I suspect it will also write code for you if you just ask it nicely. It is an interesting world we live in.

bdgwx
January 11, 2026 2:04 pm

This is a cool article Andy. I use R for data analysis as well. It can do so much. I wish I know it better than I do. It’s just that I already have to deal with so many other programming languages on a daily basis that I find my R skills to be subpar and I forget how to do even basic stuff occasionally because I don’t use it as frequently. I am working on a project in my professional life in which I’m trying to incorporate R so hopefully I can become more proficient.

Anyway, take a look at the vcovHAC function. This is how I compute the uncertainty of the linear regression trends I post here.

Bob Armstrong
January 11, 2026 4:45 pm

Learning APL in the mid 70s to understand the multidimensional geometry and its algebra underlying pattern recognition and associative memory thru being able to interactively execute the succinct expression of the math .

R , I believe , openly borrows some concepts from APL .

CoSy is my notekeeping environment evolved since then in open FORTH . It’s rules are Simple . It’s a true human level language in which one thinks in terms
of defining words rather than writing programs . CoSy is human level also in being a rich vocabulary . The currently ~ 2300 word vocabulary is open from parsing and array operators to the ( currently x86/7 ) chip .

The example under Pipes above would be expressed in current development CoSy as

 f( 1 4 9 19 )f sqrtf avgf 
which returns 2.5897… on the data stack for the next word down the line .
I’d leave the rounding until outputting .

To make a word out of that operation would simply be :

: sqrtavg sqrtf avgf ;

Then I can write
10 _iotaf sqrtavg | resulting |>| 1.93060 
for the numbers 0 thru 9 .

This allows the definition of the Planck thermal power function in a line :

| ( WaveLength Temperature — EnergyDensity )

: Planckl *f [ h 2. _f *f c ^2f *f ]+ $ [ h c boltz %f *f ]+ $ %f expf f1. -f %f ;
| Atomic : ie: applies to entire lists of lists of WLs & Ts .

I’ll be adding this verb to https://cosy.com/4thCoSy/Code/Physics/general.f .

CoSy is open and freely downloadable at CoSy.com/4thCoSy/ . Contact me if you’d like to schedule a Zoom , or have a specific domain you’d like to tame .

Bob Armstrong
Reply to  Bob Armstrong
January 11, 2026 5:06 pm

Whoops .
 f( 1 4 9 16 )f sqrtavg  |>| 2.50000 

Bob Armstrong
Reply to  Bob Armstrong
January 11, 2026 5:27 pm

And , for the heck of it , here’s the result for the squares of 1 thru 100000 .

100000 _iotaf f 1 +f f 2 ^f sqrtavg |>| 50000.5000