R Programming – Improvements in the Language

By Andy May

This is an introductory post to a series on using R to read IGRA2 radiosonde data, process it, and produce both plots and maps of the data. I started using R over 10 years ago mainly because it was a free and very powerful language for statistical analysis (download the current 64-bit Windows version here). At the time, it was a clunky programming language and difficult to use, but that has recently changed. While working on new R programs to analyze the radiosonde data I saw the many substantial improvements to the language added since around 2020. It is now a very impressive language and much easier to use and to read. Before we get into the radiosonde analysis, I’d like to cover the recent improvements in the language. Future posts in this series will provide more details about the R language and my analysis of IGRA2.

Reading and writing data

I used the original “base” R “readLines” function to read the IGRA2 files because each record had to be read as a text string and later parsed into its component values. For this purpose, readLines is ideal and efficient. However, for efficiency, once the records were parsed and prepared for processing in R, I used “fwrite” from the data.table R package to write the resulting data frames (data frames are R matrix structures that contain a mixture of data types) or tibbles (a more modern data frame structure of variables) to disk. Both data frames and tibbles are organized as a table where observations are rows and variables or measurements are columns. Different columns can have different variable types, for example character, float, or integer data types.

The function fwrite was written by Otto Seiskari and Matt Dowle and first released in 2016. It became fully parallel by 2020; it is 10-100 times faster than the alternative write functions. For reading comma delimited files (CSV files) the companion fread function is also very fast and efficient.

dplyr: data manipulation

In June 2020 the very useful dplyr R package, written by Hadley Wickham, was released. The operators and functions in this package are widely used in my IGRA2 processing. The %>% operator (called a ‘pipe’) is especially useful because it makes R code so much more readable and intuitive (see below). Dplyr is used to organize data into subsets (filter, select, arrange, group functions), add variables to an existing data frame or tibble (mutate function), or compute values that summarize a group (summarise function). Other useful column functions are across, rename, relocate, and pull. Row or observations manipulation functions include rowwise, slice, distinct, and across.

Pipes (%>%) for making code more readable

Pipes are supplied by magrittr, which is loaded when dplyr is loaded. In summary, the value to the left of the pipe (lhs), is placed as the first argument of the function on the righthand side (rhs), that is:

lhs %>% rhs is the same as rhs(lhs).

Simple Example

Without using a pipe, the following nested and inside out code is used:

round(mean(sqrt(c(1, 4, 9, 16)), na.rm = TRUE), 2) 
# → 2.5

With a pipe we can write this chained and readable equivalent code:

c(1, 4, 9, 16) %>%
  sqrt() %>%
  mean(na.rm = TRUE) %>%
  round(2)
# → 2.5

Use the dot ‘.’ placeholder when you need the piped value in a non-first position:

df %>% lm(y ~ x, data = .) # ‘.’ represents the lhs (df)

tidyverse: integrating data manipulation and plotting

Hadley Wickham, and his collaborators including Romain François, Winston Chang, Garrett Grolemund, Lionel Henry, and others developed and integrated the components of the tidyverse suite of consistent packages for modern R workflows. The key components of tidyverse are ggplot2, tidyr, dplyr, readr, purrr, and tibble. Tidyverse was a mature set of programming tools by 2019 when the paper: “Welcome to the Tidyverse” was published.

Using concepts developed by Hadley Wickham and described in his paper “Tidy Data” tidyverse is designed to help the user clean up messy datasets like IGRA2 (Wickham, 2014). As Wickham says, 80% of data analysis is spent on the process of cleaning and preparing the data. Much of the preparation time is spent preparing the output from one tool or device so it can be input to another. Tidyverse provides a wide range of tools and display functions that all work on basic R data frames and tidyverse tibbles. Further it allows tibbles to be easily translated into data frames.

ggplot2: the plotting workhorse

Ggplot2 is a very powerful part of tidyverse that allows useful and attractive graphs and maps to be made. All the graphs in my most recent paper, except one, were made using ggplot2 (May, 2025).

Hadley Wickham created the initial version of ggplot2 during his PhD studies at Iowa State University, inspired by Leland Wilkinson’s 1999 book The Grammar of Graphics. It provided a more structured alternative to R’s base graphics. The first general version was available in 2014, but a full and modern mapping capability had to wait for the introduction of terra in 2022 by Robert Hijmans. Terra did not integrate very well with ggplot and tidyverse at first and this required a major update to ggplot2 which was completed in September of 2025.

Terra: Mapping

Terra was developed to replace the raster and sp packages (first released around 2010 and 2005 respectively), which were the cornerstones of mapping in R before terra was released and matured around 2023. Robert J. Hijmans (University of California, Davis) created terra to address problems with the earlier R mapping system, terra has a simpler interface, faster performance, expanded capabilities (e.g., better vector integration), and streamlined data classes.

Terra provides efficient methods for geometric operations, local/focal/zonal/global computations, spatial predictions (e.g., via interpolation or machine learning models), and processing of very large files. Key data classes are SpatRaster (replacing multiple raster classes) and SpatVector, which provides robust vector handling. Terra is written in C++ for speed.

Tidyterra: integrating data manipulation and mapping

Tidyterra is a package developed by Diego Hernangómez that successfully allows tidyverse and ggplot2 to work with the spatial mapping package terra. Finally, ggplot2 display functions and the tidyverse data manipulation functions can work seamlessly with the terra mapping functions (Hernangómez, 2023).

Tidyterra extends the functionality of the ggplot2 package by providing additional functions specific to mapping like geom_spatraster and geom_spatvector, as well as other functionality specifically designed for map production. Spatvector objects are lines and polygons and Spatraster objects consist of equal-sized rectangles that contain one or more values (Hernangómez, 2023).

Summary

Thus, with the completion of all this work between 2023 and 2025, R became a very robust data processing and display programing language. In the next few posts, I will provide some critical details about how I used this powerful new version R to make the data displays in May, 2025 and in the paper’s supplementary materials (May, 2025b).

Works Cited

Hernangómez, D. (2023). Using the tidyverse with terra objects: the tidyterra package. J of Open Source Software, 8(91). https://doi.org/10.21105/joss.05751

May, A. (2025). The Molar Density Tropopause Proxy and its relation to the ITCZ and Hadley Circulation. OSF. https://doi.org/10.17605/OSF.IO/KBP9S

May, A. (2025b, November 28). Supplementary Materials: The Molar Density Tropopause Proxy and Its Relation to the ITCZ and Hadley Circulation. https://doi.org/10.5281/zenodo.17752293

Wickham, H. (2014). Tidy Data. J of Statistical Software, 59(10). https://doi.org/10.18637/jss.v059.i10

Wickham‎, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., & François, R. (2019). Welcome to the Tidyverse. Journal of Open Source Software, 4(43). https://doi.org/10.21105/joss.01686

5 3 votes

Article Rating

Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.

24 Comments

Inline Feedbacks

View all comments

Willis Eschenbach

Editor

January 11, 2026 10:12 am

Thanks, Andy. I use R exclusively for my work. A question.

You recommend fread and fwrite for saving and reading csv files and data frames to-from disk. Do they work for arrays?

I ask because I’ve never used a csv or data.frame file so large that read or write times were an issue. The big data I use are all 3-d arrays, which (near as I can tell) are not read or written by fread/fwrite.

Great article,

Andy May

Author

Reply to Willis Eschenbach

January 11, 2026 10:55 am

Thanks Willis. You can read or write csv arrays with fread and fwrite, but fread always reads into a data.table or data.frame. You should always be able to handle a data.table or data.frame like it was an array, but you will have many more options for manipulating it. If you have to turn it into an array you can, as follows (from Grok):
library(data.table)
dt <- data.table(
a = 1:5,
b = 11:15,
c = 21:25
)
# Option A: Convert to matrix (most frequent choice)
m <- as.matrix(dt)
# or
m <- as.matrix(dt[]) # the [] sometimes helps with certain edge cases
class(m) # → “matrix” “array”
dim(m) # → 5 3

If you can read data with fread and fwrite, it will almost always be much faster.

Bob Armstrong

Reply to Andy May

January 11, 2026 5:00 pm

Reading & writing various formats including particularly CSVs is very important . CoSy has an extensive vocabulary mainly at https://cosy.com/4thCoSy/Code/CoSy/Furniture.f .

bdgwx

Reply to Willis Eschenbach

January 11, 2026 2:12 pm

Are you using netCDF or GRIB file formats? If so which packages in R are using to read these files?

Andy May

Author

Reply to bdgwx

January 11, 2026 5:24 pm

library(RNetCDF)
or
library(ncdf4)

RNetCDF usually works for most netCDF files. I don’t know about GRIB files.

For examples see the R code linked at the bottom of this post:
https://andymaypetrophysicist.com/2020/12/12/the-ocean-mixed-layer-sst-and-climate-change/

Andy May

Author

Reply to Willis Eschenbach

January 11, 2026 5:43 pm

Willis,
WRT 3D arrays, the usual way this is handled with data frames and data tables is to make a list of them. So, one might have a large number of data frames, they can all be the same size or not, and them compile them into a list. It is common to do this and there are many functions built to manipulate them.

Bellman

January 11, 2026 10:28 am

Thanks for this, finally something I can agree with. As a hobbiest, discovering the Tidyverse packages revolutionised how I used R. I definitly need to look into Terra now.

One small quibble – I’m pretty sure that dplyer has been around since well before 2021.

Andy May

Author

Reply to Bellman

January 11, 2026 11:03 am

plyr, the early version of dplyr, was released in 2011. dplyr was officially released in 2014, mainly for testing and debugging purposes. The 1.0.0 release, which is what I referred to in the post was not released until June 2020. Prior to the 1.0.0 release, it was not widely used and had problems. It still has a few problems but is very usable today, not so much in 2014-2019. As with all software, there are always bugs, but they become more manageable with time.

Andy May

Author

Reply to Bellman

January 11, 2026 11:08 am

I see I had a typo in there, 2021 versus 2020. Thanks for catching it, I fixed it.

Steve Richards

January 11, 2026 11:11 am

A good write up, thanks

Eng_Ian

January 11, 2026 11:45 am

round(mean(sqrt(c(1, 4, 9, 16)), na.rm = TRUE), 2)
# → 2.9

I must be missing something here. Isn’t the mean of the sqrt of 1, 4, 9 and 16 going to be 2.5?

eg (1+2+3+4)/4

Or should it be the sqrt of (1+4+9+16)/4 = 2.7

Where did 2.9 come from?

Andy May

Author

Reply to Eng_Ian

January 11, 2026 11:59 am

Argggh! Quite right, it is 2.5 for both. I just ran them through R, which I should have done in the first place to make sure. The example is from Grok, which makes a lot of mistakes. I usually check every bit of code that Grok writes, since there is usually a bug in it somewhere. But I skipped that critical step this time, my fault.

The advantage of using Grok to help with your programming is it often comes up with novel code that you would not have thought of, and it is better. The disadvantage is that Grok makes a lot of mistakes and you must check everything! Here are the runs through R:

> round(mean(sqrt(c(1, 4, 9, 16)), na.rm = TRUE), 2)
[1] 2.5
> c(1, 4, 9, 16) %>%
+  sqrt() %>%
+  mean(na.rm = TRUE) %>%
+  round(2)
[1] 2.5

Mr.

January 11, 2026 12:09 pm

So SuperCalc and Lotus 123 aren’t in vogue any more?

Bob Vislocky

January 11, 2026 12:13 pm

I use R all the time. Any idea how I can get Excel and R to “talk” with each other? For example, to make a prediction on new live data feeding into Excel it would be nice to call the predict function from R to make the forecast. For simple linear regression models I just copy the regression coefficients from R over to Excel to make predictions on the fly in Excel. But that’s more difficult to do with more complicated models created using GAM or LOESS. Any thoughts?

Andy May

Author

Reply to Bob Vislocky

January 11, 2026 12:50 pm

When I need to transfer anything, I use csv files for the transfer.

But I find myself using Excel less and less these days, preferring to stay in R. Lately, especially when using Grok to help me code, I find it easier to do everything in R. However, Grok does make a lot of mistakes, so you need to take time to check everything. The nice part about Grok is you don’t have to spend as much time trying to figure out how to code your idea, the time goes into debugging Grok’s code, although Grok helps with that also.

bdgwx

Reply to Bob Vislocky

January 11, 2026 2:07 pm

I use the readxl package. Here is a quick snippet for pulling values out of Excel.

library("readxl")
sheet <- read_excel("path-to-file", sheet = "name-of-sheet")
yvalues <- sheet[["XValues"]]
yvalues <- yvalues[!is.na(yvalues)]
xvalues <- sheet[["YValues"]]
xvalues <- xvalues[1:length(yvalues)]

fah

January 11, 2026 1:05 pm

Thanks for this discussion. I myself have been using MATLAB since before it was called MATLAB, when it was just a couple of matrix inversion packages for mainframe computing. Over the years, I have eschewed using R in lieu of the (in my view then) more powerful MATLAB statistical toolboxes, coupled with the associated other capabilities and toolboxes involving image analysis, optimization, PDEs, ML, symbolic math, large scale intrinsic vectorization, etc. etc. and I have been teaching students to use it in analysis of lab experimental data for some years. But I have been re-thinking that perspective as AI and LLMs have become ever more powerful owing to virtually unlimited availability of memory and computational power by virtue of the cloud.

The world of AI seems to be comprised of largely non-proprietary codes such as Python, C++, JavaScript, etc. and I presume R in its present form has or will find a place in that world. As a result I have been revisiting my familiarity with all of the open source codes and this convinces me to add R to the list. And continue rethinking inclusion of these codes in teaching practice. One nice thing I have found about AI is that if you have some code in MATLAB (or other platforms) it will convert that code to whatever open source context you want and debug issues for you along the way. I have not tried it, but I suspect it will also write code for you if you just ask it nicely. It is an interesting world we live in.

Andy May

Author

Reply to fah

January 11, 2026 1:31 pm

I agree with you, and I think you are making the right decision. I would not have said this before my most recent project, but R, when combined with Grok, is extremely powerful.

My practice now is to design the program I want in detail, step-by-step. I then write some preliminary R code myself, sometimes leaving some gaps where I’m not quite sure how to code what I want. Then I supply Grok with the preliminary code and my step-by-step conceptual design and ask it to review and critique what I’ve done and make recommendations for the missing bits. I think you can do this all at once, but often I do it piecemeal.

Grok is very detailed in its response, which is very helpful, Co-pilot and Chat-GPT are not as good at this. Once the preliminary program is written and tested and seems to work, I ask Grok to review it and comment, it often has further ideas for improvements. This continues to the end result. I’ve tried it a bit with Python, but R is better and produces smaller and more efficient code. I know very little about MatLab, but I’ve seen it produce some nice graphics.

Andy May

Author

Reply to fah

January 11, 2026 1:35 pm

I should add that when working with Grok on writing code, make sure you give it detailed and complex example input data. This should be obvious, but I have forgotten that before.

bdgwx

January 11, 2026 2:04 pm

This is a cool article Andy. I use R for data analysis as well. It can do so much. I wish I know it better than I do. It’s just that I already have to deal with so many other programming languages on a daily basis that I find my R skills to be subpar and I forget how to do even basic stuff occasionally because I don’t use it as frequently. I am working on a project in my professional life in which I’m trying to incorporate R so hopefully I can become more proficient.

Anyway, take a look at the vcovHAC function. This is how I compute the uncertainty of the linear regression trends I post here.

Andy May

Author

Reply to bdgwx

January 11, 2026 5:30 pm

Thanks, looks interesting.

Bob Armstrong

January 11, 2026 4:45 pm

Learning APL in the mid 70s to understand the multidimensional geometry and its algebra underlying pattern recognition and associative memory thru being able to interactively execute the succinct expression of the math .

R , I believe , openly borrows some concepts from APL .

CoSy is my notekeeping environment evolved since then in open FORTH . It’s rules are Simple . It’s a true human level language in which one thinks in terms
of defining words rather than writing programs . CoSy is human level also in being a rich vocabulary . The currently ~ 2300 word vocabulary is open from parsing and array operators to the ( currently x86/7 ) chip .

The example under Pipes above would be expressed in current development CoSy as

f( 1 4 9 19 )f sqrtf avgf
which returns 2.5897… on the data stack for the next word down the line .
I’d leave the rounding until outputting .

To make a word out of that operation would simply be :

: sqrtavg sqrtf avgf ;

Then I can write
10 _iotaf sqrtavg | resulting |>| 1.93060
for the numbers 0 thru 9 .

This allows the definition of the Planck thermal power function in a line :

| ( WaveLength Temperature — EnergyDensity )

: Planckl *f [ h 2. _f *f c ^2f *f ]+ $ [ h c boltz %f *f ]+ $ %f expf f1. -f %f ;
| Atomic : ie: applies to entire lists of lists of WLs & Ts .

I’ll be adding this verb to https://cosy.com/4thCoSy/Code/Physics/general.f .

CoSy is open and freely downloadable at CoSy.com/4thCoSy/ . Contact me if you’d like to schedule a Zoom , or have a specific domain you’d like to tame .

Bob Armstrong

Reply to Bob Armstrong

January 11, 2026 5:06 pm

Whoops .
f( 1 4 9 16 )f sqrtavg |>| 2.50000

Bob Armstrong

Reply to Bob Armstrong

January 11, 2026 5:27 pm

And , for the heck of it , here’s the result for the squares of 1 thru 100000 .

100000 _iotaf f 1 +f f 2 ^f sqrtavg |>| 50000.5000

wpDiscuz

Welcome to Watts Up With That, one of the most well-known climate blogs! We gather the latest scientific research, news, and expert opinion to help you understand how our planet is changing and what implications it may have for humanity. Our approach is based on facts, objective analysis, and open discussions about one of the most critical issues of our time. Watts up with that climate and what changes await us – let’s figure it out together!

Watts Up With That covers a wide range of topics related to climate change and its impact on the world. Here’s what’s important to us:

Global warming – its causes, consequences, and future forecasts.
Analysis of current climate research and its findings.
Climate change news.
Extreme weather events – hurricanes, droughts, floods, and their connection to climate change.
The impact of different energy sources on the environment and the development of sustainable technologies.
Political and economic aspects and how states and international organizations respond to climate change.

Reading and writing data

dplyr: data manipulation

Pipes (%>%) for making code more readable

tidyverse: integrating data manipulation and plotting

ggplot2: the plotting workhorse

Terra: Mapping

Tidyterra: integrating data manipulation and mapping

Summary

Works Cited

Share this:

Like this:

Discover more from Watts Up With That?