R Programming tips to process IGRA2 Radiosonde data

By Andy May

In the last two posts (here and here) I showed how to read, plot, and map IGRA2 data. In this post I will discuss how to efficiently process large R data frames and lists to compute useful variables.

One of the more difficult things to do in R is to write readable code. This problem has been around for a long time, but in recent years a very useful tool has appeared, the “pipe” or %>%.

The %>% operator originates from the magrittr R package developed by Stefan Milton Bache in 2013/2014. Originally, he named it plumbr but later changed the name to magrittr. The name is a playful reference to Belgian surrealist painter René Magritte and his famous 1929 painting The Treachery of Images (French: La Trahison des images), which depicts a pipe with the caption Ceci n’est pas une pipe (“This is not a pipe”). The magrittr package’s logo cleverly adapts this idea to represent the pipe operator.

Figure 1. Magritte’s painting and the logo for the R magritti package. Magritte’s Image is public domain in the U.S. (see here). The tidyverse magrittr image is also public domain.

Basic Usage

Bache’s pipe forwards the result of the left-hand side into the right-hand side, typically as the first argument of a function. This concept is introduced in the first post in this series.

x %>% f() # Equivalent to f(x)

x %>% f(y) # Equivalent to f(x, y)

This enables chaining multiple operations in a readable, sequential manner.

The pipe’s primary value lies in improving code readability and maintainability, especially in data analysis workflows. It eliminates the need for:

  • Nested function calls (hard to parse).
  • Intermediate temporary variables (which clutter the environment).
  • Repeated references to the same object.

This is particularly powerful when combined with tidyverse packages like dplyr, where functions (e.g., filter(), mutate(), summarise(), group_by()) are designed to take a data frame as the first argument and return a modified data frame.

Tidyverse/magrittr Example

All these advantages of the pipe are illustrated with the following example from the R code described in (May, 2025) and available for download in (May, 2025b).

In the tidyverse code below we create a data.frame, ‘plot_values”, from an operation on the existing data.frame ‘maps.’

 # Process the data: group by year from 1993-2025, compute averages (map_values)
   plot_values <- maps %>%

Below the dplyr ‘filter’ function, created by Hadley Wickham and colleagues removes all data from before 1994 and after 2025.

 filter(cur_year >= 1994 & cur_year <= 2025) %>%    ### 1991-1993 data is crap

Below the data are grouped by year, so that ‘summarise’ can compute averages for each year.

group_by(cur_year) %>%

For each group, ‘summarise’ computes the mean of every variable from n_obs to q_gkg (specific humidity) and ignores null values (na.rm=TRUE). ‘.groups=”drop”’ means drop all grouping variables and structure so the result is a regular tibble. Summarise was developed by Hadley Wickham as part of his split-apply-combine strategy of data processing and preparation. Although developed and first released in 2014, it had a major upgrade in 2020 when the ‘.groups’ statement and the ability to handle multiple rows were added.

summarise(
       across(n_obs:q_gkg_upper, ~mean(., na.rm = TRUE)),
       .groups = "drop"
      ) %>%
filter(n_obs > 0) %>%

For each group ‘mutate’ computes and adds the vector average wind speed and direction for the upper, middle, and lower troposphere. Mutate was also created by Hadley Wickham and his team and first released in 2014 but had a very major upgrade in 2020 when ‘across’, ‘.keep’, ‘.before’, and ‘.after’ were introduced. In 2022 ‘.by’ was added and many bugs were fixed that were introduced in the major 2020 release.

mutate(
# Vector average speed
avg_wspd_upper = sqrt(mean_u_upper^2 + mean_v_upper^2),
avg_wspd_middle = sqrt(mean_u_middle^2 + mean_v_middle^2),
avg_wspd_lower = sqrt(mean_u_lower^2 + mean_v_lower^2),

# Average direction FROM (meteorological convention: 0°=from north)
avg_dir_upper = (180 + 180/pi * atan2(mean_u_upper, mean_v_upper) + 360) %% 360,
avg_dir_middle = (180 + 180/pi * atan2(mean_u_middle, mean_v_middle)+360) %% 360,
avg_dir_lower = (180 + 180 / pi * atan2(mean_u_lower, mean_v_lower) + 360) %% 360
) %>%

The following ‘select’ function removes (note ‘-‘ sign) variables that are no longer needed.

select(-spd_lower, -dir_lower, -spd_middle, -dir_middle, -spd_upper, 
       -dir_upper)  # Clean up

 Note on Native Pipe (|> )

Since R 4.1 (2021), base R includes a native pipe |>, which behaves similarly for basic use to the %>% pipe. But it is built-in (no package needed) and slightly faster. However, %>% remains widely used in 2026, especially in tidyverse code, due to its extra features and backward compatibility. Many prefer %>% for complex pipelines, while |> suffices for simpler ones.

In summary, the %>% operator revolutionized R coding by making chained operations intuitive and clean, significantly boosting productivity in data wrangling and analysis. It’s a cornerstone of modern R style.

The next post in this series explains how I selected the average ITCZ (Intertropical Convergence Zone) latitudes for each month.

Works Cited

May, A. (2025). The Molar Density Tropopause Proxy and its relation to the ITCZ and Hadley Circulation. OSF. https://doi.org/10.17605/OSF.IO/KBP9S

May, A. (2025b, November 28). Supplementary Materials: The Molar Density Tropopause Proxy and Its Relation to the ITCZ and Hadley Circulation. https://doi.org/10.5281/zenodo.17752293

Published by Andy May

Petrophysicist, details available here: https://andymaypetrophysicist.com/about/

Leave a Reply

Discover more from Andy May Petrophysicist

Subscribe now to keep reading and get access to the full archive.

Continue reading