The R Data Scientist logo

The R Data Scientist

Subscribe
Archives
September 30, 2025

Data Scientist (with R)

R community, package updates, quarto, visualisation

🫶 R community, conferences & roundups

Ten year anniversary of Free Range Statistics by @ellis2013nz (freerangestats​.info). Ten years of Free Range Statistics blogging: 225 posts, 350,000 words, analytics in R with loess smoothing and markup removal

Does Traveling to a Professional Conference Make Sense if You’re Retired? (NextChapter​.machlis​.com). Retired data scientist attends in-person data science conference in Atlanta to reconnect with R community, gain focus, and stress learning over career growth

R Weekly 2025-W40 Ducklake, Slidecrafting, Shiny & LLMs (rweekly​.org). Ducklake, Slidecrafting with reveal.js and Quarto; Shiny & LLMs; testing with testthat; new and updated R packages and CRANberries

rOpenSci News Digest, September 2025 (ropensci​.org). Monthly digest highlighting rOpenSci activities, training, software reviews, new packages, and community updates

2025-09-26 AI Newsletter (posit​.co). Posit AI newsletter covers Anthropic Claude reliability, Codex updates, GPT-5-Codex, agent definitions, and Posit news and partner programs

Issue 2025-W39 Highlights (serve​.podhome​.fm). R Weekly Highlights episode 211 covers posit::conf(2025), Reading & Writing Markdown with programming, Vibe-coding a new package to learn Japanese, and quality control exams.

Software for Phylogenetic Trees: SSEC co-organizes Workshop in London (escience​.washington​.edu). SSEC co-organizes a London workshop on Phylo2Vec with Rust core rewrite and Python/R APIs across Imperial College London and University of Copenhagen

SSEC collaborates on Workshop at the Barcelona Supercomputing Center (escience​.washington​.edu). SSEC aids BSC workshop on ecological forecasting and Docker-based R package development for biodiversity projections

📦 R packages: releases, testing & maintenance

shinystate 0.1.0 is available on CRAN! (shinydevseries​.com). shinystate 0.1.0 enables server-side bookmarking with StorageClass, snapshot/restore, and multiple session management for Shiny apps

Piwik Pro doesn't offer a free plan anymore (rstats-tips​.net). Piwik Pro free plan discontinued, pricing at least 420€ yearly; impact on piwikproR CRAN package maintenance for R developers

Testing with {testthat} (jumpingrivers​.com). Overview of testthat features for R package testing, including expectations, structure, and visual tests with doppelganger

ggsci 4.0.0: 400+ new color palettes (nanx​.me). ggsci 4.0.0 adds 400+ discrete color palettes from Primer, Atlassian, and iterm2-color-schemes for ggplot2 and plotnine

Dependencies and reverse dependencies: Python vs. R (spatialists​.ch). Reverse dependency checks in CRAN shape empathetic maintenance, easing migrations for researchers and data scientists

R-multiverse: a new way to publish R packages (ropensci​.org). R-multiverse creates a dual repository for R packages, enabling central installation, automated quarterly snapshots, and GitHub/GitLab-based releases

New version of phytools on CRAN (blog​.phytools​.org). New phytools 2.5-2 on CRAN adds models, updates, and new simulation and plotting tools for comparative biology

📝 Quarto, R Markdown & APA-style reporting

Moving to quarto and LaTeX… (blog​.ouseful​.info). Moving to Quarto and LaTeX: from Jupyter Book, PDFs, and EPUBs to print-on-demand booklets with LaTeX templates and build automation

Weekly recap (Sep 26, 2025) (blog​.stephenturner​.us). Apple enters protein folding with SimpleFold 3B parameters; parses R Markdown and Quarto, vibe coding an R package; AI in biosecurity and writing; de-extinction conservation applications

Recreating APA Manual Table 7.16 in R with apa7 (wjschne​.github​.io). recreating APA Table 7.16 in R using apa7, flextable, tidiverse, and lavaan to simulate and format path-model results

Recreating APA Manual Table 7.19 in R with apa7 (wjschne​.github​.io). Recreating APA Table 7.19 in R using apa7, flextable, ftExtra, tidyverse, and easystats

📈 ggplot2 visualizations, charts & mapping in R

Analyzing ICE Arrest Data - Part 2 (jefworks-lab​.github​.io). Analyzes ICE arrest data from deportationdata.org with R (tidyverse, dplyr, ggplot2, gganimate) to compare criminality categories and apprehension methods over time

Tuesdays and Travels (seanlunsford​.com). Reflection on globalization, data visualization with Tidy Tuesday, and visa policies through a Sankey chart highlighting US vs global visa access

Exploring ggbot2: Creating a volcano plot with your voice (tomsing1​.github​.io). Voice-controlled volcano plot creation using ggbot2, ggplot2, ggrastr, ggrepel, and Shiny within R; with Mattila et al. data and LLM-driven code generation

Vizualizing global testosterone levels by country (mihiretukebede​.com). Scrapes testosterone-by-country data from WorldPopulationReview and builds a Python/R-style choropleth map using tidyverse, rvest, sf, and viridis

Still presenting regression results in tables? why not forest plots? (mihiretukebede​.com). Reproduces an elegant JAMA forest plot in R with meta package, comparing regression results to tables

UK Household Spending Inequality by Income Level in FYE 2024 (stevenponce​.netlify​.app). UK housing costs drive largest spending gap across quintiles using ONS Family Spending data in an R viz with tidyverse, ggridges, patchwork, and custom themes

Spurious Correlations in R - Correlation is not Causation (pacha​.dev). Spurious correlations in R with spuriouscorrelations: plotting, lm modeling, and double y-axis visuals highlighting correlation vs causation

🔬 Applied analyses & workflows with R (health, bio & maps)

NHANES Activity using MIMS (Monitor-Independent Movement Summary) (hopstat​.wordpress​.com). NHANES MIMS analysis with MIMSunit in R, comparing default vs custom MIMS for 80Hz NHANES 2012 data

Learning And Exploring The Workflow of RNA-Seq Analysis - A Note To Myself (kenkoonwong​.com). RNA-Seq workflow in C. difficile: fastp, kallisto, DESeq2; QC, transcriptome reference, Tximport, PCA, and differential expression

Lake Hornborgasjön cranes: seasonal peaks and long-term growth (stevenponce​.netlify​.app). Spring migration peaks and long-term growth in crane counts shown with tidyverse, ggplot2, and TidyTuesday data

Mapping Bike Rides (Part III) (rasterweb​.net). GPX files on a map via gpx.studio, Mapbox tiles, and other free/open tools for mapping bike rides

🧮 Statistical inference, simulations & Bayesian thinking

Type S and M errors as a “rhetorical tool” (daniellakens​.blogspot​.com). Gelman and Carlin's Type S/M errors discussed as rhetorical tools vs. practical methods; author critiques their use in study design and interpretation

A warning about data-driven simulations (garstats​.wordpress​.com). Cautions on data-driven simulations: sampling distributions, population vs. sample, and power bias in RT lexical decision data

A Chess Scandal Revisited – Why Nakamura is Right About Cherry-Picking (bayesianspectacles​.org). Bayesian analysis debates cherry-picking in Nakamura-Kramnik chess controversy; discusses likelihood principle, optional stopping, change-point models, and data selection

Some notes on survey weights (blog​.djnavarro​.net). Survey weights in NHANES: correcting for stratified sampling when modeling height with GAMLSS in R

scalable Monte Carlo for Bayesian learning book review (xianblog​.wordpress​.com). Review of scalable Monte Carlo for Bayesian learning, covering stochastic gradient MCMC, non-reversible MCMC, continuous-time MCMC, and convergence diagnostics

Tomorrow's Causal I Workshop at Mixtape Sessions, A Fight I Saw at the Patriots-Steelers Game, and Thoughts About My Pedagogy in My Gov 50 Class at Harvard (causalinf​.substack​.com). Harvard Gov 50 causal inference pedagogy, classroom tools like Cosmos and ChatGPT, and a vivid Patriots-Steelers game anecdote

Why MissForest Fails in Prediction Tasks: A Key Limitation You Need to Keep in Mind (towardsdatascience​.com). MissForest's standard imputation lacks stored models for predictions; MissForestPredict preserves imputation parameters for train/test, MAR/MCAR/MNAR handling, and out-of-time validation

📚 Academic Research

An Interpretable Single-Index Mixed-Effects Model for Non-Gaussian National Survey Data (arxiv:stat). Interpretable single-index mixed-effects model for non-Gaussian survey data with skewed random effects, heavy-tailed residuals, monotone single index, grouped horseshoe, and survey weights in periodontal CAL/PD analysis using MSIMST

Measuring Partial Exchangeability with Reproducing Kernel Hilbert Spaces (arxiv:stat). Measuring partial exchangeability in Bayesian multilevel models via reproducing kernel Hilbert spaces for a priori and posterior dependence

Detecting gene-environment interactions to guide personalized intervention: boosting distributional regression for polygenic scores (arxiv:stat). Cyclical gradient boosting for Gaussian location-scale models to derive sparse polygenic scores for mean and variance, revealing GxE interactions with statins and lifestyle

Improving Disease Risk Estimation in Small Areas by Accounting for Spatiotemporal Local Discontinuities (arxiv:stat). Greedy scan clustering integrated into Bayesian spatiotemporal modelling improves cancer mortality risk estimation in Spanish municipalities

hi

Don't miss what's next. Subscribe to The R Data Scientist:
Start the conversation:
Bluesky Mastodon LinkedIn
Powered by Buttondown, the easiest way to start and grow your newsletter.