The R Data Scientist logo

The R Data Scientist

Archives
Subscribe
January 6, 2026

The R Data Scientist 06-01-2026

2025 wraps, data infrastructures, mapping trends, fitness data

🌍 Open Science & Community

2025, With Notes (yabellini​.netlify​.app). Reflections on 2025 across travel, speaking, teaching, writing, and community building in open science with rOpenSci, The Carpentries, R-Ladies, and LatinR

on the future of ISBA meetings (and others) (xianblog​.wordpress​.com). ISBA future conferences with mirrors, multi-hubs, sustainability, inclusion, survey, Venezia 2024, and researchers unable to travel

Q&A with ASA Founder Mark Glickman (magazine​.amstat​.org). Q&A with ASA founder Mark Glickman discusses statistics, teaching, and open-source tools used in data analysis and measurement

An Open Letter to the BMJ Editorial Board (deevybee​.blogspot​.com). Open letter urging BMJ to retract Attar et al PREVENT-TAHA8 due to data concerns and alleged fabrication

🔄 Interop & Data Infra

R + Python: From polyglot to crosspolination (emilyriederer​.com). Diversity in open source, cross-pollination of R and Python tooling, and sessions from posit::conf(2025) with Rich Iannone, Michael Chow, and others

Drop #749 (2025-12-31): 2025 — Dropped (dailydrop​.hrbrmstr​.dev). Year-end retrospective on 2025: DuckDB, CLI tools, fonts, RSS, and data wrangling in a digitally hoarded year

pytest-r-snapshot: Verifying Python code against R outputs at scale (nanx​.me). pytest-r-snapshot enables Python test snapshots against R outputs, recording with R and replaying in CI across environments

🧰 R Tooling & Publishing

Paired Ends Wrapped: Top 10 Posts From 2025 (blog​.stephenturner​.us). Top 10 2025 posts on R, AI, RAG with Zotero, Quarto books, and Positron tooling by Stephen Turner

Testing the R-universe build workflow from your own GitHub repository (ropensci​.org). Discusses testing the R-universe build workflow from a GitHub repo using a reusable CI workflow to mirror R-universe on Linux, Windows, and macOS

Weekly Recap (January 2, 2026) (blog​.stephenturner​.us). NSF reorg, Genomics in 2026, Claude Code course, AI labor shift, uv speed, and 2025 LLM recap across R+Python and R Data Scientist

Multiplet Function Now Handles I > 1/2 (chemospec​.org). Bryan Hanson updates the Multiplet function in R to support nuclei with spin greater than 1/2 using SpecHelpers

ME:: tl;dr-ing: How to make a static website on Codeberg using R and Quarto ; Ken Butler:: Quarto websites and Codeberg pages (part 1) (rolandtanglao​.com). How to publish a static Quarto site on Codeberg Pages using R and Quarto, with guidance for beginners

🎨 Data Viz Craft

Learning data viz from the best: New America and Datawrapper (danielroelfs​.com). Learning data viz with New America and Datawrapper using ggplot2 and tidyverse in R

60% started Geralt’s journey. Only 22% finished it. (stevenponce​.netlify​.app). A data-visualization post by Steven Ponce using R, ggplot2, and custom themes to display Witcher 3 progression via Steam achievements

Interesting thoughts about aesthetics aes() in ggplot2 (joshuamarie​.com). Explores aes() in ggplot2, implementation challenges, and Python’s plotnine comparison from an R-centric perspective

Data Strips: Quintiles vs. Box Plots (rawdatastudies​.com). Quintile area strip plots and related views compare with box plots to reveal skewness and data density in large biological datasets

What is the most “middle” name? (erdavis​.com). Analyzes most common middle names using voter data, cleaning with name grouping and R-based processing

🗺️ Mapping & Place Data

Interactive Dashboard for Mapping Police Violence Data (ianadamsresearch​.com). An interactive Shiny dashboard in R for Mapping Police Violence data, exploring temporal, demographic, and geographic patterns

Analyzing Police Violence in America: Updated Data Through 2025 (ianadamsresearch​.com). Updated MPV data through 2025 analyzed with R, showing demographic, temporal, and geographic patterns in police violence

Visualizing the Los Angeles Microclimate (conormclaughlin​.net). A data-driven look at LA microclimates using Open-Meteo hourly data and R visualizations

New Caledonia's nickel exports (freerangestats​.info). Tracking New Caledonia nickel exports (ore and metal) 2008–2025 with R, dplyr, readxl, and FRED nickel prices

🏃 Fitness Data in R

Running Around: an R package to analyse Garmin running data (quantixed​.org). R package GarminCSVr analyzes Garmin activity data with R, offering annual summaries and year-over-year comparisons

Mapping runkeeper data (blog​.djnavarro​.net). Data science blog post using R to parse GPX from Runkeeper, mapmaking with leaflet and Stadia Maps

Running Around: 2025 running dataviz in R (quantixed​.org). Stephen Royle uses R to visualise and recap 2025 running data from Garmin, exploring distances, training load and marathons

📐 Stats & Modeling Notes

Testing Super Learner's Coverage - A Note To Myself (kenkoonwong​.com). Explores SuperLearner with TMLE in R, comparing XGBoost, Random Forest, GLM, NNLS, and parallel computation

Forecasting benchmark: Dynrmf (a new serious competitor in town) vs Theta Method on M-Competitions and Tourism competitition (thierrymoudiki​.github​.io). A benchmarking study comparing Dynrmf and Theta Method on M3, M1, and Tourism datasets using R, parallel processing, and standard accuracy metrics

Why does a least squares fit appear to have a bias when applied to simple data? (stats​.stackexchange​.com). Explains why ordinary least squares can appear biased on bivariate data and contrasts with TLS, PCA, and orthogonal regression

📚 Academic Research

friends.test: rank-based method for feature selection in interaction matrices (arxiv:q-bio). friends.test detects specific interactions in large heterogeneous matrices using rank profiles and mixture breakpoints. Fast O(nk log n) R implementation aids omics feature selection analysis

Benchmarking Preprocessing and Integration Methods in Single-Cell Genomics (arxiv:q-bio). Benchmark compares normalization, integration, and dimensionality-reduction pipelines for multimodal single-cell data. Highlights Seurat/Harmony plus UMAP tradeoffs, guiding reproducible R workflows and visualization at scale, quickly

Matrix Decomposition-Based Approach to Estimate the STARTS Model (arxiv:stat). New two-stage eigenvalue-based estimation for STARTS structural equation models reduces improper solutions. Useful for longitudinal SEM in R, guiding sensitivity analyses without Bayesian priors specification

Robust reduced rank regression under heavy-tailed noise and missing data via non-convex penalization (arxiv:stat). Robust reduced-rank regression with Huber loss and SCAD/MCP spectral penalties handles outliers and missing responses. Includes rrpackrobust R package for high-dimensional multivariate prediction better accuracy

Don't miss what's next. Subscribe to The R Data Scientist:

Add a comment:

Share this email:
Share on LinkedIn Share on Hacker News Share on Mastodon Share on Bluesky
Bluesky
https://mastodo...
LinkedIn
Powered by Buttondown, the easiest way to start and grow your newsletter.