The R Data Scientist 06-01-2026
2025 wraps, data infrastructures, mapping trends, fitness data
🌍 Open Science & Community
2025, With Notes (yabellini.netlify.app). Reflections on 2025 across travel, speaking, teaching, writing, and community building in open science with rOpenSci, The Carpentries, R-Ladies, and LatinR
on the future of ISBA meetings (and others) (xianblog.wordpress.com). ISBA future conferences with mirrors, multi-hubs, sustainability, inclusion, survey, Venezia 2024, and researchers unable to travel
Q&A with ASA Founder Mark Glickman (magazine.amstat.org). Q&A with ASA founder Mark Glickman discusses statistics, teaching, and open-source tools used in data analysis and measurement
An Open Letter to the BMJ Editorial Board (deevybee.blogspot.com). Open letter urging BMJ to retract Attar et al PREVENT-TAHA8 due to data concerns and alleged fabrication
🔄 Interop & Data Infra
R + Python: From polyglot to crosspolination (emilyriederer.com). Diversity in open source, cross-pollination of R and Python tooling, and sessions from posit::conf(2025) with Rich Iannone, Michael Chow, and others
Drop #749 (2025-12-31): 2025 — Dropped (dailydrop.hrbrmstr.dev). Year-end retrospective on 2025: DuckDB, CLI tools, fonts, RSS, and data wrangling in a digitally hoarded year
pytest-r-snapshot: Verifying Python code against R outputs at scale (nanx.me). pytest-r-snapshot enables Python test snapshots against R outputs, recording with R and replaying in CI across environments
🧰 R Tooling & Publishing
Paired Ends Wrapped: Top 10 Posts From 2025 (blog.stephenturner.us). Top 10 2025 posts on R, AI, RAG with Zotero, Quarto books, and Positron tooling by Stephen Turner
Testing the R-universe build workflow from your own GitHub repository (ropensci.org). Discusses testing the R-universe build workflow from a GitHub repo using a reusable CI workflow to mirror R-universe on Linux, Windows, and macOS
Weekly Recap (January 2, 2026) (blog.stephenturner.us). NSF reorg, Genomics in 2026, Claude Code course, AI labor shift, uv speed, and 2025 LLM recap across R+Python and R Data Scientist
Multiplet Function Now Handles I > 1/2 (chemospec.org). Bryan Hanson updates the Multiplet function in R to support nuclei with spin greater than 1/2 using SpecHelpers
ME:: tl;dr-ing: How to make a static website on Codeberg using R and Quarto ; Ken Butler:: Quarto websites and Codeberg pages (part 1) (rolandtanglao.com). How to publish a static Quarto site on Codeberg Pages using R and Quarto, with guidance for beginners
🎨 Data Viz Craft
Learning data viz from the best: New America and Datawrapper (danielroelfs.com). Learning data viz with New America and Datawrapper using ggplot2 and tidyverse in R
60% started Geralt’s journey. Only 22% finished it. (stevenponce.netlify.app). A data-visualization post by Steven Ponce using R, ggplot2, and custom themes to display Witcher 3 progression via Steam achievements
Interesting thoughts about aesthetics aes() in ggplot2 (joshuamarie.com). Explores aes() in ggplot2, implementation challenges, and Python’s plotnine comparison from an R-centric perspective
Data Strips: Quintiles vs. Box Plots (rawdatastudies.com). Quintile area strip plots and related views compare with box plots to reveal skewness and data density in large biological datasets
What is the most “middle” name? (erdavis.com). Analyzes most common middle names using voter data, cleaning with name grouping and R-based processing
🗺️ Mapping & Place Data
Interactive Dashboard for Mapping Police Violence Data (ianadamsresearch.com). An interactive Shiny dashboard in R for Mapping Police Violence data, exploring temporal, demographic, and geographic patterns
Analyzing Police Violence in America: Updated Data Through 2025 (ianadamsresearch.com). Updated MPV data through 2025 analyzed with R, showing demographic, temporal, and geographic patterns in police violence
Visualizing the Los Angeles Microclimate (conormclaughlin.net). A data-driven look at LA microclimates using Open-Meteo hourly data and R visualizations
New Caledonia's nickel exports (freerangestats.info). Tracking New Caledonia nickel exports (ore and metal) 2008–2025 with R, dplyr, readxl, and FRED nickel prices
🏃 Fitness Data in R
Running Around: an R package to analyse Garmin running data (quantixed.org). R package GarminCSVr analyzes Garmin activity data with R, offering annual summaries and year-over-year comparisons
Mapping runkeeper data (blog.djnavarro.net). Data science blog post using R to parse GPX from Runkeeper, mapmaking with leaflet and Stadia Maps
Running Around: 2025 running dataviz in R (quantixed.org). Stephen Royle uses R to visualise and recap 2025 running data from Garmin, exploring distances, training load and marathons
📐 Stats & Modeling Notes
Testing Super Learner's Coverage - A Note To Myself (kenkoonwong.com). Explores SuperLearner with TMLE in R, comparing XGBoost, Random Forest, GLM, NNLS, and parallel computation
Forecasting benchmark: Dynrmf (a new serious competitor in town) vs Theta Method on M-Competitions and Tourism competitition (thierrymoudiki.github.io). A benchmarking study comparing Dynrmf and Theta Method on M3, M1, and Tourism datasets using R, parallel processing, and standard accuracy metrics
Why does a least squares fit appear to have a bias when applied to simple data? (stats.stackexchange.com). Explains why ordinary least squares can appear biased on bivariate data and contrasts with TLS, PCA, and orthogonal regression
📚 Academic Research
friends.test: rank-based method for feature selection in interaction matrices (arxiv:q-bio). friends.test detects specific interactions in large heterogeneous matrices using rank profiles and mixture breakpoints. Fast O(nk log n) R implementation aids omics feature selection analysis
Benchmarking Preprocessing and Integration Methods in Single-Cell Genomics (arxiv:q-bio). Benchmark compares normalization, integration, and dimensionality-reduction pipelines for multimodal single-cell data. Highlights Seurat/Harmony plus UMAP tradeoffs, guiding reproducible R workflows and visualization at scale, quickly
Matrix Decomposition-Based Approach to Estimate the STARTS Model (arxiv:stat). New two-stage eigenvalue-based estimation for STARTS structural equation models reduces improper solutions. Useful for longitudinal SEM in R, guiding sensitivity analyses without Bayesian priors specification
Robust reduced rank regression under heavy-tailed noise and missing data via non-convex penalization (arxiv:stat). Robust reduced-rank regression with Huber loss and SCAD/MCP spectral penalties handles outliers and missing responses. Includes rrpackrobust R package for high-dimensional multivariate prediction better accuracy
Add a comment: