The R Data Scientist 30-12-2025
đź§µ R Ecosystem Radar
Weekly Recap (December 24, 2025) (blog​.stephenturner​.us). R updates, AI in wet lab biology, Docker hardened images, LLMs in review, funding debates, and new papers from Stephen Turner and peers
Drop #746 (2025-12-26): Boxing Day Grab Bag (dailydrop​.hrbrmstr​.dev). mq, UBLOCKAI, and Friendly SQL showcased for Markdown processing, AI content blocking, and DuckDB data tricks
đź§° R Package Engineering
revdeprun 2.1.0: hunting bottlenecks and a new speedrun record (nanx​.me). Revdeprun 2.1.0 speeds up reverse dependency checks with an optimized install scheduler, parallel tarball downloads, and streamlined prep for data.table on a 256-core machine using Rust and R
R Package Development Advent Calendar 2025: A Complete Journey (drmowinckels​.io). Modern R package development walkthrough using usethis, devtools, GitHub Actions, pkgdown, and testthat for CI/CD and CRAN workflows
Creating an R package with C++ and Armadillo code (video) (pacha​.dev). Create an R package with C++ and Armadillo using armadillo4r; setup RTools, templates, and workflow for Windows users
đź§Ş Hands-on R Analysis
Railway population (r​.iresmi​.net). Using R with sf, osmdata, and ggplot2 to map railway corridors and population exposure in France
Implementation of DBSCAN Clustering in R (jmsallan​.netlify​.app). DBSCAN clustering in R with dbscan and tidyverse, exploring core/border/noise points on irregular shapes
sfReapportion (f​.briatte​.org). sfReapportion enables areal-weighted interpolation for sf and sp objects in R, porting spReapportion to sf and enabling reproducible French census data mapping
Understanding Data Import and Export in R: Working with CSV and Excel Files (mfatihtuzen​.netlify​.app). R tutorial on importing and exporting data with CSV and Excel in R, using tips dataset, read.table, read.csv, openxlsx, and related workflows
🎲 Stats & Inference
A problem with correlations: rare traits usually have small correlations to other things, just by virtue of being rare (spencergreenberg​.com). Rare traits skew correlations; introduces Generalized Cohen’s d and practical guidance for binary and non-binary variables
The Raven Paradox (allendowney​.com). Bayesian analysis of the Raven Paradox with scenarios, priors, and sampling ambiguity in Python concepts
Two Ways to See Abadie-Imbens Bias Correction (And Why It Might Matter) (causalinf​.substack​.com). Scott Cunningham explains Abadie-Imbens bias correction for nearest-neighbor matching, showing imputation and augmentation are equivalent
📚 Academic Research
Learning from Neighbors with PHIBP: Predicting Infectious Disease Dynamics in Data-Sparse Environments (arxiv:stat). Details Poisson Hierarchical Indian Buffet Process for sparse count prediction, borrowing strength across regions to forecast outbreaks with coherent uncertainty; relevant for Bayesian R pipelines
Ranked Set Sampling in Survival Analysis (arxiv:stat). Extends Kaplan–Meier and Nelson–Aalen estimators to ranked set sampling with censoring, derives asymptotics and variance estimators; promises efficiency gains and R package for survival analysis
Add a comment: