The R Data Scientist 23-12-2025
festive ML, next-gen R contributors, R's return to top ten languages
🌐 R Community Pulse
R Weekly 2025-W52 Festive ML, next gen R contributors, The Economist style ggplot2 (rweekly.org). Festive ML, next gen R contributors, The Economist style ggplot2 in R
Weekly Recap (December 19, 2025) (blog.stephenturner.us). Weekly recap covering R updates, local LLMs, AI in peer review, Biothreat Benchmark Generation, code quality tools, and insights from notable AI researchers and practitioners
rOpenSci News Digest, December 2025 (ropensci.org). rOpenSci shares updates on LatinR, uRos talks, podcast features, coworking, new packages, and peer review activity in December 2025
2025 in Review: Growth, Community, & Momentum (r-consortium.org). Global R community growth in 2025: RUGs, R-Ladies+, R/Medicine, R+AI, governance, and industry/regulatory momentum with ISC funding
Empowering Government Professionals in Nepal Using R programming for Forestry Data Analysis (r-consortium.org). Nepalese government forestry professionals learn data wrangling, visualization, statistics, and geospatial mapping in R through a seven-day training program
2025 Year in Review (rfortherestofus.com). Reflection on 2025: course updates, Clarity Data Studio spin-off, charitable giving, and upcoming AI-enhanced teaching and reporting projects
RSMF: Enabling the Next Generation of Contributors to R (blog.r-project.org). Mentoring a cohort of expert contributors and improving governance, communication, and sustainability for R via RSMF funding and community initiatives
R climbs back up into the top ten programming languages (flowingdata.com). R re-enters top ten of TIOBE, signaling growing interest in statistics and data visualization tools like R and Mathematica
🔓 Open Science & Publishing
Co-Creating the Future of Research Assessment: Highlights from DORA’s RFO Guide Workshop (sfdora.org). DORA hosts a full-day co-creation workshop in Copenhagen to refine the RFO Guide with GRC RRA WG and Science Europe
Weekly digest: AI literacy, open publishing models and equity in OA (openpharma.blog). Practical copyright guidance, AI literacy, transparency in global health research, equity in OA, and a shift to Publish, Review, Curate models
Code Hosting Options Beyond GitHub (ropensci.org). Explores mirroring GitHub repos to Codeberg and GitLab, managing multiple remotes, and keeping primary code locations while reducing platform dependence
🧰 R Performance & Setup
Finally figured out a way to port python packages to R using uv and reticulate: example with nnetsauce (thierrymoudiki.github.io). Using uv and reticulate to port Python nnetsauce into R with examples and benchmarks
R Code Optimization III: Hardware Utilization and Performance (blasbenito.com). Explores vectorization, parallelization, and memory management in R, leveraging SIMD, BLAS/LAPACK, and tools like data.table, Arrow, and DuckDB
R Code Optimization IV: Practical Tools and Workflow (blasbenito.com). Practical profiling with profvis, benchmarking with microbenchmark and bench, and a structured optimization workflow in R
Installing R Packages in Your Own Directories (nas.nasa.gov). Installing R packages in a user directory on NAS x86_64 with R_LIBS and sample steps for xts and zoo packages
A Quack-Packed Fall (motherduck.com). MotherDuck showcases sessions from Big Data London, Small Data SF, and AI-focused events, highlighting DuckDB, serverless Lakehouse ideas, and cost-sensitive analytics with expert speakers
📑 Reporting & Tables
Explore the Pharmaverse Examples: Your Gateway to Clinical Reporting with Open-Source Tools (pharmaverse.github.io). Explore open-source tooling for end-to-end clinical reporting with Pharmaverse examples and step-by-step code, interactive teal apps, and community-driven guidance
Introducing docorator to the pharmaverse (pharmaverse.github.io). docorator neatly decorates GT, ggplot2, and related outputs in R for production-ready PDFs and downstream reuse
What’s New in gt 1.2.0: Better Tables Through Collaboration (posit.co). gt 1.2.0 advances table collaboration with centralized management for R, Python, and cloud environments
Drop #743 (2025-12-22): Monday Afternoon Grab Bag (dailydrop.hrbrmstr.dev). Daff diff tool for tabular data, MDXport converts Markdown to PDFs in-browser, and GrAIphViz structures AI instructions with GraphViz DOT
🗺️ Spatial Data & Maps
App: visualizador de mapas comunales del Censo 2024 por manzanas (bastianoleah.netlify.app). Shiny app in R to map Censo 2024 data by comuna and manzana using arrow-backed datasets
2025 geocompx report: advancing spatial data analysis across languages (geocompx.org). Geocompx reports 2025 milestones across R, Python, Julia; books, blogs, translations, and new visualisation and Julia resources
Into the void (dosull.github.io). Geospatial exploration of Paparoa Track viewsheds using R (terra, sf, tmap) and QGIS data in a guided walk by David O’Sullivan, December 2025
Visualiza datos del Censo 2024 en mapas a nivel de manzana con R (bastianoleah.netlify.app). Visualiza datos del Censo 2024 en manzanas de Chile con R usando ggplot2 y mapgl
Estimated crime rates are ~134% higher in London’s mostdeprived neighborhoods (stevenponce.netlify.app). London crime gradient by income deprivation deciles analyzed with R, per-capita rates, and equal-population estimates
📈 Inference & Modeling
Frequently Asked Questions (metafor-project.org). Overview of metafor package for R, validation, funding, usage, and technical details on I2, H2, R2, and Freeman–Tukey transformations
Predicting survival using a super learner and right-censored data (aliceinstatisticsland.wordpress.com). Survival analysis with a super learner using right-censored data in R (survivalSL, flexsurv, glmnet) and methods like randomSRC and survival neural networks
Power analysis – A flexible simulation approach using R (nicolaromano.net). Monte Carlo power analysis in R comparing designs for plant growth using nlme and custom simulations
Corrupción, libertad y por qué debemos ser criteriosos al usar regresión lineal (pacha.dev). R, ggplot2, dplyr, readxl, and regression critique of corruption vs economic freedom with Cook's distance in a Spanish-language blog
Construcción de intervalos de confianza para gráficos de calibración vía "bootstrap" y algunos asuntos más (datanalytics.com). Calibración de gráficos con bootstrap, intervalos de confianza y temas relacionados en estadística y ML usando R y Python
🤖 Bayesian & ML Notes
Local models are not there (yet) (posit.co). Local models are not there yet; Posit discusses R, Python, Jupyter, and Shiny alongside partnerships and tooling
Machine Learning Powered Naughty List: A Festive Jumping Rivers Story (jumpingrivers.com). Festive ML demo using R and Random Forest to classify 'naughty' team traits with playful features
a (sunny, crisp) day at ICSDS 2025 (xianblog.wordpress.com). Bayesian learning sessions at ICSDS 2025 in Xi’an; proper prior minimaxity, variational inference, DIC, AI priors, martingale prediction, and urn-based math discussed by George, Margossian, Christensen, Rockova, Ng, Cappello, Ghiglietti
Good if make prior after data instead of before (dynomight.substack.com). Explores Bayesian priors, data-driven categories, and infinite possibilities using aliens as a thought experiment
Elo rating systems via Markov Chains (xianblog.wordpress.com). Explores Elo ratings via Markov Chains, Bradley–Terry–Luce models, spectral gap optimization, SGD updates, and Bayesian ranking discussions
New ZeMKI Working Paper on Longitudinal Social Media Engagement (nicolarighetti.net). Longitudinal Bayesian multilevel analysis of anger-driven climate-skeptic propagation on Facebook during the 2021 German election
📚 Academic Research
Inference for high dimensional repeated measure designs with the R package hdrm (arxiv:stat). hdrm adds high-dimensional repeated-measures mean tests to R, using unbiased trace estimators, subsampling, and Pearson-type approximations. Useful for EEG/omics when d≈N, delivering practical split-plot inference
Deep Gaussian Processes with Gradients (arxiv:stat). Introduces deep Gaussian processes that incorporate gradient observations, improving nonstationary surrogate modeling. Provides CRAN deepgp code with Vecchia scaling for faster Bayesian inference in R
Hazard-based distributional regression via ordinary differential equations (arxiv:stat). Models survival hazards via autonomous ODE systems, letting covariates change hazard shapes beyond proportional hazards. Bayesian computation and asymptotics aid flexible, interpretable inference for trials
Bayesian Markov-Switching Partial Reduced-Rank Regression (arxiv:stat). Bayesian Markov-switching partial reduced-rank regression mixes low-rank linear and GP components, learning groups and rank over time. Useful for multivariate time-series forecasting with uncertainty quantification
Enhancing Line Density Plots with Outlier Control and Bin-based Illumination (arxiv:cs). Proposes bin-based illumination for line density plots, separating structure from density to preserve paths and highlight outliers. Inspires better trajectory visualizations in R at scale
Add a comment: