The R Data Scientist 18-11-2025
Posit's 2025 contest winners, AI+R, new packages and more
📄 Community & Roundups
Winners of Posit’s 2025 Table Contest (wjschne.github.io). Winners of Posit’s 2025 Table Contest highlights Martin Stavro’s interactive fleet-characteristics app and a Best Tutorial prize for a 24-part APA Manual recreation
Recapping posit::conf 2025 (tshafer.com). Recapping posit::conf 2025: Tom Shafer shares reflections on a practitioner-focused Posit/RStudio conference in Atlanta
Get Involved in the Data Science Community at our Free Meetups (jumpingrivers.com). Free in-person data science meetups in Newcastle and Leeds with talks on LLMs, forecasting, MLOps, and R/Python workshops
Weekly Recap (Nov 14, 2025) (blog.stephenturner.us). Weekly recap covers posit::conf(2025), Nextflow Summit talks, AI reviews, R updates, Python for R, and new papers
R Weekly 2025-W47 Plots, Tables, Pipes (rweekly.org). Weekly digest on R plots, tables, and pipes with updates from R Core, new packages, events, and community highlights
🤖 AI & LLMs in R
Keeping LLMs in Their Lane: Focused AI for Data Science and Research (r-consortium.org). Explores responsible, focused use of LLMs in data science with R, Elmer, and Posit’s Databot for correctness, transparency, and reproducibility
Me, Myself, and Claude: Scaling an R Consultancy with AI-Assisted Development (r-consortium.org). AI-assisted development scales an R consultancy, turning boilerplate into packages and dashboards with Claude Code and Shiny, led by Jasmine Daly
When plotting, LLMs see what they expect to see (posit.co). LLMs tie plotting to expectations; Posit explains dynamic data insights, RStudio/Jupyter/VS Code management, and Python/R package ecosystems
🛠️ R Packages & Dev
readtextgrid now uses C++ (and ChatGPT helped) (tjmahr.github.io). ReadTextGrid shifts to C++ for parsing TextGrid files, with R integration and insights on LLM-assisted programming
testthat 3.3.0 (tidyverse.org). Hadley Wickham announces testthat 3.3.0 with R 4.1 support, mocking lifecycle changes, improved expectations, and new features for testing in R
unifiedml: A Unified Machine Learning Interface for R, is now on CRAN + Discussion about AI replacing humans (thierrymoudiki.github.io). unifiedml delivers a unified R interface for ML algorithms, with automatic task detection, cross-validation, and model interpretation
side::kick(), a coding agent for RStudio (simonpcouch.com). Side::kick(), an open-source RStudio coding agent built in R for interacting with files and the active R session
utf8ify your text! (rolkra.github.io). utf8ify your text! explores R package usage to format text with utf8 characters and typography tricks
How to Make High-Quality PDFs with Quarto and Typst (rfortherestofus.com). Brings Quarto and Typst together to produce high-quality PDFs with custom Typst templates and branding
🔧 Data Wrangling & Ops
Best Practices for Cleaning Data in R (eringrand.github.io). Best practices in R for data cleaning, deduplication, and validation using janitor, dplyr, and assertR in educational datasets
How to access HomeAssistant's InfluxDB from R (rstats-tips.net). Using R to access HomeAssistant's InfluxDB v1 with influxdbr, httr, and jsonlite for API queries and data wrangling in tidyverse
How to deploy a Shiny app for production (pacha.dev). Guides for deploying Shiny apps to production using Shiny Server, Kamatera, AWS, and Let’s Encrypt with golem-based packaging
🧭 Geospatial & Mapping
Graph Neural Nets for Spatial Data Science (josiahparry.com). Graph Neural Nets for Spatial Data Science uses R, igraph, sfdep, spdep, dplyr, ggplot2 and torchgnn to connect spatial lags, GCNs, and SLX modeling
30 Day Map Challenge 2025 (dosull.github.io). Explores Tanaka illuminated contours using R (sf, terra, dplyr, stars, ggplot2) and Mars DEMs, with metR and tanaka packages
Spatial autocorrelation: what’s the problem? (dosull.github.io). Explores spatial autocorrelation, sampling schemes, and R/terra/spatstat tools to show how autocorrelation affects mean estimates
Creating a London Population Map with D3po (pacha.dev). London population map using D3po in R, with sf and rvest, visualized by Po_geomap and po_labels
2125 (r.iresmi.net). Pyrenean glaciers in 2125 visualized with R: dplyr, ggplot2, sf, elevatr, terra, rnaturalearth, osmdata, ggrepel
Portafolio: mapas de Áreas Metropolitanas (bastianoleah.netlify.app). Maps of Metropolitan Areas in Chile created with R scripts, showing regional proposals and related statistics
📊 Stats & Inference
Which variables to control for, and why (pedermisager.org). Explains which variables to control for in causal inference, using DAGs, confounders, colliders, mediators, and practical limits with RCT alternatives
ROC Curves in Two Lines of Code (rworks.dev). R Horton explains ROC curves with R code, logistic regression scoring, and turtle graphics intuition
Where Are Fisher, Neyman, Pearson in 1919? Opening of Excursion 3, snippets from 3.1 (errorstatistics.com). Discusses 1919 eclipse tests of GTR, Popperian severe testing, and historical figures Fisher, Neyman, Pearson within Excursion 3
Modeling approaches in meta-analysis: from sandwich estimators to correlated hierarchical models (methodsblog.com). Meta-analysis modeling: dependence, CRVE, multilevel and phylogenetic models in ecology and evolution using R-like approaches
Approximate Bayesian Computation with Statistical Distances for Model Selection [OWABI, 27 Nov] (xianblog.wordpress.com). Clara Grazian discusses ABC with statistical distances for model selection, using full-data approaches and simulated toad movement models
Two notes after wrapping up some writing projects this week (blog.miljko.org). Quick estimation of 95% CIs for event rates with no events; reference extraction for Zotero/Mendeley
🔎 Data Case Studies
Hairy Football Challenge (datannery.com). Explore mean time between winning streaks in football using R tidyverse, tarchetypes, sliding windows, and SQLite data
Emmanuel Clase and Luis Ortiz Were Just a Little Too Obvious About Rigging Pitches (conormclaughlin.net). A data-driven look at pitch rigging using Statcast data, LOF analysis in R, and 3D release-point visuals
Choose Your Fighter: data-driven selection of the best marathon (quantixed.org). Data-driven marathon choice using elevation, weather, GPX tools in R (ggplot2, dplyr, openmeteo) by Stephen Royle
Using R/anomalize to identify delays in games of Australian Rules football (nsaunders.wordpress.com). Using R and anomalize to identify delays in AFL games, with data scraping, EDA, and anomaly detection
The Sherlock Holmes Canon Thematic Word Networks (stevenponce.netlify.app). R, tidytext, and ggraph explore 15 Sherlock Holmes stories with TF-IDF for distinctive dialogue words
📚 Academic Research
Diagnostics for Semiparametric Accelerated Failure Time Models with R Package afttest (arxiv:stat). Introduces afttest R package implementing diagnostic tests for semiparametric AFT models (rank-based/least-squares) with multiplier bootstrap and graphical tools. Vital for survival-analysis practitioners using R
A tutorial for propensity score weighting methods under violations of the positivity assumption (arxiv:stat). Comprehensive tutorial and ChiPS R package for PS weighting under positivity violations; guides estimand selection, implementation, diagnostics, simulations, and case studies. Essential for causal inference in R
rfBLT: Random Feature Bayesian Lasso Takens Model for time series forecasting (arxiv:stat). Proposes rfBLT R package combining Takens embeddings, random features, and Bayesian Lasso for probabilistic time-series forecasting. Offers credible intervals and strong real-data performance—useful for R time-series modeling