The R Data Scientist logo

The R Data Scientist

Subscribe
Archives
November 18, 2025

The R Data Scientist 18-11-2025

Posit's 2025 contest winners, AI+R, new packages and more

📄 Community & Roundups

Winners of Posit’s 2025 Table Contest (wjschne​.github​.io). Winners of Posit’s 2025 Table Contest highlights Martin Stavro’s interactive fleet-characteristics app and a Best Tutorial prize for a 24-part APA Manual recreation

Recapping posit::conf 2025 (tshafer​.com). Recapping posit::conf 2025: Tom Shafer shares reflections on a practitioner-focused Posit/RStudio conference in Atlanta

Get Involved in the Data Science Community at our Free Meetups (jumpingrivers​.com). Free in-person data science meetups in Newcastle and Leeds with talks on LLMs, forecasting, MLOps, and R/Python workshops

Weekly Recap (Nov 14, 2025) (blog​.stephenturner​.us). Weekly recap covers posit::conf(2025), Nextflow Summit talks, AI reviews, R updates, Python for R, and new papers

R Weekly 2025-W47 Plots, Tables, Pipes (rweekly​.org). Weekly digest on R plots, tables, and pipes with updates from R Core, new packages, events, and community highlights

🤖 AI & LLMs in R

Keeping LLMs in Their Lane: Focused AI for Data Science and Research (r-consortium​.org). Explores responsible, focused use of LLMs in data science with R, Elmer, and Posit’s Databot for correctness, transparency, and reproducibility

Me, Myself, and Claude: Scaling an R Consultancy with AI-Assisted Development (r-consortium​.org). AI-assisted development scales an R consultancy, turning boilerplate into packages and dashboards with Claude Code and Shiny, led by Jasmine Daly

When plotting, LLMs see what they expect to see (posit​.co). LLMs tie plotting to expectations; Posit explains dynamic data insights, RStudio/Jupyter/VS Code management, and Python/R package ecosystems

🛠️ R Packages & Dev

readtextgrid now uses C++ (and ChatGPT helped) (tjmahr​.github​.io). ReadTextGrid shifts to C++ for parsing TextGrid files, with R integration and insights on LLM-assisted programming

testthat 3.3.0 (tidyverse​.org). Hadley Wickham announces testthat 3.3.0 with R 4.1 support, mocking lifecycle changes, improved expectations, and new features for testing in R

unifiedml: A Unified Machine Learning Interface for R, is now on CRAN + Discussion about AI replacing humans (thierrymoudiki​.github​.io). unifiedml delivers a unified R interface for ML algorithms, with automatic task detection, cross-validation, and model interpretation

side::kick(), a coding agent for RStudio (simonpcouch​.com). Side::kick(), an open-source RStudio coding agent built in R for interacting with files and the active R session

utf8ify your text! (rolkra​.github​.io). utf8ify your text! explores R package usage to format text with utf8 characters and typography tricks

How to Make High-Quality PDFs with Quarto and Typst (rfortherestofus​.com). Brings Quarto and Typst together to produce high-quality PDFs with custom Typst templates and branding

🔧 Data Wrangling & Ops

Best Practices for Cleaning Data in R (eringrand​.github​.io). Best practices in R for data cleaning, deduplication, and validation using janitor, dplyr, and assertR in educational datasets

How to access HomeAssistant's InfluxDB from R (rstats-tips​.net). Using R to access HomeAssistant's InfluxDB v1 with influxdbr, httr, and jsonlite for API queries and data wrangling in tidyverse

How to deploy a Shiny app for production (pacha​.dev). Guides for deploying Shiny apps to production using Shiny Server, Kamatera, AWS, and Let’s Encrypt with golem-based packaging

🧭 Geospatial & Mapping

Graph Neural Nets for Spatial Data Science (josiahparry​.com). Graph Neural Nets for Spatial Data Science uses R, igraph, sfdep, spdep, dplyr, ggplot2 and torchgnn to connect spatial lags, GCNs, and SLX modeling

30 Day Map Challenge 2025 (dosull​.github​.io). Explores Tanaka illuminated contours using R (sf, terra, dplyr, stars, ggplot2) and Mars DEMs, with metR and tanaka packages

Spatial autocorrelation: what’s the problem? (dosull​.github​.io). Explores spatial autocorrelation, sampling schemes, and R/terra/spatstat tools to show how autocorrelation affects mean estimates

Creating a London Population Map with D3po (pacha​.dev). London population map using D3po in R, with sf and rvest, visualized by Po_geomap and po_labels

2125 (r​.iresmi​.net). Pyrenean glaciers in 2125 visualized with R: dplyr, ggplot2, sf, elevatr, terra, rnaturalearth, osmdata, ggrepel

Portafolio: mapas de Áreas Metropolitanas (bastianoleah​.netlify​.app). Maps of Metropolitan Areas in Chile created with R scripts, showing regional proposals and related statistics

📊 Stats & Inference

Which variables to control for, and why (pedermisager​.org). Explains which variables to control for in causal inference, using DAGs, confounders, colliders, mediators, and practical limits with RCT alternatives

ROC Curves in Two Lines of Code (rworks​.dev). R Horton explains ROC curves with R code, logistic regression scoring, and turtle graphics intuition

Where Are Fisher, Neyman, Pearson in 1919? Opening of Excursion 3, snippets from 3.1 (errorstatistics​.com). Discusses 1919 eclipse tests of GTR, Popperian severe testing, and historical figures Fisher, Neyman, Pearson within Excursion 3

Modeling approaches in meta-analysis: from sandwich estimators to correlated hierarchical models (methodsblog​.com). Meta-analysis modeling: dependence, CRVE, multilevel and phylogenetic models in ecology and evolution using R-like approaches

Approximate Bayesian Computation with Statistical Distances for Model Selection [OWABI, 27 Nov] (xianblog​.wordpress​.com). Clara Grazian discusses ABC with statistical distances for model selection, using full-data approaches and simulated toad movement models

Two notes after wrapping up some writing projects this week (blog​.miljko​.org). Quick estimation of 95% CIs for event rates with no events; reference extraction for Zotero/Mendeley

🔎 Data Case Studies

Hairy Football Challenge (datannery​.com). Explore mean time between winning streaks in football using R tidyverse, tarchetypes, sliding windows, and SQLite data

Emmanuel Clase and Luis Ortiz Were Just a Little Too Obvious About Rigging Pitches (conormclaughlin​.net). A data-driven look at pitch rigging using Statcast data, LOF analysis in R, and 3D release-point visuals

Choose Your Fighter: data-driven selection of the best marathon (quantixed​.org). Data-driven marathon choice using elevation, weather, GPX tools in R (ggplot2, dplyr, openmeteo) by Stephen Royle

Using R/anomalize to identify delays in games of Australian Rules football (nsaunders​.wordpress​.com). Using R and anomalize to identify delays in AFL games, with data scraping, EDA, and anomaly detection

The Sherlock Holmes Canon Thematic Word Networks (stevenponce​.netlify​.app). R, tidytext, and ggraph explore 15 Sherlock Holmes stories with TF-IDF for distinctive dialogue words

📚 Academic Research

Diagnostics for Semiparametric Accelerated Failure Time Models with R Package afttest (arxiv:stat). Introduces afttest R package implementing diagnostic tests for semiparametric AFT models (rank-based/least-squares) with multiplier bootstrap and graphical tools. Vital for survival-analysis practitioners using R

A tutorial for propensity score weighting methods under violations of the positivity assumption (arxiv:stat). Comprehensive tutorial and ChiPS R package for PS weighting under positivity violations; guides estimand selection, implementation, diagnostics, simulations, and case studies. Essential for causal inference in R

rfBLT: Random Feature Bayesian Lasso Takens Model for time series forecasting (arxiv:stat). Proposes rfBLT R package combining Takens embeddings, random features, and Bayesian Lasso for probabilistic time-series forecasting. Offers credible intervals and strong real-data performance—useful for R time-series modeling

Don't miss what's next. Subscribe to The R Data Scientist:
Start the conversation:
Bluesky Mastodon LinkedIn
Powered by Buttondown, the easiest way to start and grow your newsletter.