The R Data Scientist logo

The R Data Scientist

Archives
Subscribe
January 21, 2026

The R Data Scientist 21-01-2026

🌐 Community & Roundups

2025 Posit Open Source Year in Review (posit​.co). PositOpen Source 2025 highlights: speed, AI features, R and Python tooling, Shiny apps, OpenTelemetry, and community partnerships across RStudio, Jupyter, and VS Code

Weekly Recap (January 16, 2026) (blog​.stephenturner​.us). Weekly recap covering AIxBio, biotechnology, Claude Code, Positron tips, cell fate engineering, R updates, RAG in R, and recent papers and preprints

AI Newsletter 2026-01-16 (posit​.co). Posit shares AI news and tools updates, including Claude Code, Opus 4.5, LLM harnesses, Databricks integration, and RStudio/Jupyter support

Social Coworking and Office Hours - Let it go! (ropensci​.org). Social Coworking and Office Hours in January focusing on 'Let it go!' with Yanina Bellini Saibene and Steffi LaZerte and Zoom access details

Will AI chatbots put an end to open source software? (kucharski​.substack​.com). AI chatbots, open-source funding, and the risk of losing human attention and documentation behind paywalls

💊 Pharma R Ecosystem

Pharmaverse and Containers (pharmaverse​.github​.io). Pharmaverse container image for R with 40+ packages speeds up GitHub Actions publishing workflow

admiral 1.4 release (pharmaverse​.github​.io). Admiral 1.4 release introduces AI integration, experimental functions for time-point analysis, and an anti-drug antibody template in R

Open Pharma Day in six big ideas (openpharma​.blog). Six ideas on truth, transparency and trust in medical research publishing, featuring Tracey Brown, Helen Pearson, Richard Smith and AI-enabled tools like DKGs

🧰 R Tooling & Workflow

Setting Up A Cluster of Tiny PCs For Parallel Computing - A Note To Myself (kenkoonwong​.com). Setting up a Ubuntu cluster of tiny PCs with passwordless SSH, automated R package installs, and parallel simulations using multicore futures and TMLE in R

CodeSOD: A Pirate's Confession (thedailywtf​.com). Confessional look at R code gotchas: looping, NA handling, assignment operators, and the arrr-themed joke

How to make your data analysis life easier using Positron, Raycast, and Espanso (andrewheiss​.com). Data science workflow with Positron, Raycast, Espanso; Visual Studio Code, Quarto, R, Python; Andrew Heiss shares setup forPosit’s Data Science Lab

Announcing {typeR}: simulate live typing of R scripts (federicagazzelloni​.com). Announcing the typeR package to simulate live typing of R scripts for teaching, demos, and video recordings

Version control your data, not just your code, with the pins library for R and Python (dabblingwithdata​.amedcalf​.com). Version control data with the pins library in R and Python, enabling versioned pins to share, read, and prune datasets

Apache Arrow ADBC Database Drivers (confessionsofadataguy​.com). Apache Arrow ADBC enables end-to-end Arrow-based database connectivity, reducing serialization by moving RecordBatches between applications and databases

Document-Type Dispatching in Quarto Typst Extensions (mickael​.canouil​.fr). Mickaël Canouil explains a Typst-based document-type dispatcher pattern for Quarto extensions using multiple document types (report, letter, cv) and YAML bridging

🗺️ Spatial & Visualisation

Getting Over It (datannery​.com). Visualising a Transalp bike ride with R, tidyverse, terra, elevatr, patchwork, tarchetypes, geotargets, sf, and related tools

Spatial data science languages (spatialists​.ch). Common challenges and cross-language recommendations for R, Python, and Julia in spatial data science across tools like sf, GeoPandas, GeoParquet, GeoArrow

How Far Does APOD Take Us? (stevenponce​.netlify​.app). A data viz in R exploring APOD distances using tidyverse, tidytuesdayR, ggplot2, and custom utilities by Steven Ponce

From field trip to first paper: the colorful arable fields of Lemnos, Greece (blog​.pensoft​.net). Field trip leads to first paper on Lemnos arable fields using Vegetation Classification and Survey, with R for data analysis and collaboration from Bergmeier, Meyer, and Rinne

📈 Inference & Bayesian

Studying social transmission using STbayes (methodsblog​.com). NBDA and STbayes for social transmission analysis using Stan, R, dynamic networks, and Bayesian modeling in animal culture (Chimento, Hoppitt, Farine, Wil d)

Version 1.4.0 of NIMBLE released, plus new quadrature-based functionality in nimbleQuad package (r-nimble​.org). NIMBLE 1.4.0 released with nimbleQuad quadrature support and macro enhancements for models

How to interpret hazard ratios (thestatsgeek​.com). Hazard ratios, frailty, and causal interpretation in Cox models explored with a simple example and discussion by Jonathan Bartlett, Dominic Magirr, and Tim Morris

(JAN #2) Leisurely cruise January 2026: Excursion 4 Tour II: 4.4 “Do P-Values Exaggerate the Evidence?” (errorstatistics​.com). Discusses P-values versus Bayesian measures and other evidential accounts in statistics, with references to Berger & Sellke, Casella & Berger, and the philosophy of error statistics

Monte Carlo with infinite variances [a surveyal guide] (xianblog​.wordpress​.com). Survey on infinite-variance Monte Carlo methods; Pareto correction; Rao-Blackwellization; antithetic variables; reparameterisation; variance reduction techniques

📚 Academic Research

trud: An R interface to the NHS England Technology Reference data Update Distribution (TRUD) API (joss​.theoj​.org). trud provides an R interface to NHS England’s TRUD API, simplifying access to clinical terminology reference datasets. Ideal for reproducible healthcare analytics workflows in R

Tree Estimation and Saddlepoint-Based Diagnostics for the Nested Dirichlet Distribution: Application to Compositional Behavioral Data (arxiv:stat). Introduces greedy tree estimation for Nested Dirichlet models of compositional data and saddlepoint pseudo-residual diagnostics. Includes an R package for fit-checking, influence analysis, interpretation today

Generalized Heterogeneous Functional Model with Applications to Large-scale Mobile Health Data (arxiv:stat). Proposes GHFM to learn subgroup-specific functional effects in generalized scalar-on-function regression, scalable via pre-clustering. Applied to UK Biobank accelerometry, reveals activity–dementia heterogeneity for prediction accuracy

Don't miss what's next. Subscribe to The R Data Scientist:

Add a comment:

Share this email:
Share on LinkedIn Share on Hacker News Share on Mastodon Share on Bluesky
Bluesky
LinkedIn
https://mastodo...
Powered by Buttondown, the easiest way to start and grow your newsletter.