The R Data Scientist logo

The R Data Scientist

Subscribe
Archives
October 14, 2025

Data Scientist (with R)

R community, visualisation, AI apps, shiny engineering

🌐 R Community News

2025-10-10 AI Newsletter (posit​.co). Posit newsletter covers Claude Sonnet 4.5, coding models, Databot, R/Python packaging, and AI market dynamics

Weekly recap (Oct 10, 2025) (blog​.stephenturner​.us). Weekly AI and biosecurity, R updates, genome engineering, RAG, Lost Science, AI in medicine, ggplot2, Quarto, and DNA forensics

A Primer on Domain Verification (ropensci​.org). Domain verification across Mastodon, GitHub, and GitHub Pages demonstrates cross-site authenticity using DNS TXT records and metadata validation

2025 Annual Conference (mapor​.org). MAPOR's 50th Annual Conference in Chicago features short course on complex survey data analysis in R and a keynote by the former Census Bureau director

R Weekly 2025-W42 Aaaaand… they’re off!, Generative AI for Data Visualisation, and Behavior-Driven Development (rweekly​.org). Generative AI for Data Visualisation, Behavior-Driven Development in Shiny, and more R Weekly highlights

📊 R Visualisation

Generative AI for Data Visualisation (nrennie​.rbind​.io). Generative AI tools (ChatGPT, Claude, Copilot, Gemini) tested on weather and CEO data visualisation prompts, with varying results and guidance

Explore #TidyTuesday literary prizes with Positron’s Data Explorer (juliasilge​.com). Explore Positron's Data Explorer with #TidyTuesday dataset on British literary prizes and new features

Halloween in the Round (kieranhealy​.org). Explores aggregating FARS pedestrian fatalities, uses ggplot2: coordpolar/coordradial, geom_textsegment, forte in polar plots and donut-style visuals

The World Started Tracking Severe Food Insecurity in 2016 (stevenponce​.netlify​.app). FAO indicators tracked since 2016; R packages and ggplot2-based visuals with tidytuesdayR data

🤖 R + AI Apps

Extracting location from text with AI (jla-data​.net). Using Gemini structured output in R (gemini.R) to summarize and geocode locations from texts with prompts for single most important location

My Attempt To Reproduce Stanford HIVdb Sequence and Mutation Analysis From Scratch (kenkoonwong​.com). Rebuilding Stanford HIVdb resistance logic using R with Bioconductor tools, DECIPHER, and BLAST on HIV reference genomes

R port of llama2.c (thierrymoudiki​.github​.io). R port of llama2.c with Shiny app, installation steps, and API access for educational use

🧩 Shiny Engineering

foreach: Making All %dopar% Behave Like %dofuture% Everywhere (jottr​.org). Overview of doFuture 1.10.0 features: registerDoFuture('%dofuture%'), foreach integration, and improved error handling and RNG consistency

Behavior-Driven Development in R Shiny: A Step-By-Step Example (jakubsobolewski​.com). Behavior-Driven Development with R Shiny: write specs in testthat or cucumber, build a driver, implement app and storage, evolve via TDD

Deploy Multiple Shiny Apps from One R Package (jakubsobolewski​.com). Deploy multiple Shiny apps from one R package using a monorepo, sharing common R/ functions via inst/apps with a custom rsconnect deploy function

🎲 Statistical Inference

If you have two measures of the same confounder, you can just include both of them in your regression model (the100​.ci). Two correlated covariates need not derail regression: including both improves bias and predictive precision for X on Y

Excursion 1 Tour I (2nd Stop): Probabilism, Performance, and Probativeness (1.2) (errorstatistics​.com). Probabilism vs. performance in statistical inference; severity, error probes, and examples like Potti, Bristol-Roach, and Texas sharpshooter

Regression to the mean (blog​.engora​.com). Regression to the mean explained with cars, heights, sports, schools, and business examples, plus implications for analysis

How much should we trust medicine? (emilkirkegaard​.com). Review of Medical Nihilism by Jacob Stegenga, examining Bayes, meta-analysis, SEU, biases, and pharmaceutical incentives in medicine

📚 Academic Research

Examining the Interface Design of Tidyverse (arxiv:stat). Examines Tidyverse interface design via HCI lens for data viz and wrangling; advocates iterative, user feedback-driven development

Zero-Inflated Bayesian Multi-Study Infinite Non-Negative Matrix Factorization (arxiv:stat). Bayesian non-parametric multi-study NMF with zero-inflation for cross-study dietary pattern analysis and cancer risk association

Generating CodeMeta using declarative mapping rules: An open-ended approach using ShExML (arxiv:cs). Declarative mapping rules with ShExML for generating CodeMeta across crosswalks, validated by SHACL/ShEx to enhance FAIR research software

Automated Gating for Flow Cytometry Data Using a Kernel-Smoothed EM Algorithm (arxiv:stat). Automated gating of flow cytometry phytoplankton via a kernel-smoothed EM algorithm for time-evolving Gaussian mixtures

hi

Don't miss what's next. Subscribe to The R Data Scientist:
Start the conversation:
Bluesky Mastodon LinkedIn
Powered by Buttondown, the easiest way to start and grow your newsletter.