Data Scientist (with R)
R community, visualisation, AI apps, shiny engineering
🌐 R Community News
2025-10-10 AI Newsletter (posit.co). Posit newsletter covers Claude Sonnet 4.5, coding models, Databot, R/Python packaging, and AI market dynamics
Weekly recap (Oct 10, 2025) (blog.stephenturner.us). Weekly AI and biosecurity, R updates, genome engineering, RAG, Lost Science, AI in medicine, ggplot2, Quarto, and DNA forensics
A Primer on Domain Verification (ropensci.org). Domain verification across Mastodon, GitHub, and GitHub Pages demonstrates cross-site authenticity using DNS TXT records and metadata validation
2025 Annual Conference (mapor.org). MAPOR's 50th Annual Conference in Chicago features short course on complex survey data analysis in R and a keynote by the former Census Bureau director
R Weekly 2025-W42 Aaaaand… they’re off!, Generative AI for Data Visualisation, and Behavior-Driven Development (rweekly.org). Generative AI for Data Visualisation, Behavior-Driven Development in Shiny, and more R Weekly highlights
📊 R Visualisation
Generative AI for Data Visualisation (nrennie.rbind.io). Generative AI tools (ChatGPT, Claude, Copilot, Gemini) tested on weather and CEO data visualisation prompts, with varying results and guidance
Explore #TidyTuesday literary prizes with Positron’s Data Explorer (juliasilge.com). Explore Positron's Data Explorer with #TidyTuesday dataset on British literary prizes and new features
Halloween in the Round (kieranhealy.org). Explores aggregating FARS pedestrian fatalities, uses ggplot2: coordpolar/coordradial, geom_textsegment, forte in polar plots and donut-style visuals
The World Started Tracking Severe Food Insecurity in 2016 (stevenponce.netlify.app). FAO indicators tracked since 2016; R packages and ggplot2-based visuals with tidytuesdayR data
🤖 R + AI Apps
Extracting location from text with AI (jla-data.net). Using Gemini structured output in R (gemini.R) to summarize and geocode locations from texts with prompts for single most important location
My Attempt To Reproduce Stanford HIVdb Sequence and Mutation Analysis From Scratch (kenkoonwong.com). Rebuilding Stanford HIVdb resistance logic using R with Bioconductor tools, DECIPHER, and BLAST on HIV reference genomes
R port of llama2.c (thierrymoudiki.github.io). R port of llama2.c with Shiny app, installation steps, and API access for educational use
🧩 Shiny Engineering
foreach: Making All %dopar% Behave Like %dofuture% Everywhere (jottr.org). Overview of doFuture 1.10.0 features: registerDoFuture('%dofuture%'), foreach integration, and improved error handling and RNG consistency
Behavior-Driven Development in R Shiny: A Step-By-Step Example (jakubsobolewski.com). Behavior-Driven Development with R Shiny: write specs in testthat or cucumber, build a driver, implement app and storage, evolve via TDD
Deploy Multiple Shiny Apps from One R Package (jakubsobolewski.com). Deploy multiple Shiny apps from one R package using a monorepo, sharing common R/ functions via inst/apps with a custom rsconnect deploy function
🎲 Statistical Inference
If you have two measures of the same confounder, you can just include both of them in your regression model (the100.ci). Two correlated covariates need not derail regression: including both improves bias and predictive precision for X on Y
Excursion 1 Tour I (2nd Stop): Probabilism, Performance, and Probativeness (1.2) (errorstatistics.com). Probabilism vs. performance in statistical inference; severity, error probes, and examples like Potti, Bristol-Roach, and Texas sharpshooter
Regression to the mean (blog.engora.com). Regression to the mean explained with cars, heights, sports, schools, and business examples, plus implications for analysis
How much should we trust medicine? (emilkirkegaard.com). Review of Medical Nihilism by Jacob Stegenga, examining Bayes, meta-analysis, SEU, biases, and pharmaceutical incentives in medicine
📚 Academic Research
Examining the Interface Design of Tidyverse (arxiv:stat). Examines Tidyverse interface design via HCI lens for data viz and wrangling; advocates iterative, user feedback-driven development
Zero-Inflated Bayesian Multi-Study Infinite Non-Negative Matrix Factorization (arxiv:stat). Bayesian non-parametric multi-study NMF with zero-inflation for cross-study dietary pattern analysis and cancer risk association
Generating CodeMeta using declarative mapping rules: An open-ended approach using ShExML (arxiv:cs). Declarative mapping rules with ShExML for generating CodeMeta across crosswalks, validated by SHACL/ShEx to enhance FAIR research software
Automated Gating for Flow Cytometry Data Using a Kernel-Smoothed EM Algorithm (arxiv:stat). Automated gating of flow cytometry phytoplankton via a kernel-smoothed EM algorithm for time-evolving Gaussian mixtures
hi