The R Data Scientist logo

The R Data Scientist

Subscribe
Archives
September 2, 2025

Data Scientist (with R) - September 2nd, 2025

📦 Package Updates & Community

July 2025 Top 40 New CRAN Packages (rworks​.dev). July 2025 CRAN Top 40: ciflyr, hmde, RBaM, ArgentinAPI, bcRP, gilmour, ROCnGO, tEDM, topolow, flowcluster, mantis, staving a diverse mix of causal inference, Bayesian modeling, ML, time series, utilities, and visualization

movepub 0.4.0 (oscibio​.inbo​.be). movepub 0.4.0 introduces new functions and updates write_dwc() for Movebank data, metadata standardization, and publication to GBIF/OBIS

rOpenSci News Digest, August 2025 (ropensci​.org). rOpenSci News Digest: community calls on R-multiverse, useR! 2025 and posit::conf(2025), new packages trud, sasquatch, dataset, and updates to rOpenSci events and peer review

Closing my tabs (Aug 29, 2025) (blog​.stephenturner​.us). RAND Defining Hazardous Capabilities of Biological AI; OpenAI/Retro Biosciences stem cell reprogramming 50x; Bluesky for science; LLMs in education; PyPI vs CRAN; R/Python programming book; SIGCSETS; The Test Set podcast

Send Me Your Questions and Ideas (pacha​.dev). Request for reader questions and ideas on R, Shiny, and C++, with a form submission and GitHub organization

🎓 Education & Workshops

R-kurser i höst (statistikakademin​.se). R-kurser i höst: Baspaket, Mediumpaket och Kompletta paket onlinekurser i R, regression, visualisering, SEM, survival, biomarkörsdata, ML och AI

R courses this fall (statistikakademin​.se). R courses in fall: packages Basic, Medium, Complete; R 1–5 online courses on introduction, regression, visualization, survival analysis, biomarker data, ML/AI with discounts

Skytorial and Linkedtorial 1: Introduction to GitHub for Researchers (yabellini​.netlify​.app). Linkedtorials introducing GitHub for researchers, focusing on projects, repositories, version control, collaboration, reproducibility

Some Stuff I Learned About Harvard This Week (causalinf​.substack​.com). Harvard Gov50 class design, quantitative reasoning with data, R coding, GitHub workflows, PERMA well-being model, AI policy, podcast/papers on Imbens Angrist King, Causal Inference pedagogy

Single and multi-omics analysis and integration with mixOmics (mixomics​.org). [open] Single and multi-omics analysis and integration with mixOmics

🛠️ Posit Tools & Development

Databot is not a flotation device (posit​.co). Databot introduction, risks, and governance by Posit; includes LLM usage warnings and data science best practices

Announcing posit::conf(2025) Virtual Day, Sept 16th! (posit​.co). Announcing posit::conf(2025) Virtual Day, Sept 16th with AMA, virtual talks, Data Science Hangout, and Discord participation

posit::glimpse() Newsletter – August 2025 (posit​.co). Positron IDE, Code OSS-based data science workflow, Shiny/Streamlit/Dash apps, Quarto 1.8, Shiny for Python 1.4.0, package releases and tutorials, Posit conf in Atlanta

Using OpenAI Codex in Positron (blog​.stephenturner​.us). Developing an R package with OpenAI Codex in Positron; Codex integration, tests, devtools, usethis, Roxygen, testthat, GPT-5 vs Claude, GitHub workflows, and cost considerations

🔬 Applied Research & Visualization

Learning The Basics of Phylogenetic Analysis (kenkoonwong​.com). Workflow in R/Bioconductor: extract 16S rRNA, barrnap, DECIPHER alignment, Jukes-Cantor distances, rapidNJ/PHYLIP, ggtree/FigTree visualization

Wildlife management (openanalytics​.eu). Open Analytics visualizes Flemish wildlife data via Faunabeheer, e-loket fauna en flora, waarnemingen.be, Wilder; R/Shiny apps, GitHub Actions, ShinyProxy, INBO/ANB collaboration

The social and spatial effects of fare cuts on public transport (urbandemographics​.blogspot​.com). Fare cuts, public transport demand, and spatial effects analyzed with agent-based modeling, space syntax, and R; urban mobility, induced demand, and social equity

Plotting with ukmaps v0.0.4 and ggplot2 (pacha​.dev). UKmaps, ggplot2, boundaries, dplyr, sf, ggplot2, London, Barnet, Golders Green, LADs, counties, country(), tintin, election_results, r counts

🤖 Machine Learning & Modeling

I was wrong about tidymodels and LLMs (simonpcouch​.com). Databot and Predictive: tidymodels usage, runrcode, run_experiment, evaluative findings, and model performance across Claude Sonnet 4 and Gemini Pro 2.5

external regressors in ahead::dynrmf’s interface for Machine learning forecasting (thierrymoudiki​.github​.io). External regressors in ahead::dynrmf interface demonstrated with USAccDeaths, AirPassengers, fpp2 a10, fdeaths; xreg creation; runs with ridge and glmnet cv.glmnet

NRL Predictions for Round 27 (statschat​.org​.nz). NRL predictions, team ratings, performance metrics, and author David Scott background from Stats Chat

A guide to actuarial techniques in R and Python (posit​.co). A guide to actuarial techniques using R and Python, Posit tools, and open-source data science resources

📊 Statistics & Probability Methods

A One-Page Primer on: Statistical Power (carlislerainey​.com). Power analysis overview: SESOI, SE, Cohen-type references, R^2, pre–post design gains, and practical rules of thumb for readers and researchers

Bad BBC stats (bristoliver​.substack​.com). BBC statistics on pupil absence's first-week link to later absence; correlation vs causation; group composition; 18% persistently absent; p solving 0.57p+0.14(1-p)=0.18

The sisters "paradox" - counter-intuitive probability (blog​.engora​.com). Counter-intuitive probability: two-child problem, sample space, 1/3 vs 1/2, elder-youngest conditioning, Python simulation suggestion

You can’t have everything you want: beta edition (johndcook​.com). Beta priors for binomial likelihood; conjugacy, alpha+beta as prior sample size, non-informative beta(0.9,0.1) vs beta(1.8,0.2), singularities at 0 and 1, improper beta(0,0) debates

📚 Academic Research

An analysis of the effects of open science indicators on citations in the French Open Science Monitor (arxiv:cs). Statistical analysis of 900K publications showing open science practices increase citations by 8-19%. Relevant for R data scientists interested in open science impact and reproducible research practices

hi

Don't miss what's next. Subscribe to The R Data Scientist:
Start the conversation:
Bluesky Mastodon LinkedIn
Powered by Buttondown, the easiest way to start and grow your newsletter.