The R Data Scientist 11-11-2025
Posit::conf talks, new software updates, geospatial analysis
🌐 Community & News
posit::conf(2025) talks on YouTube (blog.stephenturner.us). Over 100 talks from posit::conf(2025) on YouTube, featuring R, Python, Quarto, Posit tools, and experts like Hadley Wickham, Wes McKinney, and Rich Iannone
promises v1.5.0 (shiny.posit.co). promises v1.5.0 adds hybrid sync/async execution, OpenTelemetry tracing, mirai support, and performance improvements for R/Shiny
pkgdown 2.2.0 (tidyverse.org). Pkgdown 2.2.0 introduces build_llm_docs for LLM-friendly docs, adds llms.txt, and exports MD versions of HTML content
Research Software Directory adopts Software Heritage IDs (softwareheritage.org). RSD adopts SWHIDs to archive software, enabling reproducible research and interoperable citations across 1,020 packages
Computo - A Journal for Transparent and Reproducible Research in Statistics and Machine Learning (ropensci.org). Computo promotes transparent, reproducible research in statistics and machine learning using notebooks, Quarto, GitHub, and open peer review
Reverse dependency check speedrun: a data.table case study (nanx.me). Benchmarking reverse dependency checks for data.table using Rust revdeprun, xfun, and pak on a 48-core cloud instance
Anaconda Ends R Support — Here’s What It Means and How to Move Forward (posit.co). Anaconda ends R support as Posit offers Package Manager for secure, reproducible R and Python workflows across enterprises
🎨 Interactive Visuals
Convierte gráficos {ggplot2} en visualizaciones interactivas con {ggiraph} (bastianoleah.netlify.app). Guía práctica para convertir gráficos ggplot2 en visualizaciones interactivas con ggiraph en R
Mushroom plots to visualise gain and loss on a phylogenetic tree (rowena-h.github.io). Visualising gene gain and loss on phylogenetic trees with ggplot2 and ggtree in R using mushroom plots
How to Analyze bluesky Posts and Trends with R (storybench.org). Exploring Bluesky data with R: TF-IDF, word frequency, and network ideas using bskyr and other tools
🗺️ Geospatial Analysis
Day 6: Dimensions (dewey.dunnington.ca). Dimensions day explores M values in XYM/XYZM, with R and Python workflows using argodata, sedonaDB, and Parquet, plus Argo ocean data
Mamdani vs Sliwa and Cuomo (kieranhealy.org). Mamdani vs Sliwa and Cuomo: R-based spatial mapping, precinct results, and dot-density visuals in NYC using sf, GTFS, and diverging choropleths
Crime and school access (urbandemographics.blogspot.com). Explores crime, school access, transit, spatial analysis, and policy using R, agent-based modeling, and urban data
The Changing Face of TB Mortality (stevenponce.netlify.app). HIV-associated TB deaths by region; Africa leads decline using weighted mortality and R tools
Inaccessibility (r.iresmi.net). Using R (sf, dplyr, ggplot2, glue, purrr, polylabelr) to compute France's pole of inaccessibility with adminexpress data
Reversed (r.iresmi.net). Reversing Mediterranean bathymetry with R (terra, sf) to reveal inverted coastlines and basins
📊 Statistical Methods
Examining the longley Dataset (jmsallan.netlify.app). Explores the Longley dataset, multicollinearity effects, and macroeconomic variable modeling using R with tidyverse, corrplot, car, and broom
Distribution of p-values under the null hypothesis for discrete data (freerangestats.info). Explores how discrete data affects p-value distributions under null hypotheses using Fisher's test in R
How to get data analysis very wrong: sample size effects (blog.engora.com). Explores sample size effects, variance, and misinterpretation in data analysis using coin tosses, dice, school results, A/B tests, with references to Wainer and van Belle
Imbens et al new undergrad course on causal inference compared to my own, next week's diff-in-diff workshop and turning 50 (aka the second third) (causalinf.substack.com). Scott’s Mixtape Caudal: Imbens undergrad causal inference course, diff-in-diff workshop, and reflections on teaching with R, potential outcomes, ATE/ATT, and covariate balance
Snow and memory (leancrew.com). Analysis of Chicago snowfall since 1932 using Python (pandas, matplotlib, statsmodels) and R, with NOWData data cleaning
🤖 ML & Modeling
simglm v0.9.26: Multi-level Propensity Score Modeling (brandonlebeau.org). Simglm v0.9.26 adds multi-level propensity score modeling for nested data in R, showcasing classroom and district level treatment effects with lme4-style syntax
tune version 2.0.0 (tidyverse.org). Tune 2.0.0 introduces parallel processing with future/mirai and postprocessing via tailor for tidymodels, enabling optimized model tuning in R
unifiedml in R: A Unified Machine Learning Interface (thierrymoudiki.github.io). A unified ML interface for R with sklearn-like API, automatic task detection, cross-validation, and model interpretability across glmnet, randomForest, and e1071
📚 Academic Research
fgwqsr: An R package for Frequentist Grouped Weighted Quantile Sum Regression (joss.theoj.org). Introduces fgwqsr, an R package implementing grouped weighted quantile-sum regression for chemical mixture analysis. Adds reproducible estimation, inference, and visualization tools that R users can integrate into workflows
Function on Scalar Regression with Complex Survey Designs (arxiv:stat). Presents svyfosr: R methods for function-on-scalar regression that properly account for complex survey designs (e.g., NHANES). Enables valid pointwise and joint inference for wearable-device functional outcomes
Inference for the Extended Functional Cox Model: A UK Biobank Case Study (arxiv:stat). Develops scalable extended functional Cox models and inferential tools applied to UK Biobank accelerometer and mortality data. Equips R statisticians for survival analysis with high-dimensional functional predictors at biobank scale
Add a comment: