News
Entertainment
Science & Technology
Life
Culture & Art
Hobbies
News
Entertainment
Science & Technology
Culture & Art
Hobbies
8 | Follower
In our research work, we usually fit models to experimental data. Our aim is to estimate some biologically relevant parameters, together with their standard errors. Very often, these parameters are interesting in themselves, as they represent means,...
You can read the original post in its original format on Rtask website by ThinkR here: Signature.py: Award-Winning Application at the 2024 Shiny Contest 🏆 We are excited to announce that {signature.py} is the grand winner of the 2024 Shiny Contest in the category ‘Best Shiny Application with Python’! This year, Posit relaunched the Shiny Contest, a competition dedicated to the development of Shiny applications. Participants are asked to create a personal or professional application that addresses a specific problem. The applications are then This post is better presented on its original ThinkR website here: Signature.py: Award-Winning Application at the 2024 Shiny Contest
In a recent post I have shown that we can build linear combinations of model parameters (see here ). For example, if we have two parameter estimates, say Q and W, with standard errors respectively equal to \(\sigma_Q\) and \(\sigma_W\), we can build...
Are you new to Linux and looking to learn the basics of text editing? Look no further than VI (or VIM), the ubiquitous text editor that comes pre-installed on nearly every Linux distribution. While it may seem intimidating at first with its uniq...
By Fenne RiemslaghWe had the pleasure of sitting down with Kirsten Bulsink, a data scientist at the Dutch National Institute for Public Health and the Environment (RIVM). Our discussion covered her journey from pandemic response to R-package developmen...
Ciclista atropellado – CC-BY by Nicanor Arenas Bermejo Day 20 & 21 of 30DayMapChallenge: « OpenStreetMap » and « Conflict » (previously). Mapping the accidents between bicycles and cars in 2023 in France. We have had a few sad accidents recently showing a growing attention on cyclist security and the conflicts on the road. We’ll use the Annual databases of road traffic injuries on an OSM background. Config library(dplyr) library(tidyr) library(readr) library(janitor) library(sf) library(glue) library(leaflet) Data The data guide is available (in french). # vehicules-2023.csv vehicles filter(catv_7 > 0 & catv_1 > 0) |> pull(num_acc) # bikers injuries bikers filter(num_acc %in% bike_car_acc, catv == 1) |> left_join(user, join_by(num_acc, id_vehicule)) |> left_join(severity, join_by(grav)) |> count(num_acc, severity) bikers_display mutate(outcome = glue("{severity} ({n})")) |> arrange(severity) |> summarise(.by = num_acc, outcome = glue_collapse(outcome, sep = "")) # accident locations bike_accidents filter(num_acc %in% bike_car_acc) |> st_as_sf(coords = c("long", "lat"), crs = "EPSG:4326") |> left_join(bikers_display, join_by(num_acc)) That’s 2858 accidents and 772 bikers killed. Map bike_accidents |> leaflet() |> addTiles(attribution = r"( r.iresmi.net. data: Ministère de l'intérieur 2023; map: OpenStreetMap)") |> addCircleMarkers(popup = ~ glue("{an}-{mois}-{jour} biker status: {outcome}"), clusterOptions = markerClusterOptions()) Figure 1: Bike accidents in France – 2023
Introduction As an R programmer, you often need to compare two columns within a data frame to identify similarities, differences, or perform various analyses. In this comprehensive guide, we’ll explore several methods to compare two columns in ...
What is the Wald Test, a method used to test the significance of individual regression coefficients, and how can it transform your regression analysis in R? Have you ever wondered how to assess the significance of specific predictors in your regression model or decide if a variable should remain in your analysis? The Wald Test in R offers a straightforward and efficient solution to these questions. It evaluates whether coefficients in a model are significantly different from zero, helping you test hypotheses and refine your models with precision. Key points The Wald Test evaluates the significance of predictors in regression models by testing if coefficients differ from zero, aiding in model refinement and hypothesis testing. R provides efficient tools like the aod and sandwich packages, making it easy to perform the Wald Test, visualize results, and ensure robust statistical analysis. A real-world example using a medical dataset demonstrated how the Wald Test identifies significant predictors (e.g., treatment type) with clear code walkthroughs and interpretations. Compared to the Likelihood Ratio Test (LRT), the Wald Test is computationally efficient. However, alternatives like the Score Test or Bayesian approaches may be preferable in small datasets or complex scenarios. Table of Contents What is the Wald Test? Wald Test is a statistical method used to evaluate the significance of parameters in a regression model. Specifically, it tests whether specific coefficients are significantly different from zero, which helps determine the relevance of predictors in explaining the dependent variable. By quantifying this relationship, the test clarifies whether to retain or exclude variables from your model that can affect the degrees of freedom. For example, suppose you're analyzing car performance using the popular mtcars dataset in R. In that case, you can use the Wald Test to evaluate which factors (e.g., horsepower or weight) most significantly impact fuel efficiency. The Wald Test is widely used in economics, medicine, and social sciences, making it a versatile tool for data-driven research. Why Use the Wald Test in R? R is renowned for its powerful statistical capabilities, and it offers a variety of packages to streamline the process of performing a Wald test. Wald Test process. The AOD and sandwich packages allow for easy implementation and robust testing. Using the Wald Test in R is computationally efficient—it requires estimating a single model compared to other tests like the Likelihood Ratio Test (LRT). R's open-source ecosystem also provides enhanced visualization and reporting tools, making it ideal for students and researchers aiming to improve their regression analyses. You can uncover insights with precision and efficiency by leveraging the Wald Test in R. Understanding the Wald Test At its core, the Wald Test evaluates whether specific predictors in your model contribute significantly to the outcome. It's especially useful in regression analysis when you need to assess the null hypothesis (H₀), which states that a coefficient equals zero, against the alternative hypothesis (H₁) that it does not. For example: Null Hypothesis (H₀): Horsepower does not affect fuel efficiency. Alternative Hypothesis (H₁): Horsepower significantly impacts fuel efficiency. If the Wald Test reveals that the coefficient for horsepower is statistically significant, you reject the null hypothesis, suggesting that horsepower is a meaningful predictor in your model. It is invaluable for refining models by focusing on relevant variables. Statistical Basis The Wald Test relies on a test statistic derived from the estimated coefficient and its standard error. Under the null hypothesis, this statistic follows a chi-squared distribution, enabling the calculation of p-values to determine significance.Key assumptions:Homoscedasticity: Equal variance of residuals across predictors.Normality: Residuals follow a normal distribution.Failing these assumptions may compromise test reliability. In such cases, you can apply robust standard errors using the sandwich package to improve accuracy. By meeting these assumptions, the Wald Test becomes a robust and reliable tool for hypothesis testing in regression models.Read More »
This is reposted from the original at https://blog.stephenturner.us/p/expand-your-bluesky-network-with-r.---I’m encouraging everyone I know online to join the scientific community on Bluesky.Bluesky for ScienceStephen Turner·Nov 16Read full ...
As a beginner C programmer, understanding conditional logic and small change operators is essential for writing efficient and dynamic code. In this in-depth guide, we’ll explore the power of the conditional operator (?:), increment (++), and dec...
Join our workshop on Crafting Custom and Reproducible PDF Reports with Quarto and Typst in R, which is a part of our workshops for Ukraine series! Here’s some more info: Title: Crafting Custom and Reproducible PDF Reports with Quarto and Typst in R Date: Thursday, December 19th, 18:00 – 20:00 CEST (Rome, Berlin, Paris timezone) … Continue reading Crafting Custom and Reproducible PDF Reports with Quarto and Typst in R workshopCrafting Custom and Reproducible PDF Reports with Quarto and Typst in R workshop was first posted on November 19, 2024 at 4:37 pm.
You can read the original post in its original format on Rtask website by ThinkR here: You’ve Been Waiting for Native Mobile Apps with R? The Wait Is Over. webR, and the next generation of app with R For the past couple of months, I’ve been sharing how webR will transform the way we build apps with R inside. If you’re unfamiliar, webR is a WebAssembly compilation of R. In simpler terms, it enables R to run within JavaScript environments. If you are familiar, you know it’s a bit This post is better presented on its original ThinkR website here: You’ve Been Waiting for Native Mobile Apps with R? The Wait Is Over.
Introduction Combining vectors is a fundamental operation in R programming. As an R programmer, you’ll often need to merge datasets, create new variables, or prepare data for further processing. This comprehensive guide will explore various met...
Introduction As a beginner R programmer, you may often need to compare two vectors to check for equality, find common elements, or identify differences. In this article, we’ll explore various methods to compare vectors in base R, including matc...
Turning learner personas into LLM agents Part of the Epiverse-TRACE initiative involves development of training materials that span early, middle and late stage outbreak analysis and modelling tasks. To ensure that our tutorials are accessible ...
So, some months ago, I spent a few hours over a few days puzzled by something that turned out to be straighforwardly written up in the Stata manual, but not easily findable anywhere else. So I want to write it up, if only to have somewhere for future m...
Methods for comparing spatial patterns in raster data This is the sixth part of a blog post series on comparing spatial patterns in raster data. More information about the whole series can be found in part one. The blog post series on...
Table of Contents Understanding Environment Variables The printenv Command Working with set Command The export Command Using alias Command Practical Applications Your Turn! (Interactive Section) Best Practices and Common Pitfalls Quick Takeawa...
Table of Contents Introduction Understanding the Basics Working with subset() Function Advanced Techniques Best Practices Your Turn FAQs References Introduction Data manipulation is a cornerstone of R programming, and selecting specific col...
Need to learn how to create a volcano plot in R and visualize differential gene expression effectively?Creating a volcano plot in R is essential for any researcher working with bioinformatics and RNA-Seq data. It allows you to easily identify which genes are upregulated or downregulated with significant changes between conditions. Imagine visualizing hundreds of genes on a simple, elegant plot and instantly spot those that stand out due to their statistical significance. That's the power of a volcano plot. Key pointsA volcano plot is a type of scatter plot used in genomics to visualize significant changes in gene expression, usually between different conditions (e.g., treated vs. untreated). It helps researchers easily identify the most important genes to study further.To create a volcano plot, the log2 fold change is plotted on the x-axis, and the log10 p-value is plotted on the y-axis. Genes on the right are upregulated, while those on the left are downregulated. Genes farther from the center are more significant.Typical cut-offs for volcano plots are a p-value less than 0.05 and a log2 fold change greater than 1, but these values vary. Adjusted p-values are often preferred to reduce false positives in the analysis.Volcano plots can be created using tools like ggplot2, EnhancedVolcano in R, or Excel for simpler visualizations. EnhancedVolcano provides easy customization for publication-quality plots.Volcano plots are used to quickly identify key genes in sequencing studies like RNA-Seq. They are more informative than standard scatter plots as they show changes in size and significance. Additionally, they can be made as models for educational purposes using materials like clay or paper mache. Table of Contents Volcanoplot in R is essential for anyone working with bioinformatics and RNA-Seq data. It helps you quickly see which genes are upregulated (increased expression) or downregulated (decreased) between different conditions. Imagine looking at hundreds of genes on a simple plot and immediately noticing which ones have significant changes—that's the power of a volcano plot. Volcano Plots in R Volcano plots are widely used in bioinformatics fields to show differential gene expression. It will explain volcano plots, why they are essential in gene expression analysis, and how they help researchers see significant changes in their data. What is a Volcano Plot? A volcano plot is a type of scatter plot that shows statistical significance (usually the negative log10 of the p-value) against fold change (log2 fold change) of gene expression. It helps researchers quickly find differentially expressed genes that are either upregulated or downregulated. Why Use Volcano Plots? Volcano plots are very helpful for finding key genes in RNA-Seq or proteomics experiments. By plotting fold change and statistical significance, researchers can see which genes have important changes, making it easier to focus on the most interesting ones. Creating a volcano plot in R is a great way to see significant changes in gene expression, which helps find essential genes in bioinformatics research. Feature Volcano Plot Benefits Visualization Type Scatter plot showing changes in gene expression Key Metrics Displayed Log2 fold change vs. -log10 p-value Upregulated/Downregulated Genes Quickly identifies which genes are more or less active between conditions Quick Identification Enables researchers to spot significant genes at a glance Data Interpretation Makes it simple to understand large datasets of gene activity Read More »
Introduction to Logical Operators Logical operators are fundamental building blocks in C programming that allow us to make decisions and control program flow based on multiple conditions. These operators work with Boolean values (true/false) an...
<div style = "width:60%; display: inline-block; float:left; "> </div><div style = "width: 40%; display: inline-block; float:right;"><img src=' https://media.springernature.com/full/springer-static/image/art%3A10.1007%2Fs44187-024-00136-1/MediaObjects/44187_2024_136_Fig1_HTML.png' width = "200" style = "padding: 10px;" /></div><div style="clear: both;"></div>
Introduction Data manipulation is a crucial skill in R programming, and subsetting data frames is one of the most common operations you’ll perform. This comprehensive guide will walk you through four powerful methods to subset data frames in R,...
Introduction Social network analysis examines individual entities and their relationships among them. The data is represented as a “graph” where individual entities are referred to as “nodes” and their relationships between them as “edges. A primary area of study in SNA is the analysis of interconnectivity of nodes, called ”communities” and identification of clusters through… Continue reading Smith-Pittman Algorithm: Enhancing Community Detection in Networks
Meltwater in crevasses in Greenland – CC-BY-NC by NASA’s Marshall Space Flight Center Day 11 of 30DayMapChallenge: « Arctic » (previously). We’ll use the Greenland 5 km DEM, Ice Thickness, and Bedrock Elevation Grids (J. Bamber 2001) from J. L. Bamber, Layberry, and Gogineni (2001) and Layberry and Bamber (2001). Download here (after registration). Data The data needs some wrangling as the format is not straightforward: it’s a wrapped fixed width ASCII file (check the user guide). We need to make one row out of every 31 lines of the file, reverse the order of the lines and give the correct projection and extent. library(terra) library(readr) library(dplyr) library(tidyr) thick mutate(row = ceiling(row_number() / 31)) |> group_by(row) |> group_modify(~ as_tibble(as.vector(t(as.matrix(.x))))) |> ungroup() |> mutate(name = rep(paste0("x", 1:310), 561)) |> drop_na(value) |> pivot_wider(values_from = value, names_from = name) |> arrange(desc(row)) |> select(-row) |> as.matrix() |> rast(crs = "+proj=stere +lat_0=90 +lat_ts=71 +lon_0=-39 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +type=crs") ext(thick) = c(-800000, 700000, -3400000, -600000) Map Here is the raw map in a polar stereographic projection: thick |> plot(main = "Greenland Ice thickness", col = map.pal("magma")) Figure 1: Greenland Ice thickness in a polar stereographic projection. data: Bamber J., 2021. NASA National Snow and Ice Data Center And on an interactive map after reprojection: library(leaflet) # native resolution is 5 km, so at 66° N it's about 0.1 degree # -75 - -15 = 65 ; 65 / 0.1 = 650 pixels wide thick_wgs84 project(rast(nrows = 250, ncols = 650, xmin = -75, xmax = -10, ymin = 60, ymax = 85, crs = "EPSG:4326")) |> subst(x = _, 0, NA) range_m addLegend(pal = pal_rev, title = "GreenlandIce thickness (m)", values = range_m, labFormat = labelFormat(transform = function(x) sort(x + 500, decreasing = TRUE))) Figure 2: Greenland Ice thickness References Bamber, J. L., R. L. Layberry, and S. P. Gogineni. 2001. “A New Ice Thickness and Bed Data Set for the Greenland Ice Sheet: 1. Measurement, Data Reduction, and Errors.” Journal of Geophysical Research: Atmospheres 106 (D24): 33773–80. https://doi.org/10.1029/2001JD900054. Bamber, Jonathan. 2001. “Greenland 5 Km DEM, Ice Thickness, and Bedrock Elevation Grids, Version 1.” NASA National Snow and Ice Data Center Distributed Active Archive Center. https://doi.org/10.5067/01A10Z9BM7KP. Layberry, R. L., and J. L. Bamber. 2001. “A New Ice Thickness and Bed Data Set for the Greenland Ice Sheet: 2. Relationship Between Dynamics and Basal Topography.” Journal of Geophysical Research: Atmospheres 106 (D24): 33781–88. https://doi.org/10.1029/2001JD900053.
This post describes a particular use-case for Python in Excel and how it was solved using the R package Reticulate 1.39.0 (https://cran.r-project.org/web/packages/reticulate/index.html) along with the ExcelRAddIn (https://github.com/Adam-Gladstone/Office365AddIns). A while back I read an interesting post on LinkedIn that identified a number of criteria that might be useful when selecting stocks for a portfolio..
The tilde operator (~) is a fundamental component of R programming, especially in statistical modeling and data analysis. This comprehensive guide will help you master its usage, from basic concepts to advanced applications. Introduction The ti...
It’s Hallowe’en; that time of year when LinkedIn is full of photos of the recruiters who ghosted you dressed up as ghosts. But ghosting isn’t the only bad thing about the modern-day job market. Another plague on job seekers is the endless rounds of int...
Introduction It can be fun to drive a problem all the way into the ground. I don’t always get to do that on paying projects, however sometimes I can do it with hobby projects. In this case I am going to re-solve Dudeney’s Remainder Problem again and again to argue […]