1. home
  2. R-bloggers | R news and tutorials contributed by (750) R bloggers
  3. Software

R-bloggers | R news and tutorials contributed by (750) R bloggers - Software

8 | Follower

Bike accidents | R-bloggers

Ciclista atropellado – CC-BY by Nicanor Arenas Bermejo Day 20 & 21 of 30DayMapChallenge: « OpenStreetMap » and « Conflict » (previously). Mapping the accidents between bicycles and cars in 2023 in France. We have had a few sad accidents recently showing a growing attention on cyclist security and the conflicts on the road. We’ll use the Annual databases of road traffic injuries on an OSM background. Config library(dplyr) library(tidyr) library(readr) library(janitor) library(sf) library(glue) library(leaflet) Data The data guide is available (in french). # vehicules-2023.csv vehicles filter(catv_7 > 0 & catv_1 > 0) |> pull(num_acc) # bikers injuries bikers filter(num_acc %in% bike_car_acc, catv == 1) |> left_join(user, join_by(num_acc, id_vehicule)) |> left_join(severity, join_by(grav)) |> count(num_acc, severity) bikers_display mutate(outcome = glue("{severity} ({n})")) |> arrange(severity) |> summarise(.by = num_acc, outcome = glue_collapse(outcome, sep = "")) # accident locations bike_accidents filter(num_acc %in% bike_car_acc) |> st_as_sf(coords = c("long", "lat"), crs = "EPSG:4326") |> left_join(bikers_display, join_by(num_acc)) That’s 2858 accidents and 772 bikers killed. Map bike_accidents |> leaflet() |> addTiles(attribution = r"( r.iresmi.net. data: Ministère de l'intérieur 2023; map: OpenStreetMap)") |> addCircleMarkers(popup = ~ glue("{an}-{mois}-{jour} biker status: {outcome}"), clusterOptions = markerClusterOptions()) Figure 1: Bike accidents in France – 2023

How to Perform a Wald Test in R | wald.test function in R | R-bloggers

What is the Wald Test, a method used to test the significance of individual regression coefficients, and how can it transform your regression analysis in R? Have you ever wondered how to assess the significance of specific predictors in your regression model or decide if a variable should remain in your analysis? The Wald Test in R offers a straightforward and efficient solution to these questions. It evaluates whether coefficients in a model are significantly different from zero, helping you test hypotheses and refine your models with precision. Key points The Wald Test evaluates the significance of predictors in regression models by testing if coefficients differ from zero, aiding in model refinement and hypothesis testing. R provides efficient tools like the aod and sandwich packages, making it easy to perform the Wald Test, visualize results, and ensure robust statistical analysis. A real-world example using a medical dataset demonstrated how the Wald Test identifies significant predictors (e.g., treatment type) with clear code walkthroughs and interpretations. Compared to the Likelihood Ratio Test (LRT), the Wald Test is computationally efficient. However, alternatives like the Score Test or Bayesian approaches may be preferable in small datasets or complex scenarios. Table of Contents What is the Wald Test? Wald Test is a statistical method used to evaluate the significance of parameters in a regression model. Specifically, it tests whether specific coefficients are significantly different from zero, which helps determine the relevance of predictors in explaining the dependent variable. By quantifying this relationship, the test clarifies whether to retain or exclude variables from your model that can affect the degrees of freedom. For example, suppose you're analyzing car performance using the popular mtcars dataset in R. In that case, you can use the Wald Test to evaluate which factors (e.g., horsepower or weight) most significantly impact fuel efficiency. The Wald Test is widely used in economics, medicine, and social sciences, making it a versatile tool for data-driven research. Why Use the Wald Test in R? R is renowned for its powerful statistical capabilities, and it offers a variety of packages to streamline the process of performing a Wald test. Wald Test process. The AOD and sandwich packages allow for easy implementation and robust testing. Using the Wald Test in R is computationally efficient—it requires estimating a single model compared to other tests like the Likelihood Ratio Test (LRT). R's open-source ecosystem also provides enhanced visualization and reporting tools, making it ideal for students and researchers aiming to improve their regression analyses. You can uncover insights with precision and efficiency by leveraging the Wald Test in R. Understanding the Wald Test At its core, the Wald Test evaluates whether specific predictors in your model contribute significantly to the outcome. It's especially useful in regression analysis when you need to assess the null hypothesis (H₀), which states that a coefficient equals zero, against the alternative hypothesis (H₁) that it does not. For example: Null Hypothesis (H₀): Horsepower does not affect fuel efficiency. Alternative Hypothesis (H₁): Horsepower significantly impacts fuel efficiency. If the Wald Test reveals that the coefficient for horsepower is statistically significant, you reject the null hypothesis, suggesting that horsepower is a meaningful predictor in your model. It is invaluable for refining models by focusing on relevant variables. Statistical Basis The Wald Test relies on a test statistic derived from the estimated coefficient and its standard error. Under the null hypothesis, this statistic follows a chi-squared distribution, enabling the calculation of p-values to determine significance.Key assumptions:Homoscedasticity: Equal variance of residuals across predictors.Normality: Residuals follow a normal distribution.Failing these assumptions may compromise test reliability. In such cases, you can apply robust standard errors using the sandwich package to improve accuracy. By meeting these assumptions, the Wald Test becomes a robust and reliable tool for hypothesis testing in regression models.Read More »

Create and Interpret a Interactive Volcano Plot in R | What & How | R-bloggers

Need to learn how to create a volcano plot in R and visualize differential gene expression effectively?Creating a volcano plot in R is essential for any researcher working with bioinformatics and RNA-Seq data. It allows you to easily identify which genes are upregulated or downregulated with significant changes between conditions. Imagine visualizing hundreds of genes on a simple, elegant plot and instantly spot those that stand out due to their statistical significance. That's the power of a volcano plot. Key pointsA volcano plot is a type of scatter plot used in genomics to visualize significant changes in gene expression, usually between different conditions (e.g., treated vs. untreated). It helps researchers easily identify the most important genes to study further.To create a volcano plot, the log2 fold change is plotted on the x-axis, and the log10 p-value is plotted on the y-axis. Genes on the right are upregulated, while those on the left are downregulated. Genes farther from the center are more significant.Typical cut-offs for volcano plots are a p-value less than 0.05 and a log2 fold change greater than 1, but these values vary. Adjusted p-values are often preferred to reduce false positives in the analysis.Volcano plots can be created using tools like ggplot2, EnhancedVolcano in R, or Excel for simpler visualizations. EnhancedVolcano provides easy customization for publication-quality plots.Volcano plots are used to quickly identify key genes in sequencing studies like RNA-Seq. They are more informative than standard scatter plots as they show changes in size and significance. Additionally, they can be made as models for educational purposes using materials like clay or paper mache. Table of Contents Volcanoplot in R is essential for anyone working with bioinformatics and RNA-Seq data. It helps you quickly see which genes are upregulated (increased expression) or downregulated (decreased) between different conditions. Imagine looking at hundreds of genes on a simple plot and immediately noticing which ones have significant changes—that's the power of a volcano plot. Volcano Plots in R Volcano plots are widely used in bioinformatics fields to show differential gene expression. It will explain volcano plots, why they are essential in gene expression analysis, and how they help researchers see significant changes in their data. What is a Volcano Plot? A volcano plot is a type of scatter plot that shows statistical significance (usually the negative log10 of the p-value) against fold change (log2 fold change) of gene expression. It helps researchers quickly find differentially expressed genes that are either upregulated or downregulated. Why Use Volcano Plots? Volcano plots are very helpful for finding key genes in RNA-Seq or proteomics experiments. By plotting fold change and statistical significance, researchers can see which genes have important changes, making it easier to focus on the most interesting ones. Creating a volcano plot in R is a great way to see significant changes in gene expression, which helps find essential genes in bioinformatics research. Feature Volcano Plot Benefits Visualization Type Scatter plot showing changes in gene expression Key Metrics Displayed Log2 fold change vs. -log10 p-value Upregulated/Downregulated Genes Quickly identifies which genes are more or less active between conditions Quick Identification Enables researchers to spot significant genes at a glance Data Interpretation Makes it simple to understand large datasets of gene activity Read More »

Greenland ice thickness | R-bloggers

Meltwater in crevasses in Greenland – CC-BY-NC by NASA’s Marshall Space Flight Center Day 11 of 30DayMapChallenge: « Arctic » (previously). We’ll use the Greenland 5 km DEM, Ice Thickness, and Bedrock Elevation Grids (J. Bamber 2001) from J. L. Bamber, Layberry, and Gogineni (2001) and Layberry and Bamber (2001). Download here (after registration). Data The data needs some wrangling as the format is not straightforward: it’s a wrapped fixed width ASCII file (check the user guide). We need to make one row out of every 31 lines of the file, reverse the order of the lines and give the correct projection and extent. library(terra) library(readr) library(dplyr) library(tidyr) thick mutate(row = ceiling(row_number() / 31)) |> group_by(row) |> group_modify(~ as_tibble(as.vector(t(as.matrix(.x))))) |> ungroup() |> mutate(name = rep(paste0("x", 1:310), 561)) |> drop_na(value) |> pivot_wider(values_from = value, names_from = name) |> arrange(desc(row)) |> select(-row) |> as.matrix() |> rast(crs = "+proj=stere +lat_0=90 +lat_ts=71 +lon_0=-39 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs +type=crs") ext(thick) = c(-800000, 700000, -3400000, -600000) Map Here is the raw map in a polar stereographic projection: thick |> plot(main = "Greenland Ice thickness", col = map.pal("magma")) Figure 1: Greenland Ice thickness in a polar stereographic projection. data: Bamber J., 2021. NASA National Snow and Ice Data Center And on an interactive map after reprojection: library(leaflet) # native resolution is 5 km, so at 66° N it's about 0.1 degree # -75 - -15 = 65 ; 65 / 0.1 = 650 pixels wide thick_wgs84 project(rast(nrows = 250, ncols = 650, xmin = -75, xmax = -10, ymin = 60, ymax = 85, crs = "EPSG:4326")) |> subst(x = _, 0, NA) range_m addLegend(pal = pal_rev, title = "GreenlandIce thickness (m)", values = range_m, labFormat = labelFormat(transform = function(x) sort(x + 500, decreasing = TRUE))) Figure 2: Greenland Ice thickness References Bamber, J. L., R. L. Layberry, and S. P. Gogineni. 2001. “A New Ice Thickness and Bed Data Set for the Greenland Ice Sheet: 1. Measurement, Data Reduction, and Errors.” Journal of Geophysical Research: Atmospheres 106 (D24): 33773–80. https://doi.org/10.1029/2001JD900054. Bamber, Jonathan. 2001. “Greenland 5 Km DEM, Ice Thickness, and Bedrock Elevation Grids, Version 1.” NASA National Snow and Ice Data Center Distributed Active Archive Center. https://doi.org/10.5067/01A10Z9BM7KP. Layberry, R. L., and J. L. Bamber. 2001. “A New Ice Thickness and Bed Data Set for the Greenland Ice Sheet: 2. Relationship Between Dynamics and Basal Topography.” Journal of Geophysical Research: Atmospheres 106 (D24): 33781–88. https://doi.org/10.1029/2001JD900053.