1 Intro and Overview

This document has a set of exercises for training your R programming skills using the tidyverse packages to process and analyses example datasets.

You will need:

  • to have been introduced to R and tidyverse
  • R and RStudio installed
  • the tidyverse package installed

Datasets:

  • Exercises will use built-in datasets
  • built-in datasets are already loaded in R and ready to use
  • you should read help pages of the datasets you analyze
  • The titanic dataset is not built-in but it will be accessible by an URL

Solution to exercises can be revealed by clicking on the [Code] buttons displayed at the right-hand side of the exercises.

2 Preparation

Load the tidyverse package.

library(tidyverse)

3 Datasets

3.1 Built-in dataset: trees

This data set provides measurements of the diameter, height and volume of timber in 31 felled black cherry trees. Note that the diameter (in inches) is erroneously labelled Girth in the data. It is measured at 4 ft 6 in above the ground.

  • Show the head of table trees
trees %>% head()
  • Create trees2 variable by copying trees and
    • Renaming column Girth to Diameter
    • Converting Diameter and Height to centimeters (1 inch = 2,54 cm)
    • Converting Volume in cubic meters (1 cibic foot = 0,0283168 cubic meter)
trees2 <- trees %>% 
  rename(Diameter=Girth) %>% 
  mutate(Diameter=Diameter*2.54, Height=Height*2.54) %>% 
  mutate(Volume=Volume*0.0283168)
  • Show the head of table trees2
trees2 %>% head()
  • Calculate the mean value of each column
trees2 %>% 
  summarise(
    mean.diameter=mean(Diameter),
    mean.height=mean(Height),
    mean.vol=mean(Volume)
    )
  • Save in variable trees2.plot a scatter plot of the diameter vs height
    • color points by Volume
    • add a title to the plot using ggtitle()
trees2.plot <- trees2 %>% 
  ggplot(aes(x=Diameter, y=Height, color=Volume)) +
  geom_point() + 
  ggtitle("Scatter Plot")
  • save the plot in a PNG image file on your computer
    • use ggsave(trees2.plot, filename = ‘your_file.png’, …) with appropriate parameters for ggsave
    • read the help of the function to create a 10x10cm plot named “trees2.plot.png”
ggsave(trees2.plot, filename = "scatterplot.png", width = 10, height = 10, units = "cm")

3.2 Built-in dataset: PlantGrowth

Results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions.

  • Show a summary of the table using summary(TABLE) (not a tidyverse’s function)
summary(PlantGrowth)
     weight       group   
 Min.   :3.590   ctrl:10  
 1st Qu.:4.550   trt1:10  
 Median :5.155   trt2:10  
 Mean   :5.073            
 3rd Qu.:5.530            
 Max.   :6.310            
  • Show a density plot of the weight values divided by group in a single plot
PlantGrowth %>% 
  ggplot(aes(x=weight, fill=group)) +
  geom_density()

  • Tuning the plots is sometimes as simple as using a special parameter to a ggplot layer
    • replot the same plot with the following setting in geom_density() to set the transparency: alpha=0.2
    • alpha can take values from 0 to 1, test alpha=0.5 and alpha=0.8
PlantGrowth %>% 
  ggplot(aes(x=weight, fill=group)) +
  geom_density(alpha=0.2)

PlantGrowth %>% 
  ggplot(aes(x=weight, fill=group)) +
  geom_density(alpha=0.5)