1 Intro and Overview

1.1 Pre-requisites

1.1.1 R and RStudio

Make sure that R and RStudio are installed in your computer. You can create a new R Script in RStudio to copy example source code from this tutorial and write code for the exercises. Save the script in your home directory (or a sub directory of your choice). To run the code written on the current line (or of selected lines), click on button Run or press CTRL+ENTER. Results will be displayed either in the RStudio’s console or the Viewer panel (graphics). To read the documentation about a function (Help panel), in the script, place the cursor on the function’s name and press F1.

1.1.2 The tidyverse

This tutorial is based on a powerful set of packages for data manipulation and visualization known as the tidyverse. We will also need the readxl package to enable reading data from Microsoft Excel files. So, make sure that those 2 packages (tidyverse and readxl) have been installed in your R environment (see RStudio’s menu Tools/Install Packages…).

1.2 Introduction

Because of the ever-growing number of publicly available and large datasets in life sciences, computational and statistical skills are becoming a key part of the life scientist’s curriculum.

The goal of this tutorial is to give you a foundation in the most important tools for data analysis in R. You can then complement and expand your knowledge on your own in order to perform more complex analyses.

This tutorial is an adaptation of Dr. Alanis-Lobato’s tutorial that borrows materials and concepts from the book R for Data Science by Garret Grolemund and Hadley Wickham.

1.3 Overview

In most data analysis projects you will have to do as shown below. The tidyverse set of packages include functions to tackle each one of the parts of the data analysis process.

1.3.1 Import

The first step in data science is to import your data into R in order to manipulate it with R functions. This means that you will take data stored in a file, database or website and load it into a variable in R. These datasets can be the output of a genomics project, the result of a survey, measurements done in the lab, etc. In this tutorial, you will use functions from the readr and readxl packages to load data tables into R.