5 Data Import and Export
5.1 Tidy Workflow
As an economist, working with empirical research and data is crucial for making data-driven decisions, and data science helps transform raw data into understanding, knowledge, and insights to support this process. A tidy workflow focuses on the tools needed to carry out this process effectively.
When working with data, the first step is to import it into our data science environment. Next, we tidy up the data to make it clean and usable. Then, as data scientists, our main task is to understand the data using three key tools: transformation, visualization, and modeling. Finally, we communicate our results to the right people to support decision-making.
The tidyverse
package is a collection of R packages that work together to make data analysis easier. When you load tidyverse
, it also loads several useful packages, including: dplyr
, readr
, forcats
, stringr
, ggplot2
, tibble
, lubridate
, tidyr
and purrr
library(tidyverse)
Let’s dive into each of these steps in the tidy workflow and explore the tools available in the R ecosystem.
5.2 Import data
When you work with data, you save your raw data in a separate file, like a CSV. You can easily load this external data into the data science environment using the readr
package.
5.2.1 Read data files into a tibble
Download the dataset: touristsl.csv
<- read_csv("touristsl.csv") data1
If the data file is in another folder within the current working directory, you can use the here
function from the here
package. With respect to the current working directory, the here
function helps you define the file path starting from there.
For example, if you have a file called “touristsl.csv” inside a folder called “data” in the current working directory, and you are currently in the working directory, you would need to open the data folder to access the CSV file. In the here
function, you define the path as here("data", "touristsl.csv")
. Each folder you want to open is listed, and they are separated by commas.
library(here)
<- read_csv(here("data", "touristsl.csv")) data
5.3 Export data
Similarly, to save a data file, we can use the write_csv function.
<- c(50,44,60)
weight <- c(150,160,163)
height <- tibble(weight, height)
ds write_csv(ds, "ds.csv")
If you want to save the data file in another folder within the current working directory, use the here
function to define the file path.
library(here)
write_csv(ds, here("data", "ds.csv"))