Working as a data scientist, you have to work with many different types of data storage and there are chances when you need to pull data from enterprise data warehouse into your analysis environment. Simply, we can write SQL queries which will retrieve data from database tables, and save the data locally into CSV file.

Alternatively, R Markdown supports database connection which can extract data directly from a remote database sever. R Markdown is a documentation tool which enables reproducibility of data analysis, and help data scientists turn their analysis and findings into high quality documents.

Connect to MySQL

Reading data from MySQL into R workspace, it requires two R libraries, RMySQL, DBI.

The connection data should not be embedded in analysis code. Separate the connection code in another script. The script should set up the connection and save it into the workspace.

The saved connection is accessible by its name in the analysis code.

library(RMySQL)
library(DBI)
mydb <- dbConnect(RMySQL::MySQL(),dbname='db',username='u',password='p', host='host',port=9999)
knitr::opts_chunk$set(connection = "mydb")

Read Data From Table

After a connection has been established, we can read data from database table into the workspace.

artist <- as_tibble(DBI::dbReadTable(mydb,"artist"))
artist %>% slice(1:5)
## # A tibble: 5 x 9
##   ConstituentID DisplayName ArtistBio Nationality Gender BeginDate EndDate
##           <dbl> <chr>       <chr>     <chr>       <chr>      <dbl>   <dbl>
## 1             1 Robert Arn~ American~ American    Male        1930    1992
## 2             2 Doroteo Ar~ Spanish,~ Spanish     Male        1936       0
## 3             3 Bill Arnold American~ American    Male        1941       0
## 4             4 Charles Ar~ American~ American    Male        1946       0
## 5             5 Per Arnoldi Danish, ~ Danish      Male        1941       0
## # ... with 2 more variables: Wiki.QID <chr>, ULAN <dbl>

Once you have the table in a tibble, you can easily tidy the data and explore it by using the fantastic tidyverse package.

frequency <- artist %>%
  count(Nationality) %>%
  arrange(desc(n))

frequency$Nationality <- 
  factor(frequency$Nationality,level=frequency$Nationality[order(frequency$n)])

frequency %>% 
  slice(1:10) %>%
  ggplot(aes(x=Nationality,y=n))+
    geom_bar(stat="identity", fill="#87cefa") + 
    coord_flip() + xlab("") + ylab("") + theme_light()

Write Data to Table

DBI::dbWriteTable can write a data frame into a database table.

DBI::dbWriteTable(mydb, "iris", iris)

Disconnect from Database

DBI::dbDisconnect(mydb)
## [1] TRUE