Working as a data scientist, you have to work with many different types of data storage and there are chances when you need to pull data from enterprise data warehouse into your analysis environment. Simply, we can write SQL queries which will retrieve data from database tables, and save the data locally into CSV file.
Alternatively, R Markdown supports database connection which can extract data directly from a remote database sever. R Markdown is a documentation tool which enables reproducibility of data analysis, and help data scientists turn their analysis and findings into high quality documents.
Connect to MySQL
Reading data from MySQL into R workspace, it requires two R libraries,
The connection data should not be embedded in analysis code. Separate the connection code in another script. The script should set up the connection and save it into the workspace.
The saved connection is accessible by its name in the analysis code.
library(RMySQL) library(DBI) mydb <- dbConnect(RMySQL::MySQL(),dbname='db',username='u',password='p', host='host',port=9999) knitr::opts_chunk$set(connection = "mydb")
Read Data From Table
After a connection has been established, we can read data from database table into the workspace.
artist <- as_tibble(DBI::dbReadTable(mydb,"artist")) artist %>% slice(1:5)
## # A tibble: 5 x 9 ## ConstituentID DisplayName ArtistBio Nationality Gender BeginDate EndDate ## <dbl> <chr> <chr> <chr> <chr> <dbl> <dbl> ## 1 1 Robert Arn~ American~ American Male 1930 1992 ## 2 2 Doroteo Ar~ Spanish,~ Spanish Male 1936 0 ## 3 3 Bill Arnold American~ American Male 1941 0 ## 4 4 Charles Ar~ American~ American Male 1946 0 ## 5 5 Per Arnoldi Danish, ~ Danish Male 1941 0 ## # ... with 2 more variables: Wiki.QID <chr>, ULAN <dbl>
Once you have the table in a tibble, you can easily tidy the data and explore it by using the fantastic
frequency <- artist %>% count(Nationality) %>% arrange(desc(n)) frequency$Nationality <- factor(frequency$Nationality,level=frequency$Nationality[order(frequency$n)]) frequency %>% slice(1:10) %>% ggplot(aes(x=Nationality,y=n))+ geom_bar(stat="identity", fill="#87cefa") + coord_flip() + xlab("") + ylab("") + theme_light()
Write Data to Table
DBI::dbWriteTable can write a data frame into a database table.
DBI::dbWriteTable(mydb, "iris", iris)
Disconnect from Database
##  TRUE