ggplot on jDataLab
https://www.jdatalab.com/tags/ggplot/
Recent content in ggplot on jDataLabHugo -- gohugo.ioen-usWed, 08 Feb 2017 12:45:03 -0600Linear Regression Analysis
https://www.jdatalab.com/data_science_and_data_mining/2017/02/08/linear-regression.html
Wed, 08 Feb 2017 12:45:03 -0600https://www.jdatalab.com/data_science_and_data_mining/2017/02/08/linear-regression.htmlPredictive learning is a process where a model is trained from known predictors and the model is used to predict, for a given new observation, either a continuous value or a categorical label. This results in two types of data mining techniques, classification for a categorical label and regression for a continuous value.
Linear regression is not only the first type but also the simplest type of regression techniques. As indicated by the name, linear regression computes a linear model which is line of best fit for a set of data points.Data Binning and Plotting in R
https://www.jdatalab.com/data_science_and_data_mining/2017/01/30/data-binning-plot.html
Mon, 30 Jan 2017 01:45:03 -0600https://www.jdatalab.com/data_science_and_data_mining/2017/01/30/data-binning-plot.htmlUpdated on 9/28/2019
Data binning is a basic skill that a knowledge worker or data scientist must have. When we want to study patterns collectively rather than individually, individual values need to be categorized into a number of groups beforehand. We can group values by a range of values, by percentiles and by data clustering.
Grouping by a range of values is referred to as data binning or bucketing in data science, i.Handling Overplotting in Large Datasets
https://www.jdatalab.com/data_science_and_data_mining/2017/01/26/overplotting-r.html
Thu, 26 Jan 2017 01:45:03 -0600https://www.jdatalab.com/data_science_and_data_mining/2017/01/26/overplotting-r.htmlScatterplots can reveal relationships among variables in a data set and is a popular way of visualizing data before applying learning algorithms. When plotting more and more data points into a scatterplot, if too many points overlap each other, dark regions will appear on the plot, referred to as overplotting. Overplotting can obscure clusters and patterns.
A dataset of 10,000 rows is used here for showing overplotting. The first 10 rows are listed to display the data schema including three variables sales_tatal, num_of_orders and gender.