
1.2 Question
Can we predict boston house price using a multivariable regression model
1.3 Gather Data
Source: Original research paper #### For this project we’re going to work with a pre built
dataset that Sklearn python library provides us documention:sklearnload boston
1.3.1 Python Code for generating csv dataset
from sklearn import datasets
import pandas as pd
data = datasets.load_boston()
df = pd.DataFrame(data=data['data'], columns = data['feature_names'])
df['PRICE']=data['target']
df.to_csv('boston.csv', sep = ',', index = False)
1.4 Exploring the data
[33]: boston_dataset <- read.csv(file="boston.csv")
[34]: summary(boston_dataset)
str(boston_dataset)
cat("class : ",class(boston_dataset),"\n")
cat ("number of rows : ",nrow(boston_dataset),"\n")
cat ("number of features : " ,ncol(boston_dataset),"\n")
cat("dataset dimension : " ,dim(boston_dataset),"\n")
CRIM ZN INDUS CHAS
Min. : 0.00632 Min. : 0.00 Min. : 0.46 Min. :0.00000
1st Qu.: 0.08205 1st Qu.: 0.00 1st Qu.: 5.19 1st Qu.:0.00000
Median : 0.25651 Median : 0.00 Median : 9.69 Median :0.00000
Mean : 3.61352 Mean : 11.36 Mean :11.14 Mean :0.06917
3rd Qu.: 3.67708 3rd Qu.: 12.50 3rd Qu.:18.10 3rd Qu.:0.00000
Max. :88.97620 Max. :100.00 Max. :27.74 Max. :1.00000
NOX RM AGE DIS
Min. :0.3850 Min. :3.561 Min. : 2.90 Min. : 1.130
1st Qu.:0.4490 1st Qu.:5.886 1st Qu.: 45.02 1st Qu.: 2.100
Median :0.5380 Median :6.208 Median : 77.50 Median : 3.207
Mean :0.5547 Mean :6.285 Mean : 68.57 Mean : 3.795
3rd Qu.:0.6240 3rd Qu.:6.623 3rd Qu.: 94.08 3rd Qu.: 5.188
Max. :0.8710 Max. :8.780 Max. :100.00 Max. :12.127
RAD TAX PTRATIO B
Min. : 1.000 Min. :187.0 Min. :12.60 Min. : 0.32
1st Qu.: 4.000 1st Qu.:279.0 1st Qu.:17.40 1st Qu.:375.38
Median : 5.000 Median :330.0 Median :19.05 Median :391.44
Mean : 9.549 Mean :408.2 Mean :18.46 Mean :356.67
3rd Qu.:24.000 3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:396.23
2