From 2af79ce27dc04f2b73071a910809fbc94e1ddcfa Mon Sep 17 00:00:00 2001 From: ada Date: Tue, 7 Aug 2018 12:47:12 +0800 Subject: [PATCH 1/2] add md file for Day3 --- Code/Day3_Multiple_Linear_Regression.md | 54 +++++++++++++++++++++++++ 1 file changed, 54 insertions(+) create mode 100644 Code/Day3_Multiple_Linear_Regression.md diff --git a/Code/Day3_Multiple_Linear_Regression.md b/Code/Day3_Multiple_Linear_Regression.md new file mode 100644 index 0000000..792bb0d --- /dev/null +++ b/Code/Day3_Multiple_Linear_Regression.md @@ -0,0 +1,54 @@ +# 多元线性回归 + + +

+ +

+ + +## 第1步: 数据预处理 + +### 导入库 +```python +import pandas as pd +import numpy as np +``` +### 导入数据集 +```python +dataset = pd.read_csv('50_Startups.csv') +X = dataset.iloc[ : , :-1].values +Y = dataset.iloc[ : , 4 ].values +``` + +### 将类别数据数字化 +```python +from sklearn.preprocessing import LabelEncoder, OneHotEncoder +labelencoder = LabelEncoder() +X[: , 3] = labelencoder.fit_transform(X[ : , 3]) +onehotencoder = OneHotEncoder(categorical_features = [3]) +X = onehotencoder.fit_transform(X).toarray() +``` + +### 躲避虚拟变量陷阱 +```python +X = X[: , 1:] +``` + +### 拆分数据集为训练集和测试集 +```python +from sklearn.cross_validation import train_test_split +X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0) +``` +## 第2步: 在训练集上训练使用多元线性回归模型 +```python +from sklearn.linear_model import LinearRegression +regressor = LinearRegression() +regressor.fit(X_train, Y_train) +``` + +## Step 3: 在测试集上预测结果 +```python +y_pred = regressor.predict(X_test) +``` + + From f333040f6e0ad445dc09ebce66133202839fb659 Mon Sep 17 00:00:00 2001 From: ada Date: Tue, 7 Aug 2018 12:48:53 +0800 Subject: [PATCH 2/2] update md file --- Code/Day3_Multiple_Linear_Regression.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Code/Day3_Multiple_Linear_Regression.md b/Code/Day3_Multiple_Linear_Regression.md index 792bb0d..adfcbf5 100644 --- a/Code/Day3_Multiple_Linear_Regression.md +++ b/Code/Day3_Multiple_Linear_Regression.md @@ -39,7 +39,7 @@ X = X[: , 1:] from sklearn.cross_validation import train_test_split X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0) ``` -## 第2步: 在训练集上训练使用多元线性回归模型 +## 第2步: 在训练集上训练多元线性回归模型 ```python from sklearn.linear_model import LinearRegression regressor = LinearRegression()