Files
100-Days-Of-ML-Code/Code/Day 3_Multiple_Linear_Regression.ipynb
2018-09-06 22:26:22 +08:00

208 lines
4.1 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 机器学习100天——第3天多元线性回归Multiple Linear Regression"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 第1步数据预处理"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**导入库**"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**导入数据集**"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"dataset = pd.read_csv('../datasets/50_Startups.csv')\n",
"X = dataset.iloc[ : , :-1].values\n",
"Y = dataset.iloc[ : , 4 ].values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**将类别数据数字化**"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn.preprocessing import LabelEncoder, OneHotEncoder\n",
"labelencoder = LabelEncoder()\n",
"X[: , 3] = labelencoder.fit_transform(X[ : , 3])\n",
"onehotencoder = OneHotEncoder(categorical_features = [3])\n",
"X = onehotencoder.fit_transform(X).toarray()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**躲避虚拟变量陷阱**"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"X = X[: , 1:]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**拆分数据集为训练集和测试集**"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"source": [
"from sklearn.model_selection import train_test_split\n",
"X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 第2步在训练集上训练多元线性回归模型"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.linear_model import LinearRegression\n",
"regressor = LinearRegression()\n",
"regressor.fit(X_train, Y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 第3步在测试集上预测结果"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"y_pred = regressor.predict(X_test)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 103015.20159796 132582.27760815 132447.73845175 71976.09851258\n",
" 178537.48221056 116161.24230166 67851.69209676 98791.73374687\n",
" 113969.43533013 167921.06569551]\n"
]
}
],
"source": [
"print(y_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<b>完整的项目请前往Github项目<a href=\"https://github.com/MachineLearning100/100-Days-Of-ML-Code\">100-Days-Of-ML-Code</a>查看。有任何的建议或者意见欢迎在issue中提出~</b>"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}