Files
100-Days-Of-ML-Code/Code/Day 2_Simple_Linear_Regression.ipynb
2018-11-28 10:45:48 +08:00

286 lines
24 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 机器学习100天——第二天简单线性回归\n",
"## 第一步:数据预处理"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"这里导入我们需要的库值得注意的是这里比第一天多了一个matplotlib.pyplot,matplotlib是python上的一个2D绘图库,\n",
"matplotlib下的模块pyplot是一个有命令样式的函数集合\n",
"matplotlib.pyplot是为我们对结果进行图像化作准备的。"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"导入相关数据"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Hours Scores\n",
"0 2.5 21\n",
"1 5.1 47\n",
"2 3.2 27\n",
"3 8.5 75\n",
"4 3.5 30\n",
"5 1.5 20\n",
"6 9.2 88\n",
"7 5.5 60\n",
"8 8.3 81\n",
"9 2.7 25\n",
"10 7.7 85\n",
"11 5.9 62\n",
"12 4.5 41\n",
"13 3.3 42\n",
"14 1.1 17\n",
"15 8.9 95\n",
"16 2.5 30\n",
"17 1.9 24\n",
"18 6.1 67\n",
"19 7.4 69\n",
"20 2.7 30\n",
"21 4.8 54\n",
"22 3.8 35\n",
"23 6.9 76\n",
"24 7.8 86\n"
]
}
],
"source": [
"dataset = pd.read_csv('../datasets/studentscores.csv')\n",
"print(dataset)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"这里我们需要使用pandas的iloc(区分于loc根据index来索引iloc利用行号来索引)方法来对数据进行处理,第一个参数为行号,:表示全部行,第二个参数 1表示截到第1列(也就是取第0列)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"X: [[2.5]\n",
" [5.1]\n",
" [3.2]\n",
" [8.5]\n",
" [3.5]\n",
" [1.5]\n",
" [9.2]\n",
" [5.5]\n",
" [8.3]\n",
" [2.7]\n",
" [7.7]\n",
" [5.9]\n",
" [4.5]\n",
" [3.3]\n",
" [1.1]\n",
" [8.9]\n",
" [2.5]\n",
" [1.9]\n",
" [6.1]\n",
" [7.4]\n",
" [2.7]\n",
" [4.8]\n",
" [3.8]\n",
" [6.9]\n",
" [7.8]]\n",
"Y: [21 47 27 75 30 20 88 60 81 25 85 62 41 42 17 95 30 24 67 69 30 54 35 76\n",
" 86]\n"
]
}
],
"source": [
"X = dataset.iloc[ : , : 1 ].values\n",
"Y = dataset.iloc[ : , 1 ].values\n",
"print(\"X:\",X)\n",
"print(\"Y:\",Y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"导入sklearn库的cross_validation类来对数据进行训练集、测试集划分"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.model_selection import train_test_split\n",
"#拆分数据0.25作为测试集\n",
"X_train, X_test, Y_train, Y_test = train_test_split( X, Y, test_size = 1/4, random_state = 0) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 训练线性回归"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.linear_model import LinearRegression\n",
"#使用训练集对模型进行训练\n",
"regressor = LinearRegression()\n",
"regressor = regressor.fit(X_train, Y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 预测结果"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"Y_pred = regressor.predict(X_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 可视化"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 训练集结果可视化"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#散点图\n",
"plt.scatter(X_train , Y_train, color = 'red')\n",
"#线图\n",
"plt.plot(X_train , regressor.predict(X_train), 'bo-')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 测试集结果可视化"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#散点图\n",
"plt.scatter(X_test , Y_test, color = 'red')\n",
"#线图\n",
"plt.plot(X_test ,Y_pred, 'bo-')\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}