{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 机器学习100天——第3天:多元线性回归(Multiple Linear Regression)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 第1步:数据预处理" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**导入库**" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**导入数据集**" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "dataset = pd.read_csv('../datasets/50_Startups.csv')\n", "X = dataset.iloc[ : , :-1].values\n", "Y = dataset.iloc[ : , 4 ].values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**将类别数据数字化**" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from sklearn.preprocessing import LabelEncoder, OneHotEncoder\n", "labelencoder = LabelEncoder()\n", "X[: , 3] = labelencoder.fit_transform(X[ : , 3])\n", "onehotencoder = OneHotEncoder(categorical_features = [3])\n", "X = onehotencoder.fit_transform(X).toarray()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**躲避虚拟变量陷阱**" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "X = X[: , 1:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**拆分数据集为训练集和测试集**" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "source": [ "from sklearn.model_selection import train_test_split\n", "X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 第2步:在训练集上训练多元线性回归模型" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.linear_model import LinearRegression\n", "regressor = LinearRegression()\n", "regressor.fit(X_train, Y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 第3步:在测试集上预测结果" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "y_pred = regressor.predict(X_test)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 103015.20159796 132582.27760815 132447.73845175 71976.09851258\n", " 178537.48221056 116161.24230166 67851.69209676 98791.73374687\n", " 113969.43533013 167921.06569551]\n" ] } ], "source": [ "print(y_pred)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "完整的项目请前往Github项目100-Days-Of-ML-Code查看。有任何的建议或者意见欢迎在issue中提出~" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 2 }