Delete Day 6_Logistic_Regression.ipynb

This commit is contained in:
郑雨薇
2018-09-07 12:14:28 +08:00
committed by GitHub
parent 9e99ad670c
commit d2e76ee9d1

View File

@ -1,202 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 机器学习100天——第6天逻辑回归Linear Regression"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 第1步数据预处理\n",
"### 导入库"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as numpy\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 导入数据集\n",
"<a href = \"https://github.com/Avik-Jain/100-Days-Of-ML-Code/blob/master/datasets/Social_Network_Ads.csv\">这里</a>获取数据集"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"dataset = pd.read_csv('../datasets/Social_Network_Ads.csv')\n",
"X = dataset.iloc[:, [2, 3]].values\n",
"Y = dataset.iloc[:,4].values"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 将数据集分成训练集和测试集"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.cross_validation import train_test_split\n",
"X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.25, random_state = 0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 特征缩放"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/ymao/usr/miniconda/lib/python3.6/site-packages/sklearn/utils/validation.py:475: DataConversionWarning: Data with input dtype int64 was converted to float64 by StandardScaler.\n",
" warnings.warn(msg, DataConversionWarning)\n"
]
}
],
"source": [
"from sklearn.preprocessing import StandardScaler\n",
"sc = StandardScaler()\n",
"X_train = sc.fit_transform(X_train)\n",
"X_test = sc.transform(X_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 第二步:逻辑回归模型\n",
"该项工作的库将会是一个线性模型库,之所以被称为线性是因为逻辑回归是一个线性分类器,这意味着我们在二维空间中,我们两类用户(购买和不购买)将被一条直线分割。然后导入逻辑回归类。下一步我们将创建该类的对象,它将作为我们训练集的分类器。\n",
"### 将逻辑回归应用于训练集"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
" intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,\n",
" penalty='l2', random_state=None, solver='liblinear', tol=0.0001,\n",
" verbose=0, warm_start=False)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.linear_model import LogisticRegression\n",
"classifier = LogisticRegression()\n",
"classifier.fit(X_train, y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 第3步预测\n",
"### 预测测试集结果"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"y_pred = classifier.predict(X_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 第4步评估预测\n",
"我们预测了测试集。 现在我们将评估逻辑回归模型是否正确的学习和理解。因此这个混淆矩阵将包含我们模型的正确和错误的预测。\n",
"### 生成混淆矩阵"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn.metrics import confusion_matrix\n",
"cm = confusion_matrix(y_test, y_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 可视化\n",
"![](https://github.com/MachineLearning100/100-Days-Of-ML-Code/blob/master/Other%20Docs/LR_training.png?raw=true)\n",
"![](https://github.com/MachineLearning100/100-Days-Of-ML-Code/blob/master/Other%20Docs/LR_test.png?raw=true) "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}