diff --git a/Code/Day 1_Data Preprocessing.md b/Code/Day 1_Data Preprocessing.md
new file mode 100644
index 0000000..3508943
--- /dev/null
+++ b/Code/Day 1_Data Preprocessing.md
@@ -0,0 +1,52 @@
+# 数据预处理
+
+
+
+
+
+如图所示,通过6步完成数据预处理。
+此例用到的[数据](https://github.com/Avik-Jain/100-Days-Of-ML-Code/blob/master/datasets/Data.csv)。
+
+## 第1步:导入库
+```Python
+import numpy as np
+import pandas as pd
+```
+## 第2步:导入数据集
+```python
+dataset = pd.read_csv('Data.csv')
+X = dataset.iloc[ : , :-1].values
+Y = dataset.iloc[ : , 3].values
+```
+## 第3步:处理丢失数据
+```python
+from sklearn.preprocessing import Imputer
+imputer = Imputer(missing_values = "NaN", strategy = "mean", axis = 0)
+imputer = imputer.fit(X[ : , 1:3])
+X[ : , 1:3] = imputer.transform(X[ : , 1:3])
+```
+## 第4步:解析分类数据
+```python
+from sklearn.preprocessing import LabelEncoder, OneHotEncoder
+labelencoder_X = LabelEncoder()
+X[ : , 0] = labelencoder_X.fit_transform(X[ : , 0])
+```
+### 创建虚拟变量
+```python
+onehotencoder = OneHotEncoder(categorical_features = [0])
+X = onehotencoder.fit_transform(X).toarray()
+labelencoder_Y = LabelEncoder()
+Y = labelencoder_Y.fit_transform(Y)
+```
+## 第5步:拆分数据集为训练集合和测试集合
+```python
+from sklearn.cross_validation import train_test_split
+X_train, X_test, Y_train, Y_test = train_test_split( X , Y , test_size = 0.2, random_state = 0)
+```
+## 第6步:特征量化
+```python
+from sklearn.preprocessing import StandardScaler
+sc_X = StandardScaler()
+X_train = sc_X.fit_transform(X_train)
+X_test = sc_X.fit_transform(X_test)
+```
\ No newline at end of file
diff --git a/README.md b/README.md
index a4dedb3..23a16f8 100644
--- a/README.md
+++ b/README.md
@@ -2,7 +2,7 @@
英文原版请移步[Avik-Jain](https://github.com/Avik-Jain/100-Days-Of-ML-Code)。
-## 数据预处理 | 第1天
+## 数据预处理 | [第1天]()