In this project we try out the most common Nearest neighbors technique: K-NN.
Data Info
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 11 columns):
XVPM 1000 non-null float64
GWYH 1000 non-null float64
TRAT 1000 non-null float64
TLLZ 1000 non-null float64
IGGA 1000 non-null float64
HYKR 1000 non-null float64
EDFS 1000 non-null float64
GUUB 1000 non-null float64
MGJM 1000 non-null float64
JHZC 1000 non-null float64
TARGET CLASS 1000 non-null int64
dtypes: float64(10), int64(1)
memory usage: 86.0 KB
Data Exploration
Since this data is artificial, we'll just do a large pairplot with seaborn.
Seaborn on the dataframe to create a pairplot with the hue indicated by the TARGET CLASS column.
<seaborn.axisgrid.PairGrid at 0x7f35caa4ac50>
**Converted the scaled features to a dataframe **
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
# Predictions and Evaluations
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
XVPM | GWYH | TRAT | TLLZ | IGGA | HYKR | EDFS | GUUB | MGJM | JHZC | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1.568522 | -0.443435 | 1.619808 | -0.958255 | -1.128481 | 0.138336 | 0.980493 | -0.932794 | 1.008313 | -1.069627 |
1 | -0.112376 | -1.056574 | 1.741918 | -1.504220 | 0.640009 | 1.081552 | -1.182663 | -0.461864 | 0.258321 | -1.041546 |
2 | 0.660647 | -0.436981 | 0.775793 | 0.213394 | -0.053171 | 2.030872 | -1.240707 | 1.149298 | 2.184784 | 0.342811 |
3 | 0.011533 | 0.191324 | -1.433473 | -0.100053 | -1.507223 | -1.753632 | -1.183561 | -0.888557 | 0.162310 | -0.002793 |
4 | -0.099059 | 0.820815 | -0.904346 | 1.609015 | -0.282065 | -0.365099 | -1.095644 | 0.391419 | -1.365603 | 0.787762 |
** Confusion matrix and classification report.**
[[109 43]
[ 41 107]]
precision recall f1-score support
0 0.73 0.72 0.72 152
1 0.71 0.72 0.72 148
avg / total 0.72 0.72 0.72 300
Choosing a better K Value
Text(0,0.5,'Error rate')
Retrained with new K Value
[[124 28]
[ 24 124]]
precision recall f1-score support
0 0.84 0.82 0.83 152
1 0.82 0.84 0.83 148
avg / total 0.83 0.83 0.83 300