In this project we try out the most common Nearest neighbors technique: K-NN.

Data Info

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 11 columns):
XVPM            1000 non-null float64
GWYH            1000 non-null float64
TRAT            1000 non-null float64
TLLZ            1000 non-null float64
IGGA            1000 non-null float64
HYKR            1000 non-null float64
EDFS            1000 non-null float64
GUUB            1000 non-null float64
MGJM            1000 non-null float64
JHZC            1000 non-null float64
TARGET CLASS    1000 non-null int64
dtypes: float64(10), int64(1)
memory usage: 86.0 KB

Data Exploration

Since this data is artificial, we'll just do a large pairplot with seaborn.

Seaborn on the dataframe to create a pairplot with the hue indicated by the TARGET CLASS column.

<seaborn.axisgrid.PairGrid at 0x7f35caa4ac50>

**Converted the scaled features to a dataframe **

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	XVPM	GWYH	TRAT	TLLZ	IGGA	HYKR	EDFS	GUUB	MGJM	JHZC
0	1.568522	-0.443435	1.619808	-0.958255	-1.128481	0.138336	0.980493	-0.932794	1.008313	-1.069627
1	-0.112376	-1.056574	1.741918	-1.504220	0.640009	1.081552	-1.182663	-0.461864	0.258321	-1.041546
2	0.660647	-0.436981	0.775793	0.213394	-0.053171	2.030872	-1.240707	1.149298	2.184784	0.342811
3	0.011533	0.191324	-1.433473	-0.100053	-1.507223	-1.753632	-1.183561	-0.888557	0.162310	-0.002793
4	-0.099059	0.820815	-0.904346	1.609015	-0.282065	-0.365099	-1.095644	0.391419	-1.365603	0.787762

# Predictions and Evaluations

** Confusion matrix and classification report.**

[[109  43]
 [ 41 107]]



             precision    recall  f1-score   support

          0       0.73      0.72      0.72       152
          1       0.71      0.72      0.72       148

avg / total       0.72      0.72      0.72       300

Choosing a better K Value

Text(0,0.5,'Error rate')

Retrained with new K Value

[[124  28]
 [ 24 124]]


             precision    recall  f1-score   support

          0       0.84      0.82      0.83       152
          1       0.82      0.84      0.83       148

avg / total       0.83      0.83      0.83       300

Search Site

K – Nearest Neighbors

Data Exploration

Choosing a better K Value

Retrained with new K Value

Leave a Reply Cancel reply

Data Exploration

Choosing a better K Value

Retrained with new K Value

You may also like

Gaussian Mixture Models

Hypothesis Testing III – Bayesian Methods

Hypothesis Testing II – Gaussian Mixtures

Leave a Reply Cancel reply