Let's take a break to learn Linear Discriminant Analysis.
Ref: http://sebastianraschka.com/Articles/2014_python_lda.html
LDA computes the directions that will represent the axes that maximize the separation between multiple classes, and PCA finds the directions that maximize the variance.
Reading in the Iris dataset¶
In [2]:
import pandas as pd
feature_dict = {i:label for i,label in zip(
range(4),
('sepal length in cm',
'sepal width in cm',
'petal length in cm',
'petal width in cm', ))}
df = pd.io.parsers.read_csv(
filepath_or_buffer='https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data',
header=None,
sep=',',
)
df.columns = [l for i,l in sorted(feature_dict.items())] + ['class label']
df.dropna(how="all", inplace=True) # to drop the empty line at file-end
df.tail()
Out[2]: