Pie Thought On

LDA and PCA

Let's take a break to learn Linear Discriminant Analysis.

Ref: http://sebastianraschka.com/Articles/2014_python_lda.html

LDA computes the directions that will represent the axes that maximize the separation between multiple classes, and PCA finds the directions that maximize the variance.

Reading in the Iris dataset

In [2]:
import pandas as pd

feature_dict = {i:label for i,label in zip(
                range(4),
                  ('sepal length in cm',
                  'sepal width in cm',
                  'petal length in cm',
                  'petal width in cm', ))}

df = pd.io.parsers.read_csv(
    filepath_or_buffer='https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data',
    header=None,
    sep=',',
    )
df.columns = [l for i,l in sorted(feature_dict.items())] + ['class label']
df.dropna(how="all", inplace=True) # to drop the empty line at file-end

df.tail()
Out[2]:
sepal length in cm sepal width in cm petal length in cm petal width in cm class label
145 6.7 3.0 5.2 2.3 Iris-virginica
146 6.3 2.5 5.0 1.9 Iris-virginica
147 6.5 3.0 5.2 2.0 Iris-virginica
148 6.2 3.4 5.4 2.3 Iris-virginica
149 5.9 3.0 5.1 1.8 Iris-virginica

misc