Python code examples of PCA v.s. SVD

Yang Zhang
3 min readJun 2, 2018

--

Some Python code and numerical examples illustrating the relationship between PCA and SVD (also Truncated SVD), specifically how PCA can be performed by SVD.

Mathematical explanations can be found in posts like these:

Imports:

import numpy as np
import pandas as pd
from numpy.linalg import svd
from sklearn.decomposition import PCA
from sklearn.decomposition import TruncatedSVD

Make some fake data to do PCA on:

X = np.random.random((5, 8))
X.round(2)

Out:

array([[0.9 , 0.06, 0.33, 0.6 , 0.21, 0.11, 0.92, 0.76],
[0.2 , 0.82, 0.26, 0.88, 0.92, 0.85, 0.34, 0.08],
[0.54, 0.74, 0.11, 0.03, 0.02, 0.55, 0.72, 0.87],
[0.68, 0.76, 0.71, 0.57, 0.45, 0.36, 0.31, 0.78],
[0.92, 0.81, 0.34, 0.47, 0.1 , 0.4 , 0.05, 0.7 ]])

Do PCA with 3 components:

k = 3

pca = PCA(n_components=k)
pca.fit(X)

Out:

PCA(copy=True, iterated_power='auto', n_components=3, random_state=None, svd_solver='auto', tol=0.0, whiten=False)

Column means of data:

X.mean(0).round(2)

Out:

array([0.65, 0.64, 0.35, 0.51, 0.34, 0.45, 0.47, 0.64])

Remove column means from data:

X_0mean = X - X.mean(0)
X_0mean.round(2)

Out:

array([[ 0.26, -0.57, -0.02,  0.09, -0.13, -0.34,  0.45,  0.12],
[-0.45, 0.18, -0.09, 0.37, 0.58, 0.39, -0.12, -0.56],
[-0.11, 0.1 , -0.24, -0.48, -0.32, 0.09, 0.25, 0.23],
[ 0.03, 0.12, 0.36, 0.06, 0.11, -0.09, -0.16, 0.14],
[ 0.27, 0.17, -0.01, -0.04, -0.24, -0.05, -0.42, 0.06]])

Do SVD:

U, s, Vh = svd(X_0mean, full_matrices=False)

Compare SVD and PCA

singular values

pca.singular_values_.round(2)

Out:

array([1.29, 0.84, 0.73])

Run:

s.round(2)

Out:

array([1.29, 0.84, 0.73, 0.38, 0.  ])

components

Run:

pca.components_.round(2)

Out:

array([[-0.37,  0.29,  0.02,  0.32,  0.52,  0.35, -0.28, -0.45],
[-0.05, -0.61, -0.02, 0.38, 0.31, -0.17, 0.56, -0.21],
[-0.44, 0.09, -0.5 , -0.44, -0.1 , 0.35, 0.48, -0.06]])

Run:

Vh[:k].round(2)

Out:

array([[ 0.37, -0.29, -0.02, -0.32, -0.52, -0.35,  0.28,  0.45],
[-0.05, -0.61, -0.02, 0.38, 0.31, -0.17, 0.56, -0.21],
[-0.44, 0.09, -0.5 , -0.44, -0.1 , 0.35, 0.48, -0.06]])

transformed data

Run:

pca.transform(X).round(2)

Out:

array([[-0.6 ,  0.62, -0.09],
[ 1.06, 0.21, 0.15],
[-0.4 , -0.25, 0.56],
[ 0.05, -0.13, -0.34],
[-0.11, -0.45, -0.27]])

Run:

(U[:, :k]*s[:k]).round(2)

Out:

array([[ 0.6 ,  0.62, -0.09],
[-1.06, 0.21, 0.15],
[ 0.4 , -0.25, 0.56],
[-0.05, -0.13, -0.34],
[ 0.11, -0.45, -0.27]])

Truncated SVD

Note how some signs are flipped between SVD and PCA. This can be resolved by using truncated SVD as explained here:

SVD suffers from a problem called “sign indeterminancy”, which means the sign of the components_ and the output from transform depend on the algorithm and random state. To work around this, fit instances of this class to data once, then keep the instance around to do transformations.

Run:

svd_tr = TruncatedSVD(n_components=k)
svd_tr.fit(X_0mean)

Out:

TruncatedSVD(algorithm='randomized', n_components=3, n_iter=5,
random_state=None, tol=0.0)

Compare Truncated SVD and PCA

singular values

Run:

pca.singular_values_.round(2)

Out:

array([1.29, 0.84, 0.73])

Run:

svd_tr.singular_values_.round(2)

Out:

array([1.29, 0.84, 0.73])

components

(Notice the same signs)

Run:

pca.components_.round(2)

Out:

array([[-0.37,  0.29,  0.02,  0.32,  0.52,  0.35, -0.28, -0.45],
[-0.05, -0.61, -0.02, 0.38, 0.31, -0.17, 0.56, -0.21],
[-0.44, 0.09, -0.5 , -0.44, -0.1 , 0.35, 0.48, -0.06]])

Run:

svd_tr.components_.round(2)

Out:

array([[-0.37,  0.29,  0.02,  0.32,  0.52,  0.35, -0.28, -0.45],
[-0.05, -0.61, -0.02, 0.38, 0.31, -0.17, 0.56, -0.21],
[-0.44, 0.09, -0.5 , -0.44, -0.1 , 0.35, 0.48, -0.06]])

transformed

(Notice the same signs)

Run:

pca.transform(X).round(2)

Out:

array([[-0.6 ,  0.62, -0.09],
[ 1.06, 0.21, 0.15],
[-0.4 , -0.25, 0.56],
[ 0.05, -0.13, -0.34],
[-0.11, -0.45, -0.27]])

Run:

svd_tr.transform(X_0mean).round(2)

Out:

array([[-0.6 ,  0.62, -0.09],
[ 1.06, 0.21, 0.15],
[-0.4 , -0.25, 0.56],
[ 0.05, -0.13, -0.34],
[-0.11, -0.45, -0.27]])

Code is here: https://github.com/yang-zhang/yang-zhang.github.io/blob/master/ds_math/pca_vs_svd.ipynb

--

--

Yang Zhang
Yang Zhang

Written by Yang Zhang

Data science and machine learning

Responses (1)