# Whitening transformation

A whitening transformation or sphering transformation is a linear transformation that transforms a vector of random variables with a known covariance matrix into a set of new variables whose covariance is the identity matrix, meaning that they are uncorrelated and each have variance 1.[1] The transformation is called "whitening" because it changes the input vector into a white noise vector.

Several other transformations are closely related to whitening:

1. the decorrelation transform removes only the correlations but leaves variances intact,
2. the standardization transform sets variances to 1 but leaves correlations intact,
3. a coloring transformation transforms a vector of white random variables into a random vector with a specified covariance matrix.[2]

## Definition

Suppose ${\displaystyle X}$ is a random (column) vector with non-singular covariance matrix ${\displaystyle M}$ and mean ${\displaystyle 0}$. Then the transformation ${\displaystyle Y=WX}$ with a whitening matrix ${\displaystyle W}$ satisfying the condition ${\displaystyle W^{\mathrm {T} }W=M^{-1}}$ yields the whitened random vector ${\displaystyle Y}$ with unit diagonal covariance.

There are infinitely many possible whitening matrices ${\displaystyle W}$ that all satisfy the above condition. Commonly used choices are ${\displaystyle W=M^{-1/2}}$ (Mahalanobis or ZCA whitening), the Cholesky decomposition of ${\displaystyle M^{-1}}$ (Cholesky whitening), or the eigen-system of ${\displaystyle M}$ (PCA whitening).[3]

Optimal whitening transforms can be singled out by investigating the cross-covariance and cross-correlation of ${\displaystyle X}$ and ${\displaystyle Y}$.[4] For example, the unique optimal whitening transformation achieving maximal component-wise correlation between original ${\displaystyle X}$ and whitened ${\displaystyle Y}$ is produced by the whitening matrix ${\displaystyle W=P^{-1/2}V^{-1/2}}$ where ${\displaystyle P}$ is the correlation matrix and ${\displaystyle V}$ the variance matrix.

## Whitening a data matrix

Whitening a data matrix follows the same transformation as for random variables. An empirical whitening transform is obtained by estimating the covariance (e.g. by maximum likelihood) and subsequently constructing a corresponding estimated whitening matrix (e.g. by Cholesky decomposition).

## R Implementation

An implementation of several whitening procedures in R, including ZCA-whitening and PCA whitening but also CCA whitening, is available in the "whitening" R package [5] published on CRAN.

## Example of ZCA whitening in Python

An example of Python implementation of ZCA whitening [6]

import numpy as np

def zca_whitening_matrix(X):
"""
Function to compute ZCA whitening matrix (aka Mahalanobis whitening).
INPUT:  X: [M x N] matrix.
Rows: Variables
Columns: Observations
OUTPUT: ZCAMatrix: [M x M] matrix
"""
# Covariance matrix [column-wise variables]: Sigma = (X-mu)' * (X-mu) / N
sigma = np.cov(X, rowvar=True) # [M x M]
# Singular Value Decomposition. X = U * np.diag(S) * V
U,S,V = np.linalg.svd(sigma)
# U: [M x M] eigenvectors of sigma.
# S: [M x 1] eigenvalues of sigma.
# V: [M x M] transpose of U
# Whitening constant: prevents division by zero
epsilon = 1e-5
# ZCA Whitening matrix: U * Lambda * U'
ZCAMatrix = np.dot(U, np.dot(np.diag(1.0/np.sqrt(S + epsilon)), U.T)) # [M x M]
return ZCAMatrix