Miscellaneous (linear algebra)
Singular value decomposition (SVD)
where can be an arbitrary matrix (let’s say n x m). Then is a n x n matrix, is a n x m matrix, and is a m x m matrix.
The column vectors in are mutually orthonormal, and the row vectors in are mutually orthonormal.
If we pick only top- big singular value, then we have .
Here is a SVD example by using Eigen library.
#include <iostream>
#include <Eigen/Eigen>
using namespace Eigen;
int main()
{
typedef Matrix<double, 2, 4> Matrix2x4;
Matrix2x4 A;
A<<1,1,0,2,
-1,-1,0,-2;
// A = U * S * V^T
JacobiSVD<Matrix2x4> svd( A, ComputeThinU | ComputeThinV);
auto U = svd.matrixU(); // left matrix 2x2
auto V = svd.matrixV(); // right matrix 4x4
auto S = svd.singularValues(); // column vector, singular values
MatrixXd S_mat = MatrixXd::Zero(U.cols(), V.cols());
for(int i=0;i<std::min(S_mat.rows(), S_mat.cols()); ++i)
S_mat(i,i) = S(i);
std::cout<<"U:\n"<<U<<"\n";
std::cout<<"sigma:\n"<< S_mat <<"\n";
std::cout<<"V:\n"<<V<<"\n";
std::cout<<"original matrix:\n"<< U * S_mat * V.transpose() <<"\n";
return 0;
}
Principle component analysis (PCA)
where usually is a covariance matrix which is a symmetrical matrix. is composed by a series of eigen vectors which are mutually orthonormal. is corresponding eigen value.
Covariance matrix
Variance measure the concentration property within a series data
Covariance measure the correlation property between two series data
Covariance matrix measure the variance and covariance in n series/dimension data. For example, we have a sample of data as below
Height | Weight | Grade | |
---|---|---|---|
Boy1 | 170cm | 50kg | 80 |
Boy2 | … | … | … |
Boy3 | … | … | … |
We can use a matrix to represent this data. Then the covariance matrix should be .
is the variance in height, weight, grade dimension, respectively.
is the covariance between height and weight dimensions. Other element in can be interpreted in the same way.
The solutions of linear equations
Suppose we have linear equations:
where is rows and cols.
We need to analyze the rank of matrix and its augmented matrix to figure out whether there is solution or not.
First, we can decompose to
It means the linear equations can be seen as linear combination of column vectors of matrix .
* Non-homogeneous linear equations
It is intuitive to see that if , is linearly independent with . It is impossible to find a vector combination to represent . In other words, there is no solution. means the rank of matrix A.
When , we can know is linearly dependent with . So if , has infinite solutions; if , the equations has only one solution.
* Homogeneous linear equations
Homogeneous case is similar with non-homogeneous. Since , if (full ranked), there is only zero solution. If is not full ranked, we can find a linear combination of colunm vectors to represent the linear dependent ones. Therefore, there are infinite solutions.
Positive and Semi-positive Definite Matrix
There are a lot properties of positive and semi-positive definite matrix.
Check here for them.
Matrix decomposition
* LU decomposition
When matrix is invertible, then we can decompose . We usually have two different composition:
- is lower unit triangular matrix (diagonal elements are 1) and is a upper triangular matrix;
- is lower triangular matrix and is an upper unit triangular matrix
* Cholesky (LLT) decomposition
When matrix is a hermitian positive-definite matrix, then we can decompose , where is lower triangular matrix The main application of matrix decomposition is to solve Ax=b linear system. Decomposition is significant faster than directly calculating inverse matrix.
* Cholesky variant (LDLT) decomposition
It is the same with cholesky, but we don’t want to calculate the square root, which slow down the calculation and lose accuracy. So we can decompose instead, where is lower unit triangular matrix and is diagonal matrix
(LLT and LDLT has been introduced in HERE)
Solving linear least squares systems
There are three typical ways to solve linear least square system (SVD, QR factorization, Cholesky decomposition).
Check here for Eigen library code.
* An example
I’d like to use a simple example to explain the Cholesky approach.
Suppose we have 12 sample points on the 3D plane of equation (with some noise). We can write these samples into a matrix.
We want to calculate the coefficient , such that
If we calculate the partial derivation w.r.t. , and let the partial derivations equal to 0. We will have below linear equation.
The solution of above equation is equivalent to the optimal estimation a, b. (it is easy to derive and prove).
To solve , we can apply Cholesky decomposition, but if is ill-conditioned Check here for more details about ill-condition (e.g. is not invertible), it will lead to problems.
Pseudoinverse matrix
Pseudoinverse matrix has two types:
- Left pseudoinverse matrix.
- Right pseudo inverse matrix.
For more details, check HERE.
Derivatives of Scalar, Vector, Matrix
Many times, we need to calculate the derivatives of (scalar, vector, matrix) w.r.t. (scalar, vector, matrix). I’d give a comprehensive summary about those calculations. You can see here for an intuitive introducation.
Case 1: The derivative of scalar w.r.t. scalar
This is the simplest case that you can apply the standard derivative rules.
Case 2: The derivative of scalar w.r.t. vector and matrix
The results are related with the transpose of corresponding coeffcients. Check here for calculation rules.
For example, where is a scalar, are vectors. Then .
where is a matrix. In this case, you can consider the trace of matrix as below
Note that the second order derivative of scalar w.r.t. vector will be the standard Hessian matrix. (The first order derivative will give a vector result, the second order is to calculate the derivative of vector w.r.t. vector which gives a Jacobian matrix result, but people call it as Hessian) See wiki Jacobian and Hessian.
Case 3: The derivative of vector w.r.t. vector
From this case, the above coefficient transpose cannot be applied. You need to explicitly expand vector to scalar representation, and then calculate derivatives.
For example, where are vectors and is a matrix. Then . is also called Jacobian.
Case 4: The derivative of vector w.r.t. matrix
This case is little bit complicated. The result will be a hyper-matrix. Let’s take as an example, it looks like
The matrix element is a matrix too (the same dimension with A).
Let’s look at a simple example in below picture.
![](/Notes/assets/linear_algebra/marix_derivative.jpg)
From this example, we can have another conclusion:
If a function which map input to output. Then the derivative of output w.r.t. input should also be able to map (based on the Taylor expansion).
Case 5: The derivative of matrix w.r.t. matrix
This case is similar with case 4, but the dimension has increased. Again, let’s look at a example
![](/Notes/assets/linear_algebra/marix_derivative2.jpg)
In one word, when we face case 4 or 5, the derivative calculation becomes complicated and there isn’t a simple representation.