Distributive and associative:
Dot product for 2 vectors is communicative:
A matrix without an inverse. i.e. the determinant is 0. One or more of its rows (columns) is a linear combination of some other rows (columns).
In practice, we rarely solve by finding . We lost precision in computing . In machine learning, can be sparse but is dense which requires too much memory to perform the computation.
An orthogonal matrix is a square matrix whose rows (columns) are mutually orthonormal. i.e. no dot products of 2 row vectors (column vectors) are 0.
For an orthogonal matrix, there is one important property:
Orthogonal matrices are particular interesting because the inverse is easy to find.
Also orthogonal matrices does not amplify errors which is very desirable:
So if we multiple the input with orthogonal matrices, the errors present in will not be amplified by the multiplication. We can decompose matrices into orthogonal matrices (like SVD) in solving linear algebra problems. Also, symmetry matrix can be decomposed into orthogonal matrices: .
Quadric form equation contrains terms of and .
In matrix form:
With 3 variables:
Eigen vector & eigen value
which is a scalar and is a vector.
Find the eigenvalues and eigenvectors for
- A matrix is singular iff any eigenvalues are 0.
- To optimize quadratic form equations, given
- If x is the eigenvector of , equals to the corresponding eigenvalues
- The max (min) of is the max (min) of the eigenvalues
Finding the eigenvalues
Consider possible factor for 16: 1, 2, 4, 8, 16
The other eigenvalues are -2, -2.
Finding the eigenvectors
Doing row reduction to solve the linear equation
Perform row subtraction/multiplication:
After many more reductions:
So for , the eigenvector is:
For a matrix with one eigenvector per column and a vector . The eigen decomposition of A is
The eigendecomposition of is
If is real and symmetry,
which is an orthogonal matrix composed of eigenvectors of . The eigenvalue is associated with the eigenvector in column . This is important because we often deal with symmetrical matrices.
Eigendecomposition requires to be a square matrix. Not every squared matrix can have eigendecomposition.
Poorly conditioned matrices amplify pre-existing errors. Consider the ratio:
which is the largest and smallest eigenvalue of .
If it is high, its inversion will multiple the errors in .
Positive definite/negative definite
If all eigenvalues of are:
- positive: the matrix is positive definite
- positive or zero: positive semi-deﬁnite
- negative: the matrix is negative definite
Properties of positive definite:
- if positive semi-deﬁnite and
- for positive definite
Positive definite or negative definite helps us to solve optimization problem. Quadratic forms on positive definite matrices are always positive for non-zero and are convex. It guarantees the existences of global minima. It allows us to use Hessian matrix to optimize multivariate functions. Similar arguments hold true for negative definite.
Taylor series in 2nd order
which is the Hessian matrix.
If is negative or 0, decreases as increases. However, we cannot drop too far as the accuracy of the Taylor series drops as increases. If is positive, the optimal step will be
The critical point of 2nd order taylor equation is
Singular value decomposition (SVD)
SVD factorizes a matrix into singular vectors and singular values. Every real matrix has a SVD but not true for eigendecomposition. (E.g. Eigendecomposition requires a square matrix.)
- A is a m×n matrix
- Left-singular vector: U is m×m orthogonal matrix (the eigenvectors of )
- Singular values: D is m×n diagonal matrix (square roots of the eigenvalues of and )
- Reft-singular vector: V is n×n orthogonal matrix (the eigenvectors of )
SVD is a powerful but expensive matrix factorization method. In numerical linear algebra, many problems can be solved to represent in this form.
Solving linear equation with SVD
is from SVD and is the reciprocal of the non-zeroelements. Then take the transpose.
L0-norm (0 if x is 0, otherwise it is 1)
L1-norm (Manhattan distance)
L2-norm (Euclidian distance)
It measures the size of a matrix.
The determinant is the product of all eigenvalues. If the absolute value is greater than 1, it expand the space. If it is between 0 and 1, it shrinks the space.
Trace is the sum of all diagonal elements
We can rewrite some operations using Trace to get rid of the summation:
Using a matrix for transformation:
PCA constrains the columns of to be orthogonal to each other with magnitude 1.
PCA minimize the
To optimize it:
So, the optimize encode and decode scheme is
Let’s assume the constraint , and since is scalar, we can just take a transpose to and we assume is 1-D.
By induction, we can expand to higher dimension.
Optimal is given by the eigenvector of corresponding to the largest eigenvalue.
The vector is added to each row of . The implicit replication of elements for an operation with a higher dimensional tensor is called broadcasting.
This section is in-complete
which are scalars.
The Matrix Cookbook by Kaare Petersen, Michael Pedersen. Linear Algebra by Georgi Shilov.