Element-wise product:

Distributive and associative:



For matrix:

Dot product for 2 vectors is communicative:


Matrix inverse

Singular matrix

A matrix without an inverse. i.e. the determinant is 0. One or more of its rows (columns) is a linear combination of some other rows (columns).

Linear equation

In practice, we rarely solve by finding . We lost precision in computing . In machine learning, can be sparse but is dense which requires too much memory to perform the computation.

Symmetry matrix

Orthogonal matrix

An orthogonal matrix is a square matrix whose rows (columns) are mutually orthonormal. i.e. no dot products of 2 row vectors (column vectors) are 0.

For an orthogonal matrix, there is one important property:

Orthogonal matrices are particular interesting because the inverse is easy to find.

Also orthogonal matrices does not amplify errors which is very desirable:

So if we multiple the input with orthogonal matrices, the errors present in will not be amplified by the multiplication. We can decompose matrices into orthogonal matrices (like SVD) in solving linear algebra problems. Also, symmetry matrix can be decomposed into orthogonal matrices: .

Quadric form

Quadric form equation contrains terms of and .

In matrix form:

With 3 variables:

Eigen vector & eigen value

which is a scalar and is a vector.

Find the eigenvalues and eigenvectors for

  • A matrix is singular iff any eigenvalues are 0.
  • To optimize quadratic form equations, given
    • If x is the eigenvector of , equals to the corresponding eigenvalues
    • The max (min) of is the max (min) of the eigenvalues

Finding the eigenvalues

Consider possible factor for 16: 1, 2, 4, 8, 16

when ,


The other eigenvalues are -2, -2.

Finding the eigenvectors

Doing row reduction to solve the linear equation


Perform row subtraction/multiplication:

After many more reductions:

So for , the eigenvector is:


For a matrix with one eigenvector per column and a vector . The eigen decomposition of A is

The eigendecomposition of is

If is real and symmetry,

which is an orthogonal matrix composed of eigenvectors of . The eigenvalue is associated with the eigenvector in column . This is important because we often deal with symmetrical matrices.

Eigendecomposition requires to be a square matrix. Not every squared matrix can have eigendecomposition.

Poor Conditioning

Poorly conditioned matrices amplify pre-existing errors. Consider the ratio:

which is the largest and smallest eigenvalue of .

If it is high, its inversion will multiple the errors in .

Jacobian matrix

Hessian matrix

Positive definite/negative definite

If all eigenvalues of are:

  • positive: the matrix is positive definite
  • positive or zero: positive semi-definite
  • negative: the matrix is negative definite

Properties of positive definite:

  • if positive semi-definite and
  • for positive definite

Positive definite or negative definite helps us to solve optimization problem. Quadratic forms on positive definite matrices are always positive for non-zero and are convex. It guarantees the existences of global minima. It allows us to use Hessian matrix to optimize multivariate functions. Similar arguments hold true for negative definite.

Taylor series in 2nd order

which is the Hessian matrix.

If is negative or 0, decreases as increases. However, we cannot drop too far as the accuracy of the Taylor series drops as increases. If is positive, the optimal step will be

Newton method

The critical point of 2nd order taylor equation is

Singular value decomposition (SVD)

SVD factorizes a matrix into singular vectors and singular values. Every real matrix has a SVD but not true for eigendecomposition. (E.g. Eigendecomposition requires a square matrix.)

  • A is a m×n matrix
  • Left-singular vector: U is m×m orthogonal matrix (the eigenvectors of )
  • Singular values: D is m×n diagonal matrix (square roots of the eigenvalues of and )
  • Reft-singular vector: V is n×n orthogonal matrix (the eigenvectors of )

SVD is a powerful but expensive matrix factorization method. In numerical linear algebra, many problems can be solved to represent in this form.

Solving linear equation with SVD

which is

is from SVD and is the reciprocal of the non-zeroelements. Then take the transpose.


L0-norm (0 if x is 0, otherwise it is 1)

L1-norm (Manhattan distance)

L2-norm (Euclidian distance)



Frobenius norm

It measures the size of a matrix.

Other properties:


The determinant is the product of all eigenvalues. If the absolute value is greater than 1, it expand the space. If it is between 0 and 1, it shrinks the space.


Trace is the sum of all diagonal elements

We can rewrite some operations using Trace to get rid of the summation:


Using a matrix for transformation:

PCA constrains the columns of to be orthogonal to each other with magnitude 1.

PCA minimize the

To optimize it:

So, the optimize encode and decode scheme is

Let’s assume the constraint , and since is scalar, we can just take a transpose to and we assume is 1-D.

By induction, we can expand to higher dimension.

Optimal is given by the eigenvector of corresponding to the largest eigenvalue.


The vector is added to each row of . The implicit replication of elements for an operation with a higher dimensional tensor is called broadcasting.


This section is in-complete



which are scalars.


The Matrix Cookbook by Kaare Petersen, Michael Pedersen. Linear Algebra by Georgi Shilov.