# “Machine learning - Linear algebra.”

### Operation

#### Element-wise product:

#### Distributive and associative:

#### Transpose

#### Commutative:

For matrix:

Dot product for 2 vectors is communicative:

Proof:

#### Matrix inverse

#### Singular matrix

A matrix without an inverse. i.e. the determinant is 0. One or more of its rows (columns) is a linear combination of some other rows (columns).

#### Linear equation

In practice, we rarely solve by finding . We lost precision in computing . In machine learning, can be sparse but is dense which requires too much memory to perform the computation.

#### Symmetry matrix

#### Orthogonal matrix

An orthogonal matrix is a *square* matrix whose rows (columns) are mutually orthonormal. i.e. no dot products of 2 row vectors (column vectors) are 0.

For an orthogonal matrix, there is one important property:

Orthogonal matrices are particular interesting because the inverse is easy to find.

Also orthogonal matrices does not amplify errors which is very desirable:

So if we multiple the input with orthogonal matrices, the errors present in will not be amplified by the multiplication. We can decompose matrices into orthogonal matrices (like SVD) in solving linear algebra problems. Also, symmetry matrix can be decomposed into orthogonal matrices: .

#### Quadric form

Quadric form equation contrains terms of and .

In matrix form:

With 3 variables:

### Eigen vector & eigen value

which is a scalar and is a vector.

Find the eigenvalues and eigenvectors for

- A matrix is singular iff any eigenvalues are 0.
- To optimize quadratic form equations, given
- If x is the eigenvector of , equals to the corresponding eigenvalues
- The max (min) of is the max (min) of the eigenvalues

#### Finding the eigenvalues

Consider possible factor for 16: 1, 2, 4, 8, 16

when ,

So

The other eigenvalues are -2, -2.

#### Finding the eigenvectors

Doing row reduction to solve the linear equation

Perform

Perform row subtraction/multiplication:

After many more reductions:

So for , the eigenvector is:

#### Eigendecomposition

For a matrix with one eigenvector per column and a vector . The eigen decomposition of A is

The eigendecomposition of is

If is real and **symmetry**,

which is an orthogonal matrix composed of eigenvectors of . The eigenvalue is associated with the eigenvector in column . This is important because we often deal with symmetrical matrices.

Eigendecomposition requires to be a square matrix. Not every squared matrix can have eigendecomposition.

### Poor Conditioning

Poorly conditioned matrices amplify pre-existing errors. Consider the ratio:

which is the largest and smallest eigenvalue of .

If it is high, its inversion will multiple the errors in .

### Jacobian matrix

### Hessian matrix

### Positive definite/negative definite

If all eigenvalues of are:

- positive: the matrix is positive definite
- positive or zero: positive semi-deﬁnite
- negative: the matrix is negative definite

Properties of positive definite:

- if positive semi-deﬁnite and
- for positive definite

Positive definite or negative definite helps us to solve optimization problem. Quadratic forms on positive definite matrices are always positive for non-zero and are convex. It guarantees the existences of global minima. It allows us to use Hessian matrix to optimize multivariate functions. Similar arguments hold true for negative definite.

### Taylor series in 2nd order

which is the Hessian matrix.

If is negative or 0, decreases as increases. However, we cannot drop too far as the accuracy of the Taylor series drops as increases. If is positive, the optimal step will be

### Newton method

The critical point of 2nd order taylor equation is

### Singular value decomposition (SVD)

SVD factorizes a matrix into singular vectors and singular values. Every real matrix has a SVD but not true for eigendecomposition. (E.g. Eigendecomposition requires a square matrix.)

- A is a m×n matrix
- Left-singular vector: U is m×m orthogonal matrix (the eigenvectors of )
- Singular values: D is m×n diagonal matrix (square roots of the eigenvalues of and )
- Reft-singular vector: V is n×n orthogonal matrix (the eigenvectors of )

SVD is a powerful but expensive matrix factorization method. In numerical linear algebra, many problems can be solved to represent in this form.

### Solving linear equation with SVD

which is

is from SVD and is the reciprocal of the non-zeroelements. Then take the transpose.

### Norms

L0-norm (0 if x is 0, otherwise it is 1)

L1-norm (Manhattan distance)

L2-norm (Euclidian distance)

Lp-norm

-norm

Frobenius norm

It measures the size of a matrix.

Other properties:

### Determinant

The determinant is the product of all eigenvalues. If the absolute value is greater than 1, it expand the space. If it is between 0 and 1, it shrinks the space.

### Trace

Trace is the sum of all diagonal elements

We can rewrite some operations using Trace to get rid of the summation:

### PCA

Using a matrix for transformation:

PCA constrains the columns of to be orthogonal to each other with magnitude 1.

PCA minimize the

To optimize it:

So, the optimize encode and decode scheme is

Let’s assume the constraint , and since is scalar, we can just take a transpose to and we assume is 1-D.

By induction, we can expand to higher dimension.

Optimal is given by the eigenvector of corresponding to the largest eigenvalue.

### Broadcasting

The vector is added to each row of . The implicit replication of elements for an operation with a higher dimensional tensor is called broadcasting.

### Karush–Kuhn–Tucker

This section is in-complete

Optimize

Lagrangian:

which are scalars.

### Reference

The Matrix Cookbook by Kaare Petersen, Michael Pedersen. Linear Algebra by Georgi Shilov.