### Operation

#### Commutative:

For matrix:

Dot product for 2 vectors is communicative:

Proof:

#### Singular matrix

A matrix without an inverse. i.e. the determinant is 0. One or more of its rows (columns) is a linear combination of some other rows (columns).

#### Linear equation

In practice, we rarely solve $x$ by finding $A^{-1}$. We lost precision in computing $A^{-1}$. In machine learning, $A$ can be sparse but $A^{-1}$ is dense which requires too much memory to perform the computation.

#### Orthogonal matrix

An orthogonal matrix is a square matrix whose rows (columns) are mutually orthonormal. i.e. no dot products of 2 row vectors (column vectors) are 0.

For an orthogonal matrix, there is one important property:

Orthogonal matrices are particular interesting because the inverse is easy to find.

Also orthogonal matrices $Q$ does not amplify errors which is very desirable:

So if we multiple the input with orthogonal matrices, the errors present in $x$ will not be amplified by the multiplication. We can decompose matrices into orthogonal matrices (like SVD) in solving linear algebra problems. Also, symmetry matrix can be decomposed into orthogonal matrices: $A=Q \Lambda Q^T$.

Quadric form equation contrains terms of $x^2, y^2$ and $xy$.

In matrix form:

With 3 variables:

### Eigen vector & eigen value

which $\lambda$ is a scalar and $v$ is a vector.

Find the eigenvalues and eigenvectors for

• A matrix is singular iff any eigenvalues are 0.
• To optimize quadratic form equations, $f(x) = x^TAx$ given $\| x\| = 1$
• If x is the eigenvector of $A$, $f(x)$ equals to the corresponding eigenvalues
• The max (min) of $f(x)$ is the max (min) of the eigenvalues

#### Finding the eigenvalues

Consider possible factor for 16: 1, 2, 4, 8, 16

when $λ=4$, $λ^3 - 12 λ - 16 = 0$

So $λ^3 - 12 λ - 16 = (λ − 4)(λ^2 + 4λ + 4) = 0$

The other eigenvalues are -2, -2.

#### Finding the eigenvectors

Doing row reduction to solve the linear equation $(A - \lambda I) v = 0$

Perform $R_1 = - \frac{1}{3} R_1$

Perform row subtraction/multiplication:

After many more reductions:

So for $\lambda=4$, the eigenvector is:

#### Eigendecomposition

For a matrix $V$ with one eigenvector per column $V= [v^{(1)}, . . . ,v^{(n)}]$ and a vector $\lambda= [\lambda_1, . . . ,\lambda_n]^T$. The eigen decomposition of A is

The eigendecomposition of $A$ is

If $A$ is real and symmetry,

which $Q$ is an orthogonal matrix composed of eigenvectors of $A$. The eigenvalue $\Lambda_{ii}$ is associated with the eigenvector in column $Q_{:,i}$. This is important because we often deal with symmetrical matrices.

Eigendecomposition requires $A$ to be a square matrix. Not every squared matrix can have eigendecomposition.

### Poor Conditioning

Poorly conditioned matrices amplify pre-existing errors. Consider the ratio:

which $\lambda_i, \lambda_j$ is the largest and smallest eigenvalue of $A$.

If it is high, its inversion will multiple the errors in $x$.

### Positive definite/negative definite

If all eigenvalues of $A$ are:

• positive: the matrix is positive definite
• positive or zero: positive semi-deﬁnite
• negative: the matrix is negative definite

Properties of positive definite:

• $x^TAx \geq 0$ if positive semi-deﬁnite and
• $x^TAx = 0 \implies x = 0$ for positive definite

Positive definite or negative definite helps us to solve optimization problem. Quadratic forms on positive definite matrices $x^TAx$ are always positive for non-zero $x$ and are convex. It guarantees the existences of global minima. It allows us to use Hessian matrix to optimize multivariate functions. Similar arguments hold true for negative definite.

### Taylor series in 2nd order

which $H$ is the Hessian matrix.

If $g^THg$ is negative or 0, $f(x)$ decreases as $\epsilon$ increases. However, we cannot drop $\epsilon$ too far as the accuracy of the Taylor series drops as $\epsilon$ increases. If $g^THg$ is positive, the optimal step will be

### Newton method

The critical point of 2nd order taylor equation is

### Singular value decomposition (SVD)

SVD factorizes a matrix into singular vectors and singular values. Every real matrix has a SVD but not true for eigendecomposition. (E.g. Eigendecomposition requires a square matrix.)

• A is a m×n matrix
• Left-singular vector: U is m×m orthogonal matrix (the eigenvectors of $A A^T$)
• Singular values: D is m×n diagonal matrix (square roots of the eigenvalues of $A A^T$ and $A^T A$ )
• Reft-singular vector: V is n×n orthogonal matrix (the eigenvectors of $A^T A$)

SVD is a powerful but expensive matrix factorization method. In numerical linear algebra, many problems can be solved to represent $A$ in this form.

### Solving linear equation with SVD

which $A^{+}$ is

$V, U$ is from SVD and $D^{+}$ is the reciprocal of the non-zeroelements. Then take the transpose.

### Norms

L0-norm (0 if x is 0, otherwise it is 1)

L1-norm (Manhattan distance)

L2-norm (Euclidian distance)

Lp-norm

$\text{L}_\infty$-norm

Frobenius norm

It measures the size of a matrix.

Other properties:

### Determinant

The determinant is the product of all eigenvalues. If the absolute value is greater than 1, it expand the space. If it is between 0 and 1, it shrinks the space.

### Trace

Trace is the sum of all diagonal elements

We can rewrite some operations using Trace to get rid of the summation:

### PCA

Using a matrix for transformation: $g(c) = Dc$

PCA constrains the columns of $D$ to be orthogonal to each other with magnitude 1.

PCA minimize the

To optimize it:

So, the optimize encode and decode scheme is

Let’s assume the constraint $D D^T=I$, and since $DD^Tx^i_j$ is scalar, we can just take a transpose to $XDD^T$ and we assume $c$ is 1-D.

By induction, we can expand $c$ to higher dimension.

Optimal $D$ is given by the eigenvector of $X^TX$ corresponding to the largest eigenvalue.

The vector $b$ is added to each row of $C$. The implicit replication of elements for an operation with a higher dimensional tensor is called broadcasting.

### Karush–Kuhn–Tucker

This section is in-complete

Optimize

Lagrangian:

which ${u_i, v_j}$ are scalars.

### Reference

The Matrix Cookbook by Kaare Petersen, Michael Pedersen. Linear Algebra by Georgi Shilov.