Data-Mining-施密特正交化

最小二乘法与投影

"最小二乘法的本质是希尔伯特空间的投影。"

Suppose we want to project some higher-dimensional vector to a lower-dimensional space(say plane).

The projection is \(\vec{p}\), original one is \(\vec{b}\), the vector that is vertical(or say orthogonal) to the space should be \[\vec{e} = \vec{b} - \vec{p}\]

for the vector \(\vec{p}\) that is within the plane(say n-dimensional space) \[\vec{p} = Ax \] where A is the matrix where each column \(a_i\) is a basis.

since \(e\perp a_i\) \[a_i^T(b-Ax) = 0\] \[A^T(b-Ax) = 0\] \[A^TAx = A^Tb\]

If \(A^TA\) is invertible \[x = (A^TA)^{-1}A^Tb\] \[p = Ax = A(A^TA)^{-1}A^Tb\]

The best scenario is A is composed of orthogonal basis. A is standard orthogonal matrix.


Question: when is \(A^TA\) invertible?

Answer: when the column vectors of \(A(m\times n)\) are linearly independent

Proof(Contradiction):

If column vectors are linearly independent and \(A^TA\) is invertible,

\(A^TAx=0\) has non-zero solution (because \(\det{A^TA} = 0\))

\[x^TA^TAx = 0\]

\[(Ax)^TAx = 0\]

\[\therefore Ax = 0\]

\(\because\) A's columns are linearlly independent \[\therefore x=0\] Contradicted, so when column vectors are linearly independent, \(A^TA\) is invertible


格拉姆-施密特正交化 Gram-Schmidt Orthogonalization

知乎回答:格拉姆-施密特正交化--QR分解法的来源

标准正交阵 \(Q\) 满足:

\[ Q^TQ = I\] \[ Q^T = Q^{-1}\]

投影矩阵 \(P\) 满足: \[ P = A(A^TA)^{-1}A^T \]

\(A\)为正交阵\(Q\)时, \[P = Q(Q^TQ)^{-1}Q^T = QQ^T\]

如果有一个矩阵 \[q = [a, b, c]\] 经过正交化 \[Q = [A, B, C]\] 其中\(A, B, C\)正交。令\(A=a\)(第一个列向量不变,调整其他向量使正交)

研究\(b\)\(a(A)\)上的投影\(p\),利用向量投影公式 \[p = \frac{A^Tb}{A^TA}A\]

\(B\)就是\(b\)\(A\)垂线的向量 \[B = b - p = b - \frac{A^Tb}{A^TA}A\]

已经有了前两个维度,\(C\)就是\(c\)减去在\(A,B\)平面上的投影,也可以认为是减去在前两个方向上的投影(因为平面上的投影也是两个方向上投影的叠加) \[C = c - \frac{A^Tc}{A^TA}A - \frac{B^Tc}{B^TB}B\]

\(u_i = A,B,C\)模数归一化得到\(e_i\),就得到正交矩阵\(Q\)

QR分解

知乎回答:[数值计算] QR分解

\(Ax = b, A\in \mathbb{R}^{m\times n}\) 的可能情况有: 1. \(m=n\) 2. \(m>n\), over-determined 3. \(m<n\), under-determined

定义

\(A\in \mathbb{R}^{m\times n}, m\geq n\)可以被分解成 \(A=QR\), 其中: * \(Q\in \mathbb{R}^{m\times m}\) 是正交矩阵 * \(R = \begin{bmatrix}\hat{R}\\0\end{bmatrix}\in \mathbb{R}^{m\times n}\) * \(\hat{R}\in \mathbb{R}^{n\times n}\)是上三角阵

正交矩阵性质

  • \(Q^TQ = QQ^T = I\)
  • 左乘正交矩阵不影响欧式范数

计算QR分解的方法

  • Gram-Schmidt orthogonalization
  • Householder Triangularization
  • Givens Rotations

Gram-Schmidt Orthogonalization

Reduced QR分解

GSO从矩阵\(A\in \mathbb{R}^{m\times n}\)的第一个列向量\(A_{:,0}\)开始构建互相正交的基。 对于原矩阵\(A\) \[A_{:,0} = r_{0,0}q_0\] \[A_{:,1} = r_{0,1}q_0 + r_{1,1}q_1\] \[...\] \[A_{:,n-1} = r_{0,n-1}q_0 + r_{1,n-1}q_1 ... + r_{n-1,n-1}q_{n-1}\]

\[ A = \hat{Q} \hat{R} \] \[ [A_{:,0}, ... A_{:,n-1}] = [q_0, ..., q_{n-1}] \begin{bmatrix} r_{0,0} \ r_{0,1} \ ... r_{0,n-1}\\ ~~~~~~~ r_{1,1} \ ... r_{1,n-1} \\ ~~~~... \\ ~~~~~~~~~~~~~r_{n-1, n-1}\end{bmatrix} \]

\(\hat{Q} \in \mathbb{R}^{m\times n}, \hat{R} \in \mathbb{R}^{n\times n}\) 称为 Reduced-QR,\(\hat{Q}\)不为方阵。 \(\hat{Q}^T\hat{Q} = I\),因为\(m>n\),不满足\(\hat{Q}\hat{Q}^T = I\) (满秩方阵才能两侧逆)

Full-QR分解

\(Q \in \mathbb{R}^{m\times m}, R \in \mathbb{R}^{m\times n}\)

\(m>n\),需要把\(\hat{Q}\)的n个基拓展到m个基。多出的\(m-n\)个基对应到\(R = \begin{bmatrix} \hat{R} \\ 0 \end{bmatrix}\) 的空白部分。

Python numpy代码

1
2
q, r = np.linalg.qr(A) # Reduced-QR
q, r = np.linalg.qr(A, mode = "complete") # Full-QR

Classic GSO (CGSO)

\[ [A_{:,0}, ... A_{:,n-1}] = [q_0, ..., q_{n-1}] \begin{bmatrix} r_{0,0} \ r_{0,1} \ ... r_{0,n-1}\\ ~~~~~~~ r_{1,1} \ ... r_{1,n-1} \\ ~~~~... \\ ~~~~~~~~~~~~~r_{n-1, n-1}\end{bmatrix} \]

每一次迭代\(q_0, ..., q_{j-1}\)已知,\(r_{j,j}\)未知,\(r_{0,j}, ..., r_{j-1,j}\)满足 \[r_{i,j} = q_i^TA_{:,~j}\]

\[r_{j,j}q_j = v_j = A_{:,j} - \sum_{i=0}^{j-1}{r_{i,j}q_i} = A_{:,j} - \sum_{i=0}^{j-1}{q_i^TA_{:,~j}q_i}\]

\(v_i\)按模数归一化, \[q_j = \frac{v_j}{||v_j||_2}\] \[r_{j,j} = ||v_j||_2\]

Modified GSO (MGSO)

...

Givens Rotations

矩阵形式

G(i,j,θ) Equations

矩阵作用

\(A\in \mathbb{R}^{m\times n}, m \geq n, A_{i,j}, A_{i,k}, j < k\), \[\begin{bmatrix}c & s\\-s & c \end{bmatrix} \begin{bmatrix}A_{i,j}\\A_{i,k}\end{bmatrix} = \begin{bmatrix} \alpha \\ 0 \end{bmatrix}\] \[\alpha = \sqrt{A_{i,j}^2 + A_{i,k}^2}\] \[c=\frac{A_{i,j}}{\alpha}\] \[s=\frac{A_{i,k}}{\alpha}\]