1 Markov decision process (MDP)

  • MDP is a tuple of \((S, A, \{P_{sa} \}, \gamma, R)\)

    • \(S\): states space
    • \(A\): action space
    • \(P_{sa}\): state transition probabilities. Distribution over the state space: Given state \(s\), if we take action \(a\), then the distribution of \(s'\)
    • \(\gamma\) : discount factor, discount future rewards
    • \(R\): reward function, \(S\times A \to \mathbb{R}\). Sometimes it is a function of state only, then \(S \to \mathbb{R}\)
  • Goal of RL: maximize the expectation of discounted future rewards \[E[R(s_0) + \gamma R(s_1) + \gamma^2 R(s_2)...]\]

  • Policy function: \(\pi : S \to A\), action given certain state. \[a = \pi(s)\]

  • Value function for a policy function \(\pi\): \[ V^\pi(s) = E[R(s_0) + \gamma R(s_1) + \gamma^2 R(s_2)... | s_0=s, \pi] \]

  • Bellman Equation: Discrete value case

    • immediate Reward + the expected future sums \[\begin{aligned} V^\pi(s) &= E[R(s_0) + \gamma V^\pi(s') |s_0=s, \pi] \\ &= R(s) + \gamma E[V^\pi(s') |s_0=s, s_1=s', \pi] \\ &= R(s) + \gamma E_{s'\sim P_{s\pi(s)}}[V^\pi(s')] \\ &= R(s) + \gamma\sum_{s'} P_{s\pi(s)}(s')V^\pi(s') \end{aligned}\]
    • Finite-state MDP (\(|S| < \infty\)) has linear solution.
  • Optimal value function \[V^*(s) = \max_\pi V^\pi(s) \]

    \[V^*(s) = R(s) + \max_{a\in A} {\gamma\sum_{s'} P_{sa(s)}(s')V^*(s')}\]

    • equivalently optimize the discounted future rewards

    • define the optimal policy \(\pi^*\) \[\pi^*(s) = \argmax_{a\in A}{\sum_{s'} P_{sa(s)}(s')V^*(s')}\]

    • \(V^*(s) = V^{\pi^*}(s)\), the optimal value function is the value function under the optimal policy. Note that the optimal policy does not depend on the initial state.

2 Value Iteration and Policy Iteration

For finding the optimal policy and the corresponding value.

Value Iteration

阅读全文 »

Reference

https://towardsdatascience.com/handwritten-digit-mnist-pytorch-977b5338e627

Import

1
2
3
4
5
6
7
8
import numpy as np
import torch
import torchvision
import matplotlib.pyplot as plt
from time import time
from torchvision import datasets, transforms
from torch import nn, optim
from tqdm.notebook import tqdm
1
%cd /content/drive/My Drive/20FA/DataMining/DigitRecog
阅读全文 »

Vanishing/Exploding gradients

\[\ell = \ell(\alpha_L)\]

\[\nabla_{w_j, b_j} \ell\]

Update parameter:

\[\theta_{i+1} := \theta_i - \eta \nabla_{w_j, b_j} \ell\]

阅读全文 »

Import packages + MyPCA + MyKMeans and Iris dataset.

1
2
3
4
5
6
7
8
9
import os
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from tqdm.notebook import tqdm
from sklearn import datasets
import scipy.stats as spst
from sklearn.datasets import make_spd_matrix
from scipy.special import softmax
1
os.getcwd()
'/content/drive/My Drive/CS229/Notes7b-GMM'
1
%cd /content/drive/My Drive/CS229/Notes7b-GMM
阅读全文 »

Implement k-means

1
2
3
4
5
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from tqdm.notebook import tqdm
from sklearn import datasets
1
2
3
4
5
6
iris = datasets.load_iris()
iris.target_names
X = iris.data
y = iris.target
# K clusters
k = 3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# PCA
def myPCA(X:np.ndarray, n_dimensions:int):
# N, d = X.shape
# Centering
X_centered = X - X.mean(axis=0)

# Covariance Matrix of d*d
Sigma = np.dot(X_centered.T, X_centered)

# SVD
U, Lambda, V = np.linalg.svd(Sigma)

X_centered_PC = np.dot(U[:,:n_dimensions].T, X_centered.T).T
X_PC = X_centered_PC + X.mean(axis=0)[:n_dimensions]

# Purposely rescale and add negative sign to mimic Sklearn's PCA
return -(X_PC - X_PC.mean(axis=0))/X_PC.std(axis=0)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def myKMeans (X: np.ndarray, k: int, iterations=100, tol=0.001):
N, d = X.shape
mu = X[np.random.choice(range(N), size=k)]
c = np.zeros(N)

for step in tqdm(range(iterations)):
prev_mu = mu.copy()

for i in range(N):
c[i] = np.argmin(np.linalg.norm(X[i] - mu, axis=1)**2)
for j in range(k):
mu[j] = np.mean(X[np.arange(N)[c==j]], axis=0)

if (np.sum(np.linalg.norm(prev_mu - mu,axis=1)) < tol):
break
distortion = np.sum([np.sum(np.linalg.norm(X[np.arange(N)[c==i]] - mu[i], axis=1)**2) for i in range(k)])
print(f"distortion: {distortion}")

return c, mu, distortion
阅读全文 »

Lagrange Multiplier

说明拉格朗日乘数函数的关键点同时也是原函数的关键点

Consider maximize a function with constraints \(f(x), s.t. Ax=b\)

\(A: \mathbb{R}^{n\times d}, x: \mathbb{R}^d, b: \mathbb{R}^n, f: \mathbb{R}^d \to \mathbb{R}\)

Finding the space of solutions

阅读全文 »

Remark

(11/3) PCA中的centering的目的是?

  • 注意到我们对于协方差矩阵的表达式 \(\Sigma = XX^T, X\in \mathbb{R}^{N\times d}\)
  • 实际上的协方差表达式 \(Cov(X) = E[XX^T] - E[X] (E[X])^T\)
  • 所以需要centering各分量

(11/3) PCA中特征值为什么对应方差贡献量?

  • \(\Sigma\) 这个半正定对称矩阵的分解可以看成 \(\Sigma = U\Lambda U^T\),即U和V相等。而U的每个列向量组成标准(norm=1)正交基。最终的\(\Sigma\)是各个分量按特征值加权再组合。因而这里的特征值可以反映高维椭圆各正交轴的长短。又因为每个\(\lambda uu^T\)的量纲是二次,所以\(\lambda\)可以反映方差贡献量。
  • SVD那张经典的示意图。正交阵的本质是换基(比如旋转变换),而\(\Lambda\) 这种特征值对角矩阵就表示对各分量进行不同程度的伸缩。
阅读全文 »

The goal of Components Analysis is to find a new basis to represent the data.

Cocktail party problem:

  • sources \(s\in R^d\)
  • microphone recordings(receiver): \(x\in R^d\)
  • mixing matrix \(A\): \(x = As\), square matrix
  • unmixing matrix: \(W = A^{-1}\), \(s^{(i)} = Wx^{(i)}\), the superscript means time i.
  • j-th source at time i \(s_j^{(i)} = w_j^Tx^{(i)}\), \(W = [w_1^T, ...w_n^T]^T\), each row is a \(w_i^T\)

ICA ambiguities

scale of A:

阅读全文 »