题目
(0908)买卖股票的最佳时机
知识点:数组,动态规划
难度:简单
题目描述:
MDP is a tuple of \((S, A, \{P_{sa} \}, \gamma, R)\)
Goal of RL: maximize the expectation of discounted future rewards \[E[R(s_0) + \gamma R(s_1) + \gamma^2 R(s_2)...]\]
Policy function: \(\pi : S \to A\), action given certain state. \[a = \pi(s)\]
Value function for a policy function \(\pi\): \[ V^\pi(s) = E[R(s_0) + \gamma R(s_1) + \gamma^2 R(s_2)... | s_0=s, \pi] \]
Bellman Equation: Discrete value case
Optimal value function \[V^*(s) = \max_\pi V^\pi(s) \]
\[V^*(s) = R(s) + \max_{a\in A} {\gamma\sum_{s'} P_{sa(s)}(s')V^*(s')}\]
equivalently optimize the discounted future rewards
define the optimal policy \(\pi^*\) \[\pi^*(s) = \argmax_{a\in A}{\sum_{s'} P_{sa(s)}(s')V^*(s')}\]
\(V^*(s) = V^{\pi^*}(s)\), the optimal value function is the value function under the optimal policy. Note that the optimal policy does not depend on the initial state.
For finding the optimal policy and the corresponding value.
https://towardsdatascience.com/handwritten-digit-mnist-pytorch-977b5338e627
1 | import numpy as np |
1 | %cd /content/drive/My Drive/20FA/DataMining/DigitRecog |
\[\ell = \ell(\alpha_L)\]
\[\nabla_{w_j, b_j} \ell\]
Update parameter:
\[\theta_{i+1} := \theta_i - \eta \nabla_{w_j, b_j} \ell\]
1 | import os |
1 | os.getcwd() |
'/content/drive/My Drive/CS229/Notes7b-GMM'
1 | %cd /content/drive/My Drive/CS229/Notes7b-GMM |
1 | Simple implementation with basic data cleaning, one-hot encoding and lightGBM classifier. |
1 | %cd /content/drive/My Drive/Kaggle/titanic |
1 | import numpy as np |
1 | iris = datasets.load_iris() |
1 | # PCA |
1 | def myKMeans (X: np.ndarray, k: int, iterations=100, tol=0.001): |
说明拉格朗日乘数函数的关键点同时也是原函数的关键点
Consider maximize a function with constraints \(f(x), s.t. Ax=b\)
\(A: \mathbb{R}^{n\times d}, x: \mathbb{R}^d, b: \mathbb{R}^n, f: \mathbb{R}^d \to \mathbb{R}\)
(11/3) PCA中的centering的目的是?
(11/3) PCA中特征值为什么对应方差贡献量?
Link: Separating mixed signals with ICA
1 | import numpy as np |
The goal of Components Analysis is to find a new basis to represent the data.