在学习cs229-notes11 时遇到了这样一个等式
\[\nabla_W|W| = |W|(W^{-1})^T \]
于是找到了这个 UCI 的“矩阵菜谱”。其中列举了与矩阵有关的公式,在此上传供以后查阅。
Credit to: Max Welling's CS 273B: Kernel-Based Learning, 2005
(可见这是一个比较古早的研究kernel的课程)
在学习cs229-notes11 时遇到了这样一个等式
\[\nabla_W|W| = |W|(W^{-1})^T \]
于是找到了这个 UCI 的“矩阵菜谱”。其中列举了与矩阵有关的公式,在此上传供以后查阅。
Credit to: Max Welling's CS 273B: Kernel-Based Learning, 2005
(可见这是一个比较古早的研究kernel的课程)
Gaussian discriminant analysis - lemon的文章 - 知乎 https://zhuanlan.zhihu.com/p/22940577
Normal Equation如何实现一步求解最优参数及其对比梯度下降的特点是什么? - 深度碎片的回答 - 知乎 https://www.zhihu.com/question/273799498/answer/370173526
Try to train the hypothesis function \(h\).
Letting \(x_0=1\) gives the error term. \[h(x) = \sum_{i=0}^d{\theta_ix_i}=\theta^T x\]
Cost function w.r.t parameter \(\theta\):
\[J(\theta) = \frac12\sum_{i=1}^{n}{(h_\theta(x^{(i)}) - y^{(i)})^2}\]
Comment: some simulation methods 1. Inverse CDF technique : \(X = F^{-1}(U), U\sim unif(0,1)\) 2. Box Muller method for generating Gaussian
Problem1: PSD: Random feature decomposition to prove
Problem2: Find a random feature map of the product of two kernels, angular might be negative
Hua Yao(UNI: hy2632)
Given: \[K(x,y) = (2\pi)^{-\frac{d}{2}}(\det(\Sigma))^{-\frac{1}{2}}\exp(-\frac{1}{2}(x-y)^{\top}\Sigma^{-1}(x-y)), \] \(\Sigma \in \mathbb{R}^{d\times d}\) is positive definite symmetric.
Performers: variant of transformer
Random feature for different kernels Softmax: triangnometric, positive Orthogonal features construction: different ways(Givens, Hadamard, regular(GM), or even more), different renormalizations
concentration, computing variance of certain feature map, Cherbychev, concentration results
attention: ..., transformer, MLP, resnet, ...
Policy optimization can be done through gradient ascend:
\[\nabla_\theta \mathbb{E}_{\epsilon\sim\mathcal{N}(0, I)} F(\theta + \sigma \epsilon) = \frac{1}{\sigma}\mathbb{E}_{\epsilon\sim\mathcal{N}(0, I)} [ \epsilon F(\theta + \sigma\epsilon) ]\]
Proof:
Managerial Negotiation 第五周的案例,强调了在谈判前要发现双方共同利益、maximize value creation。在本案例情境中,谈判轮数增加会导致双方可变成本增加,且谈判核心-\(w\)值的增加会导致双方总福利的降低。在设计本模拟工具时也考虑了两种情况:
针对消耗战情况,又设计了一个函数find_Case
来针对当前已知信息(轮数,己方出价,对方出价),找出一个模拟情况,从而对未来局势有所判断。
1 | def find_Case(round, w_K, w_A) |
"Attention is all you need"
a sequence of feature vectors \(X = [x_1, ... x_L]^T \in \mathbb{R}^{L\times d}\) are fed into an attention block.
\(W_Q, W_K, W_V\)