Description

1 Problem 1: Softmax-kernel approximators [50 points]

The goal of this task is to test different random feature map based estimators of the softmax-kernel. Choose two feature vectors \(x, y \in \mathbb{R}^{20}\) of unit \(L_2\)-norm and with angle \(\frac{π}{3}\) between them. Consider three estimators of the value of the softmax-kernel in these points: the regular one using independent sin/cos-random features (leveraging random feature maps for the Gaussian kernel) and its two modifications: the one using Givens random rotations and the one applying random Hadamard matrices (with three HD-blocks). Describe in detail each estimator. Then for the number of random projections \(m = 5, 10, 20, 30, 50, 100\) and each estimator, compute its value as an average over \(s = 10\) independent trials. Propose how to use those trials to compute empirical mean squared error for each estimator. Prepare plots where on \(x\)-axis you put the number of random projections m used and on the \(y\)-axis the average value of the estimator. Show also computed mean squared error in the form of the error-bars. Remark: If \(m > d\) the orthogonal trick can be still applied by constructing independent ensembles of samples such that within each ensemble samples are exactly orthogonal.

Problem 2 Softmax-kernel is positive semidefinite [30 points]

\(K : \mathbb{R}^{d\times d} \to \mathbb{R}^{d}\) is positive semidefinite (PSD) if:

阅读全文 »

最小二乘法与投影

"最小二乘法的本质是希尔伯特空间的投影。"

Suppose we want to project some higher-dimensional vector to a lower-dimensional space(say plane).

The projection is \(\vec{p}\), original one is \(\vec{b}\), the vector that is vertical(or say orthogonal) to the space should be \[\vec{e} = \vec{b} - \vec{p}\]

for the vector \(\vec{p}\) that is within the plane(say n-dimensional space) \[\vec{p} = Ax \] where A is the matrix where each column \(a_i\) is a basis.

阅读全文 »

Attention Mechanism

Attention block: takes a sequence of vectors, \(x_1, ... x_L (\mathbb{R}^d)\), and output another series \(x_1, ... x_L (\mathbb{R}^d)\)

Within the block, \(W_Q, W_K, W_V\), \(X = \begin{bmatrix}x_1\\x_2\\...\\x_L\end{bmatrix} \in \mathbb{R}^{L\times d}\) \[ \mathbb{R}^{L\times d}\begin{cases} Q = W_QX\\ K = W_KX\\V=W_VX\end{cases} \] \[ Q = \begin{bmatrix}q_1\\q_2\\...\\q_L\end{bmatrix} \] all these weight matrices will be learned.

rows:\(q_1, ..., q_L\), columns: \(k_1, ..., k_L\) compose a orthogonal matrix A. \[ A_{ij} = K(q_i, k_j) \]

...

阅读全文 »

找到最长无重复字串长度

  • 题目描述
    • 给定一个数组arr,返回arr的最长无的重复子串的长度(无重复指的是所有数字都不相同
    • 示例1 输入 [2,3,4,5], 输出 4
  • 考点: 哈希,双指针,字符串
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import java.util.*;
/**
*
* @param arr int整型一维数组 the array
* @return int整型
*/
public class NowCoder_MaxLength {

// 滑动窗口是一个set,如果没有就添加,否则比较长度,start增加1
public static int maxLength(int[] arr) {
HashMap<Integer, Integer> map = new HashMap<>();
int max = 1;
for (int start = 0, end = 0; end < arr.length; end++) {
if (map.containsKey(arr[end])) {
start = Math.max(start, map.get(arr[end]) + 1); // 注意点
}
max = Math.max(max, end - start + 1);
map.put(arr[end], end);
}
return max;
}

public static void main(String[] args) {
int[] a = { 2, 2, 3, 2, 4, 3, 4, 3, 4 };
System.out.println(maxLength(a));
}

}

NC61 TwoSum

  • 题目描述

    给出一个整数数组,请在数组中找出两个加起来等于目标值的数, 你给出的函数twoSum 需要返回这两个数字的下标(index1,index2),需要满足 index1 小于index2

    注意:下标是从1开始的 假设给出的数组中只存在唯一解

    例如: 给出的数组为 {20, 70, 110, 150},目标值为90

    输出 index1=1, index2=2

  • 代码

阅读全文 »

General scheme

Approximating the kernel \(K(x,y) : \mathbb{R}^d \times \mathbb{R}^d \to \mathbb{R}\)

\(\phi: \mathbb{R}^d \to \mathbb{R}^m\) randomized

\(K(x,y) = E[\phi(x)^T\phi(y)]\) \[\phi(x) = \frac{h(x)}{\sqrt{m}}\begin{bmatrix}f_1(w_1^Tx), ..., f_1(w_m^Tx)\\f_2(w_1^Tx), ..., f_2(w_m^Tx)\\...\\f_l(w_1^Tx), ..., f_l(w_m^Tx)\end{bmatrix}\] for some \(h: \mathbb{R}^d \to \mathbb{R}\)

Example:

阅读全文 »

(0918) 集合的所有子集

题目描述

现在有一个没有重复元素的整数集合S,求S的所有子集

注意: - 你给出的子集中的元素必须按升序排列 - 给出的解集中不能出现重复的元素 - 例如:如果S=[1,2,3], 给出的解集应为: [ [3], [1], [2], [1,2,3], [1,3], [2,3], [1,2], []]

解法: - 递归,集合每增加一个新元素,先前所有非空子集 + 这些非空子集加入新元素生成的新子集 + 该元素本身形成的集合 - 边界条件:空集合需要包含一个空子集 - 注意:递归时非空子集的运算屏蔽空子集 (continue) ,但每次返回时增加一个空子集。

阅读全文 »

课程信息 IEORE4540

Lecturer: Krzysztof Choromanski

  • Krzysztof Choromanski works on several aspects of machine learning and robotics. His current research interests include reinforcement learning and randomized methods such as nonlinear embeddings based on structured random feature maps and quasi-Monte-Carlo methods. He was also working on online nonparametric clustering for massive high-dimensional streams. Krzysztof is an author of several nonlinear embedding mechanisms based on structured matrices that can be used to speed up: neural network computations, kernel methods applying random feature maps, convex optimization solvers, quantization and soft clustering methods as well as several LSH-based algorithms. With his background in structural graph theory, he is also interested in applying graph theory and other combinatorial methods in machine learning.

Contact: KMC2178@columbia.edu|CHOROMANSKI1@gmail.com|KCHORO@google.com

提要

阅读全文 »