[cs231n] assignment1 SVM分类器部分

主要内容:

  1. 完成一个使用向量方法计算svm损失函数;
  2. 完成一个使用向量方法来分析梯度;
  3. 使用数学方法来检查梯度
  4. 用验证集来微调学习率和正则项;
  5. 使用随机梯度下降法来优化损失函数; 6. 可视化最后学习到的权重

代码地址:

GCloud的jupyter
GCloud备用(手动加token了)
Github仓库

首先应该说的是这个SVM只是用到了SVM损失函数,不是支持向量机的那个SVM流程。

SVM分类器的损失函数的计算公式如下:

$$L = \frac{1}{N}\sum_{i}\sum_{j\neq y_i}[max(0, f_j(x_i; W)-f_{y_i}(x_i;W)+\triangle)] + \lambda \sum_k \sum_lW_{k,l}^2$$

这个定义可以看课程的视频,简单而言,就是如果W*X计算后的得分+1大于正确标签(也就是yi)的得分(这个就相当于预测错了,即怎么可能比真实的标签值的得分还要大),那么就加上这个偏差,否则无视(我们可以这么理解,对于小的显然是应该的,所以就不应该加到损失函数里,否则就是加上偏差)

梯度的数值计算(numeric computing)方法:

$$ \frac{df(x)}{dx} = \lim\limits_{h\to0} \frac{f(x+h)-f(x)}{h}$$

梯度的微分分析计算:

$$\nabla{L_i} = -(\sum_{j\neq{y_i}}1(w_j^Tx_i-w_{y_i}^T + \triangle > 0)) x_i$$

其中1(x)是示性函数,当x为真的时候函数值为1,否则为0.

在SVM分类器中,我们这里处理训练集、验证集、测试集之外又从训练集中随机选择500个样本作为development set,在最终的训练和预测之前,我们都使用这个小的数据集,当然,直接使用完整的训练集也是可以的,不过就是花费的时间有点多。(验证集在我们的代码中的作用就是在不同的学习率和正则化参数的结合里找到最好的组合,然后去检验测试集)经过将数据拉平后的数据的shape如下:

在线性分类中,我们使用的是随机梯度下降(stochastic gradient desecnt) 随机梯度下降是与批量梯度下降区分开的两种梯度下降方式。批量梯度下降是指每一步去按照参数例如是theta的梯度负方向去更新,我们用到的是全局的最优解,即每一次更新都要用到所有训练集的数据,所以它的迭代速度是很慢的。而随机梯度下降是选取其中的小部分样本去进行计算梯度,这样虽然迭代的不是全局最优方向,但是大的整体的方向是全局最优解的,最终的结果往往也在全局最优解附近。在我们的例子中,训练集本来的大小是49000*3073的,如果要更新参数矩阵(10*3073)的大小,计算loss和grad是非常耗时的,所以我们随机选取200个训练集的例子,构成200*3073的batch去做随机梯度下降,时间会提快很多。我经过实验,batch_size选取200的时候需要不到4秒,而设置成1000的时候需要15.56秒。

下面是代码部分:

svm.ipvnb

Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the assignments page on the course website.

In this exercise you will:

  • implement a fully-vectorized loss function for the SVM
  • implement the fully-vectorized expression for its analytic gradient
  • check your implementation using numerical gradient
  • use a validation set to tune the learning rate and regularization strength
  • optimize the loss function with SGD
  • visualize the final learned weights

CIFAR-10 Data Loading and Preprocessing

《[cs231n] assignment1 SVM分类器部分》

《[cs231n] assignment1 SVM分类器部分》

SVM Classifier

Your code for this section will all be written inside cs231n/classifiers/linear_svm.py.

As you can see, we have prefilled the function compute_loss_naive which uses for loops to evaluate the multiclass SVM loss function.

The grad returned from the function above is right now all zero. Derive and implement the gradient for the SVM cost function and implement it inline inside the function svm_loss_naive. You will find it helpful to interleave your new code inside the existing function.

To check that you have correctly implemented the gradient correctly, you can numerically estimate the gradient of the loss function and compare the numeric estimate to the gradient that you computed. We have provided code that does this for you:

Inline Question 1:

It is possible that once in a while a dimension in the gradcheck will not match exactly. What could such a discrepancy be caused by? Is it a reason for concern? What is a simple example in one dimension where a gradient check could fail? How would change the margin affect of the frequency of this happening? Hint: the SVM loss function is not strictly speaking differentiable

Your Answer: fill this in.

Stochastic Gradient Descent

We now have vectorized and efficient expressions for the loss, the gradient and our gradient matches the numerical gradient. We are therefore ready to do SGD to minimize the loss.

Inline question 2:

Describe what your visualized SVM weights look like, and offer a brief explanation for why they look they way that they do.

Your answer: fill this in

LinearClassifier.py(SGD部分)

Linear_svm.py

两种计算梯度的方法,naive版本的是没有向量化的。

点赞

发表评论

电子邮件地址不会被公开。 必填项已用*标注