Sebastian Kirsch: Blog

Tuesday, 11 January 2005

Struggling with the voted perceptron

Filed under: — Sebastian Kirsch @ 21:51

I’m currently struggling with the voted perceptron learning algorithm by Yoav Freund and Robert Schapire: I’ve now read three different versions of this paper, and some things about the algorithm strike me as “not quite right".

The first thing is a minor issue: The voted perceptron algorithm is specified as starting with setting k := 0 and v1 := 0, but then immediately starts to make predictions with vk without ever initializing v0. That can’t be quite right. My guess is that they wanted to start with v0 := 0 (because a prediction vector of 0 is pretty pointless.)

The next problem is a little more involved. As a decision function, they use the standard f(x) := sign(wx) – but that’s not right, the classical perceptron algorithm uses f(x) := sign(wx - b), where b is the threshold of the decision function. It’s just that the threshold is usually hidden in this equation, because the “zeroth element” of vector x is taken to be x0 ≡ 1. This saves you the trouble of having to update the threshold by hand.

The voted perceptron does not have this implicit assumption, and being a kernel algorithm, it cannot have it – how would you specify that a component is equivalent to 1 in feature space?

According to everything I learnt in my lectures on neural networks and AI, a perceptron cannot work with a threshold of zero, because then the origin is part of all decision hyperplanes. A threshold of anything but zero is OK, because you can scale the vector w to get the right decision hyperplane. But the perceptron is supposed to be able to solve all linearly separable problems, not only those that are separable with a hyperplane containing the origin.

The only implementation of the voted perceptron I could find (in the Weka package) neatly sidesteps this problem by always using the polynomial kernel k(x, y) = (1 + xy)d; if no exponent is specified, they assume d = 1; and thus get their threshold. But this is not how the voted perceptron is specified. One solution would be to specify the algorithm with xx’ + 1 in the non-kernelized form, and k(x,x’) + 1 in the kernelized form everywhere the scalar product/kernel is used.

It’s a strange thing that no publication makes note of this problem; I will have to talk it through with someone.

1 Comment

  1. The problem with origin is solved in some implementation adding a dimension at the feature vector, as done in the Novikoff (1962) [perceptron termination] theorem proof.

    xx = [x, R] , ww = [w, b/R]

    Comment by Matteo Bertini — Wednesday, 16 February 2005 @ 16:36

RSS feed for comments on this post.

Leave a comment

Sorry, the comment form is closed at this time.

Copyright © 1999--2004 Sebastian Marius Kirsch , all rights reserved.