5 | 5 | It is not obvious to me whether we should change the documentation or the implementation. Originally we used w*x+bias, which is standard as SVM output. However, that did not work so well in context of Ensembles. Because SVMs for which the training did not work so well tend to have very large |w|, which implies that the average vote will be dominated by these poor SVMs with large |w|. Therefore we chose to penalize SVMs with large |w|. The question is whether we should penalize such that the prediction output corresponds to distance from the hyperplane to data point or if we should penalize the poor SVMs even harder with a denominator |w|^2 |