Index: /trunk/yat/classifier/SVM.h
===================================================================
--- /trunk/yat/classifier/SVM.h (revision 691)
+++ /trunk/yat/classifier/SVM.h (revision 692)
@@ -53,11 +53,11 @@
public:
- ///
- /// Constructor taking the kernel and the target vector as
- /// input.
+ ///
+ /// Constructor taking the kernel and the target vector as
+ /// input.
///
/// @note if the @a target or @a kernel
- /// is destroyed the behaviour is undefined.
- ///
+ /// is destroyed the behaviour is undefined.
+ ///
SVM(const KernelLookup& kernel, const Target& target);
@@ -73,10 +73,10 @@
make_classifier(const DataLookup2D&, const Target&) const;
- ///
- /// @return \f$ \alpha \f$
- ///
+ ///
+ /// @return \f$ \alpha \f$
+ ///
inline const utility::vector& alpha(void) const { return alpha_; }
- ///
+ ///
/// The C-parameter is the balance term (see train()). A very
/// large C means the training will be focused on getting samples
@@ -91,8 +91,8 @@
///
/// @returns mean of vector \f$ C_i \f$
- ///
+ ///
inline double C(void) const { return 1/C_inverse_; }
- ///
+ ///
/// Default is max_epochs set to 10,000,000.
///
@@ -101,27 +101,27 @@
inline long int max_epochs(void) const {return max_epochs_;}
- ///
- /// The output is calculated as \f$ o_i = \sum \alpha_j t_j K_{ij}
- /// + bias \f$, where \f$ t \f$ is the target.
- ///
- /// @return output
- ///
+ /**
+ The output is calculated as \f$ o_i = \sum \alpha_j t_j K_{ij}
+ + bias \f$, where \f$ t \f$ is the target.
+
+ @return output
+ */
inline const theplu::yat::utility::vector&
output(void) const { return output_; }
- ///
- /// Generate prediction @a predict from @a input. The prediction
- /// is calculated as the output times the margin, i.e., geometric
- /// distance from decision hyperplane: \f$ \frac{ \sum \alpha_j
- /// t_j K_{ij} + bias}{w} \f$ The output has 2 rows. The first row
- /// is for binary target true, and the second is for binary target
- /// false. The second row is superfluous as it is the first row
- /// negated. It exist just to be aligned with multi-class
- /// SupervisedClassifiers. Each column in @a input and @a output
- /// corresponds to a sample to predict. Each row in @a input
- /// corresponds to a training sample, and more exactly row i in @a
- /// input should correspond to row i in KernelLookup that was used
- /// for training.
- ///
+ /**
+ Generate prediction @a predict from @a input. The prediction
+ is calculated as the output times the margin, i.e., geometric
+ distance from decision hyperplane: \f$ \frac{ \sum \alpha_j
+ t_j K_{ij} + bias}{w} \f$ The output has 2 rows. The first row
+ is for binary target true, and the second is for binary target
+ false. The second row is superfluous as it is the first row
+ negated. It exist just to be aligned with multi-class
+ SupervisedClassifiers. Each column in @a input and @a output
+ corresponds to a sample to predict. Each row in @a input
+ corresponds to a training sample, and more exactly row i in @a
+ input should correspond to row i in KernelLookup that was used
+ for training.
+ */
void predict(const DataLookup2D& input, utility::matrix& predict) const;
@@ -152,7 +152,29 @@
Training the SVM following Platt's SMO, with Keerti's
modifacation. Minimizing \f$ \frac{1}{2}\sum
- y_iy_j\alpha_i\alpha_j(K_{ij}+\frac{1}{C_i}\delta_{ij}) \f$ ,
- which corresponds to minimizing \f$ \sum w_i^2+\sum C_i\xi_i^2
- \f$.
+ y_iy_j\alpha_i\alpha_j(K_{ij}+\frac{1}{C_i}\delta_{ij}) - \sum
+ alpha_i\f$ , which corresponds to minimizing \f$ \sum
+ w_i^2+\sum C_i\xi_i^2 \f$.
+
+ @note If the training problem is not linearly separable and C
+ is set to infinity, the minima will be located in the infinity,
+ and thus the minumum will not be reached within the maximal
+ number of epochs. More exactly, when the problem is not
+ linearly separable, there exists an eigenvector to \f$
+ H_{ij}=y_iy_jK_{ij} \f$ within the space defined by the
+ conditions: \f$ \alpha_i>0 \f$ and \f$ \sum \alpha_i y_i = 0
+ \f$. As the eigenvalue is zero in this direction the quadratic
+ term does not contribute to the objective, but the objective
+ only consists of the linear term and hence there is no
+ minumum. This problem only occurs when \f$ C \f$ is set to
+ infinity because for a finite \f$ C \f$ all eigenvalues are
+ finite. However, for a large \f$ C \f$ (and training problem is
+ non-linearly separable) there exists an eigenvector
+ corresponding to a small eigenvalue, which means the minima has
+ moved from infinity to "very far away". In practice this will
+ also result in that the minima is not reached withing the
+ maximal number of epochs and the of \f$ C \f$ should be
+ decreased.
+
+ @return true if succesful
*/
bool train();
@@ -161,6 +183,6 @@
private:
- ///
- /// Copy constructor. (not implemented)
+ ///
+ /// Copy constructor. (not implemented)
///
SVM(const SVM&);
@@ -184,13 +206,13 @@
///
- /// Private function choosing which two elements that should be
- /// updated. First checking for the biggest violation (output - target =
- /// 0) among support vectors (alpha!=0). If no violation was found check
- /// sequentially among the other samples. If no violation there as
- /// well training is completed
+ /// Private function choosing which two elements that should be
+ /// updated. First checking for the biggest violation (output - target =
+ /// 0) among support vectors (alpha!=0). If no violation was found check
+ /// sequentially among the other samples. If no violation there as
+ /// well training is completed
///
/// @return true if a pair of samples that violate the conditions
/// can be found
- ///
+ ///
bool choose(const theplu::yat::utility::vector&);