 Timestamp:
 Oct 17, 2006, 2:13:32 PM (17 years ago)
 File:

 1 edited
Legend:
 Unmodified
 Added
 Removed

trunk/yat/classifier/SVM.h
r680 r692 53 53 54 54 public: 55 56 57 55 /// 56 /// Constructor taking the kernel and the target vector as 57 /// input. 58 58 /// 59 59 /// @note if the @a target or @a kernel 60 61 60 /// is destroyed the behaviour is undefined. 61 /// 62 62 SVM(const KernelLookup& kernel, const Target& target); 63 63 … … 73 73 make_classifier(const DataLookup2D&, const Target&) const; 74 74 75 76 77 75 /// 76 /// @return \f$ \alpha \f$ 77 /// 78 78 inline const utility::vector& alpha(void) const { return alpha_; } 79 79 80 80 /// 81 81 /// The Cparameter is the balance term (see train()). A very 82 82 /// large C means the training will be focused on getting samples … … 91 91 /// 92 92 /// @returns mean of vector \f$ C_i \f$ 93 93 /// 94 94 inline double C(void) const { return 1/C_inverse_; } 95 95 96 96 /// 97 97 /// Default is max_epochs set to 10,000,000. 98 98 /// … … 101 101 inline long int max_epochs(void) const {return max_epochs_;} 102 102 103 / //104 ///The output is calculated as \f$ o_i = \sum \alpha_j t_j K_{ij}105 ///+ bias \f$, where \f$ t \f$ is the target.106 ///107 ///@return output108 ///103 /** 104 The output is calculated as \f$ o_i = \sum \alpha_j t_j K_{ij} 105 + bias \f$, where \f$ t \f$ is the target. 106 107 @return output 108 */ 109 109 inline const theplu::yat::utility::vector& 110 110 output(void) const { return output_; } 111 111 112 / //113 ///Generate prediction @a predict from @a input. The prediction114 ///is calculated as the output times the margin, i.e., geometric115 ///distance from decision hyperplane: \f$ \frac{ \sum \alpha_j116 ///t_j K_{ij} + bias}{w} \f$ The output has 2 rows. The first row117 ///is for binary target true, and the second is for binary target118 ///false. The second row is superfluous as it is the first row119 ///negated. It exist just to be aligned with multiclass120 ///SupervisedClassifiers. Each column in @a input and @a output121 ///corresponds to a sample to predict. Each row in @a input122 ///corresponds to a training sample, and more exactly row i in @a123 ///input should correspond to row i in KernelLookup that was used124 ///for training.125 ///112 /** 113 Generate prediction @a predict from @a input. The prediction 114 is calculated as the output times the margin, i.e., geometric 115 distance from decision hyperplane: \f$ \frac{ \sum \alpha_j 116 t_j K_{ij} + bias}{w} \f$ The output has 2 rows. The first row 117 is for binary target true, and the second is for binary target 118 false. The second row is superfluous as it is the first row 119 negated. It exist just to be aligned with multiclass 120 SupervisedClassifiers. Each column in @a input and @a output 121 corresponds to a sample to predict. Each row in @a input 122 corresponds to a training sample, and more exactly row i in @a 123 input should correspond to row i in KernelLookup that was used 124 for training. 125 */ 126 126 void predict(const DataLookup2D& input, utility::matrix& predict) const; 127 127 … … 152 152 Training the SVM following Platt's SMO, with Keerti's 153 153 modifacation. Minimizing \f$ \frac{1}{2}\sum 154 y_iy_j\alpha_i\alpha_j(K_{ij}+\frac{1}{C_i}\delta_{ij}) \f$ , 155 which corresponds to minimizing \f$ \sum w_i^2+\sum C_i\xi_i^2 156 \f$. 154 y_iy_j\alpha_i\alpha_j(K_{ij}+\frac{1}{C_i}\delta_{ij})  \sum 155 alpha_i\f$ , which corresponds to minimizing \f$ \sum 156 w_i^2+\sum C_i\xi_i^2 \f$. 157 158 @note If the training problem is not linearly separable and C 159 is set to infinity, the minima will be located in the infinity, 160 and thus the minumum will not be reached within the maximal 161 number of epochs. More exactly, when the problem is not 162 linearly separable, there exists an eigenvector to \f$ 163 H_{ij}=y_iy_jK_{ij} \f$ within the space defined by the 164 conditions: \f$ \alpha_i>0 \f$ and \f$ \sum \alpha_i y_i = 0 165 \f$. As the eigenvalue is zero in this direction the quadratic 166 term does not contribute to the objective, but the objective 167 only consists of the linear term and hence there is no 168 minumum. This problem only occurs when \f$ C \f$ is set to 169 infinity because for a finite \f$ C \f$ all eigenvalues are 170 finite. However, for a large \f$ C \f$ (and training problem is 171 nonlinearly separable) there exists an eigenvector 172 corresponding to a small eigenvalue, which means the minima has 173 moved from infinity to "very far away". In practice this will 174 also result in that the minima is not reached withing the 175 maximal number of epochs and the of \f$ C \f$ should be 176 decreased. 177 178 @return true if succesful 157 179 */ 158 180 bool train(); … … 161 183 162 184 private: 163 164 185 /// 186 /// Copy constructor. (not implemented) 165 187 /// 166 188 SVM(const SVM&); … … 184 206 185 207 /// 186 187 188 189 190 208 /// Private function choosing which two elements that should be 209 /// updated. First checking for the biggest violation (output  target = 210 /// 0) among support vectors (alpha!=0). If no violation was found check 211 /// sequentially among the other samples. If no violation there as 212 /// well training is completed 191 213 /// 192 214 /// @return true if a pair of samples that violate the conditions 193 215 /// can be found 194 216 /// 195 217 bool choose(const theplu::yat::utility::vector&); 196 218
Note: See TracChangeset
for help on using the changeset viewer.