Changeset 1182 for trunk/yat/classifier


Ignore:
Timestamp:
Feb 28, 2008, 1:27:37 PM (13 years ago)
Author:
Peter
Message:

working on #335. Fixed weighted test data case. Left to fix is when there are missing features in training in other words what should happen when complete training cannot be done because lack of data. The current behavior is probably not optimal, but have to look into it in more detail.

Location:
trunk/yat/classifier
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • trunk/yat/classifier/NBC.cc

    r1162 r1182  
    152152    prediction.resize(centroids_.columns(), mlw.columns(), 0);
    153153
    154     // first calculate -lnP = sum sigma_i + (x_i-m_i)^2/2sigma_i^2
     154    // first calculate -lnP = sum (sigma_i) +
     155    // N sum w_i(x_i-m_i)^2/2sigma_i^2 / sum w_i
    155156    for (size_t label=0; label<centroids_.columns(); ++label) {
    156157      double sum_log_sigma = sum_logsigma(label);
    157158      for (size_t sample=0; sample<prediction.rows(); ++sample) {
    158         prediction(label,sample) = sum_log_sigma;
     159        statistics::AveragerWeighted aw;
    159160        for (size_t i=0; i<mlw.rows(); ++i)
    160           // taking care of NaN and missing training features
    161           if (mlw.weight(i, label) && !std::isnan(sigma2_(i, label))) {
    162             prediction(label, sample) += mlw.weight(i, label)*
    163               std::pow(mlw.data(i, label)-centroids_(i, label),2)/
    164               sigma2_(i, label);
    165           }
    166        
     161          // missing training features
     162          if (!std::isnan(sigma2_(i, label)))
     163            aw.add(std::pow(mlw.data(i, label)-centroids_(i, label),2)/
     164                   sigma2_(i, label), mlw.weight(i, label));
     165        prediction(label,sample) = sum_log_sigma + mlw.rows()*aw.mean()/2;
    167166      }
    168167    }
  • trunk/yat/classifier/NBC.h

    r1169 r1182  
    104104       row in \a res corresponds to a class. The prediction is the
    105105       estimated probability that sample belong to class \f$ j \f$
     106
     107       \f$ P_j = \frac{1}{Z}\prod_i\({\frac{1}{\sqrt{2\pi\sigma_i^2}}}\)
     108       \exp(\frac{\sum{w_i(x_i-\mu_i)^2}{\sigma_i^2}}{\sum w_i})\f$,
     109       where \f$ \mu_i
     110       \f$ and \f$ \sigma_i^2 \f$ are the estimated mean and variance,
     111       respectively. If a \f$ \sigma_i \f$ could not be estimated
     112       during training, corresponding factor is set to unity, in other
     113       words, that feature is ignored for the prediction of that
     114       particular class. Z is chosen such that total probability, \f$
     115       \sum P_j \f$, equals unity.
    106116     */
    107117    void predict(const MatrixLookupWeighted& data, utility::Matrix& res) const;
Note: See TracChangeset for help on using the changeset viewer.