source: trunk/src/NNI.h @ 241

Last change on this file since 241 was 241, checked in by Jari Häkkinen, 18 years ago

Cleaned up size_t/u_int confusion.

  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 3.3 KB
Line 
1// $Id: NNI.h 241 2005-02-22 11:30:25Z jari $
2
3#ifndef _theplu_cpptools_nni_
4#define _theplu_cpptools_nni_
5
6#include <iostream>
7#include <utility>
8#include <vector>
9
10#include "matrix.h"
11
12namespace theplu {
13namespace cpptools {
14
15  using namespace std;
16
17  ///
18  /// NNI is an abstract base class defining the interface for nearest
19  /// neighbour imputation (NNI) algorithms.
20  ///
21  /// NNI algorithms implemented here is discussed in documents
22  /// created in the WeNNI project. This document will be released for
23  /// public access, and the necessary information for retrieving that
24  /// document will be provided here.
25  ///
26  /// Short introduction to NNI is that one may want to improve
27  /// (correct) uncertain data. Here, the data to be imputed is stored in a
28  /// matrix where rows similar to each other are used to adjust
29  /// uncertain data. The data matrix is accompanied by a weight
30  /// (uncertainty) matrix defining what data is to be considered as
31  /// 'certain' and what data is uncertain. The weight matrix can be
32  /// binary with 1's indicating that the data does not need
33  /// corrections, whereas a 0 means that the data should be replaced
34  /// by an imputed value. Naturally, the weight matrix can also be
35  /// continuous where values between 0 and 1 defines how certain a
36  /// data element is.
37  ///
38  /// The imputation depends on how similarity of rows of data is
39  /// defined and on the number of closest neighbours (here; rows) to
40  /// use in the imputation can be set.
41  ///
42  /// Implementation issues
43  ///
44  /// The current implementation treats rows where all data are tagged
45  /// are completely uncertain, i.e. all weights are zero, by
46  /// ignoring these lines in nearest neighbourhood
47  /// calculations. Importantly, this type of data are not changed
48  /// (imputed) either since there is no close neighbourhood defined
49  /// for this data.
50  ///
51  /// Rows that is completely identical in an imputation algorithm
52  /// sense will give problems since the distance between will usually
53  /// become zero. This is solved by setting zero distance to a small
54  /// number. Identical rows in this context are basically a
55  /// comparison between elements with non-zero uncertainty weights
56  /// only, and all these elements are equal. Zero weight elements are
57  /// not used in the comparison since these are considered as
58  /// non/sense values.
59  ///
60  class NNI
61  {
62  public:
63
64    ///
65    /// Base constructor for the nearest neighbour imputation
66    /// algorithms.
67    ///
68    NNI(const gslapi::matrix& matrix,const gslapi::matrix& weight,
69        const u_int neighbours);
70
71    virtual ~NNI(void) {};
72
73    ///
74    /// Function doing the imputation.
75    ///
76    /// @return number of rows not imputed
77    ///
78    virtual u_int estimate(void)=0;
79
80    ///
81    /// @return A const reference to the modified data.
82    ///
83    const gslapi::matrix& imputed_data(void) const { return imputed_data_; }
84
85    ///
86    /// @return indices of rows in data matrix not imputed
87    ///
88    inline vector<size_t> not_imputed(void) const { return not_imputed_; }
89
90  protected:
91    vector<pair<u_int,double> > calculate_distances(const u_int) const;
92    vector<u_int> nearest_neighbours(const u_int,
93                                     const vector<pair<u_int,double> >&) const;
94
95    const gslapi::matrix& data_;
96    gslapi::matrix imputed_data_;
97    u_int neighbours_;
98    vector<size_t> not_imputed_;
99    const gslapi::matrix& weight_;
100  };
101
102}} // of namespace cpptools and namespace theplu
103
104#endif
Note: See TracBrowser for help on using the repository browser.