source: trunk/src/NNI.h @ 228

Last change on this file since 228 was 228, checked in by Peter, 18 years ago

moved estimation from constructor, added function telling which rows were not imputed (due too many missing values).

  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 3.3 KB
Line 
1// $Id: NNI.h 228 2005-02-01 14:06:51Z peter $
2
3#ifndef _theplu_cpptools_nni_
4#define _theplu_cpptools_nni_
5
6#include <iostream>
7#include <utility>
8#include <vector>
9
10#include "matrix.h"
11
12namespace theplu {
13namespace cpptools {
14
15  using namespace std;
16
17  ///
18  /// NNI is an abstract base class defining the interface for nearest
19  /// neighbour imputation (NNI) algorithms.
20  ///
21  /// NNI algorithms implemented here is discussed in documents
22  /// created in the WeNNI project. This document will be released for
23  /// public access, and the necessary information for retrieving that
24  /// document will be provided here.
25  ///
26  /// Short introduction to NNI is that one may want to improve
27  /// (correct) uncertain data. Here, the data to be imputed is stored in a
28  /// matrix where rows similar to each other are used to adjust
29  /// uncertain data. The data matrix is accompanied by a weight
30  /// (uncertainty) matrix defining what data is to be considered as
31  /// 'certain' and what data is uncertain. The weight matrix can be
32  /// binary with 1's indicating that the data does not need
33  /// corrections, whereas a 0 means that the data should be replaced
34  /// by an imputed value. Naturally, the weight matrix can also be
35  /// continuous where values between 0 and 1 defines how certain a
36  /// data element is.
37  ///
38  /// The imputation depends on how similarity of rows of data is
39  /// defined and on the number of closest neighbours (here; rows) to
40  /// use in the imputation can be set.
41  ///
42  /// Implementation issues
43  ///
44  /// The current implementation treats rows where all data are tagged
45  /// are completely uncertain, i.e. all weights are zero, by
46  /// ignoring these lines in nearest neighbourhood
47  /// calculations. Importantly, this type of data are not changed
48  /// (imputed) either since there is no close neighbourhood defined
49  /// for this data.
50  ///
51  /// Rows that is completely identical in an imputation algorithm
52  /// sense will give problems since the distance between will usually
53  /// become zero. This is solved by setting zero distance to a small
54  /// number. Identical rows in this context are basically a
55  /// comparison between elements with non-zero uncertainty weights
56  /// only, and all these elements are equal. Zero weight elements are
57  /// not used in the comparison since these are considered as
58  /// non/sense values.
59  ///
60  class NNI
61  {
62  public:
63
64    ///
65    /// Base constructor for the nearest neighbour imputation
66    /// algorithms.
67    ///
68    NNI(const gslapi::matrix& matrix,const gslapi::matrix& weight,
69        const u_int neighbours);
70
71    virtual ~NNI(void) {};
72
73    ///
74    /// Function doing the imputation. @return number of rows not imputed
75    ///
76    virtual size_t estimate(void)=0;
77
78    ///
79    /// @return A const reference to the modified data.
80    ///
81    const gslapi::matrix& imputed_data(void) const { return imputed_data_; }
82
83    ///
84    /// @return indices of rows in data matrix not imputed
85    ///
86    inline vector<size_t> not_imputed(void) const { return not_imputed_; }
87
88  protected:
89    vector<pair<u_int,double> > calculate_distances(const u_int) const;
90    vector<u_int> nearest_neighbours(const u_int,
91                                     const vector<pair<u_int,double> >&) const;
92
93    const gslapi::matrix& data_;
94    gslapi::matrix imputed_data_;
95    u_int neighbours_;
96    vector<u_int> not_imputed_;
97    const gslapi::matrix& weight_;
98  };
99
100}} // of namespace cpptools and namespace theplu
101
102#endif
Note: See TracBrowser for help on using the repository browser.