Opened 3 years ago

Closed 3 years ago

#881 closed task (fixed)

Implement INCA XML to CSV converter

Reported by: Nicklas Nordborg Owned by:
Priority: blocker Milestone: INCA XML to CSV converter 1.0
Component: net.sf.basedb.inca Keywords:
Cc:

Description

We should implement a simple standalone Java program that converts an XML file exported from INCA to a tab-separated CSV file.

The XML file is expected to include data that we are not allowed to access and must be filtered.

The filter is defined by the INCA export file from Reggie. This file include the personal numbers that we are allowed to access.

Entries in the XML file that have a matching entry in the SCANB file should have should be fully writted to the CSV files. XML entries that doesn't match should still be written to the CSV file except that certain fields should be "blanked" out. The fields to blank are defined by a "blacklist" file.

Change History (15)

comment:1 Changed 3 years ago by Nicklas Nordborg

Milestone: INCA XML to CSV converter 1.0

comment:2 Changed 3 years ago by Nicklas Nordborg

(In [3881]) References #881: Implement INCA XML to CSV converter

Added

  • Folder structure for source code
  • JAR file manifest
  • build.xml
  • License and readme files
  • Eclipse project files

comment:3 Changed 3 years ago by Nicklas Nordborg

(In [3883]) References #881: Implement INCA XML to CSV converter

Added parser for reading the file exported from SCANB.

comment:4 Changed 3 years ago by Nicklas Nordborg

(In [3884]) References #881: Implement INCA XML to CSV converter

Open a file selection dialog for selecting the SCANB file unless the path to it has been given on the command line.

comment:5 Changed 3 years ago by Nicklas Nordborg

(In [3885]) References #881: Implement INCA XML to CSV converter

Added a SAX parser implementation for parsing the INCA XML file. It currently only generates some debug output.

comment:6 Changed 3 years ago by Nicklas Nordborg

(In [3887]) References #881: Implement INCA XML to CSV converter

Added a writer for the CSV file. The IncaXmlParser parser will simply call the writer every time a full row has been completed. The writer checks with the ScanBParser if the row should be fully accepted or if the blacklisted columns should be masked.

The writer will also encode newline, tabs and backslash characters to \n, \t,
. This is compatible with the TabCrLfEncoderDecorder implementation in BASE and should make it easy to parse the CSV file in Reggie.

comment:7 Changed 3 years ago by Nicklas Nordborg

(In [3890]) References #881: Implement INCA XML to CSV converter

Added "Save as" dialog for setting the output CSV file. A default filename is generated by replacing the .xml from the INCA XML file with .csv and placing it in the same folder as the SCANB CSV file.

Added some counters to the parsers/writer and display some information about what happend to stdout and as a popup information dialog. The dialog is only used if the user also selected files using the GUI.

comment:8 Changed 3 years ago by Nicklas Nordborg

(In [3891]) References #881: Implement INCA XML to CSV converter

Personal number in INCA are stored with a '-' separator but SCANB doesn't.

We replace the '-' with nothing and store the value as "PersonalNo". Both columns are included in the CSV file. The "PersonalNo" is always the first column.

comment:9 Changed 3 years ago by Nicklas Nordborg

(In [3892]) References #881: Implement INCA XML to CSV converter

Renamed JAR/TAR file to start with IncaXml2Csv instead of inca-xml2csv.

comment:10 Changed 3 years ago by Nicklas Nordborg

(In [3893]) References #881: Implement INCA XML to CSV converter

Ignore the new file names.

comment:11 Changed 3 years ago by Nicklas Nordborg

(In [3894]) References #881: Implement INCA XML to CSV converter

Arrgggh... difficult to get the file name correct.

comment:12 Changed 3 years ago by Nicklas Nordborg

(In [3896]) References #881: Implement INCA XML to CSV converter

Updated README and once again changed the filename of the JAR file.

comment:13 Changed 3 years ago by Nicklas Nordborg

(In [3897]) References #881: Implement INCA XML to CSV converter

Changes to make the README more "tracified".

comment:14 Changed 3 years ago by Nicklas Nordborg

(In [3898]) References #881: Implement INCA XML to CSV converter

Changes to make the README more "tracified".

comment:15 Changed 3 years ago by Nicklas Nordborg

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.