Opened 11 years ago

Closed 11 years ago

#215 closed task (fixed)

BGX reporter importer hangs if the [Controls] section has less than 6 columns

Reported by: Jari Häkkinen Owned by: Nicklas Nordborg
Priority: major Milestone: Illumina package v1.4
Component: net.sf.basedb.illumina Keywords:
Cc:

Description

The BGX file for microRNA arrays is not exactly the same as for the expression (direct hyb) arrays we have imported previously. The BGX importer recognizes the file but fails in parsing it during plug-in execution. The microRNA BGX has 5 sections and the docs describes 3 section BGXs.

I have attached the original BGX file with 5 [sections], and I also attach my trunkated BGX file with only 3 [sections].

I have made several test but all plug-in jobs hangs and cannot be aborted in the sense that status aborting is changed to an error when abort finalize. Also, the tests seems to depend on each other since the hang status of a plug-in depends on when the server was restarted.

Import 1

Name  	Run plugin: Illumina BGX reporter importer
Description 	
Priority 	4 (1 = highest, 10 = lowest)
Status 	Aborting: Parsing data in file 'humanMI_V2_R0_XS0000124-MAP.bgx'... (1114
lines so far)
Percent complete 	 97%
Created 	2009-05-18 11:27:17
Started 	2009-05-18 11:27:47
Ended 	
Running time 	2 hours 55 minutes 2 seconds
Server 	base2.thep.lu.se
Job agent 	- none -
User 	Jari Häkkinen (admin)
Experiment 	- none -
Plugin 	Illumina BGX reporter importer (version 1.2)
Configuration 	Illumina MicroRNA default reporter import

with parameters (not showing unset parameters)

Character set 	ISO-8859-1
Decimal separator 	dot
Default error handling 	fail
File 	/home/jari/Illumina BGX/humanMI_V2_R0_XS0000124-MAP.bgx
Mode 	create
Reporter is used 	skip
Plugin configuration parameters
Parameter version 	2 (5)
Character set 	ISO-8859-1
Complex column mappings 	disallow
Data header 	SYMBOL.*Probe_Id.+Array_Address_Id.*
Data splitter 	\t
Decimal separator 	dot
Name 	\Probe_Id\
External ID 	\Probe_Id\
Remove quotes 	true

Import 2

The same result as above with an additional parameter set

Data footer  	 \[ControlGraphs\]

Import 3 with my trunkated file. NOTE I cannot pass parsing of [Heading] ... 3%

View job -- Run plugin: Illumina BGX reporter importer
Name 	Run plugin: Illumina BGX reporter importer
Description 	
Priority 	4 (1 = highest, 10 = lowest)
Status 	Aborting: Parsing section 'Heading' in file
'humanMI_V2_R0_XS0000124-MAP_JHspecial.bgx'...
Percent complete 	
  	 
	 3%
Created 	2009-05-18 13:33:06
Started 	2009-05-18 13:33:26
Ended 	
Running time 	57 minutes 54 seconds
Server 	base2.thep.lu.se
Job agent 	- none -
User 	Jari Häkkinen (admin)
Experiment 	- none -
Plugin 	Illumina BGX reporter importer (version 1.2)
Configuration 	Illumina MicroRNA default reporter import

Import 4 trying to go back to original 97% progress state but cannot reach that anylonger until I restart the server. After restart the 1114 parsing stop reappears.

View job -- Run plugin: Illumina BGX reporter importer
Name 	Run plugin: Illumina BGX reporter importer
Description 	
Priority 	4 (1 = highest, 10 = lowest)
Status 	Aborting: Parsing section 'Heading' in file
'humanMI_V2_R0_XS0000124-MAP.bgx'...
Percent complete 	
  	 
	 3%
Created 	2009-05-18 14:15:42
Started 	2009-05-18 14:15:57
Ended 	
Running time 	18 minutes 2 seconds
Server 	base2.thep.lu.se
Job agent 	- none -
User 	Jari Häkkinen (admin)
Experiment 	- none -
Plugin 	Illumina BGX reporter importer (version 1.2)
Configuration 	Illumina MicroRNA default reporter import
Job parameters
Character set 	ISO-8859-1
Decimal separator 	dot
Default error handling 	fail
File 	/home/jari/Illumina BGX/humanMI_V2_R0_XS0000124-MAP.bgx
Mode 	create
Reporter is used 	skip
Plugin configuration parameters
Parameter version 	5 (5)
Character set 	ISO-8859-1
Complex column mappings 	disallow
Data header 	SYMBOL.*Probe_Id.+Array_Address_Id.*
Data splitter 	\t
Decimal separator 	dot
Name 	\Probe_Id\
External ID 	\Probe_Id\
Remove quotes 	true

Attachments (2)

humanMI_V2_R0_XS0000124-MAP.bgx (63.7 KB) - added by Jari Häkkinen 11 years ago.
humanMI_V2_R0_XS0000124-MAP_JHspecial.bgx (63.3 KB) - added by Jari Häkkinen 11 years ago.

Download all attachments as: .zip

Change History (6)

Changed 11 years ago by Jari Häkkinen

Changed 11 years ago by Jari Häkkinen

comment:1 Changed 11 years ago by Jari Häkkinen

On a freshly rebooted server I reach another status message with the trunkated file:

Executing: Parsing section 'Controls' in file 'humanMI_V2_R0_XS0000124-MAP_JHspecial.bgx'..

Percent completed 100%

but the plug-in does not finish.

comment:2 Changed 11 years ago by Nicklas Nordborg

Status: newassigned

I know what is causing this. The problem is with the [Controls] section that only has 5 columns and some bad code in BgxMergeControlsInputStream. Line 224-245 has a loop that reads data from the file:

String line = in.readLine();
while (line != null && !line.startsWith("["))
{
   String[] cols = line.split("\\t");
   if (cols.length < 6) continue; // Ignore lines that have too few column
   ...
   line = in.readLine();
}

The problem is that if a row contains less than six columns the loop starts over again without reading in a new line and hangs there forever. The plug-in can not be aborted because there is no check for this in the loop.

The fix should be rather simple. Just read a new line even if the current one is less than 6 columns. It might be a good idea to include a check for interruption in the loop as well.

Also note that the current BGX plug-in can't read the data in the [Controls] section since it is hardcoded that six given columns are present in a given order (Probe_ID, Array_Address_Id, Reporter_Group_Name, Reporter_Group_Id, Reporter_Composite_Map, Probe_Sequence) that doesn't match the columns that are present in the microRNA file.

As a workaround the file can be truncated to not include the [Controls] section.

Also a comment about the various % values that was reached in the different executions. The progress in the database is not updated every time the plug-in reports an update. In order to not overload the database with progress updates there must be a certain time interval between two updates. So what is seen in the web interface may differ from each execution of a plug-in.

comment:3 Changed 11 years ago by Nicklas Nordborg

Summary: Cannot import reporters from a Illumina microRNA BGX fileBGX reporter importer hangs if the [Controls] section has less than 6 columns

comment:4 Changed 11 years ago by Nicklas Nordborg

Resolution: fixed
Status: assignedclosed

(In [1085]) Fixes #215: BGX reporter importer hangs if the [Controls] section has less than 6 columns

Note: See TracTickets for help on using tickets.