Changeset 5770
- Timestamp:
- Sep 29, 2011, 1:27:33 PM (11 years ago)
- Location:
- trunk
- Files:
-
- 7 edited
Legend:
- Unmodified
- Added
- Removed
-
trunk/data/plugin_configfile.xml
r5764 r5770 934 934 <configuration pluginClassName="net.sf.basedb.plugins.gtf.GtfReporterImporter"> 935 935 <configname>gene_id (no prefix)</configname> 936 <description>A configuration that uses the gene_id (no prefix)instead of the transcript_id as reporter id.</description>936 <description>A configuration that uses the gene_id instead of the transcript_id as reporter id.</description> 937 937 <parameter> 938 938 <name>trimQuotes</name> … … 1001 1001 </configuration> 1002 1002 <configuration pluginClassName="net.sf.basedb.plugins.gtf.GtfReporterImporter"> 1003 <configname>transcript_id (no prefix)</configname> 1004 <description>A configuration that uses the transcript_id (no prefix) as reporter id.</description> 1003 <configname>transcript_id@chr (no prefix)</configname> 1004 <description>A configuration that uses the <transcript_id>@<seqname> as reporter id. <seqname> is usually the chromosome ID (eg. chr1).</description> 1005 <parameter> 1006 <name>dataHeaderRegexp</name> 1007 <label>Data header</label> 1008 <description>A regular expression that matches the header line just before the data begins. For example: Block\tRow\tColumn.*</description> 1009 <class>java.lang.String</class> 1010 <value><seqname>\t.*<transcript_id>.*</value> 1011 </parameter> 1005 1012 <parameter> 1006 1013 <name>trimQuotes</name> … … 1011 1018 </parameter> 1012 1019 <parameter> 1013 <name>dataHeaderRegexp</name>1014 <label>Data header</label>1015 <description>A regular expression that matches the header line just before the data begins. For example: Block\tRow\tColumn.*</description>1016 <class>java.lang.String</class>1017 <value><seqname>\t.*<transcript_id>.*</value>1018 </parameter>1019 <parameter>1020 1020 <name>reporterIdColumnMapping</name> 1021 1021 <label>External ID</label> 1022 1022 <description>Mapping that picks the reporter's external ID from the data columns. For example: \ID\</description> 1023 1023 <class>java.lang.String</class> 1024 <value>\<transcript_id>\ </value>1024 <value>\<transcript_id>\@\<seqname>\</value> 1025 1025 </parameter> 1026 1026 <parameter> … … 1037 1037 allow = Allow expression and complex mappings, for example, '\Row\, \Column\' or '=2*col('radius')'</description> 1038 1038 <class>java.lang.String</class> 1039 <value> disallow</value>1039 <value>allow</value> 1040 1040 </parameter> 1041 1041 <parameter> … … 1051 1051 <description>Mapping that picks the reporter's name from the data columns. For example: \Name\</description> 1052 1052 <class>java.lang.String</class> 1053 <value>\<transcript_id>\ </value>1053 <value>\<transcript_id>\@\<seqname>\</value> 1054 1054 </parameter> 1055 1055 <parameter> … … 1061 1061 </parameter> 1062 1062 <parameter> 1063 <name>extendedColumnMapping.chromosome</name> 1064 <label>Chromosome</label> 1065 <description>The chromosome from which the reporter is derived</description> 1066 <class>java.lang.String</class> 1067 <value>\<seqname>\</value> 1068 </parameter> 1069 <parameter> 1063 1070 <name>decimalSeparator</name> 1064 1071 <label>Decimal separator</label> … … 1066 1073 <class>java.lang.String</class> 1067 1074 <value>dot</value> 1075 </parameter> 1076 <parameter> 1077 <name>symbolColumnMapping</name> 1078 <label>Gene symbol</label> 1079 <description>Mapping that picks the reporter's gene symbol from the data columns. For example: \Gene symbol\</description> 1080 <class>java.lang.String</class> 1081 <value>\<gene_id>\</value> 1068 1082 </parameter> 1069 1083 </configuration> -
trunk/src/core/net/sf/basedb/util/gtf/GtfInputStream.java
r5764 r5770 48 48 attributes are lined up with the first line. Note that any attributes 49 49 that are not present in the first line are skipped. The parser also has an 50 option to skip lines with a <code>transcript_id </code> that is not unique.50 option to skip lines with a <code>transcript_id+seqname</code> that is not unique. 51 51 Normally, a GTF file will contain multiple entries with the same id:s, but 52 52 in most cases we are not interested in this when importing data to BASE. … … 91 91 @param charset The character set used in the file 92 92 @param skipRepeatedTranscriptIds TRUE to skip lines with non-unique 93 values for transcript_id 93 values for transcript_id+seqname 94 94 @throws IOException 95 95 */ … … 142 142 { 143 143 // read next data line 144 readMore();144 buffer = readMore(); 145 145 index = 0; 146 146 if (buffer == null) return -1; … … 217 217 // Generate header line 218 218 StringBuffer sb = new StringBuffer(); 219 transcriptIds.add(attributes[transcriptIdIndex].value );219 transcriptIds.add(attributes[transcriptIdIndex].value+ '@' + line[0]); 220 220 if (skipRepeatedTranscriptIds) 221 221 { … … 244 244 @throws IOException 245 245 */ 246 private voidreadMore()246 private byte[] readMore() 247 247 throws IOException 248 248 { … … 254 254 { 255 255 buffer = null; 256 return ;256 return null; 257 257 } 258 258 … … 260 260 parseAttributes(line[8]); 261 261 262 String id = attributes[transcriptIdIndex].value ;262 String id = attributes[transcriptIdIndex].value + '@' + line[0]; 263 263 if (transcriptIds.add(id) && skipRepeatedTranscriptIds) 264 264 { … … 268 268 269 269 // Convert to byte[] 270 buffer =appendLine(new StringBuffer(), line, attributes).toString().getBytes(charset);270 return appendLine(new StringBuffer(), line, attributes).toString().getBytes(charset); 271 271 } 272 272 -
trunk/src/plugins/core/core-plugins.xml
r5764 r5770 767 767 <description> 768 768 Creates reporters and reporter lists from GTF (Gene transfer format) 769 files. The default configuration is to use the transcript_idvalue770 as the reporter id and name . No other fields are used, but this can769 files. The default configuration uses the transcript_id+seqname value 770 as the reporter id and name, and gene_id as "symbol". This can 771 771 be changed by user configurations. For example, to use the gene_id 772 772 instead or to add prefixes to the id values. The importer -
trunk/src/plugins/core/net/sf/basedb/plugins/gtf/DefaultConfigurationValues.java
r5764 r5770 27 27 28 28 import net.sf.basedb.core.BaseException; 29 import net.sf.basedb.core.ExtendedProperties; 29 30 import net.sf.basedb.core.InvalidDataException; 30 31 import net.sf.basedb.core.Job; … … 94 95 defaultValues.put("dataHeaderRegexp", "<seqname>\\t.*<transcript_id>.*"); 95 96 defaultValues.put("minDataColumns", 4); 96 defaultValues.put("featureIdColumnMapping", "\\<transcript_id>\\"); 97 defaultValues.put("reporterIdColumnMapping", "\\<transcript_id>\\"); 98 defaultValues.put("nameColumnMapping", "\\<transcript_id>\\"); 97 defaultValues.put("complexExpressions", "allow"); 98 99 // Reporter importer mappings 100 defaultValues.put("reporterIdColumnMapping", "\\<transcript_id>\\@\\<seqname>\\"); 101 defaultValues.put("nameColumnMapping", "\\<transcript_id>\\@\\<seqname>\\"); 102 defaultValues.put("symbolColumnMapping", "\\<gene_id>\\"); 103 if (ExtendedProperties.getProperty("ReporterData", "chromosome") != null) 104 { 105 defaultValues.put("extendedColumnMapping.chromosome", "\\<seqname>\\"); 106 } 107 108 // Reporter map importer mappings (also use reporterIdColumnMapping) 109 defaultValues.put("featureIdColumnMapping", "\\<transcript_id>\\@\\<seqname>\\"); 99 110 } 100 111 value = defaultValues.get(name); -
trunk/src/plugins/core/net/sf/basedb/plugins/gtf/GtfReporterImporter.java
r5764 r5770 41 41 as a wrapper to generate a pure column-based output which can be used 42 42 by the regular tools for file parsing. The importer will also skip 43 lines with a non-unique transcript_id .43 lines with a non-unique transcript_id+seqname. 44 44 <p> 45 45 46 The default configuration is to use the transcript_id as the reporter id47 and name. No other information is extracted, but this can be changed by48 user configurations depending on what additional attributes that are49 present in the GTF file.46 The default configuration is to use the transcript_id+seqname as the reporter id 47 and name. gene_id is stored as "gene symbol" and seqname as "chromosome". 48 The default configuration can be changed by user configurations depending on 49 what additional attributes that are present in the GTF file. 50 50 51 51 @author Nicklas -
trunk/src/test/TestGtfImporters.java
r5764 r5770 53 53 static boolean test_all() 54 54 { 55 write("++Testing GTF import s using plugin");55 write("++Testing GTF importers using plugin"); 56 56 // Upload GTF file 57 57 int fileId = TestFile.test_create("data/test.gtf", false, false); … … 65 65 TestReporter.test_list(35); 66 66 67 67 /* 68 68 // Test reporter map importer 69 69 int arrayDesignId = TestArrayDesign.test_create(PlatformVariant.SEQUENCING_EXPRESSION, false); … … 75 75 TestArrayDesign.write_feature_header(); 76 76 TestArrayDesign.test_list_features(arrayDesignId, 35); 77 77 */ 78 78 if (TestUtil.waitBeforeDelete()) TestUtil.waitForEnter(); 79 TestArrayDesign.test_delete(arrayDesignId);79 // TestArrayDesign.test_delete(arrayDesignId); 80 80 // Delete reporters 81 81 int deleteReporterJobId = test_create_reporter_job(gtfReporterImporterId, fileId, "delete"); … … 84 84 TestJob.test_delete(reporterJobId); 85 85 TestJob.test_delete(deleteReporterJobId); 86 TestJob.test_delete(featureJobId);87 TestPluginConfiguration.test_delete(featureConfigurationId);86 // TestJob.test_delete(featureJobId); 87 // TestPluginConfiguration.test_delete(featureConfigurationId); 88 88 TestFile.test_delete(fileId); 89 89 -
trunk/src/test/TestGtfInputStream.java
r5759 r5770 44 44 45 45 test_parse("data/test.gtf", 46 "\\< seqname>\\", "\\<source>\\", "\\<gene_id>\\", "\\<transcript_id>\\", "\\<gene_name2>\\");46 "\\<transcript_id>\\@\\<seqname>\\", "\\<source>\\", "\\<gene_id>\\", "\\<gene_name2>\\"); 47 47 48 48 write("++Testing GTFInputStream "+(ok ? "OK" : "Failed")+"\n");
Note: See TracChangeset
for help on using the changeset viewer.