Opened 4 months ago

Last modified 5 weeks ago

#1299 new task

Import of FASTQ files from external lab

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: Reggie v4.32
Component: net.sf.basedb.reggie Keywords:
Cc:

Description

This is related to #1295 and is the second step after importing information about the specimen. The import in #1295 should create all items from the specimen down to MergedBioAssay. The MergedBioAssay will be a placeholder only until FASTQ files has been imported and processed with Trimmomatic. It will probably be a good idea to use the pattern with an item list to control the flow of items between #1295 and this import.

It is expected that the FASTQ files have not been processed after the demux except for adapter trimming that has replaced reads with N instead of removing reads from the FASTQ files.

The import functionality need to make those FASTQ files "compatible" with the FASTQ files we get from the regular demux wizard. This means we have to replicate the following steps:

  • Bowtie to estimate the fragment size and standard deviation
  • Trimmomatic step 1 to remove the N reads. This is an alternate version of step 1 in the regular demux and will allow is to get values for the PF_READS and ADAPTER_READS annotations
  • Trimmomatic step 2 to remove low-quality reads. This is exactly the same as the second step in the regular demux wizard.

Auto-confirmation should be supported and continue with the regular secondary analysis workflow (eg. the Legacy pipeline and Hisat alignment). The same checks (for individual items) that are used in the regular demux wizard should be made.

A manual confirmation wizard is also needed with options for accepting, retrying the import or flagging the RNA.

Change History (11)

comment:1 Changed 4 months ago by Nicklas Nordborg

In 6179:

References #1299: Import of FASTQ files from external lab

Started to implement a wizard for importing FASTQ files. The wizard follows the usual pattern for most secondary analysis wizards that start someting. The first step includes a selection list of merged sequences items that are in the "FASTQ import pipeline" list. There is no "Select manually" functionality since it is typically not possible to run this wizard unless there are FASTQ files available. How to access the FASTQ files is not yet resolved but it is probably a responsibility for #1295 to solve this.

comment:2 Changed 4 months ago by Nicklas Nordborg

In 6180:

References #1299: Import of FASTQ files from external lab

Started to implement a wizard for importing FASTQ files. The wizard follows the usual pattern for most secondary analysis wizards that start someting. The first step includes a selection list of merged sequences items that are in the "FASTQ import pipeline" list. There is no "Select manually" functionality since it is typically not possible to run this wizard unless there are FASTQ files available. How to access the FASTQ files is not yet resolved but it is probably a responsibility for #1295 to solve this.

comment:3 Changed 4 months ago by Nicklas Nordborg

In 6181:

References #1299: Import of FASTQ files from external lab

Implemented ImportFastqJobCreator that generates a script and submit it to the analysis cluster. The script re-uses many settings from the <demux> section in reggie-config.xml since we are trying to reproduce the final steps of the regular demux script in a comapatible way. The main difference is the first Trimmomatic step which uses settings from the <step-1-import> tag instead with a default setting of TRAILING:3 MINLEN:2. This should give us compatible numbers for the ADAPTER_READS annotation assuming that the FASTQ files we get either have trimmed or masked adapter sequences. There are still some unresolved issues with the script. The most important one is that we don't know exactly which FASTQ files are associated with which merged item. The current implementation only works if there is exacly one pair of FASTQ files available in the import directory.

comment:4 Changed 4 months ago by Nicklas Nordborg

In 6182:

References #1299: Import of FASTQ files from external lab

Implemented ImportFastqJobCreator that generates a script and submit it to the analysis cluster. The script re-uses many settings from the <demux> section in reggie-config.xml since we are trying to reproduce the final steps of the regular demux script in a comapatible way. The main difference is the first Trimmomatic step which uses settings from the <step-1-import> tag instead with a default setting of TRAILING:3 MINLEN:2. This should give us compatible numbers for the ADAPTER_READS annotation assuming that the FASTQ files we get either have trimmed or masked adapter sequences. There are still some unresolved issues with the script. The most important one is that we don't know exactly which FASTQ files are associated with which merged item. The current implementation only works if there is exacly one pair of FASTQ files available in the import directory.

comment:5 Changed 4 months ago by Nicklas Nordborg

In 6184:

References #1299: Import of FASTQ files from external lab

Implemented auto-confirmation for FASTQ import.

comment:6 Changed 4 months ago by Nicklas Nordborg

In 6185:

References #1299: Import of FASTQ files from external lab

Implemented auto-confirmation for FASTQ import.

comment:7 Changed 4 months ago by Nicklas Nordborg

In 6186:

References #1299: Import of FASTQ files from external lab

Unrelated minor fixes to speed up auto-confirmation after SSP and report creation.

comment:8 Changed 4 months ago by Nicklas Nordborg

In 6187:

References #1299: Import of FASTQ files from external lab

Started with a manual confirmation wizard for FASTQ import. It should work if just confirimg a success, but failures and other cases need more testing and options.

comment:9 Changed 4 months ago by Nicklas Nordborg

In 6188:

References #1299: Import of FASTQ files from external lab

Started with a manual confirmation wizard for FASTQ import. It should work if just confirimg a success, but failures and other cases need more testing and options.

comment:10 Changed 6 weeks ago by Nicklas Nordborg

In 6339:

References #1299: Import of FASTQ files from external lab

Progress report for both Trimmomatic steps.

comment:11 Changed 5 weeks ago by Nicklas Nordborg

In 6340:

References #1299: Import of FASTQ files from external lab

Progress reporting was inserted at incorrect locations in the script.

Note: See TracTickets for help on using tickets.