Changeset 5769


Ignore:
Timestamp:
Dec 3, 2019, 9:38:37 AM (3 years ago)
Author:
Nicklas Nordborg
Message:

References #1208: Implement wizard for building database of variant frequencies in SCAN-B samples

Added progress reporting to the python script that collect variant statistics.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • other/pipeline/trunk/mut_stats.py

    r5767 r5769  
    44
    55Script that calculates statistics for variants in a list of VCF files.
    6 The script requries a tab-separated text file as input. Each line
    7 should have two columns:
     6The script need two parameters:
     7
     81: Path to text file with list of VCF files to process (se below)
     92: Path to progress reporting file
     10
     11The text file given as the first parameter should be a tab-separated
     12text file. Each line should have three columns:
    813
    9141: Patient identifier
    10152: Path to VCF file
     163: Name of alignment (used for progress reporting)
    1117
    1218Counts and frequencies for all variants will calculated for the total and
     
    2430import gzip
    2531import datetime
     32import time
    2633
    2734# Store all variants that we have seen in the VCF files we load
     
    7885        vcf.close()
    7986
     87# Report progress to the progressFile (1-90%)
     88def progressReport(progressFile, alignment, current, total):
     89    percent = 1+current * 89 /  total
     90    with open(progressFile, 'w') as p:
     91        p.write("{0} Reading VCF file for '{1}' ({2} of {3})".format(percent, alignment, current, total))
     92
    8093# Read the patient/VCF list
    8194# Each line should be tab-separated <PAT>\t<PATH-TO-VCF>
     
    88101lines.sort()
    89102
     103progressFile = sys.argv[2]
     104
    90105# Load the VCF files and count the variants in them
     106vcfCount = 0
     107totalVcf = len(lines)
     108nextProgressReport = time.time()+15
     109
    91110for line in lines:
    92     cols = line.split('\t')
     111    cols = line.split('\t')   
     112    vcfCount += 1
     113    # Report progress every 15 seconds
     114    if time.time() > nextProgressReport:
     115        progressReport(progressFile, cols[2], vcfCount, totalVcf)
     116        nextProgressReport = time.time()+15
    93117    loadVcf(cols[1], cols[0])
    94118
Note: See TracChangeset for help on using the changeset viewer.