Beruflich Dokumente
Kultur Dokumente
datasets
NCBI/NLM/NIH
07 March 2011
Version 3.101
Table of Contents
* [1]Quick start
+ [2]Steps done once per reference genome
+ [3]Commands for running the tagger
* [4]Synopsis
* [5]Options
* [6]Environment
* [7]Config files
* [8]Return codes
Quick start
Programs run from this script (bmfilter, srprism) require about 8.5Gb
memory and three times as much harddisk space for index data. Disk
space needed for temporary files depends on input, and is typically
the same size as that of the input for metagenomic datasets.
In all above scenarios, -b, -x, and -T specify the index for bmfilter,
the index for srprism, and the directory to use for temporary files,
respecitvely. If no temporary directory is specified, current working
directory is used. Flag -q of 0 and 1 specify fasta and fastq input
files, respectively. Output specified by -o is a file name if input is
fasta or fastq, and it is a directory if the input is a run. The
output for, say run SRR059480, when -o is myresults will be a file
myresults/SRR059480.blacklist that contains the SRA indexes of reads
found to be human rather than the full id. Output files with inputs as
fasta or fastq contain the ids of reads found to be human.
Synopsis
Options
-h
Prints help to stdout and exits with error code 0
-V
Prints version to stdout and exits with error code 0
-C config
Reads config file, overriding previously set values. Subsequent
options may override values set in current config file.
-q quality
Number of quality channels on input (0|1). Should be 0 with
FASTA and 1 with FASTQ input files.
-1 reads
Specifies input file name -- should be FASTA if -q is 0 or
FASTQ if -q is 1.
-2 mates
Specifies input file name for read mates, required to be in the
same format as the file specified with -1 option, and should
have all same read IDs and in the same order. Requires option
-1.
-A run-accession
Specifies SRA run accession. Requires bmfilter to be compiled
with SRA toolkit and data to be organized as required by SRA
specifications. Should not be used with option -1.
-b reference.bitmask
Specifies filename of the reference genome bitmask, proviously
generated by bmtool
-x reference.srprism
Specifies base of the filename for the previously generated
srprism index.
-d reference.seqdb
Specifies path to blastdb generated by formatdb. Is not used
unless options passed to bmfilter are modified to generate data
for blastn (see details below).
--ref=reference
Specifies basename for reference indices, literally
--ref=abc/def means "-b abc/def.wbm -x abc/def.srprism
-d abc/def".
-T tmpdir
Specifies directory to be used for temporary files (default is
to use value of $TMPDIR)
-o output
If -1 is used this option provides script with the name for
output file which will contain list of IDs to be tagged as
matching reference genome. Otherwise if -A is used, "output" is
considered to be name of directory, where output file(s) named
by SRA run accession(s) with suffix .blacklist will be created
by the script.
--debug
Will prevent script from deleting temporary files. Also run
times for srprism and bmfilter will be reported.
Environment
PATH
is used to find programs called from script
TMPDIR
if set is used to initialize temporary directory, otherwise
/tmp is used
SRPRISM
if set specifies name and optianally path to srprism
BMFILTER
if set specifies name and optionally path to bmfilter
EXTRACT_FA
if set specifies name and optionally path to extract_fullseq
BLASTN
if set specifies name and optionally path to blastn
Config files
Config file is regular shell script which may set any variables
specified in "Environment" section, plus any of following:
bmfiles
if set specifies bmfilter bitmap file
blastdb
if set specifies blastdb for blastn
srindex
if set specifies srprism index file
Return codes