Sie sind auf Seite 1von 17

FUNPOPGEN

Altrans Manual
Version 1.1.02
Halit Ongen
12/9/2012

Contains the Altrans installation, method, usage examples, options, and files.

Contents
I. Installation ....................................................................................................................................................... 3
Windows.......................................................................................................................................................... 3
Linux and Mac OS ......................................................................................................................................... 3
II. Introduction................................................................................................................................................... 4
III. Usage Examples .......................................................................................................................................... 8
IV. Worked Example ........................................................................................................................................ 9
V. Options .......................................................................................................................................................... 10
Important Options:................................................................................................................................... 10
-a|--annotation ...................................................................................................................................... 10
-A|--anchor-length ............................................................................................................................... 10
-b|--bam-file ........................................................................................................................................... 10
-c|--check-proper-pairing ................................................................................................................. 10
-e|--min-exon-length .......................................................................................................................... 10
-E|--single-end ....................................................................................................................................... 10
-i|--max-intron-length ........................................................................................................................ 10
-m|--mapping-quality ......................................................................................................................... 10
-n|--no-clipping ..................................................................................................................................... 10
-o|--output-dir ....................................................................................................................................... 11
-p|--output-prefix................................................................................................................................. 11
-r|--read-length ..................................................................................................................................... 11
-s|--split-reads ....................................................................................................................................... 11
-u|--use-unpaired-split ...................................................................................................................... 11
-x|--max-clip ........................................................................................................................................... 11
Other Options: ............................................................................................................................................ 11
-B|--bin-size ............................................................................................................................................ 11
-C|--use-inter-chr ................................................................................................................................. 11
-D|--use-diff-strand ............................................................................................................................. 11
-d|--distribution-file............................................................................................................................ 11
-f|--norm-file .......................................................................................................................................... 11
-F|--merge-files ..................................................................................................................................... 12
-g|--gene-types ...................................................................................................................................... 12
-h|-?|--help .............................................................................................................................................. 12
-H|--has-header..................................................................................................................................... 12
-I|--ignore-prob-groups..................................................................................................................... 12
-j|--print-bed .......................................................................................................................................... 12
-J|--probability-file............................................................................................................................... 12
-k|--print-combined ............................................................................................................................ 12
-l|--convert .............................................................................................................................................. 12
-M|--use-multi-level ............................................................................................................................ 12
-N|--no-norm-file.................................................................................................................................. 12
-P|--no-fractions ................................................................................................................................... 12
-R|--no-raw-counts .............................................................................................................................. 12
-S|--silent ................................................................................................................................................. 12
-t|--trim .................................................................................................................................................... 13
-v|--verbose ............................................................................................................................................ 13
-V|--version............................................................................................................................................. 13
-U|--use-slots.......................................................................................................................................... 13
-w|--working-dir .................................................................................................................................. 13
-W|--force-skip ...................................................................................................................................... 13
-X|--regex ................................................................................................................................................. 13
-y|--skip-fusion...................................................................................................................................... 13
-Y|--normalize ....................................................................................................................................... 14
-z|--max-reads-dist .............................................................................................................................. 14
-Z|--covar ................................................................................................................................................. 14
VI. Output and other files ........................................................................................................................... 15
1

Fragment size distribution file ............................................................................................................ 15


Master fragment sizes file ..................................................................................................................... 15
Forward file ................................................................................................................................................. 15
Reverse file .................................................................................................................................................. 15
Combined file.............................................................................................................................................. 15
Raw forward file ........................................................................................................................................ 15
Raw reverse file ......................................................................................................................................... 15
2norm file ..................................................................................................................................................... 15
Norm file ....................................................................................................................................................... 15
Extended BED file ..................................................................................................................................... 16
Log file ........................................................................................................................................................... 16
Converted file ............................................................................................................................................. 16
Probability file............................................................................................................................................ 16

I. Installation
Windows
You can use the binary at http://sourceforge.net/projects/altrans/files which has been
compiled under Cygwin. In order to compile it yourself download the latest Cygwin at
http://www.cygwin.com/ and follow the instructions provided for Linux and Mac OS.
Note that after you have compiled the source under Cygwin, in order to run it as a native
Windows executable you need to have the relevant Cygwin dlls (for Cygwin 1.7.16-1:
cyggcc_s-1.dll, cygstdc++-6.dll, cygwin1.dll, cygz.dll) in your PATH or in the same
directory as the Altrans binary.

Linux and Mac OS

This has been tested and works with gcc 4.2 or newer. For other compilers you will have
to edit the makefiles.
Unzip and untar:

tar -xzvf altrans.vX.X.XX.tar.gz


Compile:

cd altrans
make
This should compile, with warnings that can be ignored, and create the altrans binary
under the bin/ directory. Precompiled binary distributions will also be available at
http://sourceforge.net/projects/altrans/files for certain flavours of these OSs, however
there is no guarantee that these will run everywhere.
Mac binary was tested and runs on Mac OS X 10.6 & 10.7.

Both linux binaries were tested and run on Ubuntu 8.04 & 10.04 & 12.04, Fedora 14 &
17, SuSE 9.3, and CentOS 5.8.

The linux binary should run on any modern Linux distribution that has the standard C &
C++ and zlib libraries.
To clean:

make clean

II. Introduction

Altrans is a method for the relative quantification of splicing events. It requires a BAM
alignment file from an RNA-seq experiment and an annotation file in GTF format
detailing the location of the exons in the genome. It uses paired end reads where one
mate maps to one exon and the other mate to a different exon and/or split reads
spanning exon-exon junctions to count links between two exons. When there are
overlapping exons, these are grouped into exon groups and unique portions of each
exon in an exon group are identified which are used when assigning reads to an exon.
The link counts ascertained from unique regions are normalized with the probability of
observing such a link given the insert size distribution which is referred to as link
coverage. Finally the quantitative metric produced is the fraction of one links coverage
over the sum of the coverages of all the links that the initial first exon makes. The
algorithm is as follows:

1. Group overlapping exons from annotation into exon groups. Transcript level
information is ignored and exons with exactly the same coordinates
belonging to multiple transcripts are treated as one unique exon.
T1-E2

T1-E3

Transcript 1

T2-E1

T2-E2

T2-E3

Transcript 2

Group 1

Group 2

Group 3

T1-E1

E1

E2

E3

E4

2. Identify unique portion(s) of each exon in an exon group. Exons with


immediate unique portions are called level 1 exons. In order to assign
reads to exons with no unique positions, remove the level 1 exons from the
exon group, determine pseudo-unique positions for the remaining exons,
and increment the level of these exons. Iterate through this process until all
exons in a group have unique or pseudo-unique portions. Use these portions
to assign mate pairs to a link. In the following figure the dark boxes are
constitutive parts of an exon, light boxes are unique portions of an exon
depicted with subscript u followed by the level of the exon, and the empty
boxes are pseudo-unique portions of an exon again depicted with subscript u
followed by the level of the exon.
4

Group 1

E1

Group 2

Group 3

E3

E2

E4

A mate pair linking E1 to E3

E1u, 2

E4u, 1

E3u, 1

E2u, 1

E2

A split read linking E3 to E4

A mate pair linking E2 to E3

3. In exons groups where step 2 fails to identify unique or pseudo-unique


portions for all the exons remove unifying exons from the analysis and
repeat step 2.
E1

E3 shares its start position with E1 and its end


position with E2 and therefore is a unifying
exon. These types of exons are removed from
the exon groups thus they are not part of the
analysis.

E2

E3

4. For exon groups were there are non-overlapping exons, use the insert size
distribution to assign pseudo-counts to certain links between
non-overlapping exons.
E1

A mate pair like the one shown in red here can


be linking E1 to E2, or it can be originating from
E3 only. In order to resolve this, a pseudo count
is assigned to the E1-E2 link which is the
probability of observing the insert size when E1
and E2 are linked over the sum of this
probability and the probability of observing the
insert size when the mate pair is originating
from E3 only.

E2

E3

5. Normalize the link counts determined from the unique portions to calculate
a link coverage. There are two normalization types implemented. The
default is to divide the link counts with the probability of observing such a
link given the insert size distribution. The second method involves
calculating the number of slots an exon link has given the insert size
distributions mode, i.e. the most frequent insert size.

Default method:
E2

E2-E3

Maximum insert size linking these exons = 20

Minimum insert size linking these exons = 10

E2u

E3u

Link Count=15

= 15 [Link Count] / 0.8 [Probability of observing insert sizes from 10 to 20] = 18.75 [Link Coverage]

Slots Method:
1

10 11 12

With a read length of 3 and an insert size of 4, there are 3 slots (shown in black) that link the exons above.
C

E2-E3

= 15 [Link Count] / 3 [Number of Slots] = 5 [Link Coverage]

6. In a given window size, consider all pairings of each exon in an exon group
with all other exon group exons. Links between level 1 exons can be
calculated directly whereas links between higher level exons are calculated
by subtracting coverage of all the other lower level links from the
pseudo-coverage of these exons.
E1

E1u,1

E4u,1

E2u,2
E3

E3u,1

Group 1

E4
E5

E5u,1

Group 2

In this figure the darker boxes are constitutive parts of an exon, lighter boxes are unique
portions of an exon depicted with subscript u followed by the level of the exon, and the
empty boxes are pseudo-unique portions of an exon again depicted with subscript u
followed by the level of the exon. Given these two exon groups the link coverages are
calculated in the following way:
C

E1->E4

E1->E5
E3->E4

E3->E5

E2->E4

E2->E5

E1u,1->E4u,1

E1u,1->E5u,1
E3u,1->E4u,1

C
C

E3u,1->E5u,1

E2u,2->E4u,1

E2u,2->E5u

-C

-C

E1->E4

E1->E5

-C

-C

E3->E4

E3->E5

7. Calculate the fraction of one exon link as the coverage of the link over the
sum of the coverages of all the links that the first exon makes.
EiEj
L = last exonindex

EiEj =

EiEn

GroupEj =+1
GroupEi

E1

E2

FE1-E2 = 5 [RE1-E2] / ( 5 [RE1-E2] + 3 [RE1-E3] )= 0.625


FE1-E3 = 3 [RE1-E3] / ( 5 [RE1-E2] + 3 [RE1-E3] ) = 0.375

E3

8. Repeat step 7 in both 5-to-3 (forward) and 3-to-5 (reverse) directions to


capture splice acceptor and donor effects respectively.

III. Usage Examples


Before you run altrans please check all the option defaults and
make sure they make sense for your specific needs. You need to
specify options like --single-end, --split-reads, or --read-length,
they are not automatically detected.
The default options are for a paired end experiment with a read length of 49 bp that
contains no split read mapping. We include all mate pairs with a mapping quality 10
which are correctly oriented on the same chromosome separated by a maximum
distance of 1,000,000 bp. The aligner used soft clips reads and uses a minimum
alignment length of 20 bp. The fragment length distribution of mate pairs is determined
from exons 300bp in length:
altrans --bam-file yourBamFile.bam --annotation gencode.v6.gtf.sorted
--mapping-quality 10 --max-intron-length 1000000 --read-length 49
--min-exon-length 300 --max-clip 29

The most basic usage involves supplying just a BAM and an annotation file:

altrans --bam-file yourBamFile.bam --annotation gencode.v6.gtf.sorted

If you would like to only include genes that are protein coding or lincRNAs:

altrans --bam-file yourBamFile.bam --annotation gencode.v6.gtf.sorted


--gene-types protein_coding lincRNA

If you would like to use all the links, print all the files there are to print, and already have
a fragment size distribution file:
altrans --bam-file yourBamFile.bam --annotation gencode.v6.gtf.sorted
--use-diff-strand --use-inter-chr --print-bed --print-combined --verbose
--distribution-file fragmentSizes.fragment_sizes

If you want to write to local drives on each node and then copy the result files to shared
storage and would like to use a prefix for your output files:
altrans --bam-file yourBamFile.bam --annotation gencode.v6.gtf.sorted
--output-dir /sharedFolder/altrans --working-dir /scratch/local/weekly
--output-prefix myResults

If you have samples with mixed read length, for example 75 and 76, and you would like
to analyse everything with the shortest read length, and also if you have split read
alignments, e.g. GEM or TopHat, and would like to check for proper pairing:
altrans --bam-file yourBamFile.bam --annotation gencode.v6.gtf.sorted
--mapping-quality 150 --check-proper-pairing --read-length 75 --split-reads
--anchor-length 1 --trim

If you want to normalize read counts before calculating the fractions:

altrans --bam-file yourBamFile.bam --annotation gencode.v6.gtf.sorted


--no-fractions
##Merge all the 2norm files produced from the individual runs. For example
if you have file names like sample1_sorted.bam.2norm:
altrans --merge-files *.2norm --output-prefix allSamples --regex "(.+)_.+"
##At this point normalize with the method of your choice or the integrated
normalization method, and produce a norm file with positive counts.

altrans --normalize allSamples.2norm --covar yourCovariatesFile.txt


--output-prefix yourNormFile
##Create a master fragment sizes file. For example if you have file names
like sample1_sorted.bam.fragment_sizes:
ls -1 *.fragment_sizes | awk '{t=$1;sub(".*\\/", "", t);
sub("_.*","",t);print t,$1}' | tr ' ' '\t' > masterDistFile
##Run altrans again to calculate fractions from normalized counts
altrans --norm-file yourNormFile.norm --annotation gencode.v6.gtf.sorted
--distribution-file masterDistFile --probability-file
yourProbFile.probability

IV. Worked Example


##Assumes a unix like environment with gcc 4.2 or later.
#get the latest altrans
wget http://sourceforge.net/projects/altrans/files/altrans.vX.X.XX/altrans.vX.X.XX.tar.gz
tar zxvf altrans.vX.X.XX.tar.gz
#compile
cd altrans
make
#get the sample dataset and extract
wget http://sourceforge.net/projects/altrans/files/sampleDataset.tar.gz
tar zxvf sampleDataset.tar.gz
cd sampleDataset

#run altrans for both BAM files


../bin/altrans --bam-file Sample1Tissue1_chr22_sorted.bam --annotation
gencode.v10.annotation.gtf.chr22.sorted --read-length 75 --output-prefix
Sample1Tissue1_chr22 --mapping-quality 150 --check-proper-pairing
--split-reads
../bin/altrans --bam-file Sample1Tissue2_chr22_sorted.bam --annotation
gencode.v10.annotation.gtf.chr22.sorted --read-length 75 --output-prefix
Sample1Tissue2_chr22 --mapping-quality 150 --check-proper-pairing
--split-reads

#merge the forward and reverse files


../bin/altrans --merge-files *.forward --regex "(.+)_.+" --output-prefix
allSamples.forward
../bin/altrans --merge-files *.reverse --regex "(.+)_.+" --output-prefix
allSamples.reverse

V. Options

Defaults for all the options are given in parentheses.

Important Options:

-a|--annotation
Annotation file containing the exons in GTF format
(http://genome.ucsc.edu/FAQ/FAQformat.html#format4). This file MUST be
sorted first by chromosome then by start position. If the file is unsorted or you
are unsure, sort it by
sort -k1,1 -k4,4g filename > filename.sorted

in *nix systems. If you want to use the -g|--gene-types option to include a subset
of gene types then the "gene_type" and transcript_type attributes have to be
set for all the exons in the file. (Required unless -F|--merge-files or
-Y|--normalize)

-A|--anchor-length
Minimum number of bases required in either side of a splice junction for split
reads. Only used when -s|--split-read option is provided. (1)

-b|--bam-file
Alignments in BAM format (http://samtools.sourceforge.net/SAM1.pdf). This is
required unless you are inputting a normalized link counts file with the
-f|--norm-file option, in which case it is ignored. (Required unless -f|--norm-file
or -F|--merge-files or -Y|--normalize)

-c|--check-proper-pairing
Require the mate pairs to be properly paired according to the aligner as
determined from the bitwise flag of the BAM file. This may or may not be a good
idea depending on the aligner used. The default behaviour is to ignore this flag
and use pairs if both are mapped with mapping quality greater than
-m|--mapping-quality, are in the correct orientation, and are separated by less
than or equal to -i|--max-intron-length bases. (false)

-e|--min-exon-length
While determining fragment size distribution, only mate pairs where both mates
map inside the same exon with a size greater than or equal to this, are included.
In order not to bias the distribution in favour of small fragment sizes, a value at
least twice that of the expected fragment size is suggested. (300)

-E|--single-end
The alignment contains single end reads. These reads have to be split mapped
for altrans to work. If given then fragment length distribution calculation is
skipped and -s|--split-reads and -u|--use-unpaired-split options are
automatically set. (false)

-i|--max-intron-length
Maximum distance (bp) between the mate pairs. This is only used when no
-c|--check-proper-pairing option is given otherwise it is ignored. (1000000)
-m|--mapping-quality
Any read less than this threshold is not included in the analysis. (10)

-n|--no-clipping
Soft clipping of reads is not allowed by aligner hence only reads where the
complete read length is aligned are mapped. (false)
10

-o|--output-dir
Output directory, MUST exist. (./)

-p|--output-prefix
Prefix to use for output files. If provided with the -f|--norm-file option than this
gets appended to the sample name. (-b|--bam-file or -f|--norm-file or the first file
in the merge list)
-r|--read-length
Read length. (49)

-s|--split-reads
Alignment contains split reads. (false)

-u|--use-unpaired-split
CURRENTLY NOT USED. Use valid mapped split reads where one mate is
mapped but the other is not. Although these would normally fail the pairing
criterion, they may still be used since they contain information even as an
unpaired read. (false)

-x|--max-clip
Maximum clipping length (bp), this is ignored if -n|--no-clipping is given. (29)

Other Options:

-B|--bin-size
The length (bp) of the bins that the genome is divided into for matching a
position to an exon. Higher numbers decrease memory usage but increase
running time. Memory gained from adjusting this is minimal so dont modify
unless memory is in real short supply. (1000)

-C|--use-inter-chr
Include links generated when both of the mates are properly mapped but align
to exons on different chromosomes. (false)

-D|--use-diff-strand
Include links generated when both of the mates are properly mapped but align
to exons on different strands. (false)
-d|--distribution-file
You can provide 2 types of files with this option, both must be tab separated.
(Required if -f|--norm-file)

If you are using the -b|--bam-file option then provide a Fragment size
distribution file. Since the fragment size distribution is required before reads can
be assigned to exons, if this file is not provided the BAM file is read twice, once to
determine the fragment size distribution and once to assign reads to exons.
If you are using the -f|--norm-file option then you MUST provide a Master
fragment sizes file. All the samples MUST be in this file.

-f|--norm-file
Provide a Norm file in which case instead of reading a BAM file and assigning
counts to links, the program will calculate link fractions using these normalized
counts. When this option is given you are required to provide a Master fragment
sizes file with the -d|--distribution-file option and a Probability file with
-J|--probability-file.
11

-F|--merge-files
Merges the provided Forward file, Reverse file, Combined file, Raw forward file,
Raw reverse file, 2norm file, Norm file, or Converted file files. See also:
-H|--has-header and -X|--regex.

-g|--gene-types
A space separated list of gene types that are allowed in the analysis. In order to
use this option the "gene_type" and transcript_type attributes have to be set in
your annotation GTF. If given, then BOTH the "gene_type" and transcript_type
attributes for a particular exon must match the provided types in order for it to
be included in the analysis. (include all types)

-h|-?|--help
Print the help message and exit.

-H|--has-header
Only used when -F|--merge-files is given. The files to be merged have header
lines. Disables -X|--regex. (false)

-I|--ignore-prob-groups
Do not include groups from which certain exon(s) were removed since they
were unifying exon(s), i.e. an exon that overlaps at least 2 other exons and has
no unique portions. (false)
-j|--print-bed
Print an Extended BED file for the paired reads which pass the
-m|--mapping-quality threshold. (false)

-J|--probability-file
Probability file. (Required when -f|--norm-file)

-k|--print-combined
Print a Combined file where instead of dividing the links of a primary exon into
forward and reverse directions, the fractions are computed using all the links a
primary exon makes. (false)
-l|--convert
Convert a 2norm file or a Norm file, which contains exon IDs rather than exon
names, into the long format and exit (false).

-M|--use-multi-level
Use reads that map to unique portions of multiple exons and default to the
longest covered exon. If you believe these reads disagree with the annotation
then they should be ignored, otherwise they are mapping errors and should be
included. Generally there are so few of these that they can be ignored. (false)
-N|--no-norm-file
Do not print out a 2norm file. (false)

-P|--no-fractions
Do not print out the Forward file, the Reverse file, and the Combined file.

-R|--no-raw-counts
Do not print out the Raw forward file and the Raw reverse file.
-S|--silent
Do not print file processing progress. (false)
12

-t|--trim
Auto trim reads longer than -r|--read-length to read length. This is useful if you
have sequenced samples with multiple read lengths and would like to treat them
as a different read length on the fly. Trimming is accomplished by editing the
CIGAR string. (false)

-v|--verbose
Print extra information about the exon groups and the corresponding exons to
the Log file. Each information line starts with a specific string where lines
starting with G describe a particular group and lines starting with E list the
member exon(s) details, and the format of the lines is as follows:
GB GroupChromosome GroupStart(0-based) GroupEnd(1-based) GroupID

GI NoExonsInGroup GroupLength LengthOfTheLongestExon


N(normal)|WG(problematic group) N(normal)|NO(non-overlapping exons
found)
GE SpaceSeparatedListOfExonIDs

E ExonChromosome ExonStart(0-based) ExonEnd(1-based) ExonID


UsedExonName RealExonName(s) N(normal)|UE(unifying exon)
N(normal)|DS(same coordinates but different strands) strand length
UniqueRegionStart(relative to group start, 0-based):UniqueRegionEnd(relative
to group start, 0-based)

-V|--version
Print the version and exit. (false)

-U|--use-slots
Use slots when calculating coverage for each exon, as opposed to using the
probability of observing the link. (false)

-w|--working-dir
First write the output files to this directory and move the files to the
-o|--output-dir directory when finished. This is useful if you are using a cluster
and would like to write to local storage in each node and move the files to shared
storage, potentially improving performance, MUST exist. (-o|--output-dir)
-W|--force-skip
Skip all reads that are not in -r|--read-length length. Use carefully since you may
end up with no reads. (false)

-X|--regex
Used only when -F|--merge-files is in effect and the files do not have headers.
Regular expression used to extract sample names from file names. The sample
name is the part of the regular expression in the first (). For example, with the
default setting and a BAM file called UC93T_120311_7.sorted.bam; the sample
name extracted is UC93T. If there is no match for the regex then the whole file
name is used. When specifying this option please enclose it in double quotes.
((.+)_\\d{6}_\\d.+))

-y|--skip-fusion
Fusion reads generated by tophat are not currently supported. You can skip
these reads by giving this option. The tophat version you are using need to add
the XF tag for fusion reads. (false)
13

-Y|--normalize
Normalize a merged 2norm file (see -F|--merge-files) with all the covariates
given in -Z|--covar. The samples between this file and -Z|--covar must match
perfectly. The method used is multiple linear regression in log space
(log( + 0.1) = 0 + 1 ,1 + , + ) which guarantees positive counts.
The residuals from this regression is transformed into counts and added to a
links estimated mean to come up with final counts ( +0 ). (Required when
-Z|--covar)

-z|--max-reads-dist
When reading a BAM file to determine fragment size distribution, stop when this
many mate pairs are counted. (use all)
-Z|--covar
A tab delimited file containing the covariates to be used in normalization. This
file must contain a header. Each row is a sample and the first column is the
sample name followed by covariate(s). If a covariate column contains a
non-numeric value then this is treated as a factor. (Required when
-Y|--normalize)

14

VI. Output and other files


Fragment size distribution file

The first line of this file contains the running options used. The rest of the file is
tab separated text file with two columns where the first column is a fragment
size (fragment size = insert size + 2 * read length) and the second column is the
frequency of this fragment size. Each line is a different fragment size and these
MUST start from 0 and MUST be sorted and continuous.

Master fragment sizes file

A tab separated text file with two columns where the first column is a sample
name and the second column is the FULL path of the Fragment size distribution
file for that sample. All samples in the analysis, each as a separate line, MUST be
present in this file.

Forward file

A tab separated text file with five columns: link name, link gene, chromosome,
TSS, fraction of this link in the forward direction. Each line is a different link.
This is a main output file.

Reverse file

A tab separated text file with five columns: link name, link gene, chromosome,
TSS, fraction of this link in the reverse direction. Each line is a different link. This
is a main output file.

Combined file

A tab separated text file with five columns: link name, link gene, chromosome,
TSS, fraction of this link in both directions. Each line is a different link.

Raw forward file

A tab separated text file with five columns: link name, exon group ID,
chromosome, TSS, raw count of this link in forward direction. Each line is a
different link. There may be more links in this file than the corresponding
forward file since this file lists all the links observed rather than the links with
positive normalized coverages.

Raw reverse file

A tab separated text file with five columns: link name, exon group ID,
chromosome, TSS, raw count of this link in reverse direction. Each line is a
different link. There may be more links in this file than the corresponding
reverse file since this file lists all the links observed rather than the links with
positive normalized coverages.

2norm file

A tab separated file with the following columns: exon1 ID, exon2 ID,
chromosome, TSS, number of links between exon1 and exon2. Each line is a
different link. You can merge the individual 2norm files together and normalize
these raw counts with a method of your choice producing a Norm file. This file
can also be used as a raw counts file for the Combined file after conversion.

Norm file

A tab separated file with the following columns: exon1 ID, exon2 ID,
chromosome, TSS, followed by normalized positive counts for each sample
where each sample is a different column. Each line is a different link. This file
must contain a header.
15

Extended BED file

A tab separated file with at least 15 columns:

Column 1: a 15 character long state string where a 1 in the positions below


signify:

1: A pair that links two exons


2: A pair that passes QC and aligns to known exons
3: A pair that aligns to non-exonic parts of the genome
4: A pair that partially aligns to known exons
5: A pair that does not agree with annotation although it is exonic.
6: A pair that is in a single exon group however does not agree with any
of the groups exons
7: A pair that aligns to unique regions of multiple exons
8: An exon cannot be found for this pair although it is in an exon group
9: A pair that fails mapping
10: A pair linking exons on different chromosomes
11: A junction pair
12: An unknown junction
13: A split read within the same exon
14: A split within the same exon group
15: A pair which is not properly paired

Columns 2-13: Standard columns of a BED file.


Column 14: Insert start position (1-based)
Column 15: Insert end position (1-based)

Column 16 (optional): Exon ID of this pairs assignment.

Log file

Contains the same information that is printed to the screen.

Converted file

A tab separated text file with the following columns: link name, comma
separated exon group IDs, comma separated exon strands, comma separated
exon chromosomes, followed by counts for each sample where each sample is a
different column. Each line is a different link.

Probability file

A tab separated file with the following columns: exon1 ID, exon2 ID,
chromosome, TSS, number of slots and probability of observing this link. Each
line is a different link.

16

Das könnte Ihnen auch gefallen