Sie sind auf Seite 1von 39

SeqMonk Analysis

Open Seqmonk
Launch SeqMonk

The first thing you will see is the opening page. SeqMonk scans your copy and make sure everything
is in order, indicated by the green check marks.

SeqMonk Analysis Page 1


SeqMonk Analysis

Create New Project


To use SeqMonk, you need to create a new project and chose a genome related to your experiment

1. Under the top menu, go to File and select New project ...
2. When prompted to select a genome, chose GRCh37 under the Homo sapiens folder
3. Click OK to proceed

SeqMonk Analysis Page 2


SeqMonk Analysis

SeqMonk First Look


SeqMonk layout is divided into 4 panels; Quick Access Panel, List Panel, Chromosome Panel,
and Track Panel

Quick Access Panel:


A series of buttons allow quick access to various layout, navigation and search functions

List Panel:
A listing of all the imported and created files

Chromosome Panel:
A quick bird's eye view of data signal on the chromosomes

Track Panel:
A detail view of annotation and data tracks

SeqMonk Analysis Page 3


SeqMonk Analysis

Import BAM
Importing BAM files into SeqMonk

1. To import data into SeqMonk, go to File, chose Import Data, then BAM/SAM ...
2. Navigate to the BAM files location, highlight and select all BAM file (.bam), and click Open.

On the new Import Options window, follow these instructions:


3. Min mapping quality: 20
4. Data Type: Single End
5. Extend reads by (bp): 50

6. Click Import to start importing the files

SeqMonk Analysis Page 4


SeqMonk Analysis

Import In Progress
BAM files are huge, please allow some time to finish the importing process

SeqMonk Analysis Page 5


SeqMonk Analysis

Mitochrodial Genome Was Not Imported


At the end of the importing process, SeqMonk will show that it did not import Mitochondria
chromosome data. That is OK, click Close to continue

Note: Different software packages interpret the Mitochondria naming system differently. In this case,
SeqMonk is expecting Mitochrodria to be named as "M", but our BAM files is naming it "chrM".
Therefore, rendering SeqMonk unable to import Mitochondria reads.

SeqMonk Analysis Page 6


SeqMonk Analysis

What is "Define Probe"?


Reads quantitation is a 2 steps process; Define Probe and Quantitation

Define Probe
1. A Probe is a predefined region on the genome. Here we can use many different methods to
define Probes: gene, mRNA, or CDS

Quantitation
Quantitation is a process of quantifying the amount of reads within the Probe region

2. Define Probe by gene/mRNA. Here a Probe is being defined using the gene/mRNA region, and
the read quantitation is being represented in this region
Note: when using mRNA to define Probe, the algorithm only include reads in exons, not intron. On
the other hand, if gene is used, reads in exons and introns will be included.

3. Define Probe by CDS. Here a Probe is being defined using the CDS region, and the read
quantitation is being represented in this region.

SeqMonk Analysis Page 7


SeqMonk Analysis

Define Probe RNA-seq Pipeline


To quantify the reads for RNA-seq experiment, we will use a Quantitation Pipeline approach.

1. To start the quantitation pipeline, go to Data, then select Quantitation Pipeline

A new Define Quantitation window appears for more option, please chose:
2. Select RNA-Seq quantitation pipeline Option
3. Transcript features: mRNA
4. Library type: Non-strand specific
5. Merge transcript isoforms: check
6. Log transform: check
7. Apply transcript length correction: check

8. Click Run Pipeline to continue

SeqMonk Analysis Page 8


SeqMonk Analysis

Result of Probe Definition


After read quantitation, 31,017 Probes were defined.

This is being shown on the List Panel, under the Probe Lists

SeqMonk Analysis Page 9


SeqMonk Analysis

QC Inspection of Reads
We will do a visual inspection on the imported samples

1. At the Chromosome Panel, use your mouse to highlight the left most region of Chromosome 4.
2. Careful examination reveals that sample ABC_Ly3.bam is particularly noisy; having reads
scattered all over the region

Based on this assessment, we have decided to remove sample ABC_Ly3.bam

SeqMonk Analysis Page 10


SeqMonk Analysis

Remove Bad Sample


1. To remove a sample, go to Data, and select Edit Data Sets ...
2. On the new Edit DataSets ... window, select the bad sample ABC_Ly3.bam
3. Click Delete Dataset to remove the sample from the project

SeqMonk Analysis Page 11


SeqMonk Analysis

Create Replicate Dataset: Step 1


Next, we will group samples into 2 replicate sets: ABC and GCB

1. To group replicate set, go to Data, and chose Edit Replicate Sets...


2. On the Edit Replicate Set... window, click Add New Replicate Set to add the first replicate set
3. We will name the first replicate set ABC

See next step on how to assign samples into each replicate set .... continue ...

SeqMonk Analysis Page 12


SeqMonk Analysis

Create Replicate Dataset: Step 2


Assigning samples into each replicate sets

1. Highlight to select the ABC replicate set


2. Highlight to select all the ABC samples (use shift key to make multiple selection)
3. Click Add to assign these samples to the ABC replicate set

4. Do the same for GCB replicate set.

SeqMonk Analysis Page 13


SeqMonk Analysis

Add Rep Track to Track Panel


We will add the newly created Replicate Set onto the Track Panel

1. To add data track, go to View, and select Set Data Tracks ...
2. In the new Select Data Track window, highlight both ABC and GCB Replicate Sets
3. Click Add to add these data onto the Track Panel
4. Here, it shows that the new data has been added

Note: Examine the Track Panel where the replicate sets ABC and GCB have added to the bottom of
the tracks.

SeqMonk Analysis Page 14


SeqMonk Analysis

Quick Access Panel: Positive and Negative Scale


1. Positive and Negative Scale
Show both the positive and negative scale of the signal intensity
2. Positive Scale
Show only the positive scale of the signal intensity

FYI: Why is there negative value?


Since the quantitation was done by normalizing or dividing the sum of reads over the length of the
Probe (or mRNA), it could produce a value which is less then 1. When logging (base 2) values that
are less then 1, we get a negative value.

Example:
Given a Probe of 2000 base-pair in length, 20 reads were mapped to this Probe
Therefore, the intensity value would be:

intensity = log2(20/2000) = -6.64

SeqMonk Analysis Page 15


SeqMonk Analysis

Quick Access Panel: Dynamic vs. Static Data Colors


1 Dynamics Data Colors
When Dynamic Data Colors is used, the Probe Quantitation bar change color according to the
amount of reads found within the Probes.

2. Static Data Colors


When Static Data Colors is used, the color retain constant in the Probe Quantitation bars.

SeqMonk Analysis Page 16


SeqMonk Analysis

Quick Access Panel: Show Probe and Reads


1. Show Reads Only
Only show the reads distribution

2. Show Probe Quantitation Only


Only show the Probe Quantitation bars

3. Show Both Reads and Probe Quantitation


Show both the read distribution and Probe Quantitation bars together.

SeqMonk Analysis Page 17


SeqMonk Analysis

Quick Access Panel: Read Density Range


1. Low Read Density
Display read distribution in LOW density setting

2. Medium Read Density


Display read distribution in MEDIEUM density setting

3. High Read Density


Display read distribution in HIGH density setting

SeqMonk Analysis Page 18


SeqMonk Analysis

Quick Access Panel: Combine and Split Packed Reads


1. Combine Packed Reads
Display read distribution by mixing the forward and reverse strand reads

2. Split Packed Reads


Display read distribution for forward and reverse strand reads separately
(Forword on top [Red], and reverse on bottom [Blue])

SeqMonk Analysis Page 19


SeqMonk Analysis

Quick Access Panel: Change Annotation and Data Tracks


1. Change Annotation Tracks
Activate to add, remove or organize the Annotation Tracks

2. Change Data Tracks


Activate to add, remove or organize the Data Tracks

SeqMonk Analysis Page 20


SeqMonk Analysis

Plot Probe Value Histogram


Plot the histogram for the overall Probe quantitation value

1. Go to Plots, then select Probe Value Histogram

2. Adjust the Division level for a more granular view of the signal

Note: The Probe Value Histogram gives us a sense of the distribution of positive vs. negative probe
(mRNA in this case) quantitation. Here, we see that negative probe value is slightly higher than
positive.

SeqMonk Analysis Page 21


SeqMonk Analysis

Plot Read Length Histogram


Plot the histogram for the overall Read Length

1. Go to Plots, then select Read Length Histogram

2. Here, the plot shows that all reads have the same length; which is 86 nucleotide in length
Note: the original read length is 36, recall the during the Import BAM step, we extended the reads
by 50 bp. (see Page 4)

SeqMonk Analysis Page 22


SeqMonk Analysis

Plot Probe Length Histogram


Plot the histogram for the different Probe Length

1. Go to Plots, then select Probe Length Histogram


2. Most Probe (mRNA in this case), have relatively short length
3. The probe length result is more apparent when set to Log scale

SeqMonk Analysis Page 23


SeqMonk Analysis

Plot Correlation Matrix


Plot the Correlation Matrix for all data tracks

1. Go to Plots, then select Correlation Matrix...


2. The Correlation Matrix shows that samples in the same group (ABC or GCB) have higher
correlation coefficient (>0.9). Although the correlation between samples from other group is not too
much lower (~0.8). Similar to microarray experiment, we do not expect between group difference for
most Probes.

SeqMonk Analysis Page 24


SeqMonk Analysis

Plot BoxWhisker Plot


Plot BoxWhisker Plot to assess the overall distribution of each individual sample

1. Go to Plots, then select Box Whisker Plot, follow by Visible Data Stores...
2. The BoxWhisker Plot shows very even distribution among the samples, which indicates that the
normalization process was appropriate.

SeqMonk Analysis Page 25


SeqMonk Analysis

Plot Scatter Plot


Plot Scatter Plot to assess the relationship between the two replicate sets

1. Go to Plot, then select Scatter Plot...


2. On the new window, Plot ABC vs. GCB
3. Mouse over each point to see its gene symbols

SeqMonk Analysis Page 26


SeqMonk Analysis

Plot MA Plot
Plot MA Plot to show how well the normalization works

1. Go to Plots, then MA Plot...


2. The MA Plot shows the data center horizontally on the zero level, which indicate a successful
normalization.

Note: MA Plot shows the difference vs. average between ABC and GCB. The difference is plotted
on the Y-axis, and the average on the X-axis. What we want to see is that the same different is
exhibited through out the different data range.

SeqMonk Analysis Page 27


SeqMonk Analysis

Statistical Test & FDR


Perform statistical test to identify genes that are significantly difference between the two replicate
sets: ABC vs. GCB

1. Go to Filtering, then select Filter by Statistical Test, follow by Intensity Difference...

On the new window, do the following:


2. On From Data Store / Group, select ABC
3. On To Data Store / Group, select GCB
4. On P-value must be below = 0.05
5. On Apply Multiple Testing Correction: check
6. Click Run Filter

7. On the new window Found XXX probes, give the gene list a meaningful name

8. The gene list will show up on the List Panel, under Probe Lists. In this case, we found 748
statistically significant genes.

SeqMonk Analysis Page 28


SeqMonk Analysis

Annotate Significant Gene List


For the newly created gene list, we will next perform annotation to give biological meaning to the list

1. Go to Reports, then select Annotated Probe Report...

On the new window Annotated Probe Report Options


2. On Annotate with select overlapping and gene
3. Set Exclude on unannotated probes
4. Click OK to proceed

Note: We did not use mRNA as annotate choice here, becuase it will return gene isoforms
information. Instace, we have chosen to use gene which will collapse all the isoforms into a single
easy to handle entry.

SeqMonk Analysis Page 29


SeqMonk Analysis

Examine the Signficant Table


The annotated table contains all the biological information about the gene list. The table can be
sorted using the Diff p-value column to further refine the list. Note that this column is FDR (False
Discovery Rate) corrected p-value. The table can be exported as text file and manipulated further in
Microsoft Excel.

SeqMonk Analysis Page 30


SeqMonk Analysis

ChIP-seq Analysis Strategy


ChIP-seq experiment is designed to identify the protein binding site on the genome. In this case, the
authors use ChIP-seq to locate the binding site for STAT3 protein, a Transcription Factor (TF). TF
binds to the upstream region of a Transcription Start Site (TSS), and activate the expression of that
gene. Sometimes, TF binds to other regions of the gene; such as inside and downstream of the gene
boundary. To identify the potential STAT3 regulated genes, we have device a strategy to Define
Probe around the TF binding site and quantitate RNA-seq reads in the defined Probes. As shown
below, the strategy is an attempt to capture gene expression signal surrounding the TF bindng site.
We have arbituary pick 2000 base-pair up- and down-stream of the TF binding to define our probe
for read quantitation purposes.

SeqMonk Analysis Page 31


SeqMonk Analysis

ChIP-seq Define Probe Caveat


There are some caveats using our strategy to identify STAT3 regulated genes. First, the defined
Probe region might include more then one gene which can introduce complications (top figure).
Second, the defined Probe region might not be large enough to capture the full extend of the gene
(bottom figure).

SeqMonk Analysis Page 32


SeqMonk Analysis

Import ChIPSeq Coordinates


Before we can use the TF binding sites derived from ChIP-seq to define our Probes, we will import
the coordinates for this binding site.

1. Go to File, then select Import Annotation, follow by Text(Generic)...


2. Locate and select the TF binding site coordinates file: STAT3_ChIPSeq_Genes.txt
3. Click Open to import the file

SeqMonk Analysis Page 33


SeqMonk Analysis

Set ChIPSeq Coordinates


To import a generic table, SeqMonk requires us to explicitly show it the column identifies.

1. In Start at Row, select 1 since the data starts from row number one
2. In Chr Col (Chromosome Column), select 2 for the chromosome column
3. In Start Col (start of genomic region), select 3 for the beginnig of the genomic region (or TF
binding site)
4. In End Col (end of genomic region), select differenc 4 for the end of the genomic region.

That is all we need to provide SeqMonk to import the table.

SeqMonk Analysis Page 34


SeqMonk Analysis

Define Probe Using STAT3 Peaks


Now that we have imported the TF binding site coordinate (see List Panel under Annotation Sets
for STAT3 ChIPSeq Genes.txt we can use them to define our Probes

1. Go to Data, then select Define Probes...

In the new Define Probes... window


2. Select the Feature Probe Generator
3. In Feature to design around, choose STAT3 ChIPSeq Gene.txt
4. In Remove exact duplicate: check
5. In Ignore feature strand information: check
6. Select Over feature, and select From -2000 to +2000
7. Click Create Probes

Warning ........

One of the major limitation of SeqMonk is that it can only store one set of Probes. Therefore, when a
new set of Probes is being defined here, the old set will be removed.

8. Click Yes to acknowledge the removal of the old set of Probes

SeqMonk Analysis Page 35


SeqMonk Analysis

Probe Quantitation
Once we have set up the Probe Defintion, we are now ready to quantify the reads within those
Probes

1. Select Read Count Quantitation

In the new Define Quantitation window


2. In Count reads in strand, select All Reads
3. In Correct for total read cont: check
4. In Correct to what? chose Largest DataStore
5. In Count total only within probes: check
6. In Correct for probe length: check
7. In Log Transform Count: check
8. Click Quantitate to proceed

SeqMonk Analysis Page 36


SeqMonk Analysis

Statistical Test & FDR


Similar to RNA-seq analysis, once we have defined the Probes we are able to perform statistical test
to identify differentially expression Probes

1. Go to Filtering, then select Filter by Statistical Test, follow by Intensity Difference...

In the new Intensity Difference Filter window


2. In From Data Store / Group, select the replicate set ABC
3. In To Data Store / Group, select the replicate set GCB
4. In P-value must be below = 0.05
5. In Apply Multiple Testing Correction: check
6. Click Run Filter to begin the test

7. In the Found XXX probes window, give it a meaningful name for the list.

8. The newly created significant Probes list will show up on the List Panel under Probe Lists

SeqMonk Analysis Page 37


SeqMonk Analysis

Annotate STAT3 Regulated Genes


The newly created potential STAT3 regulated genes is annotated to give biological meaning to the
list

1. Go to Report, then select Annotated Probes Report...

In the new Annotated Probe Report Options window


2. In Annotate with, select closest and gene
3. In Annotation distance cutoff type 10,000 bp
4. Select Exclude, unannotated probes
5. Select Include, data for currently visible stores
6. Click OK to proceed

SeqMonk Analysis Page 38


SeqMonk Analysis

Potential STAT3 Regulated Genes


The annotated table contains all the biological information about the gene list. The table can be
sorted using the Diff p-value column to further refine the list. Note that this column is FDR (False
Discovery Rate) corrected p-value. The table can be exported as text file and manipulated further in
Microsoft Excel.
Note: * (Asterisk) represent genes reported in the paper we found in our analysis

SeqMonk Analysis Page 39

Das könnte Ihnen auch gefallen