SeqMonk Analysis V2

SeqMonk Analysis
Open Seqmonk
Launch SeqMonk
The first thing you will see is the opening page. SeqMonk scans your copy and make sure everything
is in order, indicated by the green check marks.
SeqMonk Analysis Page 1

SeqMonk Analysis
Create New Project

To use SeqMonk, you need to create a new project and chose a genome related to your experiment
1. Under the top menu, go to File and select New project ...
2. When prompted to select a genome, chose GRCh37 under the Homo sapiens folder
3. Click OK to proceed

SeqMonk Analysis
SeqMonk First Look

SeqMonk layout is divided into 4 panels; Quick Access Panel, List Panel, Chromosome Panel,
and Track Panel
Quick Access Panel:

A series of buttons allow quick access to various layout, navigation and search functions
List Panel:
A listing of all the imported and created files
Chromosome Panel:
A quick bird's eye view of data signal on the chromosomes
Track Panel:
A detail view of annotation and data tracks

SeqMonk Analysis
Import BAM
Importing BAM files into SeqMonk
1. To import data into SeqMonk, go to File, chose Import Data, then BAM/SAM ...
2. Navigate to the BAM files location, highlight and select all BAM file (.bam), and click Open.
On the new Import Options window, follow these instructions:

3. Min mapping quality: 20
4. Data Type: Single End
5. Extend reads by (bp): 50
6. Click Import to start importing the files

SeqMonk Analysis
Import In Progress
BAM files are huge, please allow some time to finish the importing process

SeqMonk Analysis
Mitochrodial Genome Was Not Imported

At the end of the importing process, SeqMonk will show that it did not import Mitochondria
chromosome data. That is OK, click Close to continue
Note: Different software packages interpret the Mitochondria naming system differently. In this case,
SeqMonk is expecting Mitochrodria to be named as "M", but our BAM files is naming it "chrM".
Therefore, rendering SeqMonk unable to import Mitochondria reads.

SeqMonk Analysis
What is "Define Probe"?

Reads quantitation is a 2 steps process; Define Probe and Quantitation
Define Probe
1. A Probe is a predefined region on the genome. Here we can use many different methods to
define Probes: gene, mRNA, or CDS
Quantitation
Quantitation is a process of quantifying the amount of reads within the Probe region
2. Define Probe by gene/mRNA. Here a Probe is being defined using the gene/mRNA region, and
the read quantitation is being represented in this region
Note: when using mRNA to define Probe, the algorithm only include reads in exons, not intron. On
the other hand, if gene is used, reads in exons and introns will be included.
3. Define Probe by CDS. Here a Probe is being defined using the CDS region, and the read
quantitation is being represented in this region.

SeqMonk Analysis
Define Probe RNA-seq Pipeline

To quantify the reads for RNA-seq experiment, we will use a Quantitation Pipeline approach.
1. To start the quantitation pipeline, go to Data, then select Quantitation Pipeline
A new Define Quantitation window appears for more option, please chose:
2. Select RNA-Seq quantitation pipeline Option
3. Transcript features: mRNA
4. Library type: Non-strand specific
5. Merge transcript isoforms: check
6. Log transform: check
7. Apply transcript length correction: check
8. Click Run Pipeline to continue

SeqMonk Analysis
Result of Probe Definition

After read quantitation, 31,017 Probes were defined.
This is being shown on the List Panel, under the Probe Lists

SeqMonk Analysis
QC Inspection of Reads
We will do a visual inspection on the imported samples
1. At the Chromosome Panel, use your mouse to highlight the left most region of Chromosome 4.
2. Careful examination reveals that sample ABC_Ly3.bam is particularly noisy; having reads
scattered all over the region
Based on this assessment, we have decided to remove sample ABC_Ly3.bam

SeqMonk Analysis
Remove Bad Sample

1. To remove a sample, go to Data, and select Edit Data Sets ...
2. On the new Edit DataSets ... window, select the bad sample ABC_Ly3.bam
3. Click Delete Dataset to remove the sample from the project

SeqMonk Analysis
Create Replicate Dataset: Step 1

Next, we will group samples into 2 replicate sets: ABC and GCB
1. To group replicate set, go to Data, and chose Edit Replicate Sets...

2. On the Edit Replicate Set... window, click Add New Replicate Set to add the first replicate set
3. We will name the first replicate set ABC
See next step on how to assign samples into each replicate set .... continue ...

SeqMonk Analysis
Create Replicate Dataset: Step 2

Assigning samples into each replicate sets
1. Highlight to select the ABC replicate set

2. Highlight to select all the ABC samples (use shift key to make multiple selection)
3. Click Add to assign these samples to the ABC replicate set
4. Do the same for GCB replicate set.

SeqMonk Analysis
Add Rep Track to Track Panel

We will add the newly created Replicate Set onto the Track Panel
1. To add data track, go to View, and select Set Data Tracks ...
2. In the new Select Data Track window, highlight both ABC and GCB Replicate Sets
3. Click Add to add these data onto the Track Panel
4. Here, it shows that the new data has been added
Note: Examine the Track Panel where the replicate sets ABC and GCB have added to the bottom of
the tracks.

SeqMonk Analysis
Quick Access Panel: Positive and Negative Scale

1. Positive and Negative Scale
Show both the positive and negative scale of the signal intensity
2. Positive Scale
Show only the positive scale of the signal intensity
FYI: Why is there negative value?

Since the quantitation was done by normalizing or dividing the sum of reads over the length of the
Probe (or mRNA), it could produce a value which is less then 1. When logging (base 2) values that
are less then 1, we get a negative value.
Example:
Given a Probe of 2000 base-pair in length, 20 reads were mapped to this Probe
Therefore, the intensity value would be:
intensity = log2(20/2000) = -6.64

SeqMonk Analysis
Quick Access Panel: Dynamic vs. Static Data Colors

1 Dynamics Data Colors
When Dynamic Data Colors is used, the Probe Quantitation bar change color according to the
amount of reads found within the Probes.
2. Static Data Colors

When Static Data Colors is used, the color retain constant in the Probe Quantitation bars.

SeqMonk Analysis
Quick Access Panel: Show Probe and Reads

1. Show Reads Only
Only show the reads distribution
2. Show Probe Quantitation Only

Only show the Probe Quantitation bars
3. Show Both Reads and Probe Quantitation

Show both the read distribution and Probe Quantitation bars together.

SeqMonk Analysis
Quick Access Panel: Read Density Range

1. Low Read Density
Display read distribution in LOW density setting
2. Medium Read Density

Display read distribution in MEDIEUM density setting
3. High Read Density

Display read distribution in HIGH density setting

SeqMonk Analysis
Quick Access Panel: Combine and Split Packed Reads

1. Combine Packed Reads
Display read distribution by mixing the forward and reverse strand reads
2. Split Packed Reads

Display read distribution for forward and reverse strand reads separately
(Forword on top [Red], and reverse on bottom [Blue])

SeqMonk Analysis
Quick Access Panel: Change Annotation and Data Tracks

1. Change Annotation Tracks
Activate to add, remove or organize the Annotation Tracks
2. Change Data Tracks

Activate to add, remove or organize the Data Tracks

SeqMonk Analysis
Plot Probe Value Histogram

Plot the histogram for the overall Probe quantitation value
1. Go to Plots, then select Probe Value Histogram
2. Adjust the Division level for a more granular view of the signal
Note: The Probe Value Histogram gives us a sense of the distribution of positive vs. negative probe
(mRNA in this case) quantitation. Here, we see that negative probe value is slightly higher than
positive.

SeqMonk Analysis
Plot Read Length Histogram

Plot the histogram for the overall Read Length
1. Go to Plots, then select Read Length Histogram
2. Here, the plot shows that all reads have the same length; which is 86 nucleotide in length
Note: the original read length is 36, recall the during the Import BAM step, we extended the reads
by 50 bp. (see Page 4)

SeqMonk Analysis
Plot Probe Length Histogram

Plot the histogram for the different Probe Length
1. Go to Plots, then select Probe Length Histogram

2. Most Probe (mRNA in this case), have relatively short length
3. The probe length result is more apparent when set to Log scale

SeqMonk Analysis
Plot Correlation Matrix

Plot the Correlation Matrix for all data tracks
1. Go to Plots, then select Correlation Matrix...

2. The Correlation Matrix shows that samples in the same group (ABC or GCB) have higher
correlation coefficient (>0.9). Although the correlation between samples from other group is not too
much lower (~0.8). Similar to microarray experiment, we do not expect between group difference for
most Probes.

SeqMonk Analysis
Plot BoxWhisker Plot

Plot BoxWhisker Plot to assess the overall distribution of each individual sample
1. Go to Plots, then select Box Whisker Plot, follow by Visible Data Stores...
2. The BoxWhisker Plot shows very even distribution among the samples, which indicates that the
normalization process was appropriate.

SeqMonk Analysis
Plot Scatter Plot

Plot Scatter Plot to assess the relationship between the two replicate sets
1. Go to Plot, then select Scatter Plot...

2. On the new window, Plot ABC vs. GCB
3. Mouse over each point to see its gene symbols

SeqMonk Analysis
Plot MA Plot
Plot MA Plot to show how well the normalization works
1. Go to Plots, then MA Plot...

2. The MA Plot shows the data center horizontally on the zero level, which indicate a successful
normalization.
Note: MA Plot shows the difference vs. average between ABC and GCB. The difference is plotted
on the Y-axis, and the average on the X-axis. What we want to see is that the same different is
exhibited through out the different data range.

SeqMonk Analysis
Statistical Test & FDR

Perform statistical test to identify genes that are significantly difference between the two replicate
sets: ABC vs. GCB
1. Go to Filtering, then select Filter by Statistical Test, follow by Intensity Difference...
On the new window, do the following:

2. On From Data Store / Group, select ABC
3. On To Data Store / Group, select GCB
4. On P-value must be below = 0.05
5. On Apply Multiple Testing Correction: check
6. Click Run Filter
7. On the new window Found XXX probes, give the gene list a meaningful name
8. The gene list will show up on the List Panel, under Probe Lists. In this case, we found 748
statistically significant genes.

SeqMonk Analysis
Annotate Significant Gene List

For the newly created gene list, we will next perform annotation to give biological meaning to the list
1. Go to Reports, then select Annotated Probe Report...
On the new window Annotated Probe Report Options

2. On Annotate with select overlapping and gene
3. Set Exclude on unannotated probes
Note: We did not use mRNA as annotate choice here, becuase it will return gene isoforms
information. Instace, we have chosen to use gene which will collapse all the isoforms into a single
easy to handle entry.

SeqMonk Analysis
Examine the Signficant Table

The annotated table contains all the biological information about the gene list. The table can be
sorted using the Diff p-value column to further refine the list. Note that this column is FDR (False
Discovery Rate) corrected p-value. The table can be exported as text file and manipulated further in
Microsoft Excel.

SeqMonk Analysis
ChIP-seq Analysis Strategy

ChIP-seq experiment is designed to identify the protein binding site on the genome. In this case, the
authors use ChIP-seq to locate the binding site for STAT3 protein, a Transcription Factor (TF). TF
binds to the upstream region of a Transcription Start Site (TSS), and activate the expression of that
gene. Sometimes, TF binds to other regions of the gene; such as inside and downstream of the gene
boundary. To identify the potential STAT3 regulated genes, we have device a strategy to Define
Probe around the TF binding site and quantitate RNA-seq reads in the defined Probes. As shown
below, the strategy is an attempt to capture gene expression signal surrounding the TF bindng site.
We have arbituary pick 2000 base-pair up- and down-stream of the TF binding to define our probe
for read quantitation purposes.

SeqMonk Analysis
ChIP-seq Define Probe Caveat

There are some caveats using our strategy to identify STAT3 regulated genes. First, the defined
Probe region might include more then one gene which can introduce complications (top figure).
Second, the defined Probe region might not be large enough to capture the full extend of the gene
(bottom figure).

SeqMonk Analysis
Import ChIPSeq Coordinates

Before we can use the TF binding sites derived from ChIP-seq to define our Probes, we will import
the coordinates for this binding site.
1. Go to File, then select Import Annotation, follow by Text(Generic)...

2. Locate and select the TF binding site coordinates file: STAT3_ChIPSeq_Genes.txt
3. Click Open to import the file

SeqMonk Analysis
Set ChIPSeq Coordinates

To import a generic table, SeqMonk requires us to explicitly show it the column identifies.
1. In Start at Row, select 1 since the data starts from row number one
2. In Chr Col (Chromosome Column), select 2 for the chromosome column
3. In Start Col (start of genomic region), select 3 for the beginnig of the genomic region (or TF
binding site)
4. In End Col (end of genomic region), select differenc 4 for the end of the genomic region.
That is all we need to provide SeqMonk to import the table.

SeqMonk Analysis
Define Probe Using STAT3 Peaks

Now that we have imported the TF binding site coordinate (see List Panel under Annotation Sets
for STAT3 ChIPSeq Genes.txt we can use them to define our Probes
1. Go to Data, then select Define Probes...
In the new Define Probes... window

2. Select the Feature Probe Generator
3. In Feature to design around, choose STAT3 ChIPSeq Gene.txt
4. In Remove exact duplicate: check
5. In Ignore feature strand information: check
6. Select Over feature, and select From -2000 to +2000
7. Click Create Probes
Warning ........
One of the major limitation of SeqMonk is that it can only store one set of Probes. Therefore, when a
new set of Probes is being defined here, the old set will be removed.
8. Click Yes to acknowledge the removal of the old set of Probes

SeqMonk Analysis
Probe Quantitation
Once we have set up the Probe Defintion, we are now ready to quantify the reads within those
Probes
1. Select Read Count Quantitation
In the new Define Quantitation window

2. In Count reads in strand, select All Reads
3. In Correct for total read cont: check
4. In Correct to what? chose Largest DataStore
5. In Count total only within probes: check
6. In Correct for probe length: check
7. In Log Transform Count: check
8. Click Quantitate to proceed

SeqMonk Analysis
Statistical Test & FDR

Similar to RNA-seq analysis, once we have defined the Probes we are able to perform statistical test
to identify differentially expression Probes
1. Go to Filtering, then select Filter by Statistical Test, follow by Intensity Difference...
In the new Intensity Difference Filter window

2. In From Data Store / Group, select the replicate set ABC
3. In To Data Store / Group, select the replicate set GCB
4. In P-value must be below = 0.05
5. In Apply Multiple Testing Correction: check
6. Click Run Filter to begin the test
7. In the Found XXX probes window, give it a meaningful name for the list.
8. The newly created significant Probes list will show up on the List Panel under Probe Lists

SeqMonk Analysis
Annotate STAT3 Regulated Genes

The newly created potential STAT3 regulated genes is annotated to give biological meaning to the
list
1. Go to Report, then select Annotated Probes Report...
In the new Annotated Probe Report Options window

2. In Annotate with, select closest and gene
3. In Annotation distance cutoff type 10,000 bp
4. Select Exclude, unannotated probes
5. Select Include, data for currently visible stores

SeqMonk Analysis
Potential STAT3 Regulated Genes

The annotated table contains all the biological information about the gene list. The table can be
sorted using the Diff p-value column to further refine the list. Note that this column is FDR (False
Discovery Rate) corrected p-value. The table can be exported as text file and manipulated further in
Microsoft Excel.
Note: * (Asterisk) represent genes reported in the paper we found in our analysis

SeqMonk Analysis V2

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

SeqMonk Analysis V2

Hochgeladen von

Copyright:

Verfügbare Formate

SeqMonk Analysis

SeqMonk Analysis Page 1

Create New Project

SeqMonk Analysis Page 2

SeqMonk First Look

Quick Access Panel:

SeqMonk Analysis Page 3

On the new Import Options window, follow these instructions:

6. Click Import to start importing the files

SeqMonk Analysis Page 4

SeqMonk Analysis Page 5

Mitochrodial Genome Was Not Imported

SeqMonk Analysis Page 6

What is "Define Probe"?

SeqMonk Analysis Page 7

Define Probe RNA-seq Pipeline

1. To start the quantitation pipeline, go to Data, then select Quantitation Pipeline

8. Click Run Pipeline to continue

SeqMonk Analysis Page 8

Result of Probe Definition

SeqMonk Analysis Page 9

Based on this assessment, we have decided to remove sample ABC_Ly3.bam

SeqMonk Analysis Page 10

Remove Bad Sample

SeqMonk Analysis Page 11

Create Replicate Dataset: Step 1

1. To group replicate set, go to Data, and chose Edit Replicate Sets...

SeqMonk Analysis Page 12

Create Replicate Dataset: Step 2

1. Highlight to select the ABC replicate set

4. Do the same for GCB replicate set.

SeqMonk Analysis Page 13

Add Rep Track to Track Panel

SeqMonk Analysis Page 14

Quick Access Panel: Positive and Negative Scale

FYI: Why is there negative value?

intensity = log2(20/2000) = -6.64

SeqMonk Analysis Page 15

Quick Access Panel: Dynamic vs. Static Data Colors

2. Static Data Colors

SeqMonk Analysis Page 16

Quick Access Panel: Show Probe and Reads

2. Show Probe Quantitation Only

3. Show Both Reads and Probe Quantitation

SeqMonk Analysis Page 17

Quick Access Panel: Read Density Range

2. Medium Read Density

3. High Read Density

SeqMonk Analysis Page 18

Quick Access Panel: Combine and Split Packed Reads

2. Split Packed Reads

SeqMonk Analysis Page 19

Quick Access Panel: Change Annotation and Data Tracks

2. Change Data Tracks

SeqMonk Analysis Page 20

Plot Probe Value Histogram

1. Go to Plots, then select Probe Value Histogram

SeqMonk Analysis Page 21

Plot Read Length Histogram

1. Go to Plots, then select Read Length Histogram

SeqMonk Analysis Page 22

Plot Probe Length Histogram

1. Go to Plots, then select Probe Length Histogram

SeqMonk Analysis Page 23