Beruflich Dokumente
Kultur Dokumente
Gavin Schnitzler
Asst. Prof. Medicine TUSM, Investigator at MCRI, TMC
gschnitzler@tuftsmedicalcenter.org
617-636-0615
ChIP-seq COURSE OUTLINE
• Day 1: ChIP techniques, library production,
USCS browser tracks
• Day 2: QC on reads, Mapping binding site
peaks, examining read density maps.
• Day 3: Analyzing peaks in relation to
genomic feature, etc.
• Day 4: Analyzing peaks for transcription
factor binding site consensus sequences.
• Day 5: Variants & advanced
approaches.
Day 5 Outline
• Introduction to variations on ChIP-seq
methods
• Extensions & variations on TFBS analysis
• Analyzing published data & across
platforms
• Downloading & installing programs
• Writing your own programs
Next-Generation Sequencing Analysis
DNase-Seq: Treatment of nuclei with a restriction enzyme such as DNase I will result in cleavage of DNA at
accessible regions. Isolation of these regions and their detection by sequencing allows the creation of
DNase hypersensitivity maps, providing information about which regulatory elements are accessible in the
genome. (variant technique called FAIRE-seq)
MNase-Seq: Micrococcal Nuclease (MNase) is a restriction enzyme that degrades genomic DNA not
wrapped around histones. The remaining DNA represents nucleosomal DNA, and can be sequencing to
reveal nucleosome positions along the genome. This method can also be combined with ChIP to map
nucleosomes that contain specific histone modifications.
RNA-Seq: Extraction, fragmentation, and sequencing of RNA populations within a sample. The replacement
for gene expression measurements by microarray. There are many variants on this, such as Ribo-Seq
(isolation of ribosomes translating RNA), small RNA-Seq (to identify miRNAs), etc.
GRO-Seq: RNA-Seq of nascent RNA. Transcription is halted, nuclei are isolated, labeled nucleotides are
added back, and transcription briefly restarted resulting in labeled RNA molecules. These newly created,
nascent RNAs are isolated and sequenced to reveal "rates of transcription" as opposed to the total number of
stable transcripts measured by normal RNA-seq.
Hi-C: Genomic interaction assay for understanding genome 3D structure. This assay is much more
specialized - For more information about how to use HOMER to analyze Hi-C data, check out the Hi-C
analysis section.
Examining long-
range interactions
by ChIP-seq
Two DNA fragments
associated with the same
IP’d protein are ligated
together.
CTCF binds
better to the A
variant
Mapping CpG DNA
methylation patterns
Approaches:
Bormann Chung CA, Boyd VL, McKernan KJ, Fu Y, et al. (2010) Whole Methylome Analysis by Ultra-Deep Sequencing Using Two-Base Encoding. PLoS ONE 5(2): e9320.
doi:10.1371/journal.pone.0009320 http://www.plosone.org/article/info:doi/10.1371/journal.pone.0009320
Mapping nucleosome
positions
Approaches:
•1) Fragmentation to
mononucleosome size by
sonication or micrococcal
nuclease (MNase)
ChIP w/ antibody
against histone modification
(H3K4me1) – can map
positions of nucleosomes
with this mark.
Whole genome
sequencing.
50 LiERBS_v_LiU
LiERBS_v_LiNon-regl.
40 LiERBS_v_LiNon-expr.
AoERBS_v_AoD
30 AoERBS_v_AoU
AoERBS_v_AoNon-regl.
20
10
0
-2000 -1000 0 1000 2000
BP from TSSes of gene group
Using input chromatin read density to measure
nucleosome densities
Hypothesis: Sonication mostly cuts in nucleosome free regions or inter-
nucleosomal spacers. Thus, read positions give information about
nucleosome positions.
5
LiINPUT_v_LiD_pros (norm'd)
LiINPUT_v_LiU_pros (norm'd)
4
LiINPUT_v_AoD_pros (norm'd)
LiINPUT_v_AoU_pros (norm'd)
3
0
-2000 -1500 -1000 -500 0 500 1000 1500 2000
BP from Li Down promoter TSSes
Day 5 Outline
• Introduction to variations on ChIP-seq
methods
• Extensions & variations on TFBS
analysis
• Analyzing published data & across
platforms
• Downloading & installing programs
• Writing your own programs
Many approaches to TFBS analysis
length k in n unrelated 1 k
n
sequences
The program will need to run once for each k:
e.g. 6 bp, 7 bp, 8 bp sequences, etc. (either
automatically, or by hand).
From : Lawrence, C. et al.(1993) Detecting Subtle Sequence Signals: A Geibbs Sampler approach to Multiple Alignment. Science 262.208-
Start M2
p12
Regulatory Modules:
De novo cis-regulatory module elicitation for eukaryotic genomes. Proc Nat’l Acad Sci USA, 102, 7079-84 p21
M1 M3
Gene A
Gene B Stop
ChIP-on-chip -
1-2 kb information on protein/DNA interaction:
An Algorithm for Finding Protein-DNA Interaction Sites with Applications to Chromatin Immunoprecipitation Microarray Experiments Nature Biotechnology, 20, 835-39
Protein binding
in neighborhood Coding regions
Click on the name to the left of the smaller file (1.9M) & then on the
downloads tab.
Right click on the ftp link for the run & copy the link location.
This gives you the same .fastq format you’re familiar with.
Use head to confirm the format, but then you might as well delete the
file with rm so as not to clutter up the cluster.
After this week you are now ready to do any analysis you want
on this data, from mapping reads to the genome (w/ bowtie) to
peak calling (w/ MACS), to TFBS analysis.
“Liftover” programs to convert
between genomes & builds
Several useful tools for this in Cistrome/Galaxy:
Liftover/Others
Convert between RefSeq, Gene Symbols to Entrez IDs using Bioconductor.
Liftover Wig Files Liftover wig files
[Galaxy]Convert genome coordinates between assemblies and genomes
Extract data from Wiggle Extract data for certain chromosome from a wiggle
file
Extract data from Bed Extract data for certain chromosome from a BED file
http://biowhat.ucsd.edu/homer/ngs/index.html
Mapping to the genome (NOT performed by HOMER, but important to understand)
Creation Tag directories, quality control, and normalization. (makeTagDirectory)
UCSC visualization (makeUCSCfile, makeBigWig.pl)
Peak finding / Transcript detection / Feature identification (findPeaks)
Motif analysis (findMotifsGenome.pl)
Annotation of Peaks (annotatePeaks.pl)
Quantification of Data at Peaks/Regions in the Genome/Histograms and Heatmaps
(annotatePeaks.pl)
Quantification of Transcripts (analyzeRNA.pl)
Could be very useful… & with (only a bit of) luck, you’ll be
able to install & run them yourself.
Installing a program in R
Check out the Key R Commands link at
http://sites.tufts.edu/cbi/resources/chip-seq/