Protein Sequencing and Identification by Mass Spectrometry

Protein Sequencing and
Identification by Mass
Spectrometry
Masses of Amino Acid Residues
Protein Backbone
H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-OH
Ri-1 Ri Ri+1
N-terminus C-terminus
AA residuei-1 AA residuei AA residuei+1

Peptide Fragmentation
Collision Induced Dissociation
H+
H...-HN-CH-CO . . . NH-CH-CO-NH-CH-CO-OH
Ri-1 Ri Ri+1
Prefix Fragment Suffix Fragment
Peptides tend to fragment along the backbone.

Fragments can also loose neutral chemical groups
like NH3 and H2O.
Breaking Protein into Peptides and
Peptides into Fragment Ions
Proteases, e.g. trypsin, break protein into
peptides.
A Tandem Mass Spectrometer further breaks the
peptides down into fragment ions and measures
the mass of each piece.
Mass Spectrometer accelerates the fragmented
ions; heavier ions accelerate slower than lighter
ones.
Mass Spectrometer measure mass/charge
ratio of an ion.
N-
te
rm
in
a lp
ep
C- tid
te e s
rm
ina
lp
ep
tid
e s
N- and C-terminal Peptides
Terminal peptides and ion types
Peptide
Mass (D) 57 + 97 + 147 + 114 = 415
Peptide without
Mass (D) 57 + 97 + 147 + 114 18 = 397

486
71
415
s
e
e
tid
tid
185
ep
301
ep
lp
lp
a
ina
in
rm
154 332
rm
te
te
N-
C-
57 429
486
71
415
s
e
e
tid
tid
185
ep
301
ep
lp
lp
a
ina
in
rm
154 332
rm
te
te
N-
C-
57 429
486
71
415
185
301
154 332
57 429
486
71
415
Reconstruct peptide from the set of masses of fragment ions

(mass-spectrum) 185
301
154 332
57 429
Peptide Fragmentation
b2-H2O b3- NH3
a2 b2 a3 b3
HO NH3+
| |
R1 O R2 O R3 O R4
| || | || | || |
H -- N --- C --- C --- N --- C --- C --- N --- C --- C --- N --- C -- COOH
| | | | | | |
H H H H H H H
y3 y2 y1
y3 -H2O y2 - NH3
Mass Spectra
G V D
D L K
H2O
L
57 Da =KG 99 Da = V D V G
mass
0
The peaks in the mass spectrum:

Prefix and Suffix Fragments.
Fragments with neutral losses (-H2O, -NH3)
Noise and missing peaks.
Protein Identification with MS/MS
G V D L K
Peptide
MS/MS Identification:
Intensity
mass
00
Tandem Mass-Spectrometry
Breaking Proteins into
Peptides
GTDIMR HPLC To
PAKID
MPSERGTDIMRPAKID...... MS/MS
MPSER

protein peptides
Mass Spectrometry
Matrix-Assisted Laser Desorption/Ionization (MALDI)
From lectures by Vineet Bafna (UCSD)

Tandem Mass Spectrometry S#: 1707 RT: 54.44 AV: 1 NL: 2.41E7
F: + c Full ms [ 300.00 - 2000.00]
RT: 0.01- 80.02 100
638.0
1389 NL: 95
MS
100 1991
1.52E8 90
LC
1409 2149
90 1615 1621 Base Peak F: + 85
80
c Full ms [
1411 300.00 -
75
80 2147 70
1611 2000.00] 65
70 1655 1995 60
1593
1387
Relative Abundance
55
60 2155
801.0
1435 1987 50
2001 2177 45
ce
50 1445 1661
an
1937
d
40
n
Scan 1707
u
b
2205
tiveA
35
40 1779
638.9
2135 30
Rla
e
30 2017 25
1307 1313 2207 20 1173.8
2329 872.3 1275.3

20 1105 1707
15
1095 10
687.6
944.7 1742.1 1884.5
2331 783.3 1048.3 122.0 1413.9 1617.7
10 5
0
200 400 600 800 1000 1200 1400 1600 1800 2000
0 m/z
5 10 15 20 3 25 35 0 40 45 50 55 60 65 70 75 80
Time (min)
S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6

T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
850.3
100
95
687.3
90
85
collision
588.1
80
75
MS-1 MS-2
MS/MS
70
cell 65
60
Relative Abundance
55 851.4
425.0
50
Ion
45 949.4
40
326.0
35 524.9
Source
30
25 589.2
20 1048.6
226.9 397.1
1049.6
15 489.1
10
629.0
5
Scan 1708
0
200 400 600 800 1000 1200 1400 1600 1800 2000
m/z
Protein Identification by Tandem
Mass Spectrometry
S MS/MS instrument S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6
e
T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
850.3
100
95
687.3
90
q
85
588.1
80
75
u
70
65
60
Relative Abundance
55 851.4
425.0
e
50
45 949.4
40
326.0
35 524.9
n Database search
30
25 589.2
20 1048.6
226.9 397.1
1049.6
c
489.1
Sequest
15
10
629.0
5
e
200 400 600 800 1000 1200 1400 1600 1800 2000
de Novo interpretation
m/z
Sherenga
Tandem Mass Spectrum
Tandem Mass Spectrometry (MS/MS): mainly
generates partial N- and C-terminal peptides
Spectrum consists of different ion types
because peptides can be broken in several
places.
Chemical noise often complicates the
spectrum.
Represented in 2-D: mass/charge axis vs.
intensity axis
De Novo vs. Database Search
S #: 1708 R T: 54.47 AV: 1 N L: 5.27E6
T: + c d Full m s 2 638.00 [ 165.00 - 1925.00]
850.3
100
Database
95
687.3
90
De Novo
85
588.1
80
75
70
R e la ti ve A b u n d a n c e
65
60
55 851.4
425.0
50
45 949.4
40
Search
326.0
35 524.9
30
25 589.2
20 1048.6
226.9 397.1
1049.6
15 489.1
10
629.0
5
0
200 400 600 800 1000 1200 1400 1600 1800 2000
m /z
Mass, Score
Database of Database of allWpeptides =
R 20
n
known peptides
A A V L
L
G
G T
AAAAAAAA,AAAAAAAC,AAAAAAAD,AAAAAAAE,
MDERHILNM, KLQWVCSDL, AAAAAAAG,AAAAAAAF,AAAAAAAH,AAAAAAI,
E
PTYWASDL, ENQIKRSACVM, P
C L K
K
TLACHGGEM, NGALPQWRT,
HLLERTKMNVV, GGPASSDA, W
AVGELTI, AVGELTK , AVGELTL, AVGELTM,
D
GGLITGMQSD, MQPLMNWE,
ALKIIMNVRT,AVGELTK
ALKIIMNVRT, AVGELTK, ,
T
HEWAILF, GHNLWAMNAC,
GVFGSVLRA, EKLNKAATYIN.. YYYYYYYS,YYYYYYYT,YYYYYYYV,YYYYYYYY
AVGELTK
De Novo vs. Database Search: A
Paradox
The database of all peptides is huge O(20n) .
The database of all known peptides is much smaller

O(108).
However, de novo algorithms can be much faster, even

though their search space is much larger!
A database search scans all peptides in the database of

all known peptides search space to find best one.
De novo eliminates the need to scan database of all
peptides by modeling the problem as a graph search.
De novo Peptide Sequencing
S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6
T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
850.3
100
95
687.3
90
85
588.1
80
75
70
65
60
Relative Abundance
55 851.4
425.0
50
45 949.4
40
326.0
35 524.9
30
25 589.2
20 1048.6
226.9 397.1
1049.6
15 489.1
10
629.0
5
0
200 400 600 800 1000 1200 1400 1600 1800 2000
m/z
Sequence
Theoretical Spectrum
Theoretical Spectrum (contd)
Theoretical Spectrum (contd)
Building Spectrum Graph
How to create vertices (from masses)
How to create edges (from mass differences)
How to score paths
How to find best path

b
S E Q U E N C
E
Mass/Charge (M/Z)
a
SE Q U E N C E
Mass/Charge (M/Z)
a is an ion type shift in b
S E Q U E N C E
Mass/Charge (M/Z)
y
E C N E U Q E S
Mass/Charge (M/Z)
Intensity
Mass/Charge (M/Z)
Intensity
Mass/Charge (M/Z)
noise
Mass/Charge (M/Z)
MS/MS Spectrum
Intensity
Mass/Charge (M/z)
Some Mass Differences between
Peaks Correspond to Amino Acids
u
q
e
s e u q
e
n
c n
e e
s e q c
n e
u e s
c
e
Ion Types
Some masses correspond to fragment
ions, others are just random noise
Knowing ion types ={1, 2,, k} lets us
distinguish fragment ions from noise
We can learn ion types i and their
probabilities qi by analyzing a large test
sample of annotated spectra.
Example of Ion Type
={1, 2,, k}
Ion types
{b, b-NH3, b-H2O}
correspond to
={0, 17, 18}
*Note: In reality the value of ion type b is -1 but we will hide it for the sake of simplicity
Match between Spectra and the
Shared Peak Count
The match between two spectra is the number of masses
(peaks) they share (Shared Peak Count or SPC)
In practice mass-spectrometrists use the weighted SPC
that reflects intensities of the peaks
Match between experimental and theoretical spectra is
defined similarly
Peptide Sequencing Problem
Goal: Find a peptide with maximal match between
an experimental and theoretical spectrum.
Input:
S: experimental spectrum
: set of possible ion types
m: parent mass
Output:
P: peptide with mass m, whose theoretical
spectrum matches the experimental S

spectrum the best
Vertices

of Spectrum
MassesofpotentialNterminalpeptides
Graph
Verticesaregeneratedbyreverseshiftscorrespondingtoiontypes
={1,2,,k}
EveryNterminalpeptidecangenerateuptokions
m1,m2,,mk
EverymasssinanMS/MSspectrumgenerateskvertices
V(s)={s+1,s+2,,s+k}
correspondingtopotentialNterminalpeptides
Verticesofthespectrumgraph:
{initialvertex}V(s1)V(s2)...V(sm){terminalvertex}
Reverse Shifts
Shift in H2O
Shift in H2O+NH3
Edges of Spectrum Graph
Two vertices with mass difference
corresponding to an amino acid A:
Connect with an edge labeled by A
Gap edges for di- and tri-peptides

Paths
Path in the labeled graph spell out amino acid
sequences
There are many paths, how to find the correct

one?
We need scoring to evaluate paths

Path Score
p(P,S) = probability that peptide P produces
spectrum S= {s1,s2,sq}
p(P, s) = the probability that peptide P

generates a peak s
Scoring = computing probabilities
p(P,S) = sS p(P, s)
Peak Score
For a position t that represents ion type dj :
qj, if peak is generated at t

p(P,st) =
1-qj , otherwise
Peak Score (contd)
For a position t that is not associated with an
ion type:
qR , if peak is generated at t
pR(P,st) =
1-qR , otherwise
qR = the probability of a noisy peak that does
not correspond to any ion type
Finding Optimal Paths in the Spectrum Graph
For a given MS/MS spectrum S, find a

peptide P maximizing p(P,S) over all possible
peptides P:
p(P',S) max P p(P,S)

Peptides = paths in the spectrum graph
P = the optimal path in the spectrum graph

Ions and Probabilities
Tandem mass spectrometry is characterized
by a set of ion types {1,2,..,k} and their
probabilities {q1,...,qk}
i-ions of a partial peptide are produced

independently with probabilities qi
Ions and Probabilities
k
A peptide has all k peaks with probability q
i 1
i
(1 q )
i 1
i
and no peaks with probability
A peptide also produces a ``random noise''

with uniform probability qR in any position.
Ratio Test Scoring for Partial Peptides
Incorporates premiums for observed ions

and penalties for missing ions.
Example: for k=4, assume that for a partial

peptide P we only see ions 1,2,4.
q1 q2 (1 q3 ) q4
The score is calculated as:
qR qR (1 qR ) qR
Scoring Peptides
T- set of all positions.
Ti={t 1,, t 2,..., ,t k,}- set of positions that
represent ions of partial peptides Pi.
A peak at position tj is generated with

probability qj.
R=T- U Ti - set of positions that are not

associated with any partial peptides (noise).
Probabilistic Model
For a position t j Ti the probability p(t, P,S) that
peptide P produces a peak at position t.
qj if a peak is generated at position t j
P (t , P, S )
1 q j otherwise
Similarly, for tR, the probability that P produces a

random noise peak at t is:
qR if a peak is generated at position t
PR (t )
1 qR otherwise
Probabilistic Score
For a peptide P with n amino acids, the score
for the whole peptides is expressed by the
following ratio test:
k p (t
i j , P , S )
n
p ( P, S )

pR ( S ) i 1 j 1 pR (ti j )
De Novo vs. Database Search
S #: 1708 R T: 54.47 AV: 1 N L: 5.27E6
T: + c d Full m s 2 638.00 [ 165.00 - 1925.00]
850.3
100
Database
95
687.3
90
De Novo
85
588.1
80
75
70
R e la ti ve A b u n d a n c e
65
60
55 851.4
425.0
50
45 949.4
40
Search
326.0
35 524.9
30
25 589.2
20 1048.6
226.9 397.1
1049.6
15 489.1
10
629.0
5
0
200 400 600 800 1000 1200 1400 1600 1800 2000
m /z
W
Database of R
known peptides
A
A V L
L
G
G T
MDERHILNM, KLQWVCSDL, E
PTYWASDL, ENQIKRSACVM, P
C L K
K
TLACHGGEM, NGALPQWRT,
HLLERTKMNVV, GGPASSDA, W D
GGLITGMQSD, MQPLMNWE,
ALKIIMNVRT,AVGELTK
ALKIIMNVRT, AVGELTK, ,
T
HEWAILF, GHNLWAMNAC,
GVFGSVLRA, EKLNKAATYIN..
AVGELTK

Protein Sequencing and Identification by Mass Spectrometry

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Protein Sequencing and Identification by Mass Spectrometry

Hochgeladen von

Copyright:

Verfügbare Formate

Protein Sequencing and

AA residuei-1 AA residuei AA residuei+1

Peptides tend to fragment along the backbone.

Mass (D) 57 + 97 + 147 + 114 = 415

Mass (D) 57 + 97 + 147 + 114 18 = 397

Reconstruct peptide from the set of masses of fragment ions

The peaks in the mass spectrum:

From lectures by Vineet Bafna (UCSD)

1307 1313 2207 20 1173.8

2329 872.3 1275.3

S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6

The database of all known peptides is much smaller

However, de novo algorithms can be much faster, even

A database search scans all peptides in the database of

How to create edges (from mass differences)

How to score paths

How to find best path

: set of possible ion types

spectrum matches the experimental S

corresponding to an amino acid A:

Connect with an edge labeled by A

Gap edges for di- and tri-peptides

There are many paths, how to find the correct

We need scoring to evaluate paths

p(P, s) = the probability that peptide P

Scoring = computing probabilities

qj, if peak is generated at t

For a given MS/MS spectrum S, find a

p(P',S) max P p(P,S)

P = the optimal path in the spectrum graph

i-ions of a partial peptide are produced

and no peaks with probability

A peptide also produces a ``random noise''

Incorporates premiums for observed ions

Example: for k=4, assume that for a partial

A peak at position tj is generated with

R=T- U Ti - set of positions that are not

Similarly, for tR, the probability that P produces a

Das könnte Ihnen auch gefallen