Sie sind auf Seite 1von 55

Protein Sequencing and

Identification by Mass
Spectrometry
Masses of Amino Acid Residues
Protein Backbone

H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-OH
Ri-1 Ri Ri+1
N-terminus C-terminus

AA residuei-1 AA residuei AA residuei+1


Peptide Fragmentation
Collision Induced Dissociation
H+
H...-HN-CH-CO . . . NH-CH-CO-NH-CH-CO-OH
Ri-1 Ri Ri+1
Prefix Fragment Suffix Fragment

Peptides tend to fragment along the backbone.


Fragments can also loose neutral chemical groups
like NH3 and H2O.
Breaking Protein into Peptides and
Peptides into Fragment Ions
Proteases, e.g. trypsin, break protein into
peptides.
A Tandem Mass Spectrometer further breaks the
peptides down into fragment ions and measures
the mass of each piece.
Mass Spectrometer accelerates the fragmented
ions; heavier ions accelerate slower than lighter
ones.
Mass Spectrometer measure mass/charge
ratio of an ion.
N-
te
rm
in
a lp
ep
C- tid
te e s
rm
ina
lp
ep
tid
e s
N- and C-terminal Peptides
Terminal peptides and ion types
Peptide

Mass (D) 57 + 97 + 147 + 114 = 415

Peptide without

Mass (D) 57 + 97 + 147 + 114 18 = 397


N- and C-terminal Peptides
486

71
415

s
e

e
tid

tid
185

ep
301

ep
lp

lp
a

ina
in
rm

154 332
rm
te

te
N-

C-

57 429
N- and C-terminal Peptides
486

71
415

s
e

e
tid

tid
185

ep
301

ep
lp

lp
a

ina
in
rm

154 332
rm
te

te
N-

C-

57 429
N- and C-terminal Peptides
486

71
415

185
301

154 332

57 429
N- and C-terminal Peptides
486

71
415

Reconstruct peptide from the set of masses of fragment ions


(mass-spectrum) 185
301

154 332

57 429
Peptide Fragmentation
b2-H2O b3- NH3
a2 b2 a3 b3

HO NH3+
| |
R1 O R2 O R3 O R4
| || | || | || |
H -- N --- C --- C --- N --- C --- C --- N --- C --- C --- N --- C -- COOH
| | | | | | |
H H H H H H H

y3 y2 y1
y3 -H2O y2 - NH3
Mass Spectra
G V D
D L K

H2O
L
57 Da =KG 99 Da = V D V G

mass
0

The peaks in the mass spectrum:


Prefix and Suffix Fragments.
Fragments with neutral losses (-H2O, -NH3)
Noise and missing peaks.
Protein Identification with MS/MS

G V D L K

Peptide
MS/MS Identification:
Intensity

mass
00
Tandem Mass-Spectrometry
Breaking Proteins into
Peptides

GTDIMR HPLC To
PAKID
MPSERGTDIMRPAKID...... MS/MS
MPSER

protein peptides
Mass Spectrometry
Matrix-Assisted Laser Desorption/Ionization (MALDI)

From lectures by Vineet Bafna (UCSD)


Tandem Mass Spectrometry S#: 1707 RT: 54.44 AV: 1 NL: 2.41E7
F: + c Full ms [ 300.00 - 2000.00]
RT: 0.01- 80.02 100
638.0

1389 NL: 95

MS
100 1991
1.52E8 90

LC
1409 2149
90 1615 1621 Base Peak F: + 85

80
c Full ms [
1411 300.00 -
75
80 2147 70
1611 2000.00] 65
70 1655 1995 60
1593
1387

Relative Abundance
55
60 2155
801.0
1435 1987 50

2001 2177 45
ce

50 1445 1661
an

1937
d

40
n

Scan 1707
u
b

2205
tiveA

35
40 1779
638.9
2135 30
Rla
e

30 2017 25

1307 1313 2207 20 1173.8

2329 872.3 1275.3


20 1105 1707
15

1095 10
687.6
944.7 1742.1 1884.5
2331 783.3 1048.3 122.0 1413.9 1617.7
10 5

0
200 400 600 800 1000 1200 1400 1600 1800 2000
0 m/z

5 10 15 20 3 25 35 0 40 45 50 55 60 65 70 75 80
Time (min)

S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6


T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
850.3
100

95
687.3
90

85

collision
588.1
80

75

MS-1 MS-2
MS/MS
70

cell 65

60

Relative Abundance
55 851.4
425.0
50

Ion
45 949.4
40
326.0
35 524.9

Source
30

25 589.2
20 1048.6
226.9 397.1
1049.6
15 489.1

10
629.0
5

Scan 1708
0
200 400 600 800 1000 1200 1400 1600 1800 2000
m/z
Protein Identification by Tandem
Mass Spectrometry
S MS/MS instrument S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6

e
T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
850.3
100

95
687.3
90

q
85
588.1
80

75

u
70

65

60

Relative Abundance
55 851.4
425.0

e
50

45 949.4
40
326.0
35 524.9

n Database search
30

25 589.2
20 1048.6
226.9 397.1
1049.6

c
489.1

Sequest
15

10
629.0
5

e
200 400 600 800 1000 1200 1400 1600 1800 2000

de Novo interpretation
m/z

Sherenga
Tandem Mass Spectrum
Tandem Mass Spectrometry (MS/MS): mainly
generates partial N- and C-terminal peptides
Spectrum consists of different ion types
because peptides can be broken in several
places.
Chemical noise often complicates the
spectrum.
Represented in 2-D: mass/charge axis vs.
intensity axis
De Novo vs. Database Search
S #: 1708 R T: 54.47 AV: 1 N L: 5.27E6
T: + c d Full m s 2 638.00 [ 165.00 - 1925.00]
850.3
100

Database
95
687.3
90

De Novo
85
588.1
80

75

70

R e la ti ve A b u n d a n c e
65

60

55 851.4
425.0
50

45 949.4
40

Search
326.0
35 524.9

30

25 589.2
20 1048.6
226.9 397.1
1049.6
15 489.1

10
629.0
5

0
200 400 600 800 1000 1200 1400 1600 1800 2000
m /z

Mass, Score
Database of Database of allWpeptides =
R 20
n

known peptides
A A V L
L
G
G T
AAAAAAAA,AAAAAAAC,AAAAAAAD,AAAAAAAE,
MDERHILNM, KLQWVCSDL, AAAAAAAG,AAAAAAAF,AAAAAAAH,AAAAAAI,
E
PTYWASDL, ENQIKRSACVM, P
C L K
K
TLACHGGEM, NGALPQWRT,
HLLERTKMNVV, GGPASSDA, W
AVGELTI, AVGELTK , AVGELTL, AVGELTM,
D
GGLITGMQSD, MQPLMNWE,
ALKIIMNVRT,AVGELTK
ALKIIMNVRT, AVGELTK, ,
T
HEWAILF, GHNLWAMNAC,
GVFGSVLRA, EKLNKAATYIN.. YYYYYYYS,YYYYYYYT,YYYYYYYV,YYYYYYYY

AVGELTK
De Novo vs. Database Search: A
Paradox
The database of all peptides is huge O(20n) .

The database of all known peptides is much smaller


O(108).

However, de novo algorithms can be much faster, even


though their search space is much larger!

A database search scans all peptides in the database of


all known peptides search space to find best one.
De novo eliminates the need to scan database of all
peptides by modeling the problem as a graph search.
De novo Peptide Sequencing
S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6
T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
850.3
100

95
687.3
90

85
588.1
80

75

70

65

60
Relative Abundance

55 851.4
425.0
50

45 949.4
40
326.0
35 524.9

30

25 589.2
20 1048.6
226.9 397.1
1049.6
15 489.1

10
629.0
5

0
200 400 600 800 1000 1200 1400 1600 1800 2000
m/z

Sequence
Theoretical Spectrum
Theoretical Spectrum (contd)
Theoretical Spectrum (contd)
Building Spectrum Graph
How to create vertices (from masses)

How to create edges (from mass differences)

How to score paths

How to find best path


b

S E Q U E N C
E

Mass/Charge (M/Z)
a

SE Q U E N C E

Mass/Charge (M/Z)
a is an ion type shift in b

S E Q U E N C E

Mass/Charge (M/Z)
y

E C N E U Q E S

Mass/Charge (M/Z)
Intensity

Mass/Charge (M/Z)
Intensity

Mass/Charge (M/Z)
noise

Mass/Charge (M/Z)
MS/MS Spectrum
Intensity

Mass/Charge (M/z)
Some Mass Differences between
Peaks Correspond to Amino Acids

u
q
e
s e u q
e
n
c n
e e
s e q c
n e
u e s
c
e
Ion Types
Some masses correspond to fragment
ions, others are just random noise
Knowing ion types ={1, 2,, k} lets us
distinguish fragment ions from noise
We can learn ion types i and their
probabilities qi by analyzing a large test
sample of annotated spectra.
Example of Ion Type
={1, 2,, k}
Ion types
{b, b-NH3, b-H2O}
correspond to
={0, 17, 18}

*Note: In reality the value of ion type b is -1 but we will hide it for the sake of simplicity
Match between Spectra and the
Shared Peak Count
The match between two spectra is the number of masses
(peaks) they share (Shared Peak Count or SPC)
In practice mass-spectrometrists use the weighted SPC
that reflects intensities of the peaks
Match between experimental and theoretical spectra is
defined similarly
Peptide Sequencing Problem
Goal: Find a peptide with maximal match between
an experimental and theoretical spectrum.
Input:
S: experimental spectrum

: set of possible ion types

m: parent mass

Output:
P: peptide with mass m, whose theoretical

spectrum matches the experimental S


spectrum the best
Vertices

of Spectrum
MassesofpotentialNterminalpeptides
Graph
Verticesaregeneratedbyreverseshiftscorrespondingtoiontypes

={1,2,,k}
EveryNterminalpeptidecangenerateuptokions

m1,m2,,mk

EverymasssinanMS/MSspectrumgenerateskvertices
V(s)={s+1,s+2,,s+k}
correspondingtopotentialNterminalpeptides
Verticesofthespectrumgraph:
{initialvertex}V(s1)V(s2)...V(sm){terminalvertex}
Reverse Shifts

Shift in H2O

Shift in H2O+NH3
Edges of Spectrum Graph
Two vertices with mass difference

corresponding to an amino acid A:

Connect with an edge labeled by A

Gap edges for di- and tri-peptides


Paths
Path in the labeled graph spell out amino acid
sequences

There are many paths, how to find the correct


one?

We need scoring to evaluate paths


Path Score
p(P,S) = probability that peptide P produces
spectrum S= {s1,s2,sq}

p(P, s) = the probability that peptide P


generates a peak s

Scoring = computing probabilities

p(P,S) = sS p(P, s)
Peak Score
For a position t that represents ion type dj :

qj, if peak is generated at t


p(P,st) =
1-qj , otherwise
Peak Score (contd)
For a position t that is not associated with an
ion type:
qR , if peak is generated at t
pR(P,st) =
1-qR , otherwise
qR = the probability of a noisy peak that does
not correspond to any ion type
Finding Optimal Paths in the Spectrum Graph

For a given MS/MS spectrum S, find a


peptide P maximizing p(P,S) over all possible
peptides P:

p(P',S) max P p(P,S)


Peptides = paths in the spectrum graph

P = the optimal path in the spectrum graph


Ions and Probabilities
Tandem mass spectrometry is characterized
by a set of ion types {1,2,..,k} and their
probabilities {q1,...,qk}

i-ions of a partial peptide are produced


independently with probabilities qi
Ions and Probabilities
k
A peptide has all k peaks with probability q
i 1
i

(1 q )
i 1
i

and no peaks with probability

A peptide also produces a ``random noise''


with uniform probability qR in any position.
Ratio Test Scoring for Partial Peptides

Incorporates premiums for observed ions


and penalties for missing ions.

Example: for k=4, assume that for a partial


peptide P we only see ions 1,2,4.
q1 q2 (1 q3 ) q4
The score is calculated as:
qR qR (1 qR ) qR
Scoring Peptides
T- set of all positions.
Ti={t 1,, t 2,..., ,t k,}- set of positions that
represent ions of partial peptides Pi.

A peak at position tj is generated with


probability qj.

R=T- U Ti - set of positions that are not


associated with any partial peptides (noise).
Probabilistic Model
For a position t j Ti the probability p(t, P,S) that
peptide P produces a peak at position t.
qj if a peak is generated at position t j
P (t , P, S )
1 q j otherwise

Similarly, for tR, the probability that P produces a


random noise peak at t is:
qR if a peak is generated at position t
PR (t )
1 qR otherwise
Probabilistic Score
For a peptide P with n amino acids, the score
for the whole peptides is expressed by the
following ratio test:
k p (t
i j , P , S )
n
p ( P, S )

pR ( S ) i 1 j 1 pR (ti j )
De Novo vs. Database Search
S #: 1708 R T: 54.47 AV: 1 N L: 5.27E6
T: + c d Full m s 2 638.00 [ 165.00 - 1925.00]
850.3
100

Database
95
687.3
90

De Novo
85
588.1
80

75

70

R e la ti ve A b u n d a n c e
65

60

55 851.4
425.0
50

45 949.4
40

Search
326.0
35 524.9

30

25 589.2
20 1048.6
226.9 397.1
1049.6
15 489.1

10
629.0
5

0
200 400 600 800 1000 1200 1400 1600 1800 2000
m /z

W
Database of R
known peptides
A
A V L
L
G
G T
MDERHILNM, KLQWVCSDL, E
PTYWASDL, ENQIKRSACVM, P
C L K
K
TLACHGGEM, NGALPQWRT,
HLLERTKMNVV, GGPASSDA, W D
GGLITGMQSD, MQPLMNWE,
ALKIIMNVRT,AVGELTK
ALKIIMNVRT, AVGELTK, ,
T
HEWAILF, GHNLWAMNAC,
GVFGSVLRA, EKLNKAATYIN..

AVGELTK

Das könnte Ihnen auch gefallen