Sie sind auf Seite 1von 2

FASTA Format

FASTA format is a compact and simple method of storing DNA


and protein sequences as text files that can be read by virtually all
molecular biology programs.

A sequence in FASTA format

begins with a single-line description (or header),

followed by lines of sequence data.

The description line is distinguished from the sequence data


by a greater-than (">") symbol in the first column. It is
recommended that all lines of text be shorter than 80
characters in length.

An example sequence in FASTA format is:

essential!!!
name of the sequence

>gi|532319|pir|TVFV2E|TVFV2E envelope protein


ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTTGLLLNGSYSENRT
QIWQKHRTSNDSALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQKYNLRLRQAWC
HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHIRSVIIWLETISKK
TYAPPREGHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRYKLVEITPIGF
APTEVRRYTGGHERQKRVPFVVQSQHLLAGILQQQKNLLAAVEAQQQMLKLTIWGVK

Sequences are expected to be represented in the standard IUB/IUPAC amino


acid and nucleic acid codes,
A Ala L Leu
B Asx M Met
C Cys N Asn
D Asp P Pro
E Glu Q Gln
F Phe R Arg
G Gly S Ser
H His T Thr
I Ile V Val
K Lys W Trp
Y Tyr

lower-case letters are are equivalent to upper-case.

Some, but not all programs that accept 'FASTA Format" recognize
- a hyphen or dash (-) to represent a gap of indeterminate length and
- an asterix (*) to represent an unknown or ambiguous character.

Introduction to Bioinformatics
Matthias Sipiczki, Department of Genetics, University of Debrecen
Comments to: lipovy@tigris.unideb.hu

Das könnte Ihnen auch gefallen