Sie sind auf Seite 1von 2

Protein domains and motifs - Pamela Stanley lab wiki

http://stanxterm.aecom.yu.edu/wiki/index.php?page=Protein_domains_a...

September 10, 2013 | Home | What is Wiki | Adding or editing content | All documents | Disclaimer | My Lab Recently viewed: Home > Protein domains and motifs Document: Protein domains and motifs | Last modified: December 28, 2005

Protein Domains And Motifs


S Patnaik, Mar 2005 Sections Consensus sites, domains, folds, motifs, patterns, profiles and repeats Databases and links Detecting known motifs and domains in your sequence of interest Detecting unknown or specific patterns in your sequence of interest Consensus sites, domains, folds, motifs, patterns, profiles and repeats A consensus site usually refers to a position (usually conserved among homolous and orthologous sequences) that can theoretically get modified, for example, by phosphorylation or glycosylation. An asparagine followed by any amino acid follwed by a serine or threonine, for example, is a consensus site for N-linked glycosylation. A domain is a discrete structural unit that is assumed to fold independently of the rest of the protein and to have its own function. It can be composed of 20 or so amino acid residues to up to hundreds of them. Domains are made up of multiple secondary structure units (alpha helices, beta sheets, etc.) Most proteins are multi-domain. Folds are the core 3-D structures of domains. It is believed that only a few thousand folds exist. A beta-barrel is an example of a fold. Motifs are short, conserved regions and frequently are the most conserved regions of domains. Motifs are critical for the domain to function - in enzymes, for example, they may contain the active sites. Another example of motifs would be muclear localization sequences. A pattern describes a short, contiguous stretch of protein using regular expressions. E.g., DX[DE]X is a pattern composed of amino acid D, followed by any, followed by either D or E, followed by any. A profile is built by multiple sequence alignment, and is a matrix or table that describes the probability of finding a particular amino acid at a certain position. Mathematical means such as hidden Markov models are used to generate profiles. A repeat is a stretch of amino acid sequence that gets repeated a number of times along the length of the sequence. There usually is some sequence variation between the repeated segments. Many domains are constituted from repeats. Databases Hundreds of thousands of protein sequences have been manually or automatically analyzed to generate databases of patterns, profiles, domains, etc. The PROSITE database contains patterns as well as profiles. The Fingerprints or PRINTS-S database contains clusters of patterns that define protein families. BLOCKS and ProDom databases are made of sequence fragments (like patterns) that are generated from sequence alignment and clustering. The Pfam and SMART databases contain hidden Markov model profiles. Integrated databases such as InterPro and CDD seek to integrate some or all of above databases into a single resource. Detecting KNOWN motifs and domains in your sequence of interest To search (by keywords or sequence similarity) a particular database mentioned above, use these links BLOCKS - http://blocks.fhcrc.org/ Pfam - http://www.sanger.ac.uk/Software/Pfam/ ProDom - http://prodes.toulouse.inra.fr/prodom/doc/prodom.html PRINTS-S - http://bioinf.man.ac.uk/dbbrowser/sprint/printss_lis.html

1 of 2

9/10/2013 8:03 PM

Protein domains and motifs - Pamela Stanley lab wiki

http://stanxterm.aecom.yu.edu/wiki/index.php?page=Protein_domains_a...

PROSITE - http://www.expasy.ch/prosite/ SMART - http://smart.embl-heidelberg.de/ To search an integrated database, use one of these links CDD - http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml InterPro - http://www.ebi.ac.uk/interpro/ Uniprot - http://www.pir.uniprot.org/search/SearchTools.shtml best as it provides almost all the known information on the protein on one page To search for consensus sites (phosphorylation, glycosylation, etc.), use ELM server - http://elm.eu.org/ (multiple) CBS server - http://www.cbs.dtu.dk/services/ - kinase-specific phosphorylation site prediction, glycosylation prediction, sorting motifs, etc. Identifying UNKNOWN or specific repeats, motifs and domains in your sequence of interest This involves, first, collecting a group of sequences that are similar to yours. The set of sequences (or their alignment) is then analyzed for patterns. Some online servers are PRATT - for patterns - http://www.ebi.ac.uk/pratt/ RADAR - for repeats (using single sequence) - http://www.ebi.ac.uk/Radar/ To look for a pattern designated by you in a sequence, use the protein pattern find tool at http://bioinformatics.org /sms2/protein_pattern.html

accuracy, clarity, cost, ease, logic | 74 wiki pages served since a while | Admin login

2 of 2

9/10/2013 8:03 PM

Das könnte Ihnen auch gefallen