Beruflich Dokumente
Kultur Dokumente
PLASMAPPER
SOURCE: wishart.biology.ualberta.ca/PlasMapper
METHOD:
1. Collect the sequence for which plasmapper has to design, in Fasta format from
3. Paste the sequence in fasta format in the space of the home page of the
website.
4. Set the defaults and click ‘graphic map’ to get the result.
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
1
OUTPUT:
INTERPRETATION:
PlasMapper uses sequence pattern matching and BLAST alignment to automatically
identify and label common promoters, terminators, cloning sites, restriction sites,
2
reporter genes, affinity tags, selectable marker genes, replication origins and open
reading frames
AIM: To do the restriction mapping of the given sequence by using Bio Edit tool.
DESCRIPTION:
Bioedit is a biological sequence alignment editor written for windows of
51981 NT. A rich intuitive multiple document interface with many convenient
features makes alignment, manipulation and viewing of sequences relatively quick
and easy on desktop. Several sequences manipulation and analysis options and fully
automated links to local and www- based analysis programs facilitate an integrated
working environment which allows to view, align and analyze sequences from a
single application with simple point and click operations.
SOURCE:
http://www.mbio.ncsu.edu/bio edit/page2.Html
METHOD:
1) Collect the sequence for which restriction mapping has to be done in ‘fasta’
format from NCBI.
2) Open the source website www.mbio.ncsu.edu/bio edit/page2.Html
3) Download the bio edit tool by using the source website.
4) Open the query sequence inside the tool in the given space.
5) Select the sequence and then do the editing and restriction mapping by
clicking restriction mapping.
6) Save the result page in which sequence has been mapped.
3
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
OUTPUT:
4
INTERPRETATION:
Restriction mapping of the given sequence has been done, it gives the cutting
number of the various restriction enzymes like BsmI, XcaI, etc. It shows the location
of the restriction site of various enzymes also. This tool is used for recombinant DNA
technology for finding the cutting sites of restriction enzymes present in particular
sequence.
3.PRIMER DESIGNING
AIM: To design the primer of the given query sequence by the using ‘PRIMER 3’
primer design tool.
DESCRIPTION:
Primer 3 is a tool used to choose primes for PCR reactions. Primer 3’s design
is heavily based on earlier implementations of similar programs: Prime (0.5) and
primer V2. Primer 3 can also design hybridization probes and sequencing primers.
SOURCE:
http:// biotools.umassmed.edu/bioapps/primer 3_www.cgi.
METHOD:
1) Collect the sequence for which primer has to design, in Fasta format from
NCBI home page.
2) Open the source website: biotools.umassmed.edu/bioapps/primer 3_www.cgi.
3) Paste the sequence in fasta format in the space of the home page of the
website.
4) Set the defaults and click ‘pick primers’ to get the result.
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
5
OUTPUT:
INTERPRETATION:
Primers were designed by using tools. Left primer and Right primer have
designed, some other oligos also used for designing.
6
4.SEQUENCE RETRIVEL
NCBI
AIM: To retrieve the nucleotide for the given accession number from the NCBI
nucleotide sequence database
.DESCRIPTION
:ethods for determining DNA sequences were first described in 1972. since then, a wealth of
sequence information has been obtained and deposited in several essential centralized
locations. These generalized databases includes:
Genbank
EMBL
DDBJ
Databases and databases analysis tools allow a researcher to probe for a desired
sequence. The National Center for Biotechnology Information (NCBI) is part of
the United States National Library of Medicine (NLM), a branch of the National Institutes
of Health. The NCBI has had responsibility for making available the GenBank DNA
sequence database since 1992. GenBank coordinates with individual laboratories and
other sequence databases such as those of the European Molecular Biology Laboratory
(EMBL) and the DNA Database of Japan (DDBJ).
SOURCE : http://www.ncbi.nlm.nih.gov/
METHOD:
7
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
.
OUTPUT:
INTERPRETATION:
Nucleotide and protein sequence has been retrieved using NCBI sequence
database.
EMBL
AIM: To retrieve the nucleotide for the given accession number from the EMBL
nucleotide sequence database.
DESCRIPTION:
The European Molecular Biology Laboratory (EMBL) is a molecular biology
research institution supported by 20 countries comprising nearly all of western
Europe and Israel. The cornerstones of EMBL's mission are: to perform basic research
8
in molecular biology, to train scientists, students and visitors at all levels, to offer vital
services to scientists in the member states, to develop new instruments and methods in
the life sciences, and to actively engage in technology transfer.
SOURCE: http://www.ebi.ac.uk/embl/
METHOD:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
OUTPUT:
INTERPRETATION:
9
Swissprot
AIM: To retrieve the nucleotide for the given accession number from the Swissprot
nucleotide sequence database.
DESCRIPTION :
Swiss-Prot is a manually curated biological database of protein sequences. Swiss-Prot was
created in 1986 by Amos Bairoch during his PhD and developed by the Swiss-Prot and its
automatically curated supplement TrEMBL, have joined with the Protein Information
Resource protein database to produce the UniProt Knowledgebase, the world's most
comprehensive catalogue of information on proteins.[2] As of 3 April 2007, UniProtKB/Swiss-
Prot release 52.2 contains 263,525 entries. As of 3 April 2007, the UniProtKB/TrEMBL
release 35.2 contains 4,232,122 entries.
SOURCE: http://www.ebi.ac.uk/swissprot/
METHOD:
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
OUTPUT:
10
v
INTERPRETATION:
The sequence for the given accession number has been retrieved from the swissprot
protein database.
11
5.SEQUENCE FORMAT CONVERSION
SQUIZZ
AIM : To convert the given sequence in NCBI format to EMBL format using SQUIZZ as
format conversion tool.
DESCRIPTION:
All the tools available for analysis of biological data(sequences), requires data in different
formats. T o change the same data in different formats to make it acceptable to different
sequence analysis tools, we require the sequence format conversion tools. There are different
tools available at the web site.
SQUIZZ allows the verification of sequence or sequence alignment format and conversion in
To the following formats:-
• CLUSTAL
• EMBL
• FASTA
• GCG
• GDE
• GENBANK
• NBRF
• MSF
• Phyllip
SOURCE; http://bioweb.pasteur.fr/sequenal/interface/squizz.html
METHOD:
1. The home page of sequence conversion tool was opened by typing “sequence
conversion tool” in the google search tool bar.
2. Then the sequence format conversion hyperlink was clicked on open page.
3. SQUIZZ hyperlink was clicked to open this page.
4. A nucleotide sequence was taken in NCBI format and put in hyperlink Actual data
here
5. SQUIZZ was run
12
6 . Format was converted into changed format from hyperlink Convert into format.
7. Results in changed format were obtained and saved to notepad.
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
OUTPUT:
INTERPRETATION:
Given nucleotide sequence was converted from genbank to EMBL format using SQUIZZ
sequence format conversions tool.
13
READSEQ
AIM : To convert the given sequence in EMBL format to FASTA format using READSQ
format conversion tool.
DESCRIPTION:
Sequence format conversion inputs DNA or amino acid sequence of specified format. Input
format is determined automatically. Automatically detects input format and converts into
following formats:
• CLUSTAL
• EMBL
• FASTA
• GCG
• GDE
• GENBANK
• NBRF
• MSF
• Phyllip
In the present exercise we have converted EMBL format to FASTA using READseq
conversion tool.
SOURCE:
http://bioweb.pasteur.fr/sequenal/interface/readseq.cgi
METHOD:
1. The home page of sequence conversion tool was opened by typing “sequence
conversion tool” in the google search tool bar.
2. Then the sequence format conversion hyperlink was clicked on open page.
3. READSEQ hyperlink was clicked to open this page.
4. A protein sequence was taken in EMBL format and put in hyperlink Actual data
here.
5. SQUIZZ was run.
6. Format was converted into fasta format from hyperlink Convert into format.
14
7. Results in changed format were obtained and saved to notepad.
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
vvv
OUTPUT;
INTERPRETATION:
15
Given protein sequence was converted from EMBL to FASTA format using READSEQ
sequence format conversions tool.
FMTSEQ
AIM : To convert the given sequence in EMBL format to CLUTAL format using FMTSEQ
format conversion tool.
DESCRIPTION:
Format conversion tool converts sequence between 22 sequence format types. FMTSEQ
converts sequence between many formats including among
• CLUSTAL
• EMBL
• FASTA
• GCG
• GDE
• GENBANK
• NBRF
• MSF
• Phyllip
SOURCE;
http://evol.biology.mcmaster.ca/seqanal/tmp/fmt.seq/A27358120711907/fmtseq.out
METHOD:
1. The home page of sequence conversion tool was opened by typing “sequence
conversion tool” in the google search tool bar.
2. Then the sequence format conversion hyperlink was clicked on open page.
3. FMTSEQ hyperlink was clicked to open this page.
16
4. A nucleotide sequence was taken in EMBL format and put in hyperlink Actual data
here.
5. FMTSEQ was run.
6. Format was converted into format from hyperlink Convert into format.
7. Results in changed format were obtained and saved to notepad.
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
OUTPUT;
INTERPRETATION:
Given nucleotide sequence was converted from EMBL to CLUSTAL format using FMTSEQ
sequence format conversions tool.
17
SREFORMAT
AIM : To convert the given sequence in NCBI format to PIR format using SREFORMAT as
format conversion tool.
DESCRIPTION:
SreFormat allows the user to convert one sequence format conversion to another conversion.
It can accept the sequence in following format :
• CLUSTAL
• EMBL
• FASTA
• GCG
• GDE
• GENBANK
• NBRF
• MSF
• Phyllip
SOURCE; http://bioweb.pasteur.fr/sequenal/interface/SreFormat.html
METHOD:
1. The home page of sequence conversion tool was opened by typing “sequence
conversion tool” in the google search tool bar.
2. Then the sequence format conversion hyperlink was clicked on open page.
18
Homo sapiens S100 calcium binding protein B
OUTPUT;
INTERPRETATION:
Given protein sequence was converted from NCBI to PIR format using SreFormat sequence
format conversions tool.
SMS
19
AIM: To convert the given sequence in GenBank format to the FASTA format.
DESCRIPTION:
sequence into any type of the required format
Using this tool, sequences in any format can be converted into the
following listed format. MVIEW tool is used to convert the given
In this it has the input option and output option.
INPUT OPTION:
• Pearson/FASTA
• MSF(GCG)
• CLUSTALW
• Max Hom/ HSSP
• Plain
• Multa: MULTAS/MULTAL
• Mips: MIPS-ALN
OUTPUT OPTION:
• HTML
• GCG/MSF
• Pearson/FASTA
• PIR
• RDB table for storaqe/manipulation in relational database form
METHODOLOGY:
A. Given sequence in FASTA format is pasted in the table provided.
B. PIR format is selected from the options provided.
C. 3. Email I.D Is Provided when it is required.
D. Tool is performed and result obtained is saved.
INPUT:
Seq name: Rattus norvegicus
Accession number: :NM_053814
WEB PAGE
20
output
INTERPRETATION:
GenBank formatted query sequence had been converted into FASTA format by
using sequence conversion tool SMS
21
6. ORF FINDER
AIM: To find the open reading frame for the direct and the reverse strand
DESCRIPTION:
ORF Finder searches for open reading frames (ORFs) in the DNA sequence you
enter. The program returns the range of each ORF, along with its protein translation. Use
ORF Finder to search newly sequenced DNA for potential protein encoding segments. ORF
Finder supports the entire IUPAC alphabet and several genetic codes.
SOURCE:
www.bioinformatics.org/sms2/
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
22
OUTPUT:
INTERPRETATION:
By using ORF finder tool we have fond out the open reading frame for Oryza sativa gene of
accesseion no: EF 183474.
HOMOLOGY SEARCH
The term sequence analysis in biology implies subjecting a DNA or peptide sequence to
sequence alignment, sequence database, repeated sequence searches or other bioinformatics
methods on a computer.
23
The BLAST program can either be downloaded and run as a command-line utility
"blastall" or accessed for free over the web. The BLAST web server, hosted by the
NCBI, allows anyone with a web browser to perform similarity searches against
constantly updated databases of proteins and DNA that include most of the newly
sequenced organisms. BLAST is actually a family of programs (all included in the blastall
executable). The following are some of the programs, ranked mostly in order of importance:
Nucleotide-nucleotide BLAST (blastn) :This program, given a DNA query, returns the most
similar DNA sequences from the DNA database that the user specifies.
Protein-protein BLAST (blastp) :This program, given a protein query, returns the most
similar protein sequences from the protein database that the user specifies.
Nucleotide 6-frame translation-protein (blastx) :
This program compares the six-frame conceptual translation products of a nucleotide query
sequence (both strands) against a protein sequence database.
Nucleotide 6-frame translation-nucleotide 6-frame translation (tblastx) :
This program is the slowest of the BLAST family. It translates the query nucleotide sequence
in all six possible frames and compares it against the six-frame translations of a nucleotide
sequence database. The purpose of tblastx is to find very distant relationships between
nucleotide sequences.
Protein-nucleotide 6-frame translation (tblastn) :
This program compares a protein query against the six-frame translations of a nucleotide
sequence database.
NUCLEOTIDE BLAST
DESCRIPTION :
BLAST is one of the most widely used bioinformatics programs, because it addresses a
fundamental problem and the algorithm emphasizes speed over sensitivity.
24
To run, BLAST requires two sequences as input: a query sequence (also called the target
sequence) and a sequence database. BLAST will find subsequences in the query that are
similar to subsequences in the database.
Nucleotide-nucleotide BLAST (blastn) :
This program, given a DNA query, returns the most similar DNA sequences from the DNA
database that the user specifies.
METHOD:
SOURCE:
http://www.ncbi.nlm.nih.gov/blast/Blast.cgl
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
25
OUTPUT:
26
INTERPRETATION:
By using nucleotide blast we are able to get nucleotide sequence with maximum similarity.
The accession number for homologous sequence is: AY532754
PROTEIN BLAST
Search a Protein database using a protein query
DESCRIPTION :
BLAST is one of the most widely used bioinformatics programs, because it addresses a
fundamental problem and the algorithm emphasizes speed over sensitivity.
To run, BLAST requires two sequences as input: a query sequence (also called the target
sequence) and a sequence database. BLAST will find subsequences in the query that are
similar to subsequences in the database.
Protein-protein BLAST (blastp):
This program, given a protein query, returns the most similar protein sequences from the
protein database that the user specifies
METHOD:
SOURCE:
http://www.ncbi.nlm.nih.gov/blast.cgi#24657901
27
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
OUTPUT:
28
INTERPRETATION:
By using proetin blast we are able to get protein sequence with maximum similarity.
The accession number for homologous sequence is: NP_563915
BLASTX
DESCRIPTION :
BLAST is one of the most widely used bioinformatics programs, because it addresses a
fundamental problem and the algorithm emphasizes speed over sensitivity.
To run, BLAST requires two sequences as input: a query sequence (also called the target
sequence) and a sequence database. BLAST will find subsequences in the query that are
similar to subsequences in the database.
Nucleotide 6-frame translation-protein (blastx)
This program compares the six-frame conceptual translation products of a nucleotide query
sequence (both strands) against a protein sequence database.
METHOD:
SOURCE:
http://www.ncbi.nih.gov/blast/Blast.cgi
29
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
OUTPUT:
30
INTERPRETATION:
By using blastx we are able to get protein sequence with maximum similarity.
The accession number for homologous sequence is: AAT40013
tBLAST N
DESCRIPTION :
BLAST is one of the most widely used bioinformatics programs[2], because it
addresses a fundamental problem and the algorithm emphasizes speed over
sensitivity.
Nucleotide 6-frame translation-nucleotide 6-frame translation (tblastx)
This program is the slowest of the BLAST family. It translates the query nucleotide
sequence in all six possible frames and compares it against the six-frame translations
of a nucleotide sequence database. The purpose of tblastx is to find very distant
relationships between nucleotide sequences.
METHOD:
SOURCE:
31
http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
OUTPUT:
32
INTERPRETATION:
By using tblastN we are able to get translated nucleotide sequence with maximum similarity.
The accession number for homologous sequence is: NM_1135932
tBLASTX
DESCRIPTION :
BLAST is one of the most widely used bioinformatics programs[2], because it addresses a
fundamental problem and the algorithm emphasizes speed over sensitivity.
33
To run, BLAST requires two sequences as input: a query sequence (also called the target
sequence) and a sequence database. BLAST will find subsequences in the query that are
similar to subsequences in the database.
METHOD:
SOURCE:
http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
34
OUTPUT:
INTERPRETATION:
By using tblastX we are able to get translated nucleotide sequence with maximum similarity.
The accession number for homologous sequence is: NM_001080098
35
FASTA
FASTA stands for FAST- all, reflecting the fact that it can be used for a fast
William.R. Pearson in 1985. This program achieves a high level of sensitivity for
for local alignments using a substitution matrix. The high speed is achieved by using
the observed pattern of word hits to identify potential matches before attempting the
more time consuming optimized search. The trade – off between speed and sensitivity
is controlled by the ktup parameter, which specifies the size of the word. Increasing
the ktup decreases the number of background hits. Not every word hit is investigated
but instead initially looks for segment’s containing several nearby hits.
Tool Description
36
FASTA – PROTEIN
AIM:-To find similarity in the protein sequences for the given query protein sequence
DESCRIPTION:-
databases using the FASTA programs. FASTA can be very specific when identifying
long regions of low similarity especially for highly diverged sequences. We can also
conduct sequence similarity searching against proteome or genome database using the
FASTA program.
SOURCE: htpp/www.ebi.ac.uk/Fasta33.
METHOD:
4. Click on FASTA.
submission box.
37
INPUT-
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
OUTPUT:
38
INTERPRETATION:
FASTA- NUCLEOTIDE
AIM:To find similarity in the nucleotide sequences for the given query nucleotide
DESCRIPTION:
databases using the FASTA programs. FASTA can be very specific when identifying
long regions and low similarity especially for highly diverged sequence. We can
SOURCE:
http://www.ebi.ac.uk/fasta33/
METHOD:
4. Click on FASTA.
39
6. Paste or browse a protein sequence in any format in the sequence
submission box.
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
40
OUTPUT:
INTERPRETATION:
Biologist often find a protein with approximately the same sequence in different
species, suggesting that the proteins have a closely related biological function and that the
gene encoding these protein have come from common genetic source. If we align theses genes
we find some are alike and some are almost identical.
As with aligning a pair of sequence, that difficulty in aligning a group of sequences varies
considerably, being much greater as the degree of sequence similarity decreases, when the
amount of sequence variation is great, it is difficult to find an optimal alignment of sequences
because so many combinations of substitutions, insertion and deletion, each predicting a
different alignment are possible.
41
MULTIPLE SEQUENCE ALIGNMENT
USING CLUSTAL W
DESCRIPTION:
METHOD:
1. Select more than two protein or nucleotide sequence from NCBI in FASTA format.
2. Copy the sequences and save in format.
3. Type Clustal W in google search bar.
4. Click on multiplae sequence aignmment- Clustal W.
5. Submit the sequence in enter sequence box.
6. Click on execute multiple alignment.
7. Copy and save the result on notepad.
SOURCE: http://www.ebi.ac.uk/Tools/clustalw2/index.html
42
INPUT:
ACCESSION NO: NM_008083
NM_001081278
NM_012519
43
OUTPUT:
RESULTS:
Multiple sequence alignment for three insuline protein was performed using Clustal W.
DESCRIPTION:
T-Coffee is an advanced pairwise alignment program that uses a system of sequences
position weights to generate an multiple sequence alignment that is the most consistent with
pair-wise alignments of all the component sequences ( T-Coffee stands for tree based
Consistency based objective function for alignment evaluation). T-Coffee is better than
Clustal W at reproducing known alignment of related proteins but is much slower.
METHOD:
44
1. Select more than two protein or nucleotide sequence from NCBI in FASTA format.
2. Copy the sequences and save in format.
3. Type T-Coffee in google search bar.
4. Click on multiplae sequence aignmment- T-Coffee
5. Submit the sequence in enter sequence box.
6. Run the program.
7. Copy and save the result on notepad.
SOURCE: http://www.ch.embnet.org/software/TCoffee.html
INPUT:
ACCESSION NO: NM_008083
NM_001081278
NM_012519
OUTPUT:
45
RESULTS:
Multiple sequence alignment for three insuline protein was performed using T-Coffee.
46
AIM: To align three sequences using Multalign.
DESCRIPTION:
Multalign does a simultaneous alignments for two or more DNA or protein sequences.
It introduce a certain number of gaps into either pairwise aligned sequences to find minimal
global distance. The program is based on a generalization of the algorithm of Watermann-
Smith and Beyer by Kreger and Osterburg.
METHOD:
1. Select more than two protein or nucleotide sequence from NCBI in FASTA format.
2. Copy the sequences and save in format.
3. Type Multalign in google search bar.
4. Click on multiplae sequence aignmment- Multalign.
5. Submit the sequence in enter sequence box.
6. Run the program.
7. Copy and save the result on notepad.
SOURCE: http://bioinfo.genopole-toulouse.prd.fr/multalin/multalin.html
INPUT:
ACCESSION NO - NM 14646
NM 001122899
NM 010704
47
OUTPUT:
INTERPRETATION:
Multiple sequence alignment for three insuline protein was performed using
multalign.
48
GENE PREDICTION
With the advent of whole genome sequencing projects, It has become routine to scan
genomic DNA sequences t find genes, particularly those that encode protein. Computational
methods for gene prediction work by searching through sequences to locate the most likely
ones that encodes proteins. Predicating protein-encoding genes is generally easier in
prokaryotes than in eukaryotic organisms because prokaryotic generally lack introns and
because several quite highly conserved sequences are found in the promoter region and
around the start sites of transcription and translation.
DESCRIPTION:-
SOURCE:-
http://www.itb.cnr.it/sun/webgene/
METHOD:-
49
1. Sequence of human insulin was retrieved from NCBI and saved in note pad.
2. On google search bar ,webgene was typed.
3. Webgene home page was opened.
I. Gene builder:-
50
Sequence was pasted in the given box.
Analysis was run
Results were saved.
V. HC polyA
VI . Hctata
VIII . AUG_evaluator
51
Analysis was run
Results were saved.
52
OUTPUT: Repeat-View
53
OUTPUT: Splice View
OUTPUT: HC polyA
54
OUTPUT: . Hctata
55
OUTPUT: Gen view2
OUTPUT: AUG_evaluator
56
RESULTS AND INTERPRETATION:
Eight programs of Webgene were run for human insulin gene to predict:
Gene builder- protein coding gene.
Repeat view- repeated element mapping.
CpG island- CpG island.
Splice view- Splicing signal.
HcpolyA- for PolyA.
Hctata- for TATA signal prediction.
Genview- protein coding gene.
AUG_evaluator- start codon.
DESCRIPTION:-
The GeneMark. hmm algorithm presented here was designed to improve the
gene prediction quality in terms of finding exact gene boundaries. The high gene
finding accuracy has been found with genmark. This program also use the specially
derived ribosome binding site pattern to refine predictions of translation initiation
codons.
SOURCE:-
http://exon.gatech.edu/Genmark/genmark_prok_gms_plus.cgi
METHOD:-
1. Sequence of prokaryotic gene was retrieved from NCBI and saved in note pad.
2. On google search bar ,genmark was typed.
57
3. Genmark home page was opened.
4. Sequence was pasted in box.
5. Analysis was done.
6. Results were saved.
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
OUTPUT:
58
INTERPRETATION:
DESCRIPTION:-
Genescan is an example of an approaches for gene prediction which integrate
multiple types of information including splice signal sensors, compositional properties of
coding and non-coding DNA and in some cases database homology searching in order to
predict entire gene structures (sets of spliceable exons) in genomic sequences. Genescan use
distinct, explicit, empirically derived sets of model parameters to capture differences in gene
structure and composition between distinct C . G compositional regions (isochores) of the
human genome. It also has the capacity to predict multiple genes in a sequence, to deal with
partial as well as complete genes, and to predict consistent sets of genes occuring on either or
both DNA strands.
59
SOURCE :-
http://genome.dkfz-heidelberg.de/cgi-bin/GENSCAN/genscan.call.cgi
METHOD:-
1. Sequence of human insulin was retrieved from NCBI and saved in note pad.
2. On google search bar genescan was typed.
3. Genescan home page was opened.
4. Sequence was pasted in box.
5. Analysis was done.
6. Results were saved.
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
OUTPUT:
60
INTERPRETATION:
Genscan program was run to predict gene of eukaryotes.
Result was saved.
AIM: To search patterns and profiles of given protein sequences using various EXPASy
tools.
DESCRIPTION:
The Expasy (Expert Protein Analysis System) is a proteomics server of the Swiss
Institute of Bioinformatics (SIB) which analyzes protein sequences and structures and two
dimensional electrophoresis. The server functions in collaboration with the EBI. Expasy also
produes the protein sequence knowledge base Uniprot and Swissprot.
For the prediction of patterns and profiles of proteins Expasy produces tools like
1. ELM
61
2. FingerPRINTScan
3. Motif Scan
4. Proscan
5. PRATT
Profiles are numerical representation of a multiple sequence alignment. Profiles help find the
similarities between these sequences and help in identification and analysis of distant related
proteins.
Patterns also represent the common characterstics of a protein family but it does not contain
any weighing information. Thus, the user can specify what kind of patterns should be
searched for, and how many sequences should match a pattern to be repeated- there are option
fot pattern conservation, restrictions, number of pattern symbols, flexible spacers etc.
Prosite
AIM: To perform profile and pattern search using Prosite tool.
SOURCE: http://www.expasy.ch/prosite/
METHOD:
INPUT:
62
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
OUTPUT:
63
INTERPRETATION:
ELM
DESCRIPTION:
64
ELM stands for Eukaryotic Linear Motif search and is a resource for finding functional
sites in proteins. It can find Pfam domain, signal peptide, coiled coil prediction,
transmembrane helix as well as loop, helix and strand prediction.
SOURCE:
http://elm.eu.org/
METHOD:
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
65
OUTPUT:
INPRETATION:
Using EML tool we are able to find number of helixes, strands, loops which are present in
the secondary structure of chitinase , Accession No. AAA32461
FingerPRINTScan
DESCRIPTION:
66
FingerPRINTScan tool scans a protein sequence against the PRINTS protein
finger database. It tells the number of motifs matched to the query sequence, its length
and position.
SOURCE:
http://www.bioinf.man.ac.uk/fingerPRINTScan/
METHOD:
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
67
OUTPUT:
68
INFERENCE:
The Fingerprint scan tool was used in order to find out the number of motifs and their
positionsin the sequence
Motif Scan
AIM: To perform profile and pattern search using Motif Scan tool.
DESCRIPTION:
Motif or family comparisons are more sensitive because motifs represent a higher
level generalization of the features that are imporatnat for a given structural or functional
feature. This tool scans a sequence against protein profile databases [including PROSITE].
SOURCE:
http://mybits.icb.sib.ch/cgi-bin/motif-scan
METHODOLGY:
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
69
OUTPUT:
70
INTERPRETATION:
we are able to find number of helixes, strands, loops which are present in the secondary
structure of chitinase , Accession No. AAA32461 by the using of motif scan
PROSCAN
AIM: To perform profile and pattern search using PROSCAN tool.
DESCRIPTION:
This tool developed and run by PBIL in University of Lyon, France scans a sequence
against PROITE and allows mismatches as well. It can give information regarding
phosphorylation, amidation or any other specific identity characterstic of the given sequence.
SOURCE:
http://npsa-phil.ibcp.fr/cgi-bin/npsa_automat.pI?page=npsa_prosite.html
METHOD:
71
2. The retrieved protein sequence is pasted on the PROSCAN submission form.
3. The e-mail id was entered.
4. The tool PROSCAN was run and the results viewed and saved.
INPUT:
ACCESSION NO –IH4P B
72
OUTPUT:
INTERPRETATION:
Using the tool proScan the functional sites of a protein sequence can be
found . The results are viewed and saved.
73
VISUALIZATION OF PROTEIN STRUCTURE BY USING
RASMOL
AIM: To visualize the structure of protein sequence by using visualization tool RasMol.
DESCRIPTION:
RasMol 2 is a molecular graphics program intended for the visualization of proteins,
nucleic acids and small molecules. The program is aimed at display, teaching and generation
of publication quality images. RasMol runs on Microsoft Windows, Apple, Macintosh, UNIX
and VMS systems. The UNIX and VMS systems require an 8,24 or 32 bit colour X Windows
display (X11R4 or later). The program reads in a molecule co-ordinate file and interactively
displays the molecule on the screen in a variety of colour schemes and molecular
representations. Currently available representations include depth cued wireframes, ‘drieding’
sticks, spacefilling (CPK) spheres, ball and stick, solid and strand biomolecular ribbons, atom
labels and dot surfaces.
SOURCES:
1. http://wbiomed.curtin.edu.au/teach/biochem/help/download.html
2. http://mc2.cchem.berkeley.edu/rasmol/v2.6/
protein structure (.pdb) http://www.pdb.org/pdb/home/home.do
METHOD:
74
8. This .pdb file was opened using RasMol.
9. The structure is viewed with different Display options like wireframe, Backbone,
Sticks, Spacefill, Ball & Stick, Ribbons, Strands, cartoons that are available on
RasMol.
10. In RasMol Command Line, some of the commands like “select helix’ and
“colour yellow” are used to view helix structure in that molecule.
11. several other commands can also be used like “set picking distance”, “set picking
angle”, set picking tortion”, etc.
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
OUTPUT:
75
INTERPRETATION:
76
SECONDARY STRUCTURE PREDICTION
AIM: Secondary structure prediction of the given protein sequences using Expasy tools.
DESCRIPTION:
The Expasy (Expert Protein Analysis System) is a proteomics server of the Swiss
Institute of Bioinformatics (SIB) which analyzes protein sequences and structures and two
dimensional electrophoresis. The server functions in collaboration with the EBI. Expasy also
produes the protein sequence knowledge base Uniprot and Swissprot.
For the prediction of secondary structure of proteins Expasy produces tools like
1. GOR
2. HNN
3. SOPMA
4. JPred
5. GOR
GOR
AIM: To predict secondary structure of a given protein using GOR tool from Expasy.
DESCRIPTION:
GOR predicts the secondary structure of a given amino acid by looking at a window
of 8 amino acids before and 8 after the position of interest. This program (named after
Garnier, Osguthorpe and Robson) is in its fourth version.
SOURCE:
http://npsa-pbil.ibcp.fr.cgi-bin/npsa_automat.pi?page=npsa_gor4.html
METHOD:
77
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
OUTPUT:
78
INTERPRETATION:
Using GOR tool we are able to predict the secondary structure for chitinase Accession no.
IH4P B.Tool has shown the number of helixes , alpha helixes and beta bridges and other
details of protein structure.
HNN
79
AIM: To predict secondary structure of a given protein using HNN tool from Expasy.
DESCRIPTION:
Hierarchial Neural Networks can be used to predict protein structure. The protein
sequence is translated into patterns by shifting a window of n adjacent residues(typical value
of n=13-21) through the protein.
SOURCE:
http://npsa-pbil.ibcp.fr.cgi-bin/npsa_automat.pi?page=npsa_nn.html
METHOD:
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
OUTPUT:
80
INTERPRETATION:
Using HNN tool we are able to prerdict the secondary structure for chitinase Accession
number.JH4P B.
Tool has predicted the number of helixes, alpha helixes and beta bridges and other details of
protein structure.
81
SOPMA
AIM: To predict secondary structure of a given protein using SOPMA tool from Expasy.
DESCRIPTION:
SOURCE:
http://npsa_pbil.ibcp.fr.cgi_bin/npsa_automat.pi?page=npsa_sopma.html
METHOD:
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
82
OUTPUT:
83
INTERPRETATION:
Using SOPMA tool we are able to prerdict the secondary structure for chain b Accession
number. JH4P B.
Tool has predicted the number of helixes, alpha helixes and beta bridges and other details of
protein structure.
JPred
AIM: To predict secondary structure of a given protein using JPred tool from Expasy.
DESCRIPTION:
SOURCE:
http://www.compbio.dundee.ac.uk/~www-jpred/
METHOD:
84
1. A protein query sequence is retrieved from NCBI in FASTA format.
2. The retrieved protein sequence is pasted on the JPred submission form.
3. The e-mail id was entered.
4. The tool JPred was run and the results viewed and saved.
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
OUTPUT:
85
INTERPRETATION:
Using Jpred tool we are able to prerdict the secondary structure for chitinase Accession
number. AA32461.
AIM: To draw the phylogenetic tree of the given sequences using the software phylodraw.
DESCRIPTION:
86
The sequences whose phylogenetic relationship is to be known are retrieved from NCBI by
keyword search or by the accession number. The tool Phylodraw available on the net is used
for drawing the phylogenetic tree. The input format is Dialign which is obtained by doing a
multiple sequence alignment using the dialign tool.
For this phylogenetic treedrawing Phylodraw and Dialign are the tools used.
SOURCES:
Dialign: http://bibiserve.techfak.uni-bielefeld.de/dialign/sumission.html
Phylodaw: http://pearl.cs.pusan.ac.kr/phylodraw/
NCBI: www.ncbi.nlm.nih.gov
METHOD :
1. The sequences with the following accession numbers are retrieved from the NCBIs
biological database.
2. The sequences are used as the input in the Dialign tool for multiple sequence alignment.
3. The output and the result of dialign is used as the input in the phylodraw tool.
4. Phylodraw is the tool used to draw phylogeetic trees. It has the following types of trees.
a. Unrooted tree
b. Rooted tree
c. Radial tree
d. Slated cladogram
e. Rectangular cladogram
f. Phylogram.
5. The results are displayed in Radial tree, Slated cladogram, rectangle cladogram and
Phylogram tree formats.
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
87
PHYLODRAW INPUT:
88
OUTPUT:
89
The phylodraw tool is used to draw the phylogenetic tree of genetically related species. It
can display the trees in various formats.
90
Rectangular cladogram
Phylogram
INTERPRETATION:
The phylogenetic tree for the sequences has been drawn using the Phylodraw tool with
the result of Dialign as the input.
91