( (Abbas Bio Info Soft Copy... ) )

1.
PLASMAPPER
AIM: To generates and annotate high-quality circular plasmid maps.
DESCRIPTION: A particular feature of PlasMapper is its capacity to automatically

identify and label the plasmid control sequences found in both eukaryotic and
prokaryotic vectors using its own database of common plasmid sequences and
common plasmid subsequences. PlasMapper is also able to generate plasmid maps of
sufficient quality and resolution that they may be used directly in publications or
presentations. The underlying concept behind PlasMapper is to make plasmid
annotation trivially simple and to make the sharing of plasmid images and plasmid
data as easy as possible for as many computer platforms as possible.
SOURCE: wishart.biology.ualberta.ca/PlasMapper
METHOD:
1. Collect the sequence for which plasmapper has to design, in Fasta format from
NCBI home page.
2. Open the source website: wishart.biology.ualberta.ca/PlasMapper
3. Paste the sequence in fasta format in the space of the home page of the
website.
4. Set the defaults and click ‘graphic map’ to get the result.
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
1
OUTPUT:
INTERPRETATION:
PlasMapper uses sequence pattern matching and BLAST alignment to automatically
identify and label common promoters, terminators, cloning sites, restriction sites,
2
reporter genes, affinity tags, selectable marker genes, replication origins and open
reading frames
2.RESTRICTION MAPPING BY USING BIO EDIT TOOL
AIM: To do the restriction mapping of the given sequence by using Bio Edit tool.
DESCRIPTION:
Bioedit is a biological sequence alignment editor written for windows of
51981 NT. A rich intuitive multiple document interface with many convenient
features makes alignment, manipulation and viewing of sequences relatively quick
and easy on desktop. Several sequences manipulation and analysis options and fully
automated links to local and www- based analysis programs facilitate an integrated
working environment which allows to view, align and analyze sequences from a
single application with simple point and click operations.
SOURCE:
http://www.mbio.ncsu.edu/bio edit/page2.Html
METHOD:
1) Collect the sequence for which restriction mapping has to be done in ‘fasta’
format from NCBI.
2) Open the source website www.mbio.ncsu.edu/bio edit/page2.Html
3) Download the bio edit tool by using the source website.
4) Open the query sequence inside the tool in the given space.
5) Select the sequence and then do the editing and restriction mapping by
clicking restriction mapping.
6) Save the result page in which sequence has been mapped.
3
INPUT:
OUTPUT:
4
INTERPRETATION:
Restriction mapping of the given sequence has been done, it gives the cutting
number of the various restriction enzymes like BsmI, XcaI, etc. It shows the location
of the restriction site of various enzymes also. This tool is used for recombinant DNA
technology for finding the cutting sites of restriction enzymes present in particular
sequence.
3.PRIMER DESIGNING
AIM: To design the primer of the given query sequence by the using ‘PRIMER 3’
primer design tool.
DESCRIPTION:
Primer 3 is a tool used to choose primes for PCR reactions. Primer 3’s design
is heavily based on earlier implementations of similar programs: Prime (0.5) and
primer V2. Primer 3 can also design hybridization probes and sequencing primers.
SOURCE:
http:// biotools.umassmed.edu/bioapps/primer 3_www.cgi.
METHOD:
1) Collect the sequence for which primer has to design, in Fasta format from
NCBI home page.
2) Open the source website: biotools.umassmed.edu/bioapps/primer 3_www.cgi.
3) Paste the sequence in fasta format in the space of the home page of the
website.
4) Set the defaults and click ‘pick primers’ to get the result.
INPUT:
5
OUTPUT:
INTERPRETATION:
Primers were designed by using tools. Left primer and Right primer have
designed, some other oligos also used for designing.
6
4.SEQUENCE RETRIVEL
NCBI
AIM: To retrieve the nucleotide for the given accession number from the NCBI
nucleotide sequence database
.DESCRIPTION
:ethods for determining DNA sequences were first described in 1972. since then, a wealth of
sequence information has been obtained and deposited in several essential centralized
locations. These generalized databases includes:
Genbank
EMBL
DDBJ
Databases and databases analysis tools allow a researcher to probe for a desired
sequence. The National Center for Biotechnology Information (NCBI) is part of
the United States National Library of Medicine (NLM), a branch of the National Institutes
of Health. The NCBI has had responsibility for making available the GenBank DNA
sequence database since 1992. GenBank coordinates with individual laboratories and
other sequence databases such as those of the European Molecular Biology Laboratory
(EMBL) and the DNA Database of Japan (DDBJ).
SOURCE : http://www.ncbi.nlm.nih.gov/
METHOD:
1. The NCBI home page in logged on using the websites.

2. On the home page, nucleotide option was clicked to retrieve nucleotide sequence
respectively.
3. The accession no. or our gene of intrest of our query sequence is entered in the search
page.
4. ‘Go’ button next to search tool bar was clicked.
5. The page containing the result matching to our query was displayed.
6. The required result is obtained by clicking on the link provided in the result page.
7. The sequence of our interest was selected and copied to a note pad and save.
7
INPUT:
.
OUTPUT:
INTERPRETATION:
Nucleotide and protein sequence has been retrieved using NCBI sequence
database.
EMBL
AIM: To retrieve the nucleotide for the given accession number from the EMBL
nucleotide sequence database.
DESCRIPTION:
The European Molecular Biology Laboratory (EMBL) is a molecular biology
research institution supported by 20 countries comprising nearly all of western
Europe and Israel. The cornerstones of EMBL's mission are: to perform basic research
8
in molecular biology, to train scientists, students and visitors at all levels, to offer vital
services to scientists in the member states, to develop new instruments and methods in
the life sciences, and to actively engage in technology transfer.
SOURCE: http://www.ebi.ac.uk/embl/
METHOD:
1. The EMBL home page in logged on using the websites.

2. The accession no. or our gene of intrest of our query sequence is entered in the search
page.
3. ‘Go’ button next to search tool bar was clicked.
4. The page containing the result matching to our query was displayed.
5. The required result is obtained by clicking on the link provided in the result page.
6. The sequence of our interest was selected and copied to a note pad and save.
INPUT:
OUTPUT:
INTERPRETATION:
Nucleotide sequence has been retrieved using NCBI sequence database.
9
Swissprot
AIM: To retrieve the nucleotide for the given accession number from the Swissprot
nucleotide sequence database.
DESCRIPTION :
Swiss-Prot is a manually curated biological database of protein sequences. Swiss-Prot was
created in 1986 by Amos Bairoch during his PhD and developed by the Swiss-Prot and its
automatically curated supplement TrEMBL, have joined with the Protein Information
Resource protein database to produce the UniProt Knowledgebase, the world's most
comprehensive catalogue of information on proteins.[2] As of 3 April 2007, UniProtKB/Swiss-
Prot release 52.2 contains 263,525 entries. As of 3 April 2007, the UniProtKB/TrEMBL
release 35.2 contains 4,232,122 entries.
SOURCE: http://www.ebi.ac.uk/swissprot/
METHOD:
1.The PIR home page in logged on using the websites.

2.The accession no. or our gene of interest of our query sequence is entered in the search
page.
3.‘Go’ button next to search tool bar was clicked.

4.The page containing the result matching to our query was displayed.
5.The required result is obtained by clicking on the link provided in the result page.
6.The sequence of our interest was selected and copied to a note pad and save.
INPUT:
OUTPUT:
10
v
INTERPRETATION:
The sequence for the given accession number has been retrieved from the swissprot
protein database.
11
5.SEQUENCE FORMAT CONVERSION
SQUIZZ
AIM : To convert the given sequence in NCBI format to EMBL format using SQUIZZ as
format conversion tool.
DESCRIPTION:
All the tools available for analysis of biological data(sequences), requires data in different
formats. T o change the same data in different formats to make it acceptable to different
sequence analysis tools, we require the sequence format conversion tools. There are different
tools available at the web site.
SQUIZZ allows the verification of sequence or sequence alignment format and conversion in
To the following formats:-
• CLUSTAL
• EMBL
• FASTA
• GCG
• GDE
• GENBANK
• NBRF
• MSF
• Phyllip
SOURCE; http://bioweb.pasteur.fr/sequenal/interface/squizz.html
METHOD:
1. The home page of sequence conversion tool was opened by typing “sequence
conversion tool” in the google search tool bar.
2. Then the sequence format conversion hyperlink was clicked on open page.
3. SQUIZZ hyperlink was clicked to open this page.
4. A nucleotide sequence was taken in NCBI format and put in hyperlink Actual data
here
5. SQUIZZ was run
12
6 . Format was converted into changed format from hyperlink Convert into format.
7. Results in changed format were obtained and saved to notepad.
INPUT:
OUTPUT:
INTERPRETATION:
Given nucleotide sequence was converted from genbank to EMBL format using SQUIZZ
sequence format conversions tool.
13
READSEQ
AIM : To convert the given sequence in EMBL format to FASTA format using READSQ
DESCRIPTION:
Sequence format conversion inputs DNA or amino acid sequence of specified format. Input
format is determined automatically. Automatically detects input format and converts into
following formats:
• CLUSTAL
• EMBL
• FASTA
• GCG
• GDE
• GENBANK
• NBRF
• MSF
• Phyllip
In the present exercise we have converted EMBL format to FASTA using READseq
conversion tool.
SOURCE:
http://bioweb.pasteur.fr/sequenal/interface/readseq.cgi
METHOD:
3. READSEQ hyperlink was clicked to open this page.
4. A protein sequence was taken in EMBL format and put in hyperlink Actual data
here.
5. SQUIZZ was run.
6. Format was converted into fasta format from hyperlink Convert into format.
14
INPUT:
vvv
OUTPUT;
INTERPRETATION:
15
Given protein sequence was converted from EMBL to FASTA format using READSEQ
FMTSEQ
AIM : To convert the given sequence in EMBL format to CLUTAL format using FMTSEQ
DESCRIPTION:
Format conversion tool converts sequence between 22 sequence format types. FMTSEQ
converts sequence between many formats including among
• CLUSTAL
• EMBL
• FASTA
• GCG
• GDE
• GENBANK
• NBRF
• MSF
• Phyllip
SOURCE;
http://evol.biology.mcmaster.ca/seqanal/tmp/fmt.seq/A27358120711907/fmtseq.out
METHOD:
3. FMTSEQ hyperlink was clicked to open this page.
16
4. A nucleotide sequence was taken in EMBL format and put in hyperlink Actual data
here.
5. FMTSEQ was run.
6. Format was converted into format from hyperlink Convert into format.
INPUT:
OUTPUT;
INTERPRETATION:
Given nucleotide sequence was converted from EMBL to CLUSTAL format using FMTSEQ
17
SREFORMAT
AIM : To convert the given sequence in NCBI format to PIR format using SREFORMAT as
DESCRIPTION:
SreFormat allows the user to convert one sequence format conversion to another conversion.
It can accept the sequence in following format :
• CLUSTAL
• EMBL
• FASTA
• GCG
• GDE
• GENBANK
• NBRF
• MSF
• Phyllip
SOURCE; http://bioweb.pasteur.fr/sequenal/interface/SreFormat.html
METHOD:
3. SreFormat hyperlink was clicked to open this page.

4. A Protein sequence was taken in NCBI format and put in hyperlink Actual data
here.
5. Sreformat was run.
6. Format was converted into PIR format from hyperlink Convert into format.
INPUT:
18
OUTPUT;
INTERPRETATION:
Given protein sequence was converted from NCBI to PIR format using SreFormat sequence
format conversions tool.
SMS
19
AIM: To convert the given sequence in GenBank format to the FASTA format.
DESCRIPTION:
 sequence into any type of the required format
 Using this tool, sequences in any format can be converted into the
following listed format. MVIEW tool is used to convert the given
 In this it has the input option and output option.
INPUT OPTION:
• Pearson/FASTA
• MSF(GCG)
• CLUSTALW
• Max Hom/ HSSP
• Plain
• Multa: MULTAS/MULTAL
• Mips: MIPS-ALN
OUTPUT OPTION:
• HTML
• GCG/MSF
• Pearson/FASTA
• PIR
• RDB table for storaqe/manipulation in relational database form
METHODOLOGY:
A. Given sequence in FASTA format is pasted in the table provided.
B. PIR format is selected from the options provided.
C. 3. Email I.D Is Provided when it is required.
D. Tool is performed and result obtained is saved.
INPUT:
Seq name: Rattus norvegicus
Accession number: :NM_053814
WEB PAGE
20
output
INTERPRETATION:
GenBank formatted query sequence had been converted into FASTA format by
using sequence conversion tool SMS
21
6. ORF FINDER
AIM: To find the open reading frame for the direct and the reverse strand
DESCRIPTION:
ORF Finder searches for open reading frames (ORFs) in the DNA sequence you
enter. The program returns the range of each ORF, along with its protein translation. Use
ORF Finder to search newly sequenced DNA for potential protein encoding segments. ORF
Finder supports the entire IUPAC alphabet and several genetic codes.
SOURCE:
www.bioinformatics.org/sms2/
INPUT:
22
OUTPUT:
INTERPRETATION:
By using ORF finder tool we have fond out the open reading frame for Oryza sativa gene of
accesseion no: EF 183474.
HOMOLOGY SEARCH
The term sequence analysis in biology implies subjecting a DNA or peptide sequence to
sequence alignment, sequence database, repeated sequence searches or other bioinformatics
methods on a computer.
In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for

comparing primary biological sequence information, such as the amino-acid sequences
of different proteins or the nucleotides of DNA sequences. A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences, and
identify library sequences that resemble the query sequence above a certain threshold.
23
The BLAST program can either be downloaded and run as a command-line utility
"blastall" or accessed for free over the web. The BLAST web server, hosted by the
NCBI, allows anyone with a web browser to perform similarity searches against
constantly updated databases of proteins and DNA that include most of the newly
sequenced organisms. BLAST is actually a family of programs (all included in the blastall
executable). The following are some of the programs, ranked mostly in order of importance:
Nucleotide-nucleotide BLAST (blastn) :This program, given a DNA query, returns the most
similar DNA sequences from the DNA database that the user specifies.
Protein-protein BLAST (blastp) :This program, given a protein query, returns the most
similar protein sequences from the protein database that the user specifies.
Nucleotide 6-frame translation-protein (blastx) :
This program compares the six-frame conceptual translation products of a nucleotide query
sequence (both strands) against a protein sequence database.
Nucleotide 6-frame translation-nucleotide 6-frame translation (tblastx) :
This program is the slowest of the BLAST family. It translates the query nucleotide sequence
in all six possible frames and compares it against the six-frame translations of a nucleotide
sequence database. The purpose of tblastx is to find very distant relationships between
nucleotide sequences.
Protein-nucleotide 6-frame translation (tblastn) :
This program compares a protein query against the six-frame translations of a nucleotide
sequence database.
NUCLEOTIDE BLAST
Search a nucleotide database using a nucleotide query
AIM: To search a nucleotide similar to more sequences.
DESCRIPTION :
BLAST is one of the most widely used bioinformatics programs, because it addresses a
fundamental problem and the algorithm emphasizes speed over sensitivity.
24
To run, BLAST requires two sequences as input: a query sequence (also called the target
sequence) and a sequence database. BLAST will find subsequences in the query that are
similar to subsequences in the database.
Nucleotide-nucleotide BLAST (blastn) :
This program, given a DNA query, returns the most similar DNA sequences from the DNA
database that the user specifies.
METHOD:
1. Go to NCBI home page.

2. Click on Blast.
3. Click on nucleotide blast.
4. Paste a query sequence in FASTA format.
5. Choose nucleotide collection (nr/nt) in the database.
6. Run blast.
7. Select the most similar sequence which has maximum identity percentage and least
‘e’ value.
SOURCE:
http://www.ncbi.nlm.nih.gov/blast/Blast.cgl
INPUT:
25
OUTPUT:
26
INTERPRETATION:
By using nucleotide blast we are able to get nucleotide sequence with maximum similarity.
The accession number for homologous sequence is: AY532754
PROTEIN BLAST
Search a Protein database using a protein query
AIM: To search a protein similar to more sequences.
DESCRIPTION :
Protein-protein BLAST (blastp):
This program, given a protein query, returns the most similar protein sequences from the
protein database that the user specifies
METHOD:

2. Click on Blast.
3. Click on protein blast.
5. Choose protein collection (nr) in the database.
6. Run blast.
‘e’ value.
SOURCE:
http://www.ncbi.nlm.nih.gov/blast.cgi#24657901
27
INPUT:
OUTPUT:
28
INTERPRETATION:
By using proetin blast we are able to get protein sequence with maximum similarity.
The accession number for homologous sequence is: NP_563915
BLASTX
Search a protein database using a translated nucleotide query
AIM: To search a protein similar to more sequences.
DESCRIPTION :
Nucleotide 6-frame translation-protein (blastx)
This program compares the six-frame conceptual translation products of a nucleotide query
sequence (both strands) against a protein sequence database.
METHOD:

2. Click on Blast.
3. Click on blastx.
4. Paste a query EST sequence in FASTA format.
5. Choose non-reductant protein sequence (nr) in the database.
6. Run blast.
‘e’ value.
SOURCE:
http://www.ncbi.nih.gov/blast/Blast.cgi
29
INPUT:
OUTPUT:
30
INTERPRETATION:
By using blastx we are able to get protein sequence with maximum similarity.
The accession number for homologous sequence is: AAT40013
tBLAST N
Search a translated nucleotide database using a protein query
AIM: To search a translated nucleotide similar to more sequences.
DESCRIPTION :
BLAST is one of the most widely used bioinformatics programs[2], because it
addresses a fundamental problem and the algorithm emphasizes speed over
sensitivity.
Nucleotide 6-frame translation-nucleotide 6-frame translation (tblastx)
This program is the slowest of the BLAST family. It translates the query nucleotide
sequence in all six possible frames and compares it against the six-frame translations
of a nucleotide sequence database. The purpose of tblastx is to find very distant
relationships between nucleotide sequences.
METHOD:

2. Click on Blast.
3. Click on tblastN..
6. Run blast.
7. Select the most similar sequence which has maximum identity percentage and
least ‘e’ value.
SOURCE:
31
http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
INPUT:
OUTPUT:
32
INTERPRETATION:
By using tblastN we are able to get translated nucleotide sequence with maximum similarity.
The accession number for homologous sequence is: NM_1135932
tBLASTX
Search a translated nucleotide database using a translated nucleotide query
AIM: To search a translated nucleotide similar to more sequences.
DESCRIPTION :
BLAST is one of the most widely used bioinformatics programs[2], because it addresses a
33
Protein-nucleotide 6-frame translation (tblastn)

This program compares a protein query against the six-frame translations of a nucleotide
sequence database.
METHOD:

2. Click on Blast.
3. Click on tblast..
6. Run blast.
‘e’ value.
SOURCE:
http://www.ncbi.nlm.nih.gov/blast/Blast.cgi
INPUT:
34
OUTPUT:
INTERPRETATION:
By using tblastX we are able to get translated nucleotide sequence with maximum similarity.
The accession number for homologous sequence is: NM_001080098
35
FASTA
FASTA stands for FAST- all, reflecting the fact that it can be used for a fast
protein comparison or a fast nucleotide comparison. It is a DNA and protein sequence
alignment software package first described as (FASTAP) by David. J. Lipman and
William.R. Pearson in 1985. This program achieves a high level of sensitivity for
similarity searching at high speed. This is achieved by performing optimized searches
for local alignments using a substitution matrix. The high speed is achieved by using
the observed pattern of word hits to identify potential matches before attempting the
more time consuming optimized search. The trade – off between speed and sensitivity
is controlled by the ktup parameter, which specifies the size of the word. Increasing
the ktup decreases the number of background hits. Not every word hit is investigated
but instead initially looks for segment’s containing several nearby hits.
General FASTA Programs:
Tool Description
FASTA- protein Sequence similarity searching against protein
databases using FASTA.

FASTA- nucleotide. Sequence similarity searching against nucleotide
databases using FASTA.
36
FASTA – PROTEIN
AIM:-To find similarity in the protein sequences for the given query protein sequence
in any format using FASTA- Protein tool.
DESCRIPTION:-
It is about sequence similarity searching against protein databases using
FASTA. Provides sequence similarity searching against nucleotide and protein
databases using the FASTA programs. FASTA can be very specific when identifying
long regions of low similarity especially for highly diverged sequences. We can also
conduct sequence similarity searching against proteome or genome database using the
FASTA program.
SOURCE: htpp/www.ebi.ac.uk/Fasta33.
METHOD:
1. Type EBI in Google search (www.ebi.ac.uk/Fasta33).
2. Click on European Bioinformatics Institute.
3. Click on sequence similarity and analysis.
4. Click on FASTA.
5. Click on FATA Protein.
6. Paste or browse a protein sequence in any format in the sequence
submission box.
7. Click on Run FASTA3.
37
INPUT-
OUTPUT:
38
INTERPRETATION:
Most similar sequence to the query protein sequence was obtained.
FASTA- NUCLEOTIDE
AIM:To find similarity in the nucleotide sequences for the given query nucleotide
sequence in any format using FASTA-Nucleotide tool.
DESCRIPTION:
It is about sequence similarity searching against nucleotide databases using
FASTA. It provides sequence similarity searching against nucleotide and protein
databases using the FASTA programs. FASTA can be very specific when identifying
long regions and low similarity especially for highly diverged sequence. We can
conduct sequence similarity searching against complete proteome or genome
databases using the FASTA program.
SOURCE:
http://www.ebi.ac.uk/fasta33/
METHOD:
1. Type EBI in Google search (www.ebi.ac.uk/Fasta33).
2. Click on European Bioinformatics Institute.
3. Click on sequence similarity and analysis.
4. Click on FASTA.
5. Click on FATA Protein.
39
6. Paste or browse a protein sequence in any format in the sequence
submission box.
7. Click on Run FASTA3.
INPUT:
40
OUTPUT:
INTERPRETATION:
Most similar sequence to the query nucleotide sequence was obtained.
MULTIPLE SEQUENCE ALIGNMENT
Biologist often find a protein with approximately the same sequence in different
species, suggesting that the proteins have a closely related biological function and that the
gene encoding these protein have come from common genetic source. If we align theses genes
we find some are alike and some are almost identical.
As with aligning a pair of sequence, that difficulty in aligning a group of sequences varies
considerably, being much greater as the degree of sequence similarity decreases, when the
amount of sequence variation is great, it is difficult to find an optimal alignment of sequences
because so many combinations of substitutions, insertion and deletion, each predicting a
different alignment are possible.
Three commonly used program for multiple sequence alignment are:

 Clustal W
 T-Coffee
 Multalin
41
MULTIPLE SEQUENCE ALIGNMENT
USING CLUSTAL W
AIM: To align three sequences using Clustal W.
DESCRIPTION:
Clustal W is a more recent version of clustal with W standing for “weighting” to

represent the ability of the program to provide weights to the sequence and program to
parameters. Program is designed to provide an adequate alignment of a large number\ of more
closely related sequences and a reliable indication of the domain structure of sequences.
Once an alignment has been made, a phylogenetic tree can me made by the neighbour-joining
method.
METHOD:
1. Select more than two protein or nucleotide sequence from NCBI in FASTA format.
2. Copy the sequences and save in format.
3. Type Clustal W in google search bar.
4. Click on multiplae sequence aignmment- Clustal W.
5. Submit the sequence in enter sequence box.
6. Click on execute multiple alignment.
7. Copy and save the result on notepad.
SOURCE: http://www.ebi.ac.uk/Tools/clustalw2/index.html
42
INPUT:
ACCESSION NO: NM_008083
NM_001081278
NM_012519
43
OUTPUT:
RESULTS:
Multiple sequence alignment for three insuline protein was performed using Clustal W.
MULTIPLE SEQUENCE ALIGNMENT USING

T-COFFEE
AIM: To align three sequences using T-Coffee.
DESCRIPTION:
T-Coffee is an advanced pairwise alignment program that uses a system of sequences
position weights to generate an multiple sequence alignment that is the most consistent with
pair-wise alignments of all the component sequences ( T-Coffee stands for tree based
Consistency based objective function for alignment evaluation). T-Coffee is better than
Clustal W at reproducing known alignment of related proteins but is much slower.
METHOD:
44
3. Type T-Coffee in google search bar.
4. Click on multiplae sequence aignmment- T-Coffee
6. Run the program.
SOURCE: http://www.ch.embnet.org/software/TCoffee.html
INPUT:
ACCESSION NO: NM_008083
NM_001081278
NM_012519
OUTPUT:
45
RESULTS:
Multiple sequence alignment for three insuline protein was performed using T-Coffee.
MULTIPLE SEQUENCE ALIGNMENT USING

MULTALIGN
46
AIM: To align three sequences using Multalign.
DESCRIPTION:
Multalign does a simultaneous alignments for two or more DNA or protein sequences.
It introduce a certain number of gaps into either pairwise aligned sequences to find minimal
global distance. The program is based on a generalization of the algorithm of Watermann-
Smith and Beyer by Kreger and Osterburg.
METHOD:
3. Type Multalign in google search bar.
4. Click on multiplae sequence aignmment- Multalign.
6. Run the program.
SOURCE: http://bioinfo.genopole-toulouse.prd.fr/multalin/multalin.html
INPUT:
ACCESSION NO - NM 14646
NM 001122899
NM 010704
47
OUTPUT:
INTERPRETATION:
Multiple sequence alignment for three insuline protein was performed using
multalign.
48
GENE PREDICTION
With the advent of whole genome sequencing projects, It has become routine to scan
genomic DNA sequences t find genes, particularly those that encode protein. Computational
methods for gene prediction work by searching through sequences to locate the most likely
ones that encodes proteins. Predicating protein-encoding genes is generally easier in
prokaryotes than in eukaryotic organisms because prokaryotic generally lack introns and
because several quite highly conserved sequences are found in the promoter region and
around the start sites of transcription and translation.
Three commonly used programs for gene prediction are:-

• Webgene
• Genmark
• Genscan
GENE PREDICTION USING WEBGENE
AIM:-To predict the features of eukaryotic gene using webgene.
DESCRIPTION:-
WebGene is a tool which publishes family history information on the Web. It

publishes this information from a standard file type used typically to exchange data
between genealogy software applications Rex Myer is the founder of WebGene and it
has been online since fall of 1995. WebGene indexes the information in the
GEDCOM file and presents it in an appealing graphical format suitable for the
Internet. Further, it enables the lookup and cross-referencing of surnames and family
relationships.
SOURCE:-
http://www.itb.cnr.it/sun/webgene/
METHOD:-
49
1. Sequence of human insulin was retrieved from NCBI and saved in note pad.
2. On google search bar ,webgene was typed.
3. Webgene home page was opened.
I. Gene builder:-
 Gene builder bar was clicked.

 Shown parameters on webpage were set.
 Sequence was pasted in the given box.
 Analysis was run
 Results were saved.
II. Repeat view
 Repeat View bar was clicked.

 Results were saved
III. CpG island
 CpG bar was clicked.

IV. Splice View
 Splice view bar was clicked.

50
V. HC polyA
 HC polyA bar was clicked.

VI . Hctata
 HCtata bar was clicked.

 Results were saved
VII . Gen view2
 Gene view2 bar was clicked.

VIII . AUG_evaluator
 Gene builder bar was clicked.

51
Input gene builder
OUTPUT: Gene Builder
52
OUTPUT: Repeat-View
OUTPUT: CpG island
53
OUTPUT: Splice View
OUTPUT: HC polyA
54
OUTPUT: . Hctata
55
OUTPUT: Gen view2
OUTPUT: AUG_evaluator
56
RESULTS AND INTERPRETATION:
Eight programs of Webgene were run for human insulin gene to predict:
Gene builder- protein coding gene.
Repeat view- repeated element mapping.
CpG island- CpG island.
Splice view- Splicing signal.
HcpolyA- for PolyA.
Hctata- for TATA signal prediction.
Genview- protein coding gene.
AUG_evaluator- start codon.
GENE PREDICTION USING GENMARK
AIM:-To predict the features of eukaryotic gene using genemark.
DESCRIPTION:-
The GeneMark. hmm algorithm presented here was designed to improve the
gene prediction quality in terms of finding exact gene boundaries. The high gene
finding accuracy has been found with genmark. This program also use the specially
derived ribosome binding site pattern to refine predictions of translation initiation
codons.
SOURCE:-
http://exon.gatech.edu/Genmark/genmark_prok_gms_plus.cgi
METHOD:-
1. Sequence of prokaryotic gene was retrieved from NCBI and saved in note pad.
2. On google search bar ,genmark was typed.
57
3. Genmark home page was opened.
4. Sequence was pasted in box.
5. Analysis was done.
6. Results were saved.
INPUT:
OUTPUT:
58
INTERPRETATION:
Genmark program was run to predict gene of prokaryotes.Result was saved.

GENE PREDICTION USING GENSCAN
AIM:-To predict the features of eukaryotic gene using Genescan.
DESCRIPTION:-
Genescan is an example of an approaches for gene prediction which integrate
multiple types of information including splice signal sensors, compositional properties of
coding and non-coding DNA and in some cases database homology searching in order to
predict entire gene structures (sets of spliceable exons) in genomic sequences. Genescan use
distinct, explicit, empirically derived sets of model parameters to capture differences in gene
structure and composition between distinct C . G compositional regions (isochores) of the
human genome. It also has the capacity to predict multiple genes in a sequence, to deal with
partial as well as complete genes, and to predict consistent sets of genes occuring on either or
both DNA strands.
59
SOURCE :-
http://genome.dkfz-heidelberg.de/cgi-bin/GENSCAN/genscan.call.cgi
METHOD:-
1. Sequence of human insulin was retrieved from NCBI and saved in note pad.
2. On google search bar genescan was typed.
3. Genescan home page was opened.
4. Sequence was pasted in box.
5. Analysis was done.
6. Results were saved.
INPUT:
OUTPUT:
60
INTERPRETATION:
Genscan program was run to predict gene of eukaryotes.
Result was saved.
PATTERNS AND PROFILE SEARCH OF PROTEINS
AIM: To search patterns and profiles of given protein sequences using various EXPASy
tools.
DESCRIPTION:
The Expasy (Expert Protein Analysis System) is a proteomics server of the Swiss
Institute of Bioinformatics (SIB) which analyzes protein sequences and structures and two
dimensional electrophoresis. The server functions in collaboration with the EBI. Expasy also
produes the protein sequence knowledge base Uniprot and Swissprot.
For the prediction of patterns and profiles of proteins Expasy produces tools like
1. ELM
61
2. FingerPRINTScan
3. Motif Scan
4. Proscan
5. PRATT
Profiles are numerical representation of a multiple sequence alignment. Profiles help find the
similarities between these sequences and help in identification and analysis of distant related
proteins.
Patterns also represent the common characterstics of a protein family but it does not contain
any weighing information. Thus, the user can specify what kind of patterns should be
searched for, and how many sequences should match a pattern to be repeated- there are option
fot pattern conservation, restrictions, number of pattern symbols, flexible spacers etc.
Prosite
AIM: To perform profile and pattern search using Prosite tool.
DESCRIPTION: PROSITE consists of documentation entries describing protein

domains, families and functional sites as well as associated patterns and profiles to
identify them
SOURCE: http://www.expasy.ch/prosite/
METHOD:
1. A protein query sequence is retrieved from NCBI in FASTA format.

2. The retrieved protein sequence is pasted on the Prosite submission form.
3. Click the scan button
4. The tool Prosite was run and the results viewed by clicking on Rich view and saved.
INPUT:
62
OUTPUT:
63
INTERPRETATION:
UsingProsite tool we are able to predict the secondary structure for

cellulase ; AAA23226B.Tool has shown the number of disulphide bridges, active sites and
other details of protein structure.
ELM
AIM: To perform profile and pattern search using ELM tool.
DESCRIPTION:
64
ELM stands for Eukaryotic Linear Motif search and is a resource for finding functional
sites in proteins. It can find Pfam domain, signal peptide, coiled coil prediction,
transmembrane helix as well as loop, helix and strand prediction.
SOURCE:
http://elm.eu.org/
METHOD:

2. The retrieved protein sequence is pasted on the ELM submission form.
3. The e-mail id was entered.
4. The tool ELM was run and the results viewed and saved.
INPUT:
65
OUTPUT:
INPRETATION:
Using EML tool we are able to find number of helixes, strands, loops which are present in
the secondary structure of chitinase , Accession No. AAA32461
FingerPRINTScan
AIM: To perform profile and pattern search using FingerPRINTScan tool.
DESCRIPTION:
66
FingerPRINTScan tool scans a protein sequence against the PRINTS protein
finger database. It tells the number of motifs matched to the query sequence, its length
and position.
SOURCE:
http://www.bioinf.man.ac.uk/fingerPRINTScan/
METHOD:

2. The retrieved protein sequence is pasted on the FingerPRINTScan submission
form.
4. The tool FingerPRINTScan was run and the results viewed and saved.
INPUT:
67
OUTPUT:
68
INFERENCE:
The Fingerprint scan tool was used in order to find out the number of motifs and their
positionsin the sequence
Motif Scan
AIM: To perform profile and pattern search using Motif Scan tool.
DESCRIPTION:
Motif or family comparisons are more sensitive because motifs represent a higher
level generalization of the features that are imporatnat for a given structural or functional
feature. This tool scans a sequence against protein profile databases [including PROSITE].
SOURCE:
http://mybits.icb.sib.ch/cgi-bin/motif-scan
METHODOLGY:

2. The retrieved protein sequence is pasted on the Motif Scan submission form.
4. The tool Motif Scan was run and the results viewed and saved.
INPUT:
69
OUTPUT:
70
INTERPRETATION:
we are able to find number of helixes, strands, loops which are present in the secondary
structure of chitinase , Accession No. AAA32461 by the using of motif scan
PROSCAN
AIM: To perform profile and pattern search using PROSCAN tool.
DESCRIPTION:
This tool developed and run by PBIL in University of Lyon, France scans a sequence
against PROITE and allows mismatches as well. It can give information regarding
phosphorylation, amidation or any other specific identity characterstic of the given sequence.
SOURCE:
http://npsa-phil.ibcp.fr/cgi-bin/npsa_automat.pI?page=npsa_prosite.html
METHOD:
71
2. The retrieved protein sequence is pasted on the PROSCAN submission form.
4. The tool PROSCAN was run and the results viewed and saved.
INPUT:
ACCESSION NO –IH4P B
CHAIN B , CRYSTAL PROTEIN
72
OUTPUT:
INTERPRETATION:
Using the tool proScan the functional sites of a protein sequence can be
found . The results are viewed and saved.
73
VISUALIZATION OF PROTEIN STRUCTURE BY USING
RASMOL
AIM: To visualize the structure of protein sequence by using visualization tool RasMol.
DESCRIPTION:
RasMol 2 is a molecular graphics program intended for the visualization of proteins,
nucleic acids and small molecules. The program is aimed at display, teaching and generation
of publication quality images. RasMol runs on Microsoft Windows, Apple, Macintosh, UNIX
and VMS systems. The UNIX and VMS systems require an 8,24 or 32 bit colour X Windows
display (X11R4 or later). The program reads in a molecule co-ordinate file and interactively
displays the molecule on the screen in a variety of colour schemes and molecular
representations. Currently available representations include depth cued wireframes, ‘drieding’
sticks, spacefilling (CPK) spheres, ball and stick, solid and strand biomolecular ribbons, atom
labels and dot surfaces.
SOURCES:
1. http://wbiomed.curtin.edu.au/teach/biochem/help/download.html
2. http://mc2.cchem.berkeley.edu/rasmol/v2.6/
protein structure (.pdb) http://www.pdb.org/pdb/home/home.do
METHOD:
1. The NCBI website is logged on.

2. The given accession no. is entered and searched for it. The nucleotide sequence is
got from the CoreNucleotide database.
3. The pdb id is collected for the given sequence in the CDS section of the
sequence. PDB ID found is.eg.2MM1
4. The PDB website is logged on.
5. The pdb id .is entered and searched for it.
6. The .pdb.gz file is downloaded from the options on the left of the page.
7. The .pdb was extracted from the .pdb.gz file.
74
8. This .pdb file was opened using RasMol.
9. The structure is viewed with different Display options like wireframe, Backbone,
Sticks, Spacefill, Ball & Stick, Ribbons, Strands, cartoons that are available on
RasMol.
10. In RasMol Command Line, some of the commands like “select helix’ and
“colour yellow” are used to view helix structure in that molecule.
11. several other commands can also be used like “set picking distance”, “set picking
angle”, set picking tortion”, etc.
INPUT:
OUTPUT:
75
INTERPRETATION:
1. This protein has total of …atoms.

2. This protein has ..helix structure with …atoms.
3. this protein has no sheets or loops…
4. This protein has ..HOH molecues.
Picture with ligands
76
SECONDARY STRUCTURE PREDICTION
AIM: Secondary structure prediction of the given protein sequences using Expasy tools.
DESCRIPTION:
The Expasy (Expert Protein Analysis System) is a proteomics server of the Swiss
Institute of Bioinformatics (SIB) which analyzes protein sequences and structures and two
dimensional electrophoresis. The server functions in collaboration with the EBI. Expasy also
produes the protein sequence knowledge base Uniprot and Swissprot.
For the prediction of secondary structure of proteins Expasy produces tools like
1. GOR
2. HNN
3. SOPMA
4. JPred
5. GOR
GOR
AIM: To predict secondary structure of a given protein using GOR tool from Expasy.
DESCRIPTION:
GOR predicts the secondary structure of a given amino acid by looking at a window
of 8 amino acids before and 8 after the position of interest. This program (named after
Garnier, Osguthorpe and Robson) is in its fourth version.
SOURCE:
http://npsa-pbil.ibcp.fr.cgi-bin/npsa_automat.pi?page=npsa_gor4.html
METHOD:

6. The retrieved protein sequence is pasted on the GOR4 submission form.
8. The tool GOR4 was run and the results viewed and saved.
77
INPUT:
OUTPUT:
78
INTERPRETATION:
Using GOR tool we are able to predict the secondary structure for chitinase Accession no.
IH4P B.Tool has shown the number of helixes , alpha helixes and beta bridges and other
details of protein structure.
HNN
79
AIM: To predict secondary structure of a given protein using HNN tool from Expasy.
DESCRIPTION:
Hierarchial Neural Networks can be used to predict protein structure. The protein
sequence is translated into patterns by shifting a window of n adjacent residues(typical value
of n=13-21) through the protein.
SOURCE:
http://npsa-pbil.ibcp.fr.cgi-bin/npsa_automat.pi?page=npsa_nn.html
METHOD:

2. The retrieved protein sequence is pasted on the HNN submission form.
4. The tool HNN was run and the results viewed and saved.
INPUT:
OUTPUT:
80
INTERPRETATION:
Using HNN tool we are able to prerdict the secondary structure for chitinase Accession
number.JH4P B.
Tool has predicted the number of helixes, alpha helixes and beta bridges and other details of
protein structure.
81
SOPMA
AIM: To predict secondary structure of a given protein using SOPMA tool from Expasy.
DESCRIPTION:
SOPMA is a secondary structure prediction program ( Self Optimized Prediction

Method) that uses multiple alignments. SOPMA correctly predicts 69.57% of amino acids for
a secondary structure (alpha helix, beta sheet and coil).
SOURCE:
http://npsa_pbil.ibcp.fr.cgi_bin/npsa_automat.pi?page=npsa_sopma.html
METHOD:

2. The retrieved protein sequence is pasted on the SOPMA submission form.
4. The tool SOPMA was run and the results viewed and saved.
INPUT:
82
OUTPUT:
83
INTERPRETATION:
Using SOPMA tool we are able to prerdict the secondary structure for chain b Accession
number. JH4P B.
Tool has predicted the number of helixes, alpha helixes and beta bridges and other details of
protein structure.
JPred
AIM: To predict secondary structure of a given protein using JPred tool from Expasy.
DESCRIPTION:
It is a consensus to find secondary structure of protein put forth by University of

Dundee.
SOURCE:
http://www.compbio.dundee.ac.uk/~www-jpred/
METHOD:
84
2. The retrieved protein sequence is pasted on the JPred submission form.
4. The tool JPred was run and the results viewed and saved.
INPUT:
OUTPUT:
85
INTERPRETATION:
Using Jpred tool we are able to prerdict the secondary structure for chitinase Accession
number. AA32461.
TO COSTRUCT THE PHYLOGENTIC REALTIONSHIP
BETWEEN DIFFERENT ORGANISMS
AIM: To draw the phylogenetic tree of the given sequences using the software phylodraw.
DESCRIPTION:
86
The sequences whose phylogenetic relationship is to be known are retrieved from NCBI by
keyword search or by the accession number. The tool Phylodraw available on the net is used
for drawing the phylogenetic tree. The input format is Dialign which is obtained by doing a
multiple sequence alignment using the dialign tool.
For this phylogenetic treedrawing Phylodraw and Dialign are the tools used.
SOURCES:
Dialign: http://bibiserve.techfak.uni-bielefeld.de/dialign/sumission.html
Phylodaw: http://pearl.cs.pusan.ac.kr/phylodraw/
NCBI: www.ncbi.nlm.nih.gov
METHOD :
1. The sequences with the following accession numbers are retrieved from the NCBIs
biological database.
2. The sequences are used as the input in the Dialign tool for multiple sequence alignment.
3. The output and the result of dialign is used as the input in the phylodraw tool.
4. Phylodraw is the tool used to draw phylogeetic trees. It has the following types of trees.
a. Unrooted tree
b. Rooted tree
c. Radial tree
d. Slated cladogram
e. Rectangular cladogram
f. Phylogram.
5. The results are displayed in Radial tree, Slated cladogram, rectangle cladogram and
Phylogram tree formats.
INPUT:
87
PHYLODRAW INPUT:
88
OUTPUT:
89
The phylodraw tool is used to draw the phylogenetic tree of genetically related species. It
can display the trees in various formats.
The tree formats thet are displayed are:

a. Radial tree
Slated cladogram
90
Rectangular cladogram
Phylogram
INTERPRETATION:
The phylogenetic tree for the sequences has been drawn using the Phylodraw tool with
the result of Dialign as the input.
91

( (Abbas Bio Info Soft Copy... ) )

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

( (Abbas Bio Info Soft Copy... ) )

Hochgeladen von

Copyright:

Verfügbare Formate

1.

AIM: To generates and annotate high-quality circular plasmid maps.

DESCRIPTION: A particular feature of PlasMapper is its capacity to automatically

NCBI home page.

2. Open the source website: wishart.biology.ualberta.ca/PlasMapper

2.RESTRICTION MAPPING BY USING BIO EDIT TOOL

1. The NCBI home page in logged on using the websites.

1. The EMBL home page in logged on using the websites.

Nucleotide sequence has been retrieved using NCBI sequence database.

1.The PIR home page in logged on using the websites.

3.‘Go’ button next to search tool bar was clicked.

3. SreFormat hyperlink was clicked to open this page.

In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for

Search a nucleotide database using a nucleotide query

AIM: To search a nucleotide similar to more sequences.

1. Go to NCBI home page.

AIM: To search a protein similar to more sequences.

1. Go to NCBI home page.

Search a protein database using a translated nucleotide query

AIM: To search a protein similar to more sequences.

1. Go to NCBI home page.

Search a translated nucleotide database using a protein query

AIM: To search a translated nucleotide similar to more sequences.

1. Go to NCBI home page.

Search a translated nucleotide database using a translated nucleotide query

AIM: To search a translated nucleotide similar to more sequences.

Protein-nucleotide 6-frame translation (tblastn)

1. Go to NCBI home page.

protein comparison or a fast nucleotide comparison. It is a DNA and protein sequence

alignment software package first described as (FASTAP) by David. J. Lipman and

similarity searching at high speed. This is achieved by performing optimized searches

General FASTA Programs:

FASTA- protein Sequence similarity searching against protein

databases using FASTA.

databases using FASTA.

in any format using FASTA- Protein tool.

It is about sequence similarity searching against protein databases using

FASTA. Provides sequence similarity searching against nucleotide and protein

1. Type EBI in Google search (www.ebi.ac.uk/Fasta33).

2. Click on European Bioinformatics Institute.

3. Click on sequence similarity and analysis.

5. Click on FATA Protein.

6. Paste or browse a protein sequence in any format in the sequence

7. Click on Run FASTA3.

Most similar sequence to the query protein sequence was obtained.

sequence in any format using FASTA-Nucleotide tool.

It is about sequence similarity searching against nucleotide databases using

FASTA. It provides sequence similarity searching against nucleotide and protein

conduct sequence similarity searching against complete proteome or genome

databases using the FASTA program.

1. Type EBI in Google search (www.ebi.ac.uk/Fasta33).

2. Click on European Bioinformatics Institute.

3. Click on sequence similarity and analysis.

5. Click on FATA Protein.

7. Click on Run FASTA3.

Most similar sequence to the query nucleotide sequence was obtained.

MULTIPLE SEQUENCE ALIGNMENT

Three commonly used program for multiple sequence alignment are:

AIM: To align three sequences using Clustal W.

Clustal W is a more recent version of clustal with W standing for “weighting” to

MULTIPLE SEQUENCE ALIGNMENT USING

AIM: To align three sequences using T-Coffee.