Sie sind auf Seite 1von 91

1.

PLASMAPPER

AIM: To generates and annotate high-quality circular plasmid maps.

DESCRIPTION: A particular feature of PlasMapper is its capacity to automatically


identify and label the plasmid control sequences found in both eukaryotic and
prokaryotic vectors using its own database of common plasmid sequences and
common plasmid subsequences. PlasMapper is also able to generate plasmid maps of
sufficient quality and resolution that they may be used directly in publications or
presentations. The underlying concept behind PlasMapper is to make plasmid
annotation trivially simple and to make the sharing of plasmid images and plasmid
data as easy as possible for as many computer platforms as possible.

SOURCE: wishart.biology.ualberta.ca/PlasMapper

METHOD:

1. Collect the sequence for which plasmapper has to design, in Fasta format from

NCBI home page.

2. Open the source website: wishart.biology.ualberta.ca/PlasMapper

3. Paste the sequence in fasta format in the space of the home page of the

website.

4. Set the defaults and click ‘graphic map’ to get the result.

INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

1
OUTPUT:

INTERPRETATION:
PlasMapper uses sequence pattern matching and BLAST alignment to automatically
identify and label common promoters, terminators, cloning sites, restriction sites,

2
reporter genes, affinity tags, selectable marker genes, replication origins and open
reading frames

2.RESTRICTION MAPPING BY USING BIO EDIT TOOL

AIM: To do the restriction mapping of the given sequence by using Bio Edit tool.

DESCRIPTION:
Bioedit is a biological sequence alignment editor written for windows of
51981 NT. A rich intuitive multiple document interface with many convenient
features makes alignment, manipulation and viewing of sequences relatively quick
and easy on desktop. Several sequences manipulation and analysis options and fully
automated links to local and www- based analysis programs facilitate an integrated
working environment which allows to view, align and analyze sequences from a
single application with simple point and click operations.

SOURCE:

http://www.mbio.ncsu.edu/bio edit/page2.Html

METHOD:

1) Collect the sequence for which restriction mapping has to be done in ‘fasta’
format from NCBI.
2) Open the source website www.mbio.ncsu.edu/bio edit/page2.Html
3) Download the bio edit tool by using the source website.
4) Open the query sequence inside the tool in the given space.
5) Select the sequence and then do the editing and restriction mapping by
clicking restriction mapping.
6) Save the result page in which sequence has been mapped.

3
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

OUTPUT:

4
INTERPRETATION:
Restriction mapping of the given sequence has been done, it gives the cutting
number of the various restriction enzymes like BsmI, XcaI, etc. It shows the location
of the restriction site of various enzymes also. This tool is used for recombinant DNA
technology for finding the cutting sites of restriction enzymes present in particular
sequence.

3.PRIMER DESIGNING

AIM: To design the primer of the given query sequence by the using ‘PRIMER 3’
primer design tool.

DESCRIPTION:
Primer 3 is a tool used to choose primes for PCR reactions. Primer 3’s design
is heavily based on earlier implementations of similar programs: Prime (0.5) and
primer V2. Primer 3 can also design hybridization probes and sequencing primers.

SOURCE:
http:// biotools.umassmed.edu/bioapps/primer 3_www.cgi.

METHOD:
1) Collect the sequence for which primer has to design, in Fasta format from
NCBI home page.
2) Open the source website: biotools.umassmed.edu/bioapps/primer 3_www.cgi.
3) Paste the sequence in fasta format in the space of the home page of the
website.
4) Set the defaults and click ‘pick primers’ to get the result.

INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

5
OUTPUT:

INTERPRETATION:
Primers were designed by using tools. Left primer and Right primer have
designed, some other oligos also used for designing.

6
4.SEQUENCE RETRIVEL
NCBI

AIM: To retrieve the nucleotide for the given accession number from the NCBI
nucleotide sequence database

.DESCRIPTION
:ethods for determining DNA sequences were first described in 1972. since then, a wealth of
sequence information has been obtained and deposited in several essential centralized
locations. These generalized databases includes:
Genbank
EMBL
DDBJ
Databases and databases analysis tools allow a researcher to probe for a desired
sequence. The National Center for Biotechnology Information (NCBI) is part of
the United States National Library of Medicine (NLM), a branch of the National Institutes
of Health. The NCBI has had responsibility for making available the GenBank DNA
sequence database since 1992. GenBank coordinates with individual laboratories and
other sequence databases such as those of the European Molecular Biology Laboratory
(EMBL) and the DNA Database of Japan (DDBJ).

SOURCE : http://www.ncbi.nlm.nih.gov/

METHOD:

1. The NCBI home page in logged on using the websites.


2. On the home page, nucleotide option was clicked to retrieve nucleotide sequence
respectively.
3. The accession no. or our gene of intrest of our query sequence is entered in the search
page.
4. ‘Go’ button next to search tool bar was clicked.
5. The page containing the result matching to our query was displayed.
6. The required result is obtained by clicking on the link provided in the result page.
7. The sequence of our interest was selected and copied to a note pad and save.

7
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B
.
OUTPUT:

INTERPRETATION:

Nucleotide and protein sequence has been retrieved using NCBI sequence
database.

EMBL

AIM: To retrieve the nucleotide for the given accession number from the EMBL
nucleotide sequence database.

DESCRIPTION:
The European Molecular Biology Laboratory (EMBL) is a molecular biology
research institution supported by 20 countries comprising nearly all of western
Europe and Israel. The cornerstones of EMBL's mission are: to perform basic research

8
in molecular biology, to train scientists, students and visitors at all levels, to offer vital
services to scientists in the member states, to develop new instruments and methods in
the life sciences, and to actively engage in technology transfer.

SOURCE: http://www.ebi.ac.uk/embl/

METHOD:

1. The EMBL home page in logged on using the websites.


2. The accession no. or our gene of intrest of our query sequence is entered in the search
page.
3. ‘Go’ button next to search tool bar was clicked.
4. The page containing the result matching to our query was displayed.
5. The required result is obtained by clicking on the link provided in the result page.
6. The sequence of our interest was selected and copied to a note pad and save.
INPUT:

ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

OUTPUT:

INTERPRETATION:

Nucleotide sequence has been retrieved using NCBI sequence database.

9
Swissprot

AIM: To retrieve the nucleotide for the given accession number from the Swissprot
nucleotide sequence database.

DESCRIPTION :
Swiss-Prot is a manually curated biological database of protein sequences. Swiss-Prot was
created in 1986 by Amos Bairoch during his PhD and developed by the Swiss-Prot and its
automatically curated supplement TrEMBL, have joined with the Protein Information
Resource protein database to produce the UniProt Knowledgebase, the world's most
comprehensive catalogue of information on proteins.[2] As of 3 April 2007, UniProtKB/Swiss-
Prot release 52.2 contains 263,525 entries. As of 3 April 2007, the UniProtKB/TrEMBL
release 35.2 contains 4,232,122 entries.

SOURCE: http://www.ebi.ac.uk/swissprot/

METHOD:

1.The PIR home page in logged on using the websites.


2.The accession no. or our gene of interest of our query sequence is entered in the search
page.

3.‘Go’ button next to search tool bar was clicked.


4.The page containing the result matching to our query was displayed.
5.The required result is obtained by clicking on the link provided in the result page.
6.The sequence of our interest was selected and copied to a note pad and save.

INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

OUTPUT:

10
v

INTERPRETATION:

The sequence for the given accession number has been retrieved from the swissprot
protein database.

11
5.SEQUENCE FORMAT CONVERSION

SQUIZZ

AIM : To convert the given sequence in NCBI format to EMBL format using SQUIZZ as
format conversion tool.

DESCRIPTION:
All the tools available for analysis of biological data(sequences), requires data in different
formats. T o change the same data in different formats to make it acceptable to different
sequence analysis tools, we require the sequence format conversion tools. There are different
tools available at the web site.

SQUIZZ allows the verification of sequence or sequence alignment format and conversion in
To the following formats:-
• CLUSTAL
• EMBL
• FASTA
• GCG
• GDE
• GENBANK
• NBRF
• MSF
• Phyllip

SOURCE; http://bioweb.pasteur.fr/sequenal/interface/squizz.html

METHOD:

1. The home page of sequence conversion tool was opened by typing “sequence
conversion tool” in the google search tool bar.
2. Then the sequence format conversion hyperlink was clicked on open page.
3. SQUIZZ hyperlink was clicked to open this page.
4. A nucleotide sequence was taken in NCBI format and put in hyperlink Actual data
here
5. SQUIZZ was run

12
6 . Format was converted into changed format from hyperlink Convert into format.
7. Results in changed format were obtained and saved to notepad.

INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

OUTPUT:

INTERPRETATION:
Given nucleotide sequence was converted from genbank to EMBL format using SQUIZZ
sequence format conversions tool.

13
READSEQ

AIM : To convert the given sequence in EMBL format to FASTA format using READSQ
format conversion tool.

DESCRIPTION:

Sequence format conversion inputs DNA or amino acid sequence of specified format. Input
format is determined automatically. Automatically detects input format and converts into
following formats:

• CLUSTAL
• EMBL
• FASTA
• GCG
• GDE
• GENBANK
• NBRF
• MSF
• Phyllip

In the present exercise we have converted EMBL format to FASTA using READseq
conversion tool.

SOURCE:

http://bioweb.pasteur.fr/sequenal/interface/readseq.cgi

METHOD:

1. The home page of sequence conversion tool was opened by typing “sequence
conversion tool” in the google search tool bar.
2. Then the sequence format conversion hyperlink was clicked on open page.
3. READSEQ hyperlink was clicked to open this page.
4. A protein sequence was taken in EMBL format and put in hyperlink Actual data
here.
5. SQUIZZ was run.
6. Format was converted into fasta format from hyperlink Convert into format.

14
7. Results in changed format were obtained and saved to notepad.

INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

vvv

OUTPUT;

INTERPRETATION:

15
Given protein sequence was converted from EMBL to FASTA format using READSEQ
sequence format conversions tool.

FMTSEQ

AIM : To convert the given sequence in EMBL format to CLUTAL format using FMTSEQ
format conversion tool.

DESCRIPTION:

Format conversion tool converts sequence between 22 sequence format types. FMTSEQ
converts sequence between many formats including among
• CLUSTAL
• EMBL
• FASTA
• GCG
• GDE
• GENBANK
• NBRF
• MSF
• Phyllip

SOURCE;

http://evol.biology.mcmaster.ca/seqanal/tmp/fmt.seq/A27358120711907/fmtseq.out

METHOD:

1. The home page of sequence conversion tool was opened by typing “sequence
conversion tool” in the google search tool bar.
2. Then the sequence format conversion hyperlink was clicked on open page.
3. FMTSEQ hyperlink was clicked to open this page.

16
4. A nucleotide sequence was taken in EMBL format and put in hyperlink Actual data
here.
5. FMTSEQ was run.
6. Format was converted into format from hyperlink Convert into format.
7. Results in changed format were obtained and saved to notepad.

INPUT:

ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

OUTPUT;

INTERPRETATION:
Given nucleotide sequence was converted from EMBL to CLUSTAL format using FMTSEQ
sequence format conversions tool.

17
SREFORMAT

AIM : To convert the given sequence in NCBI format to PIR format using SREFORMAT as
format conversion tool.

DESCRIPTION:
SreFormat allows the user to convert one sequence format conversion to another conversion.
It can accept the sequence in following format :
• CLUSTAL
• EMBL
• FASTA
• GCG
• GDE
• GENBANK
• NBRF
• MSF
• Phyllip

SOURCE; http://bioweb.pasteur.fr/sequenal/interface/SreFormat.html

METHOD:

1. The home page of sequence conversion tool was opened by typing “sequence
conversion tool” in the google search tool bar.
2. Then the sequence format conversion hyperlink was clicked on open page.

3. SreFormat hyperlink was clicked to open this page.


4. A Protein sequence was taken in NCBI format and put in hyperlink Actual data
here.
5. Sreformat was run.
6. Format was converted into PIR format from hyperlink Convert into format.
7. Results in changed format were obtained and saved to notepad.
INPUT:
ACCESSION NO –NM_006272.2

18
Homo sapiens S100 calcium binding protein B

OUTPUT;

INTERPRETATION:

Given protein sequence was converted from NCBI to PIR format using SreFormat sequence
format conversions tool.

SMS

19
AIM: To convert the given sequence in GenBank format to the FASTA format.

DESCRIPTION:
 sequence into any type of the required format
 Using this tool, sequences in any format can be converted into the
following listed format. MVIEW tool is used to convert the given
 In this it has the input option and output option.

INPUT OPTION:
• Pearson/FASTA
• MSF(GCG)
• CLUSTALW
• Max Hom/ HSSP
• Plain
• Multa: MULTAS/MULTAL
• Mips: MIPS-ALN

OUTPUT OPTION:
• HTML
• GCG/MSF
• Pearson/FASTA
• PIR
• RDB table for storaqe/manipulation in relational database form

METHODOLOGY:
A. Given sequence in FASTA format is pasted in the table provided.
B. PIR format is selected from the options provided.
C. 3. Email I.D Is Provided when it is required.
D. Tool is performed and result obtained is saved.

INPUT:
Seq name: Rattus norvegicus
Accession number: :NM_053814

WEB PAGE

20
output

INTERPRETATION:
GenBank formatted query sequence had been converted into FASTA format by
using sequence conversion tool SMS

21
6. ORF FINDER

AIM: To find the open reading frame for the direct and the reverse strand

DESCRIPTION:
ORF Finder searches for open reading frames (ORFs) in the DNA sequence you
enter. The program returns the range of each ORF, along with its protein translation. Use
ORF Finder to search newly sequenced DNA for potential protein encoding segments. ORF
Finder supports the entire IUPAC alphabet and several genetic codes.

SOURCE:

www.bioinformatics.org/sms2/

INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

22
OUTPUT:

INTERPRETATION:

By using ORF finder tool we have fond out the open reading frame for Oryza sativa gene of
accesseion no: EF 183474.

HOMOLOGY SEARCH

The term sequence analysis in biology implies subjecting a DNA or peptide sequence to
sequence alignment, sequence database, repeated sequence searches or other bioinformatics
methods on a computer.

In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for


comparing primary biological sequence information, such as the amino-acid sequences
of different proteins or the nucleotides of DNA sequences. A BLAST search enables a
researcher to compare a query sequence with a library or database of sequences, and
identify library sequences that resemble the query sequence above a certain threshold.

23
The BLAST program can either be downloaded and run as a command-line utility
"blastall" or accessed for free over the web. The BLAST web server, hosted by the
NCBI, allows anyone with a web browser to perform similarity searches against
constantly updated databases of proteins and DNA that include most of the newly
sequenced organisms. BLAST is actually a family of programs (all included in the blastall
executable). The following are some of the programs, ranked mostly in order of importance:
Nucleotide-nucleotide BLAST (blastn) :This program, given a DNA query, returns the most
similar DNA sequences from the DNA database that the user specifies.
Protein-protein BLAST (blastp) :This program, given a protein query, returns the most
similar protein sequences from the protein database that the user specifies.
Nucleotide 6-frame translation-protein (blastx) :
This program compares the six-frame conceptual translation products of a nucleotide query
sequence (both strands) against a protein sequence database.
Nucleotide 6-frame translation-nucleotide 6-frame translation (tblastx) :
This program is the slowest of the BLAST family. It translates the query nucleotide sequence
in all six possible frames and compares it against the six-frame translations of a nucleotide
sequence database. The purpose of tblastx is to find very distant relationships between
nucleotide sequences.
Protein-nucleotide 6-frame translation (tblastn) :
This program compares a protein query against the six-frame translations of a nucleotide
sequence database.

NUCLEOTIDE BLAST

Search a nucleotide database using a nucleotide query

AIM: To search a nucleotide similar to more sequences.

DESCRIPTION :

BLAST is one of the most widely used bioinformatics programs, because it addresses a
fundamental problem and the algorithm emphasizes speed over sensitivity.

24
To run, BLAST requires two sequences as input: a query sequence (also called the target
sequence) and a sequence database. BLAST will find subsequences in the query that are
similar to subsequences in the database.
Nucleotide-nucleotide BLAST (blastn) :
This program, given a DNA query, returns the most similar DNA sequences from the DNA
database that the user specifies.

METHOD:

1. Go to NCBI home page.


2. Click on Blast.
3. Click on nucleotide blast.
4. Paste a query sequence in FASTA format.
5. Choose nucleotide collection (nr/nt) in the database.
6. Run blast.
7. Select the most similar sequence which has maximum identity percentage and least
‘e’ value.

SOURCE:
http://www.ncbi.nlm.nih.gov/blast/Blast.cgl

INPUT:

ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

25
OUTPUT:

26
INTERPRETATION:

By using nucleotide blast we are able to get nucleotide sequence with maximum similarity.
The accession number for homologous sequence is: AY532754
PROTEIN BLAST
Search a Protein database using a protein query

AIM: To search a protein similar to more sequences.

DESCRIPTION :

BLAST is one of the most widely used bioinformatics programs, because it addresses a
fundamental problem and the algorithm emphasizes speed over sensitivity.
To run, BLAST requires two sequences as input: a query sequence (also called the target
sequence) and a sequence database. BLAST will find subsequences in the query that are
similar to subsequences in the database.
Protein-protein BLAST (blastp):
This program, given a protein query, returns the most similar protein sequences from the
protein database that the user specifies

METHOD:

1. Go to NCBI home page.


2. Click on Blast.
3. Click on protein blast.
4. Paste a query sequence in FASTA format.
5. Choose protein collection (nr) in the database.
6. Run blast.
7. Select the most similar sequence which has maximum identity percentage and least
‘e’ value.

SOURCE:

http://www.ncbi.nlm.nih.gov/blast.cgi#24657901

27
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

OUTPUT:

28
INTERPRETATION:

By using proetin blast we are able to get protein sequence with maximum similarity.
The accession number for homologous sequence is: NP_563915
BLASTX

Search a protein database using a translated nucleotide query

AIM: To search a protein similar to more sequences.

DESCRIPTION :

BLAST is one of the most widely used bioinformatics programs, because it addresses a
fundamental problem and the algorithm emphasizes speed over sensitivity.
To run, BLAST requires two sequences as input: a query sequence (also called the target
sequence) and a sequence database. BLAST will find subsequences in the query that are
similar to subsequences in the database.
Nucleotide 6-frame translation-protein (blastx)
This program compares the six-frame conceptual translation products of a nucleotide query
sequence (both strands) against a protein sequence database.

METHOD:

1. Go to NCBI home page.


2. Click on Blast.
3. Click on blastx.
4. Paste a query EST sequence in FASTA format.
5. Choose non-reductant protein sequence (nr) in the database.
6. Run blast.
7. Select the most similar sequence which has maximum identity percentage and least
‘e’ value.

SOURCE:

http://www.ncbi.nih.gov/blast/Blast.cgi

29
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

OUTPUT:

30
INTERPRETATION:
By using blastx we are able to get protein sequence with maximum similarity.
The accession number for homologous sequence is: AAT40013

tBLAST N

Search a translated nucleotide database using a protein query

AIM: To search a translated nucleotide similar to more sequences.

DESCRIPTION :
BLAST is one of the most widely used bioinformatics programs[2], because it
addresses a fundamental problem and the algorithm emphasizes speed over
sensitivity.
Nucleotide 6-frame translation-nucleotide 6-frame translation (tblastx)
This program is the slowest of the BLAST family. It translates the query nucleotide
sequence in all six possible frames and compares it against the six-frame translations
of a nucleotide sequence database. The purpose of tblastx is to find very distant
relationships between nucleotide sequences.

METHOD:

1. Go to NCBI home page.


2. Click on Blast.
3. Click on tblastN..
4. Paste a query sequence in FASTA format.
5. Choose nucleotide collection (nr/nt) in the database.
6. Run blast.
7. Select the most similar sequence which has maximum identity percentage and
least ‘e’ value.

SOURCE:

31
http://www.ncbi.nlm.nih.gov/blast/Blast.cgi

INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

OUTPUT:

32
INTERPRETATION:

By using tblastN we are able to get translated nucleotide sequence with maximum similarity.
The accession number for homologous sequence is: NM_1135932

tBLASTX

Search a translated nucleotide database using a translated nucleotide query

AIM: To search a translated nucleotide similar to more sequences.

DESCRIPTION :

BLAST is one of the most widely used bioinformatics programs[2], because it addresses a
fundamental problem and the algorithm emphasizes speed over sensitivity.

33
To run, BLAST requires two sequences as input: a query sequence (also called the target
sequence) and a sequence database. BLAST will find subsequences in the query that are
similar to subsequences in the database.

Protein-nucleotide 6-frame translation (tblastn)


This program compares a protein query against the six-frame translations of a nucleotide
sequence database.

METHOD:

1. Go to NCBI home page.


2. Click on Blast.
3. Click on tblast..
4. Paste a query sequence in FASTA format.
5. Choose nucleotide collection (nr/nt) in the database.
6. Run blast.
7. Select the most similar sequence which has maximum identity percentage and least
‘e’ value.

SOURCE:

http://www.ncbi.nlm.nih.gov/blast/Blast.cgi

INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

34
OUTPUT:

INTERPRETATION:

By using tblastX we are able to get translated nucleotide sequence with maximum similarity.
The accession number for homologous sequence is: NM_001080098

35
FASTA

FASTA stands for FAST- all, reflecting the fact that it can be used for a fast

protein comparison or a fast nucleotide comparison. It is a DNA and protein sequence

alignment software package first described as (FASTAP) by David. J. Lipman and

William.R. Pearson in 1985. This program achieves a high level of sensitivity for

similarity searching at high speed. This is achieved by performing optimized searches

for local alignments using a substitution matrix. The high speed is achieved by using

the observed pattern of word hits to identify potential matches before attempting the

more time consuming optimized search. The trade – off between speed and sensitivity

is controlled by the ktup parameter, which specifies the size of the word. Increasing

the ktup decreases the number of background hits. Not every word hit is investigated

but instead initially looks for segment’s containing several nearby hits.

General FASTA Programs:

Tool Description

FASTA- protein Sequence similarity searching against protein

databases using FASTA.


FASTA- nucleotide. Sequence similarity searching against nucleotide

databases using FASTA.

36
FASTA – PROTEIN

AIM:-To find similarity in the protein sequences for the given query protein sequence

in any format using FASTA- Protein tool.

DESCRIPTION:-

It is about sequence similarity searching against protein databases using

FASTA. Provides sequence similarity searching against nucleotide and protein

databases using the FASTA programs. FASTA can be very specific when identifying

long regions of low similarity especially for highly diverged sequences. We can also

conduct sequence similarity searching against proteome or genome database using the

FASTA program.

SOURCE: htpp/www.ebi.ac.uk/Fasta33.

METHOD:

1. Type EBI in Google search (www.ebi.ac.uk/Fasta33).

2. Click on European Bioinformatics Institute.

3. Click on sequence similarity and analysis.

4. Click on FASTA.

5. Click on FATA Protein.

6. Paste or browse a protein sequence in any format in the sequence

submission box.

7. Click on Run FASTA3.

37
INPUT-

ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

OUTPUT:

38
INTERPRETATION:

Most similar sequence to the query protein sequence was obtained.

FASTA- NUCLEOTIDE

AIM:To find similarity in the nucleotide sequences for the given query nucleotide

sequence in any format using FASTA-Nucleotide tool.

DESCRIPTION:

It is about sequence similarity searching against nucleotide databases using

FASTA. It provides sequence similarity searching against nucleotide and protein

databases using the FASTA programs. FASTA can be very specific when identifying

long regions and low similarity especially for highly diverged sequence. We can

conduct sequence similarity searching against complete proteome or genome

databases using the FASTA program.

SOURCE:

http://www.ebi.ac.uk/fasta33/

METHOD:

1. Type EBI in Google search (www.ebi.ac.uk/Fasta33).

2. Click on European Bioinformatics Institute.

3. Click on sequence similarity and analysis.

4. Click on FASTA.

5. Click on FATA Protein.

39
6. Paste or browse a protein sequence in any format in the sequence

submission box.

7. Click on Run FASTA3.

INPUT:

ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

40
OUTPUT:

INTERPRETATION:

Most similar sequence to the query nucleotide sequence was obtained.

MULTIPLE SEQUENCE ALIGNMENT

Biologist often find a protein with approximately the same sequence in different
species, suggesting that the proteins have a closely related biological function and that the
gene encoding these protein have come from common genetic source. If we align theses genes
we find some are alike and some are almost identical.
As with aligning a pair of sequence, that difficulty in aligning a group of sequences varies
considerably, being much greater as the degree of sequence similarity decreases, when the
amount of sequence variation is great, it is difficult to find an optimal alignment of sequences
because so many combinations of substitutions, insertion and deletion, each predicting a
different alignment are possible.

Three commonly used program for multiple sequence alignment are:


 Clustal W
 T-Coffee
 Multalin

41
MULTIPLE SEQUENCE ALIGNMENT
USING CLUSTAL W

AIM: To align three sequences using Clustal W.

DESCRIPTION:

Clustal W is a more recent version of clustal with W standing for “weighting” to


represent the ability of the program to provide weights to the sequence and program to
parameters. Program is designed to provide an adequate alignment of a large number\ of more
closely related sequences and a reliable indication of the domain structure of sequences.
Once an alignment has been made, a phylogenetic tree can me made by the neighbour-joining
method.

METHOD:

1. Select more than two protein or nucleotide sequence from NCBI in FASTA format.
2. Copy the sequences and save in format.
3. Type Clustal W in google search bar.
4. Click on multiplae sequence aignmment- Clustal W.
5. Submit the sequence in enter sequence box.
6. Click on execute multiple alignment.
7. Copy and save the result on notepad.

SOURCE: http://www.ebi.ac.uk/Tools/clustalw2/index.html

42
INPUT:
ACCESSION NO: NM_008083
NM_001081278
NM_012519

43
OUTPUT:

RESULTS:

Multiple sequence alignment for three insuline protein was performed using Clustal W.

MULTIPLE SEQUENCE ALIGNMENT USING


T-COFFEE

AIM: To align three sequences using T-Coffee.

DESCRIPTION:
T-Coffee is an advanced pairwise alignment program that uses a system of sequences
position weights to generate an multiple sequence alignment that is the most consistent with
pair-wise alignments of all the component sequences ( T-Coffee stands for tree based
Consistency based objective function for alignment evaluation). T-Coffee is better than
Clustal W at reproducing known alignment of related proteins but is much slower.

METHOD:

44
1. Select more than two protein or nucleotide sequence from NCBI in FASTA format.
2. Copy the sequences and save in format.
3. Type T-Coffee in google search bar.
4. Click on multiplae sequence aignmment- T-Coffee
5. Submit the sequence in enter sequence box.
6. Run the program.
7. Copy and save the result on notepad.

SOURCE: http://www.ch.embnet.org/software/TCoffee.html

INPUT:
ACCESSION NO: NM_008083
NM_001081278
NM_012519

OUTPUT:

45
RESULTS:

Multiple sequence alignment for three insuline protein was performed using T-Coffee.

MULTIPLE SEQUENCE ALIGNMENT USING


MULTALIGN

46
AIM: To align three sequences using Multalign.

DESCRIPTION:
Multalign does a simultaneous alignments for two or more DNA or protein sequences.
It introduce a certain number of gaps into either pairwise aligned sequences to find minimal
global distance. The program is based on a generalization of the algorithm of Watermann-
Smith and Beyer by Kreger and Osterburg.

METHOD:

1. Select more than two protein or nucleotide sequence from NCBI in FASTA format.
2. Copy the sequences and save in format.
3. Type Multalign in google search bar.
4. Click on multiplae sequence aignmment- Multalign.
5. Submit the sequence in enter sequence box.
6. Run the program.
7. Copy and save the result on notepad.

SOURCE: http://bioinfo.genopole-toulouse.prd.fr/multalin/multalin.html

INPUT:
ACCESSION NO - NM 14646
NM 001122899
NM 010704

47
OUTPUT:

INTERPRETATION:

Multiple sequence alignment for three insuline protein was performed using
multalign.

48
GENE PREDICTION

With the advent of whole genome sequencing projects, It has become routine to scan
genomic DNA sequences t find genes, particularly those that encode protein. Computational
methods for gene prediction work by searching through sequences to locate the most likely
ones that encodes proteins. Predicating protein-encoding genes is generally easier in
prokaryotes than in eukaryotic organisms because prokaryotic generally lack introns and
because several quite highly conserved sequences are found in the promoter region and
around the start sites of transcription and translation.

Three commonly used programs for gene prediction are:-


• Webgene
• Genmark
• Genscan

GENE PREDICTION USING WEBGENE

AIM:-To predict the features of eukaryotic gene using webgene.

DESCRIPTION:-

WebGene is a tool which publishes family history information on the Web. It


publishes this information from a standard file type used typically to exchange data
between genealogy software applications Rex Myer is the founder of WebGene and it
has been online since fall of 1995. WebGene indexes the information in the
GEDCOM file and presents it in an appealing graphical format suitable for the
Internet. Further, it enables the lookup and cross-referencing of surnames and family
relationships.

SOURCE:-

http://www.itb.cnr.it/sun/webgene/

METHOD:-

49
1. Sequence of human insulin was retrieved from NCBI and saved in note pad.
2. On google search bar ,webgene was typed.
3. Webgene home page was opened.

I. Gene builder:-

 Gene builder bar was clicked.


 Shown parameters on webpage were set.
 Sequence was pasted in the given box.
 Analysis was run
 Results were saved.

II. Repeat view

 Repeat View bar was clicked.


 Shown parameters on webpage were set.
 Sequence was pasted in the given box.
 Analysis was run
 Results were saved

III. CpG island

 CpG bar was clicked.


 Shown parameters on webpage were set.
 Sequence was pasted in the given box.
 Analysis was run
 Results were saved.

IV. Splice View

 Splice view bar was clicked.


 Shown parameters on webpage were set.

50
 Sequence was pasted in the given box.
 Analysis was run
 Results were saved.

V. HC polyA

 HC polyA bar was clicked.


 Shown parameters on webpage were set.
 Sequence was pasted in the given box.
 Analysis was run
 Results were saved.

VI . Hctata

 HCtata bar was clicked.


 Shown parameters on webpage were set.
 Sequence was pasted in the given box.
 Analysis was run
 Results were saved

VII . Gen view2

 Gene view2 bar was clicked.


 Shown parameters on webpage were set.
 Sequence was pasted in the given box.
 Analysis was run
 Results were saved.

VIII . AUG_evaluator

 Gene builder bar was clicked.


 Shown parameters on webpage were set.
 Sequence was pasted in the given box.

51
 Analysis was run
 Results were saved.

Input gene builder

OUTPUT: Gene Builder

52
OUTPUT: Repeat-View

OUTPUT: CpG island

53
OUTPUT: Splice View

OUTPUT: HC polyA

54
OUTPUT: . Hctata

55
OUTPUT: Gen view2

OUTPUT: AUG_evaluator

56
RESULTS AND INTERPRETATION:

Eight programs of Webgene were run for human insulin gene to predict:
Gene builder- protein coding gene.
Repeat view- repeated element mapping.
CpG island- CpG island.
Splice view- Splicing signal.
HcpolyA- for PolyA.
Hctata- for TATA signal prediction.
Genview- protein coding gene.
AUG_evaluator- start codon.

GENE PREDICTION USING GENMARK

AIM:-To predict the features of eukaryotic gene using genemark.

DESCRIPTION:-
The GeneMark. hmm algorithm presented here was designed to improve the
gene prediction quality in terms of finding exact gene boundaries. The high gene
finding accuracy has been found with genmark. This program also use the specially
derived ribosome binding site pattern to refine predictions of translation initiation
codons.

SOURCE:-

http://exon.gatech.edu/Genmark/genmark_prok_gms_plus.cgi

METHOD:-
1. Sequence of prokaryotic gene was retrieved from NCBI and saved in note pad.
2. On google search bar ,genmark was typed.

57
3. Genmark home page was opened.
4. Sequence was pasted in box.
5. Analysis was done.
6. Results were saved.

INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

OUTPUT:

58
INTERPRETATION:

Genmark program was run to predict gene of prokaryotes.Result was saved.


GENE PREDICTION USING GENSCAN

AIM:-To predict the features of eukaryotic gene using Genescan.

DESCRIPTION:-
Genescan is an example of an approaches for gene prediction which integrate
multiple types of information including splice signal sensors, compositional properties of
coding and non-coding DNA and in some cases database homology searching in order to
predict entire gene structures (sets of spliceable exons) in genomic sequences. Genescan use
distinct, explicit, empirically derived sets of model parameters to capture differences in gene
structure and composition between distinct C . G compositional regions (isochores) of the
human genome. It also has the capacity to predict multiple genes in a sequence, to deal with
partial as well as complete genes, and to predict consistent sets of genes occuring on either or
both DNA strands.

59
SOURCE :-

http://genome.dkfz-heidelberg.de/cgi-bin/GENSCAN/genscan.call.cgi

METHOD:-

1. Sequence of human insulin was retrieved from NCBI and saved in note pad.
2. On google search bar genescan was typed.
3. Genescan home page was opened.
4. Sequence was pasted in box.
5. Analysis was done.
6. Results were saved.
INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

OUTPUT:

60
INTERPRETATION:
Genscan program was run to predict gene of eukaryotes.
Result was saved.

PATTERNS AND PROFILE SEARCH OF PROTEINS

AIM: To search patterns and profiles of given protein sequences using various EXPASy
tools.

DESCRIPTION:

The Expasy (Expert Protein Analysis System) is a proteomics server of the Swiss
Institute of Bioinformatics (SIB) which analyzes protein sequences and structures and two
dimensional electrophoresis. The server functions in collaboration with the EBI. Expasy also
produes the protein sequence knowledge base Uniprot and Swissprot.

For the prediction of patterns and profiles of proteins Expasy produces tools like

1. ELM

61
2. FingerPRINTScan
3. Motif Scan
4. Proscan
5. PRATT

Profiles are numerical representation of a multiple sequence alignment. Profiles help find the
similarities between these sequences and help in identification and analysis of distant related
proteins.

Patterns also represent the common characterstics of a protein family but it does not contain
any weighing information. Thus, the user can specify what kind of patterns should be
searched for, and how many sequences should match a pattern to be repeated- there are option
fot pattern conservation, restrictions, number of pattern symbols, flexible spacers etc.

Prosite
AIM: To perform profile and pattern search using Prosite tool.

DESCRIPTION: PROSITE consists of documentation entries describing protein


domains, families and functional sites as well as associated patterns and profiles to
identify them

SOURCE: http://www.expasy.ch/prosite/

METHOD:

1. A protein query sequence is retrieved from NCBI in FASTA format.


2. The retrieved protein sequence is pasted on the Prosite submission form.
3. Click the scan button
4. The tool Prosite was run and the results viewed by clicking on Rich view and saved.

INPUT:

62
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

OUTPUT:

63
INTERPRETATION:

UsingProsite tool we are able to predict the secondary structure for


cellulase ; AAA23226B.Tool has shown the number of disulphide bridges, active sites and
other details of protein structure.

ELM

AIM: To perform profile and pattern search using ELM tool.

DESCRIPTION:

64
ELM stands for Eukaryotic Linear Motif search and is a resource for finding functional
sites in proteins. It can find Pfam domain, signal peptide, coiled coil prediction,
transmembrane helix as well as loop, helix and strand prediction.

SOURCE:

http://elm.eu.org/

METHOD:

1. A protein query sequence is retrieved from NCBI in FASTA format.


2. The retrieved protein sequence is pasted on the ELM submission form.
3. The e-mail id was entered.
4. The tool ELM was run and the results viewed and saved.

INPUT:

ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

65
OUTPUT:

INPRETATION:

Using EML tool we are able to find number of helixes, strands, loops which are present in
the secondary structure of chitinase , Accession No. AAA32461

FingerPRINTScan

AIM: To perform profile and pattern search using FingerPRINTScan tool.

DESCRIPTION:

66
FingerPRINTScan tool scans a protein sequence against the PRINTS protein
finger database. It tells the number of motifs matched to the query sequence, its length
and position.

SOURCE:

http://www.bioinf.man.ac.uk/fingerPRINTScan/

METHOD:

1. A protein query sequence is retrieved from NCBI in FASTA format.


2. The retrieved protein sequence is pasted on the FingerPRINTScan submission
form.
3. The e-mail id was entered.
4. The tool FingerPRINTScan was run and the results viewed and saved.

INPUT:

ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

67
OUTPUT:

68
INFERENCE:

The Fingerprint scan tool was used in order to find out the number of motifs and their
positionsin the sequence

Motif Scan

AIM: To perform profile and pattern search using Motif Scan tool.

DESCRIPTION:

Motif or family comparisons are more sensitive because motifs represent a higher
level generalization of the features that are imporatnat for a given structural or functional
feature. This tool scans a sequence against protein profile databases [including PROSITE].

SOURCE:

http://mybits.icb.sib.ch/cgi-bin/motif-scan

METHODOLGY:

1. A protein query sequence is retrieved from NCBI in FASTA format.


2. The retrieved protein sequence is pasted on the Motif Scan submission form.
3. The e-mail id was entered.
4. The tool Motif Scan was run and the results viewed and saved.

INPUT:

ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

69
OUTPUT:

70
INTERPRETATION:

we are able to find number of helixes, strands, loops which are present in the secondary
structure of chitinase , Accession No. AAA32461 by the using of motif scan

PROSCAN
AIM: To perform profile and pattern search using PROSCAN tool.

DESCRIPTION:

This tool developed and run by PBIL in University of Lyon, France scans a sequence
against PROITE and allows mismatches as well. It can give information regarding
phosphorylation, amidation or any other specific identity characterstic of the given sequence.

SOURCE:

http://npsa-phil.ibcp.fr/cgi-bin/npsa_automat.pI?page=npsa_prosite.html

METHOD:

1. A protein query sequence is retrieved from NCBI in FASTA format.

71
2. The retrieved protein sequence is pasted on the PROSCAN submission form.
3. The e-mail id was entered.
4. The tool PROSCAN was run and the results viewed and saved.

INPUT:

ACCESSION NO –IH4P B

CHAIN B , CRYSTAL PROTEIN

72
OUTPUT:

INTERPRETATION:

Using the tool proScan the functional sites of a protein sequence can be
found . The results are viewed and saved.

73
VISUALIZATION OF PROTEIN STRUCTURE BY USING
RASMOL
AIM: To visualize the structure of protein sequence by using visualization tool RasMol.

DESCRIPTION:
RasMol 2 is a molecular graphics program intended for the visualization of proteins,
nucleic acids and small molecules. The program is aimed at display, teaching and generation
of publication quality images. RasMol runs on Microsoft Windows, Apple, Macintosh, UNIX
and VMS systems. The UNIX and VMS systems require an 8,24 or 32 bit colour X Windows
display (X11R4 or later). The program reads in a molecule co-ordinate file and interactively
displays the molecule on the screen in a variety of colour schemes and molecular
representations. Currently available representations include depth cued wireframes, ‘drieding’
sticks, spacefilling (CPK) spheres, ball and stick, solid and strand biomolecular ribbons, atom
labels and dot surfaces.

SOURCES:

1. http://wbiomed.curtin.edu.au/teach/biochem/help/download.html
2. http://mc2.cchem.berkeley.edu/rasmol/v2.6/
protein structure (.pdb) http://www.pdb.org/pdb/home/home.do

METHOD:

1. The NCBI website is logged on.


2. The given accession no. is entered and searched for it. The nucleotide sequence is
got from the CoreNucleotide database.
3. The pdb id is collected for the given sequence in the CDS section of the
sequence. PDB ID found is.eg.2MM1
4. The PDB website is logged on.
5. The pdb id .is entered and searched for it.
6. The .pdb.gz file is downloaded from the options on the left of the page.
7. The .pdb was extracted from the .pdb.gz file.

74
8. This .pdb file was opened using RasMol.
9. The structure is viewed with different Display options like wireframe, Backbone,
Sticks, Spacefill, Ball & Stick, Ribbons, Strands, cartoons that are available on
RasMol.
10. In RasMol Command Line, some of the commands like “select helix’ and
“colour yellow” are used to view helix structure in that molecule.
11. several other commands can also be used like “set picking distance”, “set picking
angle”, set picking tortion”, etc.

INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

OUTPUT:

75
INTERPRETATION:

1. This protein has total of …atoms.


2. This protein has ..helix structure with …atoms.
3. this protein has no sheets or loops…
4. This protein has ..HOH molecues.

Picture with ligands

76
SECONDARY STRUCTURE PREDICTION

AIM: Secondary structure prediction of the given protein sequences using Expasy tools.

DESCRIPTION:

The Expasy (Expert Protein Analysis System) is a proteomics server of the Swiss
Institute of Bioinformatics (SIB) which analyzes protein sequences and structures and two
dimensional electrophoresis. The server functions in collaboration with the EBI. Expasy also
produes the protein sequence knowledge base Uniprot and Swissprot.

For the prediction of secondary structure of proteins Expasy produces tools like

1. GOR
2. HNN
3. SOPMA
4. JPred
5. GOR

GOR
AIM: To predict secondary structure of a given protein using GOR tool from Expasy.

DESCRIPTION:

GOR predicts the secondary structure of a given amino acid by looking at a window
of 8 amino acids before and 8 after the position of interest. This program (named after
Garnier, Osguthorpe and Robson) is in its fourth version.

SOURCE:

http://npsa-pbil.ibcp.fr.cgi-bin/npsa_automat.pi?page=npsa_gor4.html

METHOD:

5. A protein query sequence is retrieved from NCBI in FASTA format.


6. The retrieved protein sequence is pasted on the GOR4 submission form.
7. The e-mail id was entered.
8. The tool GOR4 was run and the results viewed and saved.

77
INPUT:

ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

OUTPUT:

78
INTERPRETATION:

Using GOR tool we are able to predict the secondary structure for chitinase Accession no.
IH4P B.Tool has shown the number of helixes , alpha helixes and beta bridges and other
details of protein structure.

HNN

79
AIM: To predict secondary structure of a given protein using HNN tool from Expasy.

DESCRIPTION:

Hierarchial Neural Networks can be used to predict protein structure. The protein
sequence is translated into patterns by shifting a window of n adjacent residues(typical value
of n=13-21) through the protein.

SOURCE:

http://npsa-pbil.ibcp.fr.cgi-bin/npsa_automat.pi?page=npsa_nn.html

METHOD:

1. A protein query sequence is retrieved from NCBI in FASTA format.


2. The retrieved protein sequence is pasted on the HNN submission form.
3. The e-mail id was entered.
4. The tool HNN was run and the results viewed and saved.

INPUT:

ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

OUTPUT:

80
INTERPRETATION:

Using HNN tool we are able to prerdict the secondary structure for chitinase Accession
number.JH4P B.

Tool has predicted the number of helixes, alpha helixes and beta bridges and other details of
protein structure.

81
SOPMA
AIM: To predict secondary structure of a given protein using SOPMA tool from Expasy.

DESCRIPTION:

SOPMA is a secondary structure prediction program ( Self Optimized Prediction


Method) that uses multiple alignments. SOPMA correctly predicts 69.57% of amino acids for
a secondary structure (alpha helix, beta sheet and coil).

SOURCE:

http://npsa_pbil.ibcp.fr.cgi_bin/npsa_automat.pi?page=npsa_sopma.html

METHOD:

1. A protein query sequence is retrieved from NCBI in FASTA format.


2. The retrieved protein sequence is pasted on the SOPMA submission form.
3. The e-mail id was entered.
4. The tool SOPMA was run and the results viewed and saved.

INPUT:

ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

82
OUTPUT:

83
INTERPRETATION:

Using SOPMA tool we are able to prerdict the secondary structure for chain b Accession
number. JH4P B.

Tool has predicted the number of helixes, alpha helixes and beta bridges and other details of
protein structure.

JPred

AIM: To predict secondary structure of a given protein using JPred tool from Expasy.

DESCRIPTION:

It is a consensus to find secondary structure of protein put forth by University of


Dundee.

SOURCE:

http://www.compbio.dundee.ac.uk/~www-jpred/

METHOD:

84
1. A protein query sequence is retrieved from NCBI in FASTA format.
2. The retrieved protein sequence is pasted on the JPred submission form.
3. The e-mail id was entered.
4. The tool JPred was run and the results viewed and saved.

INPUT:

ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

OUTPUT:

85
INTERPRETATION:

Using Jpred tool we are able to prerdict the secondary structure for chitinase Accession
number. AA32461.

TO COSTRUCT THE PHYLOGENTIC REALTIONSHIP

BETWEEN DIFFERENT ORGANISMS

AIM: To draw the phylogenetic tree of the given sequences using the software phylodraw.

DESCRIPTION:

86
The sequences whose phylogenetic relationship is to be known are retrieved from NCBI by
keyword search or by the accession number. The tool Phylodraw available on the net is used
for drawing the phylogenetic tree. The input format is Dialign which is obtained by doing a
multiple sequence alignment using the dialign tool.
For this phylogenetic treedrawing Phylodraw and Dialign are the tools used.

SOURCES:

Dialign: http://bibiserve.techfak.uni-bielefeld.de/dialign/sumission.html
Phylodaw: http://pearl.cs.pusan.ac.kr/phylodraw/
NCBI: www.ncbi.nlm.nih.gov

METHOD :

1. The sequences with the following accession numbers are retrieved from the NCBIs
biological database.
2. The sequences are used as the input in the Dialign tool for multiple sequence alignment.
3. The output and the result of dialign is used as the input in the phylodraw tool.
4. Phylodraw is the tool used to draw phylogeetic trees. It has the following types of trees.
a. Unrooted tree
b. Rooted tree
c. Radial tree
d. Slated cladogram
e. Rectangular cladogram
f. Phylogram.
5. The results are displayed in Radial tree, Slated cladogram, rectangle cladogram and
Phylogram tree formats.

INPUT:
ACCESSION NO –NM_006272.2
Homo sapiens S100 calcium binding protein B

87
PHYLODRAW INPUT:

88
OUTPUT:

89
The phylodraw tool is used to draw the phylogenetic tree of genetically related species. It
can display the trees in various formats.

The tree formats thet are displayed are:


a. Radial tree
Slated cladogram

90
Rectangular cladogram
Phylogram

INTERPRETATION:

The phylogenetic tree for the sequences has been drawn using the Phylodraw tool with
the result of Dialign as the input.

91

Das könnte Ihnen auch gefallen