Sie sind auf Seite 1von 4

BIOL450 Introduction to Bioinformatics

Computer Lab #4
NCBI Advanced BLAST searching

Name:_____________________

Part 1. For these exercises, you will be a botanist working on specific gene sequences in
plant genomes. Answer the following problem sets using the tools and information
youve learned in this class.

Problem 1
A. Molecular biologists in your advisors lab have isolated the following sequence. Use
NCBI BLAST tools to analyze it.
>unknown 1
CACACGAGCCAAGGAGCTGGATATTAACTGCGTTGCGGGACGTTTAAGAAGAGATCGGGGACTGCGGGCT
GTGTTGCCAAAATTGAGGCTGTCCCGCGAAAAGTGCTACATTTGGGCGGGTTGAAATTAGAAATGCTGCT
GTGTCACAGAAACTTGAGCTGAAGCTTTCTTCTAATAGACCATCCTGCAACCAATTAATTTCCTGGGCTA
AAGTGCTTGTAAGATACATCTACAAATGTCTCCTACGAGGCTCAAGTGTGACCTTTTCTTCTCCTTTCCA
TAAGCATACATAAGATGAGCTCCACTGCAGACAGAGATGACGGTAGTATATTTGATGGACTGGTAGAAGA
AGATGACAAGGATAAAGCAAAGAGGGTGTCTCGGAACAAATCGGAGAAACGGAGAAGAGATCAGTTCAAT
ATTCTCATTAAGGAGCTTGGGTCTATGTTGCCAGGCAATGCCAGAAGAATAGACAAATCCACAGTGCTGC
AGAAAAGCATTGACTTCCTACAGAAGCATAAAGAAATCAGTGCACAGTCGGATGCTAGCGAAATACGACA
AGACTGGAAACCTACTTTCCTTAGTAATGAAGAATTTACTCAACTCATGCTGGAGGCGCTGGATGGATTT
TTCCTGGCAGTTATGACAGATGGAAACATAATATATGTATCTGAAAGCGTGACCTCCTTACTTGAACACT
TACCATCTGATCTTGTTGATCAGAGTATATTTAATTTTGTCCCCGAGGGGGAACATTCAGAAGTGTATAA
AATTCTGTCTACGCGGATGTTGGAAAGTGGTTCTTTGTCATCGGAGTATCTTAAAACAAAAAATGAACTA
GAATTCTGTTGCCACATGCTAAGAGGCACAGCCGATCCTAAAGAGCCATCAACGTACGAATTTGTAAAGT
TTATTGGAAACTTCAAGTCTTTAAATAATGTGCCCAACTCTACACATAATGGATTTGATGGTGCATTACA
GAGGTCTCTGCGGCCACCCTATGAAGAGAGAGTGTGCTTTGTAGCCACTGTAAGGCTAGCTACTCCACAG
TTCATTAAGGAAATGTGCACTGTAGAGGAATCCAATGAAGAGTTCACATCAAGGCACAGTCTGGAATGGA
AGTTTCTTTTCTTGGACCACAGGGCCCCTCCAATCATTGGATATTTGCCTTTTGAGGTGCTAGGAACTTC
TGGCTATGATTATTATCACGTGGATGACCTGGAAAACCTTGCAAAATGCCATGAACACTTAATGCAGTAT
GGAAAAGGCAAGTCATGCTATTACAGGTTCCTGACAAAAGGACAGCAATGGATATGGCTGCAGACGCGCT
ACTATATCACCTACCATCAGTGGAATTCCAGGCCAGAGTTTATAGTTTGTACGCATACTGTAGTAAGCTA
TGCAGAGGTTGGAGCAGAAAGAAGACGTGAGCGGGGCAATGAAGATTCCCCTCCTGCCATAACTGCAGAA
AAAAATCAGGACTCTGTCTCAGACAATCACATGAACACAGTCAGTCTGAAGGAAGCTTTGGAAAGATTTG
ATGACAGCCGAACGCCTTCACCGTCATCCAAAAGCTCAATTAAATCATCCTCTCACACAGCAGTTTCCGA
CCCATCATCAACGCCAACAAAGATCCCAACAGACACAAGCACACCACCTAGACAAGCTTTAACTGGCCTT
GACAAGAGGAGGTCATCAATCAGCAGCCAGTCTATGAGCTCTCAGTCAGTCAGTCAGCCTCTGTCGCAGT
CAGTGATGAAGCAAACAGCATCTATTCAGCTCCAGCAAGGAATGACACAGCCCATGTTTCAGTTCACGGC
GCAGTTTGGAGCTATGAAGCACCTGAAGGACCAGCTGGAGCAGAGAACTCGGATCATTGAAGAAAACATC
CAGCGGCAGCAGGAGGAGCTGCGTAAGATTCAGGACCAGTTGCAAATGGTCCATGGGCAGGGAATCCAGA
TGTATTTACAGCAACCAGCACCAGGGCTAAATTTTGGGCCAGTACAGTACTCTTCTGGAAATAGCCCAAG
CATTCAGCAGTTACCGCAGTTCACCATGCAAGGCCAAGTGGTCCAGACTAACCAACTTCAAGGTGTCATG
AACACTGGGCATGTGGGAGCTCAGCACATAATGCAGCAGCAACAGTTACACAATACAAGCCAGCAAGGCC
AGCAGAACATTCATGGAGGACACAACCAGCAGACTTCACTGTCTAGCCAGACATCGGGAACTCTCACTAG
TCCTTTGTACAACACAATGGTGATCTCCCAGCCACCTTCAGGTAGTATGGTACAGATGCCTTCCAACATA
CAGCAGAGCAACCAGGGAGCCTCTGTTACTACATTTCCTCAGGACAGGCAGATTAGGTTTTCTCAAGCTC
AGCAAATTGTCACTAAGCTGGTGACCACTCCTATGGCTTGTGGAACAGTTATGGTCCCAAGCACAATGTT

TATGGGCCCAGTGGTTACTGCCTACCCTACATTTACTACACAGCAGCAGCAACCTCAAACCTTATCATTC
ACCCAGCATCAACAAAATCAGCAAGACCAACAGACAGTGCCTACAATACAGCAGCCTGCTCACGCACAAC
TAGCCCAGCAGCCACAACAGTTTTTGCAGACCTCGAGGTTGGTTCATGGAAACCAGTCCACACAGCTGAT
TCTTTCCCCTTTCCCTGTGCAACAGAACACTTTTGCCCCATCCCACCAGCAGCAGTTATCCCATCACAGG
ACTGACACTATGAGTGATCCTTCCAAGGTGCAGCAACAGTAGTACCACTATGAGTCCCTTTAAACTGCTG
TCTGAAAGAGATGACAGAATTTGGCGGAATTATGGCGTTTATAATCACTATGGATACAGCGCAATGAATG
TTTGGTACGAGGCAGCAGAGCTCGCTGAGATCAGGCACCAACTGAAACTCGGCATGTGCGATGCTCTGCT
GCACTACAGACTAGTCAGGCGTAGAACATAATAGACGTGCAATGCTCTAGACAACTCTGCTACTGGAGAT
GCCTTCAACAACCACTAGGACAGGCCCCCGCTATATATTATATTCTTCTATTTCTGCCCTTATGCTGAGG
GAATATATTGCTCTCTCTCGGCACGAGC

1. What organism is the gene likely from?

2. Which BLAST tool did you use?

3. Is there a Reference Sequence in the list? If so, what is it?

4. Is there a Reference Sequence for a human gene in the list? If so, what is it?

5. Your PhD advisor asks you to find out information on the protein the gene encodes.
What BLAST tool could you use to do this?

6. What is the Reference Sequence for the protein with the highest score?

Problem 2
You have isolated the following clone from a cDNA library.
>unknown 2
ATGAGCTCTTATTTTCTGAAATCGACTACATGCAGAAAAGAGTAAGAAATAAATCTTAACTTAATTTCTA
TNCTTAATATCTTCTTATAATAATTTNGCTCATTTGCATTGCACCGATTAAGCAGGAAGTTGATTTGCAT
AACGATAACCAGATTCTTCGTGCAAAGGTAAAAGTACACATTATATATCAATTTCTATAATTTTCCCTAT
TGTTTCAATAAATATTATATGAAAACGATTAAATGGTGAAAGTACTTGTTAGAATAAATAGTTTATGAAC

7. Which organism does it come from?

8. Is there a Reference Sequence in the list? If so, what is it?

9. What is the Reference Sequence for the protein with the highest score?

Problem 3
You are interested in the repair of broken DNA ends. A protein called Ku80 is known to
bind broken dsDNA ends. Ku80 has been studied extensively in fruit flies, but never
before in plants. Your advisor asks you to identify a gene from Arabidopsis thaliana that
might be similar to the drosophila Ku80 gene.
>gi|7546991|gb|AAF63744.1|AF237722_1 Ku80 [Drosophila melanogaster]
MASNKECLIIVLDVRTCAAEEVKLKSAKCVAEILKDKIVCDRKDYVSFVLVGCDTDEIKTEDASHPNVVP
FGEPRLCSWQLLLEFFQFVNKTACEDGEWLNGLQAALKLQSVAATLRVARRRILLLFDFNDFPQDYEKFN
EITDELLGENIDLIVGTHNIAYIDNAITSQPQAIFNFSRKCGPDELNNQKYALSLVPRCNATLCSFKEAL
HTVFKVTNRRPWVWNAKLNIGSKISISLQGIIAMKNQTPVKLVKVWAEKDEIVIRETRHYIKGTEITPLP
ENLITGYMLGGTPVPYDEAVLEPKEPHPPGLHFFGFIKRNAVPDEYFCGESLYLLVHQKHNQSAAVKLDA
LVRALVSSDRAILCWKIYSTKFNRPQMVVLLPRLTDDTHPATLYMLEVSYTSQHHFWDFPALRTTKTECS
EEQLNAIDQLIDSTDLECTLRDTQQPRPWAQNDLLPFDALPSIFEQNVMDILERKVIYDNDKEDKMLKDK
NFADVFWRVPDPLEEKSKRAAAIVKKLFPLRYSRAWQEKLLAKEQAENGVAVKSEPAEKEIPMPSDGVGL
IDPVSDFRRVLASVHTISNATERDARFQTLAADTRVVIITLLQRRKQNIGQLGELITLYRQSCIDFNTFL
EYDKFAEELKKIALAKNRSEFWQDVMVDKQLGPLVLGEPTLDDELALKAYYTIENWAESGANDMEDVEM

10. How many Arabidopsis sequences produce significant alignments?

11. How many Arabidopsis orthologs have >50% coverage of the query sequence?

12. What is the accession number for the protein that is most likely the best ortholog?

13. Which BLAST program can you use to look at the DNA sequences that could
possibly code for this amino acid sequence?

14. What is the accession number for the nucleotide sequence from Arabidopsis thaliana
that is most closely associated with this protein?

15. Based on the BLAST scores, do you think this gene is the likely source for the
protein youve been researching?

16. Why or why not?

Das könnte Ihnen auch gefallen