Beruflich Dokumente
Kultur Dokumente
Final Examination
VERSION #1
INSTRUCTIONS
CLOSED BOOK X OPEN BOOK
SINGLE-SIDED PRINTED ON BOTH SIDES OF THE PAGE X
MULTIPLE CHOICE X
Note: The Examination Security Monitor Program detects pairs of students with unusually similar
answer patterns on multiple-choice exams. Data generated by this program can be used as admissible
EXAM: evidence, either to initiate or corroborate an investigation or a charge of cheating under Section 16 of
the Code of Student Conduct and Disciplinary Procedures.
ANSWER IN BOOKLET EXTRA BOOKLETS PERMITTED: YES NO
ANSWER ON EXAM X
SHOULD THE EXAM BE: RETURNED X KEPT BY STUDENT
CRIB SHEETS:
Specifications: Single double-side page, 8.5 inches x
11 inches
DICTIONARIES: TRANSLATION ONLY REGULAR X NONE
CALCULATORS: NOT PERMITTED X PERMITTED (Non-Programmable)
• For each multiple choice question, fill ALL bubbles corresponding to correct
ANY SPECIAL answer(s). There is always at least one correct answer. If you think no
INSTRUCTIONS: answer is correct, indicate the answer you feel is the closest to being correct.
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page number: 1 / 28
Multiple choice questions (27 points). ANSWER ON SCANTRON SHEET.
1. (3 points) Which of the following is a valid key for a Dictionary object? Circle ALL valid answers.
A. String
B. List
C. Tuple
D. Dictionary
E. NumPy ndarray
1 class Bus():
2 def __init__(self, busID, stationID):
3 self.bus_id = busID
4 self.station_id = stationID
5 passengers = []
6 self.nb_passengers = len(passengers)
What are the attributes of the class Bus? Circle ALL valid answers.
A. bus_id
B. station_id
C. passengers
D. nb_passengers
E. self
3. (3 points) What are the correct boolean condition to test if a string s starts with “cat" and ends with “dog"?
Circle ALL valid answers. You may assume the module re has already been imported.
A. s[0:3]=="cat" and s[len(s)-3:len(s)]=="dog"
B. s[0:3]=="cat" and s[-3:]=="dog"
C. re.search("^cat.*dog$", s)
D. re.search("cat.*dog", s)
E. s == "^cat.*dog$"
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 2 Page number: 2 / 28
4. (3 points) What will be printed when this program is executed? Circle ONE of the 5 choices.
A. 1
B. 2
C. 3
D. 4
E. None of the above
5. (3 points) What will be printed when this program is executed? Circle ONE of the 5 choices.
A. 1
B. 2
C. [2]
D. 4
E. [4]
6. (3 points) Given match score 3, mismatch score -2, and gap score -1. What’s the alignment score between
sequences ATC and AAC based on the Needle-Wunsch global sequence alignments we learned in Assignment
2. Choose ONE correct answer.
A. 1
B. 2
C. 3
D. 4
E. 5
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 3 Page number: 3 / 28
7. (3 points) What will be printed when this program is executed? Circle ONE of the 5 choices.
1 class Animal():
2 def __init__(self):
3 self.legs = 3
4
5 class Predator(Animal):
6 def __init__(self):
7 Animal.__init__(self)
8 self.claws = 4
9 def grow_claws(self):
10 claws = self.claws + 1
11
12 pred = Predator()
13 pred.grow_claws()
14 print(pred.legs, pred.claws)
A. "legs" "claws"
B. 3 4
C. 3 5
D. AttributeError: ’Predator’ object has no attribute ’legs’
E. AttributeError: ’Predator’ object has no attribute ’claws’
8. (3 points) Which one of the following descriptions is the most closely related to supervised learning? Choose
ONE correct choice.
A. (1) an objective function; (2) an algorithm that optimizes that objective function.
B. (1) a fixed set of rules; (2) an efficient program that implements those rules.
C. (1) representing the problem as an input data matrix X and a label vector Y; (2) find the best
function to predict Y given X.
D. (1) representing the problem as an input data matrix X; (2) find the best way to cluster X into K clusters.
E. (1) a reward function; (2) a set of actions for an autonomous agent to react on its own in an unknown
environment in order to maximize the reward function.
9. (3 points) In machine learning, one special case in K-fold cross validation is called leave-one-out cross
validation (LOOCV). Suppose we have N data points in our dataset. In the LOOCV setting, the data set
is divided into N folds, i.e., the number of folds is the same as the number of data points (K=N). After
completing the LOOCV, how many times each data point has been used for training and how many times
each data point has been used for validation? Choose ONE correct answer.
A. One time for training; one time for validation
B. N times for training example; N times for validation
C. N - 1 times for training; one time for validation
D. N - 1 times for training example; N - 1 times for validation
E. N times for training example; N - 1 times for validation
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 4 Page number: 4 / 28
Short answer questions (33 points + 5 bonus points)
10. (4 points) Describe how to use linear search and binary search algorithm that we have learned in class to
find the number 7 in the list [0,2,3,7,10]. You can describe what number is being picked up and compared
with the target number (i.e., 7) at each iteration of linear and binary search algorithm.
a) [2 points] linear search:
Solution: 3 at index 2 is picked up and compared with 7, and then 7 is found at index 3
11. (4 points) Describe how to use selection and insertion sort algorithms that we have learned in class to sort
the list [3,10,2,7,8]. You can write down the intermediately sorted list at each iteration for each sorting
algorithm.
a) [2 points] selection sort:
Solution: starting from [3,10,2,7,8], then [3,10,2,7,8] (nothing change because 10>3), [2,3,10,7,8],
[2,3,10,7,8], [2,3,7,10,8], [2,3,7,8,10]
12. (4 points) The following function is supposed to flip the image up side down. However, it has one or more
errors. Describe how to fix these errors by directly correcting the code.
1 import skimage.io as io
2 image = io.imread("myimage.jpg")
3 def upsidedown_fixme(image):
4 n_row, n_col = image.shape
5 for i in range(0,n_row):
6 for j in range(0,n_col):
7 t = image[i,j]
8 image[i,j] = image[n_row-i-1,j]
9 image[n_row-i-1, j] = t
10 return image
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 5 Page number: 5 / 28
Solution:
1 import skimage.io as io
2 image = io.imread("myimage.jpg")
3 def upsidedown_fixme(image):
4 n_row, n_col = image.shape
5 for i in range(0,n_row/2): # correction
6 for j in range(0,n_col):
7 t = image[i,j].copy # correction
8 image[i,j] = image[n_row-i-1,j]
9 image[n_row-i-1, j] = t
10 return image
13. (4 points) What will be printed when this program is executed? If executing the program causes an error/ex-
ception, show everything that gets printed up to that point, and the error/exception message that would be
generated.
1 class Animal:
2
3 def __init__(self, animalID):
4 self.id = animalID
5 self.age = 0
6 self.age_max = 2
7
8 def grow(self):
9 if self.age < self.age_max:
10 self.age += 1
11
12 def __str__(self):
13 return "animal" + str(self.id) + ": age: " + str(self.age)
14
15 class Terrain:
16
17 def __init__(self, nb_animals=0):
18 self.animals = {}
19 for i in range(nb_animals):
20 a = Animal(i)
21 self.add_animal(a)
22
23 def add_animal(self, animal):
24 self.animals[animal.id] = animal
25
26 def __str__(self):
27 s = ""
28 for animalID,animalObj in self.animals.items():
29 s += animalObj.__str__() + "\n"
30 return s
31
32 terrain = Terrain(2)
33
34 for animalID in terrain.animals.keys():
35 if animalID > 0:
36 terrain.animals[animalID].grow()
37 else:
38 for i in range(10):
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 6 Page number: 6 / 28
39 terrain.animals[animalID].grow()
40
41 print(terrain)
Solution:
animal0: age: 2
animal1: age: 1
14. (4 points) What will be printed when the following program is executed? Recall that pop() returns the
last element in the list and removes it from the list, the operator // is integer division (e.g., 1//2 is equal
to 0, 3//2 is equal to 1), and the operator % returns the remainder of the integer division (e.g., 1 % 2 is
equal to 1, 3 % 2 is equal to 1, 4 % 2 is equal to 0).
1 def binaryConvertor(decNumber):
2 remstack = list()
3 while decNumber > 0:
4 rem = decNumber % 2
5 remstack.append(rem)
6 decNumber = decNumber // 2
7 binString = ""
8 while len(remstack) > 0:
9 binString = binString + str(remstack.pop())
10 return binString
11
12 print(binaryConvertor(4))
Solution: 100
15. (4 points) Write a list comprehension that does exactly the same as the following nested for loops but in
only one line of code:
1 mylist = []
2 for i in range(3):
3 for j in range(5):
4 mylist.append(j + 5*i)
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 7 Page number: 7 / 28
16. (4 points) The function factorial(n) shown below is supposed to return the factorial of n. In mathe-
matics, the factorial of a positive integer n, denoted by n!, is the product of all positive integers less than
or equal to n. For example,
5! = 5 × 4 × 3 × 2 × 1 = 120
However, there are bugs in this function.
1 def factorial(n):
2 y = 1
3 for i in range(0,n):
4 y *= i
5 return y
Solution: 0
Solution:
1 def factorial(n):
2 y = 1
3 for i in range(1,n+1):
4 y *= i
5 return y
6
7 print(factorial(5))
17. (5 points) Write a function called is_protein_altering_mutation that determines whether a mu-
tation in a given DNA sequence is a protein-altering mutation. A mutation substitutes one letter for another
letter in a protein-coding DNA sequence. A protein-altering mutation affects the translation of the DNA
sequence and leads to at least one difference in the protein sequence. To determine whether a mutation is a
protein-altering mutation, you will need to use the Seq class and its method translate from BioPython
and nothing else from BioPython. Recall the method translate will translate the DNA sequence into the
amino acid sequence. e.g.,
>>> print(Seq("ATGGCCTCA").translate())
"MAS"
To simplify the task, you may consider only the forward translation based on the current DNA string. No
need to consider the translation for the reverse complement of the sequence.
Complete the code in the following page
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 8 Page number: 8 / 28
1 from Bio.Seq import Seq
2
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33 # examples
34 dna = "ATGGCCTCAATTGTAATGGGCCGCTGAAAGGGTGCCCGATAGCAT"
35 print(is_protein_altering_mutation(dna, 0, 'C')) # True
36 print(is_protein_altering_mutation(dna, 5, 'T')) # False
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 9 Page number: 9 / 28
Solution:
17 dna = "ATGGCCTCAATTGTAATGGGCCGCTGAAAGGGTGCCCGATAGCAT"
18 print(is_protein_altering_mutation(dna, 0, 'C')) # True
19 print(is_protein_altering_mutation(dna, 5, 'T')) # False
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 10 Page number: 10 / 28
18. (5 points (bonus)) The following decision tree is used to classify a test subject into prostate cancer patient
or healthy control using the subject’s age, prostate specific antigen (PSA) level, and sex.
Based on the true labels of the test data (prostate cancer patient=1, health control=0) and probability
predictions from the decision tree by running each test data point through the tree, construct the Receiver
Operating Characteristic (ROC) curve.
For the ease of manual calculation, use only the thresholds 1.0, 0.5, and 0 to compute the TPRs and FPRs.
First complete the table and then draw the ROC curve directly on the plot provided to you below.
Yes No
True positive rate (TPR)
0.75
Age <= 60 Normal: 1.0
Cancer: 0.0
Yes No
0.50
Normal: 0.9 PSA <= 1.8
Cancer: 0.1
Yes No 0.25
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 11 Page number: 11 / 28
Solution:
ROC Curve
(1, 1)
1.00
True positive rate (TPR)
0.50
0.25
(0, 0)
0.00
0.25 0.50 0.75 1.00
False positive rate (FPR)
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 12 Page number: 12 / 28
Long answer questions (40 points)
19. (10 points) Write a function called average_fruits_per_basket based on the docstring below. Also
see the test examples below. To simplify the task, you do not need to sort the fruits by their decreasing
average number per basket. Note that this question is similar to question 3 in Assignment 3, where you were
asked to calculate the average ICD-9 code (i.e., ‘fruit’) per patient (i.e., ‘basket’).
1 def average_fruits_per_basket(fruit_baskets):
2 """
3 Inputs:
4 a dictionary with key as basket ID and values as a list of fruit names
5 Returns:
6 a dictionary with key as fruit name and value as
7 the average number of the fruit per basket
8 """
9 fruit_average={}
10 # YOUR CODE HERE
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34 return fruit_average
35 # test
36 x = average_fruits_per_basket({'basket1':['apple','apple','orange','banana'],
37 'basket2':['pineapple','apple'],
38 'basket3':['strawberry','pear','orange'],
39 'basket4':['apple','apple','strawberry']})
40 print(x)
41 # {'apple': 1.25, 'orange': 0.5, 'banana': 0.25, 'pineapple': 0.25, 'strawberry':
,→ 0.5, 'pear': 0.25}
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 13 Page number: 13 / 28
Solution:
1 def average_fruits_per_basket(fruit_baskets):
2
3 fruit_average={}
4
10 fruit_average[f] += 1
11
12 for f in fruit_average:
13 fruit_average[f] = fruit_average[f]/len(fruit_baskets)
14
15 return fruit_average
16
17 x = average_fruits_per_basket({'basket1':['apple','apple','orange','banana'],
18 'basket2':['pineapple','apple'],
19 'basket3':['strawberry','pear','orange'],
20 'basket4':['apple','apple','strawberry']})
21
22 print(x)
23 # {'apple': 1.25, 'orange': 0.5, 'banana': 0.25, 'pineapple': 0.25,
,→ 'strawberry': 0.5, 'pear': 0.25}
20. (10 points) Recall in Assignment 4, when a predator inspects the terrain within its visual range, it gets back
a list of preys. The predator will pick the target prey that is the closest to its current position. Write a
method under the Predator class called get_target_prey(self, preys).
The method takes a list of Prey objects and calculates for each prey its distance from the predator. You
may assume that the distance function is provided to you as distance(animal1, animal2), which
returns a numerical value indicating the distance between animal1 and animal2.
1 class Predator(Animal):
2 def get_target_prey(self, preys):
3 """
4 Inputs: a list containing one or more Prey objects
5 Returns: the closest Prey object
6 """
7 # YOUR CODE HERE
8
9
10
11
12
13
14
15
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 14 Page number: 14 / 28
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
Solution:
1 class Predator(Animal):
2 def get_target_prey(self, preys):
3 """
4 Inputs: list containing one or more Prey objects
5 Returns: the closest Prey object
6 """
7 min_dist = 1000
8 for prey in preys:
9 dist = distance(self, prey)
10 if dist < min_dist:
11 min_dist = dist
12 closest_prey = prey
13 return closest_prey
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 15 Page number: 15 / 28
21. (10 points) An RNA molecule in a cell tends to form a stem-loop structure. As shown in the picture below,
the stem is made of two reverse-complementary sequences, separated by a loop of unpaired nucleotides.
Write a function to find the longest stem for a given RNA sequence by following the guide below.
a) [3 points] First, write a function called is_reverse_complement(x, y) that takes two valid RNA
sequences x and y as two input strings and returns True if the two sequences are reverse complementary to
each other, otherwise False. Recall that in RNA (e.g., Q5 microRNA in Assignment 1), the ribonucleotides
A and U are complementary, and C and G are complementary. Then RNA sequences x and y are reverse-
complementary if the complement of the reverse of y is equal to x. See the test examples below.
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30 # test is_reverse_complement
31 print(is_reverse_complement("GUGCCACG", "CGUGGCAC")) # True
32 print(is_reverse_complement("GUGCCACC", "CGUGGCAC")) # False
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 16 Page number: 16 / 28
Solution:
b) [7 points] Now, write a function called find_longest_stem(rna, loopsize) that takes an RNA
sequence as the first argument and loopsize as the second argument. The function returns a tuple with
the first element as the sequence of the left arm of the stem and the second element as the sequence for
the right arm of the stem. See the illustration below. See examples in the test case below.
To simplify the task, assume we know the loop size (which is the second function argument). Here loopsize
indicates the number of letter between the left arm of the stem and right arm of the stem. Presumably, you
will need to use the function is_reverse_complement(x,y) to complete this function.
stem left-arm loop stem right-arm
ACGUGCCACGAUUCAACGUGGCACAG
Idea: You can think about sliding the loop window with the fixed window size (i.e., 6 in the above example)
along the RNA sequence.
At each window position,
• Try different stem start positions from the first letter up to the loop start position in the RNA string;
• At the fixed stem start position, extract the stem left-arm sequence and the stem right-arm sequence
to see whether they are reverse complementary to each other;
• If they are, check whether the stem is the longest stem found so far. We can consider the length of
the stem as the length of either the left or right arm of the stem;
• record the longest stem sequences accordingly.
See the expected output from a test example in the following page.
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 17 Page number: 17 / 28
34 def find_longest_stem(rna, loopsize):
35 # YOUR CODE HERE
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76 # test find_longest_stem
77 find_longest_stem("ACGUGCCACGAUUCAACGUGGCACAG", loopsize=6)
78 # ('GUGCCACG', 'CGUGGCAC')
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 18 Page number: 18 / 28
Solution:
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 19 Page number: 19 / 28
22. (10 points) We obtained a greyscale 10 × 10 pixel image showing the cell colonies in a square petri dish as
illustrated below. Recall that in greyscale, pixel value 0 is a perfect black, pixel value 1 is a perfect white,
and any value between 0 and 1 are different tones of grey. In our image, the background color is equal to 1
(i.e., white), and the pixel value in each cell colony is between 0 and 1 (but not equal to 1; i.e., grey).
Each cell colony forms a perfect rectangle. Two adjacent cell colonies are well separated by the white
background pixels (value = 1.0).
The goal of this question is to write a function that counts the total number of cell colonies in an input
image. For instance, there are 12 colonies in the image shown below.
Running the completed function on this image will give the following outputs:
Suggestion (not required to strictly follow): Although we learned edge detection and seed filling algorithms
in class, there is actually a much simpler way to answer this question. The key is to take advantage of the
facts that (1) all of the cell colonies are perfect rectangular and (2) each cell colony is well separated from
other cell colonies by the white background pixels.
Therefore, one way to solve this problem is the following.
1. Go from left to the right and top to bottom pixel by pixel in the image in searching for a colony;
2. If the pixel is a non-background pixel (i.e., a new colony is found):
• increase the colony number
• for this colony,
– determine its height;
– determine its width;
• set the pixels values within the colony to background value (i.e., 1) to avoid counting the rectangle
again.
3. continue the next search until we reach to the bottom right corner of the image
Once again, this is only a guide to save you time to think algorithmically – you do not need to strictly follow
it. There are several other ways to solve this problem for full points.
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 20 Page number: 20 / 28
a) [(4 points)] If you do follow the above suggestion, you can first implement the determine_colony_height
function to get 4 points; otherwise, skip this sub-question and proceed to the following page.
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42 return colony_height
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 21 Page number: 21 / 28
Solution:
3 background=1
4 n_rows = img.shape[0]
5 colony_height = 1
6
7 while True:
8 colony_height_new = colony_height
9 colony_height_new += 1
10 if colony_height_new < n_rows:
11 if (img[i:i+colony_height_new,j] != background).sum() ==
,→ colony_height_new:
12 colony_height = colony_height_new
13 else:
14 break
15 else:
16 break
17 return colony_height
18
21 background=1
22 n_cols = img.shape[1]
23 colony_width = 1
24
25 while True:
26 colony_width_new = colony_width
27 colony_width_new += 1
28 if colony_width_new < n_cols:
29 if (img[i, j:j+colony_width_new] != background).sum() ==
,→ colony_width_new:
30 colony_width = colony_width_new
31 else:
32 break
33 else:
34 break
35 return colony_width
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 22 Page number: 22 / 28
b) [6 points or 10 points] Complete the function count_cell_colonies. You may complete the function
without following the suggestion to get the full 10 points.
If you have been following the suggestion and completed the above determine_colony_height func-
tion, you may assume that the function determine_colony_width is completed for you (because it is
very similar to determine_colony_height). Just as a reference, the function header is shown below.
Therefore, even if you don’t know how to complete determine_colony_height, you may still get 6
points if you implement count_cell_colonies that correctly uses determine_colony_height
and determine_colony_width to count the number of cell colonies.
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 23 Page number: 23 / 28
44 def count_cell_colonies(img):
45 """
46 Inputs:
47 img: image of the cell colonies
48 Returns:
49 an integer number of the distinct cell colonies
50 """
51 nb_colonies = 0
52 background=1
53 n_rows,n_cols = img.shape
54 # YOUR CODE HERE
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 24 Page number: 24 / 28
Solution:
39 def count_cell_colonies(img):
40
41 background=1
42 nb_colonies = 0
43 n_rows,n_cols = img.shape
44
45 for i in range(n_rows):
46 for j in range(n_cols):
47
48 if img[i,j] != background:
49
50 nb_colonies += 1
51
52 colony_height = determine_colony_height(img, i, j)
53 colony_width = determine_colony_width(img, i, j)
54
55 img[i:i+colony_height,j:j+colony_width] = background
56
57 # plt.imshow(img1, cmap="gray")
58 # plt.show()
59 return nb_colonies
60
61 import skimage.io as io
62 #import matplotlib.pyplot as plt
63 from skimage.color import rgb2gray
64
65 #plt.imshow(rgb2gray(io.imread("cells.eps")), cmap="gray")
66 #plt.show()
67
68 print(count_cell_colonies(rgb2gray(io.imread("cells.eps"))))
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 25 Page number: 25 / 28
This page is left empty intentionally. Use it as you need.
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 26 Page number: 26 / 28
This page is left empty intentionally. Use it as you need.
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 27 Page number: 27 / 28
This page is left empty intentionally. Use it as you need.
Course: COMP-204 Intro. to Comp. Program. for Life Sciences Page 28 Page number: 28 / 28