Sie sind auf Seite 1von 32

Week 1

Chapter 1
 Variables, values, input()
Chapter 2
 Data types, type(), casting types, mathematical operators
Chapter 3
 Quotes, \n, repr(), \t, slicing, index numbers, len, immutability, comparison
operators, concatenation, f-strings, lower(), upper(), dir(), help()
Chapter 4
 Boolean expressions, and/or/not, all and any, if/elif/else statements, identation,
nesting

Week 2
Chapter 5
 Containers, args and kwargs, mutability
Chapter 6
 Lists, .append() basic list operations, list methods
Chapter 7
 Sets, .add(), union, intersection, set operations,
Chapter 8
 Comparison of lists and sets
Chapter 9
 For loops, .reverse(), tuples, unpacking variables, continue, break, .split()
Chapter 10
 Dictionaries, .keys(), .values(), collections, nesting dictionaries,

1
Week 1
Chapter 1 Getting started with variables and values

Print()
 Outputs its argument to the screen
 Example 1: print (“Hello, world”) shows Hello. World on the screen.

Comment #
 Comments are not executed in Python. They are used to make the code easy to understand.

Variables
 A stored value which is named using the = symbol.
 Variable names can be re-used, but they will always contain the value that you last assigned
to it.
 Example 2: name = “Patrick Bateman”, x = 22

Variable names
 Variable names are only valid if they:
o Start with a letter or underscore (_)
o Only contain letters, numbers and underscores
 Use clear, meaningful, descriptive variable names so that your code will remain
understandable
 Use underscores and lowercase characters for separating words
 Do not use built-in names, such as print or sum

Input()
 This takes user input and returns it as a string
 Example 3: text = input(“What is your name?”)
print(text)
Result: Color

Chapter 2 Basic Data types

Basic data types


 String: for representing text
 Integer: for representing whole numbers
 Float: for representing numbers with decimals
 Tuple: for representing immutable combinations of values
 List: for representing ordered sequences of objects
 Set: for representing unordered sets of objects
 Dictionary: to represent mappings between objects
 Booleans: to represent the truth values True or False
 Functions: to manipulate objects, or to produce new objects given some input

Type()
 We can use the type() function to check object types.
 Example 4: type(“hoi”)
Result: str

2
+ sign
 The + sign (and other symbols) can act like different operators depending on the type which
they are used with.
 Example 5: var1 = 1 + 5
print(var1)
Result: 6
 Example 6: var2 = “hello “ + “there”
print(var2)
Result: hello there

Casting types
 Converting types into other types
 Example 7: x = “5”
y=2
print(int(x) + y)
result: 7

Mathematical operators
x+y sum of x and y
x–y difference of x and y
x*y product of x and y
x/y quotient of x and y
x // y floored quotient of x and y
x%y remainder of x / y
-x x negated
+x x unchanged
x ** y x to the power y

Updates of variables (+= -= *= /= operators)


 Example 8: number_of_books = 100
Number_of_books += 5
Print(number_of_books)
Result: 105

Chapter 3 Strings

Defining and representing strings


 A string is a type of variable for which the value is enclosed by single or double quotes.
 If there is a problem with quotes, like with ‘Wendy’s’, you can use the escape character \ in
front of the quote, which will tell Python not to treat this specific quote as the end of the
string (or use double quotes).
 Example 9: restaurant = ‘Wendy\’s’
 Example 10: restaurant = “Wendy’s”

3
Multi-line strings (hidden character \n)
 A string that spans across multiple lines can be achieved in two ways:
1. With single or double quotes, where we use \n to indicate that the rest of the string
continues on the next line.
o Example 11: long_string1 = "A very long string\n\
can be split into multiple\n\
sentences by appending a newline symbol\n\
to the end of the line."
o Example 12: multiline_text_2 = "This is a multiline text, so it is enclosed by triple
quotes.\nPretty cool stuff!\nI always wanted to type more than one line, so today is
my lucky day!"
o ! zie dat \n\ wordt gebruikt als de string al op meerdere regels getypt is en \n als de
string als één geheel getypt is.
2. With three single or double quotes
o Example 13: long_string2 = """A very long string
can also be split into multiple
sentences by enclosing the string
with three double or single quotes."""

Internal representation using repr()


 Internally example 11 and 13 are equally represented. We can check that with the double
equals sign ==. Outcome = true
 By using the repr() function you can show that example 11 and 13 have the same hidden
characters, namely \n.
 Example 14: print(repr(long_string2)
Result: "A very long string\ncan be split into multiple\nsentences by appending a
newline symbol\nto the end of the line."

Hidden character \t
 \t represents tabs
 Example 15: colors = “yellow\tgreen\tblue\tred”
Print(colors)
Result: yellow green blue red

String indices
 Each character in a string has a position, which can be referred to by the index number of
the position.

 You can access the characters of a string using […] with the index number in between the
brackets.
 Example 16: my_string = “Sandwiches are yummy”
Print(my_string[1])
Result: a

4
Len()
 Len() lets you compute lengths of a sequence
 Spaces count as characters too

String slices
 We can also extract a range from a string
 It works as follows:

 You can also have negative step size. my_string[::-1] is the idiomatic way to reverse a string.

Immutability
 Strings are immutable (cannot change).
 It is possible to create a new string based on the old one.
 Example 17: fruit = ‘guanabana’
Island = fruit[:5]
Print(island, ‘Island’)
Result: guana island

Comparing strings
 Comparison operators are: ==, !=, <, <=, >, and >=
 Comparing strings can be useful for putting words in lexicographical order (alphabet)
(uppercase letters come before all lowercase letters).

In operator (to compare strings)


 To check whether a string is part of another string.
 It returns True if the string contains the substring, and False if it doesn’t.
 Print (5 in 10) will not work, because integers are not iterable.
 In and not in can also be used to check whether an object is a member of a list.

Difference between 1 (“Hello”, “World”) and 2 (“Hello” + “World)


 1. Strings using the comma as separator will be printed leaving the result with single blanks.
You can mix types when using the comma.
o = tuple
 2. String concatenation using “+” can only be used with one type. You cannot mix types.
Also, no single blanks are inserted.
o = string
 The * sign can also be used to repeat strings.

5
f-strings (formatted strings)
 Used instead of the + sign when you want to concatenate strings.
 Example 18: introduction = f”Hello. My name is {name}. I’m {age} years old.”
 Example 19: text = f”Next year, I’m turning {age+1} years old.”

String methods – lower(), upper()


 Lower() turns a string into all lowercase characters
 Upper() makes strings uppercase
 Example 20: string_1 = “Hello, world!”
Print(string_1.lower())
Result: hello, world!
 String methods only return the result. They do not change the string itself.

Learning about methods – dir(), help()


 The dir() function returns a list of method names.
 The help() function shows what a specific method does.
 Example 21: help(str.strip)

List of string methods


 .strip()
o Returns a copy of the string with leading and trailing whitespace removed.
 .upper()
 .lower()
 .count(“ “)
 .find(“ “) or .find(“ “, index number: index number)
o Returns the lowest index in string where substring is found, such that the substring
is contained within [start:end]. Optional arguments start and end are interpreted as
in slice notation.
o Return -1 on failure.
o Example 22: s = “ Humpty Dumpty sat on the wall “
print(s.find(“sat”))
Result: 14
o Example 23: print(s.find(“t”, 12))
Result: 16
o Example 24: print(s.find(“1”, 12))
Result: -1
 .replace(“ “, “ “)
 .split()
o Returns a list of the words in the string, using sep as the delimiter string
 .capitalize()
o Returns a capitalized version of the string (makes the first character have upper case
and the rest lower case.
 “ “.join(“ “)
o Concatenate any number of strings. The string whose method is called is inserted
between each given string.
o Example 25: words = Humpty

6
Dumpty
Print(“-“.join(words))
Result: Humpty-Dumpty
 .lstrip() and .rstrip() remove whitespaces left and right
 .startswith() and .endswith() return true when the string starts or ends with the given “ “

Chapter 4 Boolean expressions and conditions

Boolean expression
 A Boolean expression results in True or False

And, or and not

 Example 26: letters = [“a”, “b”, “c”, “d]


number = [1, 2, 3, 4, 5]
Print (“f” in letters or 2 in numbers)
 Example 27: a_string = “hello”
Print (not a_string.startswith(“o”)
 Example 28: print (not (4 == 4 and “4” == 4))
 Alternative ways of writing
o X not in y, not x in y, not x == y, and x !=y are identical

All() and any()

If statements
 Example 29: number = 5
If number == 5:
print (“number equals 5”)
If number > 4:
print (“number is greater than 4”)
Results: number equals 5 / number is greater than 4

Two-way decisions – if, else


 Example 30: number = 10
if number <= 5:
print (number)
else:
print (“number is higher than 5”)
Result: number is higher than 5

7
Multi-way decisions – if, elif, else
 For every if block, you can have one if statement, multiple elif statements and one else
statement.
 First the if statement will be evaluated. Only if that statement turns out to be False the
computer will proceed to evaluate the elif statements.
 Example 31: age = 21
if age < 12:
print (“You’re still a child!”)
elif age <18:
print (“You are a teenager!”)
elif age <30:
print (“You’re pretty young!”)
else:
print (“Wow, you’re old!”)
Result: You’re pretty young!
 NOTE! You cannot start with age <30 because then the machine won’t proceed to the next
option (the elif)

Remember if-if, if-elif


 If-if: your code will check all the if statements
 If-elif: if one condition results to True, it will not check the other conditions

Indentation
 Indentation (4 spaces) lets Python know when it needs to execute the piece of code
 When a Boolean expression for example is True, python executes the code from the next
line that starts with four spaces or one tab to the right.
 Indentation must be consistently throughout your code.
 All statements with the same distance to the right belong to the same block of code

Nesting
 Blocks can contain blocks as well
 There may be a situation when you want to check for another condition after a condition
resolves to True. In such a situation, you can use the nested if construct.

8
Week 2
Chapter 5 – Core concepts of containers

Containers
 Lists
 Tuples
 Set
 Dictionaries

Positional arguments (args) and keyword arguments (kwargs)


 Positional arguments (args) are compulsory in order to call a method
 Keyword arguments (kwargs) are optional. They can be optinal since they usually have a
default value. By using the keyword argument, you simply change the default value to
another value.
 Examples: .
o upper() = no args no kwargs
o .count() = one arg no kwargs
o .replace() = two args no kwargs
o .split() = no args no kwargs
o .split(sep=”0”) = no args one kwarg

Mutability

Chapter 6 – Lists

Lists
 Lists are surrounded by square brackets and the elements in the list are separated by
commas.
 A list element can be any Python object – even another list.
 A list can be empty.
 Lists are mutable.
 Two ways of creating an empty list:
o One_way = [ ]
o Another_way = list()

Adding items to a list


 Use .append()
 Example 32: a_list = [1, 3, 4]
a_list.append(5)
print(a_list)
Result: [1, 2, 3, 4, 5]

9
Extracting/inspecting items in a list
 Indexing and slicing in lists works the same way as with string. Every item in the list has
hence its own index number.
 Example 33: friend_list = [“John”, “Bob”, “Marry”]
print(friend_list[0])
Result: John

Count and index in lists


 Count returns how often the value occurs in the list.
 Index is similar to the count method, but it returns the first index of the value instead of the
count.
 Example 34: friend_list = [“John”, “Bob”, “John”, “Marry”, “Bob”]
First_index_with_bob = friend_list.index(“Bob”)
print (first_index_with_bob)
Result: 1

Basic list operation


 Concatenation
o The + sign concatenates two lists
o Example 35: one_list = [“Where”, “is”]
Another_list = [“the”, “rest”, “?”]
print (one_list + another_list)
result: [“Where”, “is”, “the”, “rest”, “?”]
 Repetition
o The * sign makes it possible to repeat a list
 Membership
o You can use lists in membership boolean expressions
o For example “in”
 Comparison
o You can use lists in comparison boolean expression
o For example “==”

Built-in functions in lists


 Len(list)
o Number of items in a list
 Max(list)
o Highest value in a list
 Min(list)
o Lowest value in a list
 Sum(list)
o Sum of all values in a list

10
List methods
 a.append(b)
o Add item b to the end of a
 a.extend(c)
o Add the elements of list c at the end of a
 a.insert(i, b)
o Insert item b at position i
 a.pop(i)
o Remove from a the i’th element and return it. If i is not specified, remove the last
element
 a.index(x)
o Return the index of the first element of a with value x. error if it does not exist
 a.count(x)
o Return how often value x is found in a
 a.remove(x)
o Remove from a the first element with value x. Error if it does not exist
 a.sort()
o Sort the elements of list a
 a.reverse()
o Reverses list a (no return value)

Chapter 7 – Sets

Sets
 Sets are surrounded by curly brackets and the elements in the set are separated by commas.
 A set can be empty.
 Sets do not allow duplicates.
 Sets are unordered:
o You can check if two sets are the same even if you don’t know the order in which
items were put in.
o You cannot use an index to extract an element from a set.
 A set can only contain immutable objects (for now: strings and integers).
 A set cannot contain mutable objects, hence no lists or sets.
 Creating a set:
o A_set = set()

Adding items to a set


 Use .add()
 Example 36: a_set = set()
a_set.add(1)
print (a_sett)
Result: {1}

11
Extracting/inspecting items in a set
 When you use sets, you usually want to compare the elements of different sets.
 You can use union, intersection, difference and symmetric difference.
 Union:
o Return the union of sets as a new sets (i.e. all elements that are in either set)
o Example 37: set1.union(set2)
Result: {1, 2, 3, 4, 5, 6, 7, 8}
 Intersection:
o Return the intersection of two sets as a new set (i.e. all elements that are in both
sets)

Built-in functions in sets


 The same range of functions that operate on lists work with sets.

Set operations
 Set_a._add(an_element)
o Add an_element to set_a
 Set_a.update(set_b)
o Add the elements of set_b_ to set_a
 Set_a.pop()
o Remove and return an arbitrary set element.
 Set_a.remove(an_element)
o Remove an_element from set_a

Chapter 8 – Comparison of lists and sets

 You can create a set from a list. Attention: duplicate will be removed

Finding elements
 It’s quicker to check if an element is in a set than to check if it is in a list.

When to choose what?


 Lists if you need:
o Duplicates
o The order in which items are added
o Mutable objects

12
 All other scenarios  sets

Chapter 9 – Looping over containers

Looping over a list


 Example 38: for number in [1, 2, 3]
print (number)
Result: 1
2
3
 The Python interpreter starts by checking whether there’s anything to iterate over. If the list
is empty, it just passes over the for-loop and does nothing.
 Then, the first value in the iterable (in this case a list) gets assigned to the variable number.
 Following this, we enter a for-loop context, indicated by the indentation.
 Then, Python carries out all the operations in the for-loop context. In this case, this is just
print(number). Because number refers to the first element of the list, it prints 1.
 Once all operation in the for-loop context have been carried out, the interpreter checks if
there are any more elements in the list. If so, the next value (in this case 2) gets assigned to
the variable number and all operations are carried out.

Reversing order of a list


 You can reserve the order of a list by using list.reverse() or list [::-1]

Looping over a set


 Looping over a set is the same as looping over a list.
 Remember: it is unordered

Looping over a string


 When you loop over a string, all letters will be printed on a separate line.
 Example 39: word = “hippo”
for letter in word:
print(letter
Result: h
i
p
p
o

Tuples
 A tuple is defined in the same way as a list, except that the whole set of elements in
enclosed in parentheses.
 The elements of a tuple have a defined order.
 Tuples can contain immutable and mutable objects.
 Items cannot be added or removed.
 A tuple can be empty.

13
 Tuples have two methods: index and count
 You can unpack tuples.
 Two ways of creating a tuple:
o An_empty_tuple = ()
o Another_empty_tuple = tuple()
Unpacking variables (tuples)
 Example 40: a_tuple = (1, “a”, 2)
first_el, second_el, third_el = a_tuple
print (first_el)
Result: 1
 Unpacking can also be used with other containers

Using tuples in a for loop


 Example 41: language_data = [(“the”, “determiner”),
(“house”, “noun”)]
for item in language data:
Print(item[0]. Item[1])
Result: the determiner
house noun
 You can unpack the tuple within the for-loop to make it more readable:
 Example 42: for word, part_of_speech in language_data:
Print(word, part_of_speech)
Result: the determiner
house noun

Continue and break


 The break statement lets us escape a loop
 Example 43: word = “hippopotamus”
for letter in word:
print(letter)
if letter == “0”;
break
Result: h
i
p
p
o
 The continue statement ends the current iteration and jumps to the top of the loop and
starts the next iteration
 Example 44: word = “hippop”
for letter in word:
if letter == “o”:
continue
print(letter)
Result: h
i
p
p

14
p

Split to print each word in a string in a new line


 By using .split() you create a list of the string
 After creating a list, you can use a for-loop to print each word in a new line
Chapter 10 – Dictionaries

Dictionaries
 A dictionary is surrounded by curly brackets and the key/value pairs are separated by
commas.
 A dictionary consists of one or more key:value pairs, the key is the identifier or name that is
used to describe the value.
 The keys in a dictionary are unique.
o Key:value pairs get overwritten if you assign a different value to an existing key.
 The syntax for a key/value pair is: KEY : VALUE.
 The keys in a dictionary have to be immutable.
 The values in a dictionary can be any python object.
 A dictionary can be empty.
 Two ways of creating an empty dictionary:
o Use dict() or {}

Adding items to a dictionary


 Example 45: a_dict = dict()
a_dict[“Frank”] = 8
print (a_dict)
Result: {“Frank”: 8}

Accessing data in a dictionary


 Example 46: student_grades = {“Frank” : 8, “Susan” : 7, “Guido” : 10}
print(student_grade[“Frank”]
Result: 8
 If the key is not in the dictionary, it will return a keyerror.
 In order to avoid getting an error, you can use an if-statement.
 Example 47: key = “Piet”
If key in student_grades:
print(student_grades[key])
else:
print(key, “not in dictionary”)
Result: Piet not in dictionary
 The keys method returns the keys in a dictionary.
 Example 48: the_keys = student_grades.keys()
Print(the_keys)
Result: dict_keys([“Frank”, “Susan”, “Guido”])
 The values method returns the values in a dictionary
 Example 49: the_values = student_grades.values()
Print(the_values)
Result: dict_values([8, 7, 10])

15
Using built-in function to inspect the keys and values
 Len, max, min and sum can be used to inspect the keys and values.

The items method


 The items method returns a list op tuples. It is for example useful is you want to know which
student got a 8 or higher.
 Example 50: print(student_grades.items()
Result: dict_items([(“Frank”, 8), (“Susan”, 7), (“Guido”, 10)])
 Example 51: for key, value in student_grades.items():
print(key, value)
Result: Frank 8
Susan 7
Guido 10
 Example 52: for student, grade in student_grades.items():
If grade > 7:
Print(student, grade)
Result: Frank 8
Guido 10

Counting with a dictionary


 Dictionaries are very useful to derive statistics. For example, we can easily determine the
frequency of each letter in a word.
 Example 53: letter2freq = dict()
word = “hippo”
for letter in word:
if letter in letter2freq:
letter2freq[letter] +=1
else:
letter2freq[letter] =1
print(letter, letter2freq)
print()
print(letter2freq)
Result: h {'h': 1}
i {'h': 1, 'i': 1}
p {'h': 1, 'i': 1, 'p': 1}
p {'h': 1, 'i': 1, 'p': 2}
o {'h': 1, 'i': 1, 'p': 2, 'o': 1}
{'h': 1, 'i': 1, 'p': 2, 'o': 1}
 You can do this as well with lists
 Python has a module, collections, which is very useful for counting.

Nesting dictionaries
 Since dictionaries consists of key:value pairs, we can actually make another dictionary the
value of a dictionary.
 Example 54: a_nested_dictionary = {“a_key”:
{“nested_key1” : 1,

16
“nested_key2 : 2,
“nested_key3 : 3}
}
Print(a_nested_dictionary)
Result: {'a_key': {'nested_key1': 1, 'nested_key2': 2, 'nested_key3': 3}}
 In order to access the nested value, we must do a look up for each key on each nested level
 Example 55: the_nested_value = a_nested_dictionary[“a_key”][“nested_key1”]
Print(the_nested_value)
Result: 1

Week 4
Chapter 11: Functions and scope

Writing a function
 A function is an isolated chunk of code, that has a name, gets zero or more parameters, and
returns a value. In general, a function will do something for you, based on a number of input
parameters you pass it, and it will typically return a result.

Defining a function
 Write def
 The name you would like to call your function
 A set of parentheses containing the argument(s) of your function
 A colon
 A docstring describing what your function does
 The function definition
 Ending with a return statement
 Example 56: def happy_birthday_to_Emily():
“”” Print a birthday song to Emily.”””
print(“Happy birthday to you!”)

Calling a function
 When calling a function, you should always use parenthesis
 Example 57: happy_birthday_to_Emily()

Calling a function from within another function


 We can also define functions that call other functions, which is very helpful if we want to
split our task in smaller, more manageable subtasks.
 Example 58: def new_line():
“””Print a new line.”””

17
print()
def two_new_lines():
“””Print two new line.”””
new_line()
new_line()
Parameters and arguments
 Parameter: variables used in function definitions are called parameters.
 Argument: variables used in function calls are called arguments.
 Functions can have multiple parameters.
 Example 59: def multiply(x, y):
“””Multiply who numeric values.”””
result = x * y
print(result)

positional vs keyword parameters and arguments

 Example 60: def multiply(x, y, third_number=1):


“””Multiply two or three numbers and print the result.”””
result = x*y*third_number
print(result)

The return statement


 The return statement returns a value back to the caller and always ends the execution of the
function
 Example 61: def multiply(x, y):
“””Multiply two numbers and return the result.”””
result=x*y
return result
result = multiply(2, 5)
print(result)
result: 10
 If we assign the result to a variable, but do not use the return statement, the function
cannot return it. Instead, it returns none.

Returning multiple values


 A function can also return multiple values as an output. We call such a collection of values a
tuple.
 Example 62: def calculate(x, y):
“””Calculate product and sum of two numbers.”””
product = x * y
summed = x + y
return product, summed
var1, var2 = calculate(10,5)
print(var1, var2)
result: 50, 15

18
 Make sure you actually save your 2 values into 2 variables (so use var1 and var2)
 Saving the resulting values in different variables can be useful when you want to use them in
different places in your code.

Docstring
 Docstring is a string that occurs as the first statement in a function definition.
 Always use triple double quotes around docstrings.
 There is no blank line either before or after the docstring.
 The docstring is a phrase ending in a period. It prescribes the function or method’s effect as
a command (do this, return that), not as a description; e.g. don’t write: returns the
pathname.

Conditioning the function output


 It is possible to condition the return value of the function by using if-else statements.

Variable scope
 Any variables you declare in a function, as well as the parameters that are passed to a
function will only exist within the scope of that function, i.e. inside the function itself.
 Example 63: def setx():
“””Set the value of a variable to 1.”””
x=1
return x*
setx()
print(x)
Result: NameError *error is with and without return x
 Example 64: x = 0
def setx():
“””Set the value of a variable to 1.”””
x=1
set(x)
print(x)
Result: 0
 Example 65: x = 1
def getx():
“””Print the value of a variable x.”””
print(x)
getx()
Result: 1
 The function locals() returns a list of all local variables.
 The function globals() returns a lost of all global variables.
 It is best to check for membership with the in operator.
 Local context stays local to the function and is not shared even with other functions called
within a function.

Chapter 12: Importing external modules

Importing Python modules

19
 A python module is a python file which contains function definitions and statements.
 To use an external module in you code you need to explicitly import it.
 Example 66: import random
print(random.randint(0, 100)
Result: 48
 The dot indicates that the machine should look for the randint() method inside the random
module that is imported.
 You can import an entire module or import only a specific function in the module.
 Example 67: from random import randint
Print (randint(0, 100))
Result: 83
 You can also (temporarily) change the names of the functions you import
 Example 68: from random import randint as random_number
Print(randon_number(0, 100))

Useful modules
 Datetime
o Example 69: import datetime
Print(datetime.datetime.now())
Result: 2019-09-20 15:17:28.590338
 Requests
o Example 70: import requests
A=request.get(“http://piasommerauersite.wordpress.com/”)
print(a.content)

Chapter 13: Working with python files

Editors
 Option 1: Jupyter notebook  New  Text file
 Option 2: install editor

Starting the terminal


 To run a .py file we wrote in an editor, we need to start the terminal.
o Open Anaconda Prompt

Terminal
 The windows command line is not case sensitive.
 When working with a file or directory with a space, surround it in quotes.
 Type dir at the prompt to get a list of files in the current directory.
 Type dir /p to list the files one page at a time.
 Type dir a* if you want to list files that begin with the letter “A”.
 Type cd to change location in your terminal. The cd command can move into a directory.
 The cd command allows you to go back a directory by typing cd..
 The cd command allows you to go back to the root directory by typing cd\
 To make a directory (folder) in the current directory type mkdr file name
 To open a file type ‘start notepad’ file name + extension (.txt. docx).
 To make a new file type ‘start notepad’ file name that doesn’t exist + extension. After you
press enter a new file will be opened and the system will ask if you want to save it.

20
 The move command let’s you move a file into an alternate directory. First go to the directory
the file is in and type move file name + target directory.
 To rename a file use the rename command. Type rename old file name + new file name.
 To delete a file use the del command. Type del file name.
 To remove a directory use the rmdir command.
 To run a program (.exe) type the file name.exe and press enter.
 Use exit to close the command line window.

Running the program (hello_world.py) on Windows


 Use the terminal to navigate to the folder in which the notebook is placed by copying the
output of the following cell in your terminal
 import os
cwd = os.getcwd()
cwd_escaped_spaces = cwd.replace(' ', '^ ')
print('cd', cwd_escaped_spaces)
Output = cd C:\Users\Color\Documents\CIW^ jaar^ 2\Programmeren\Week-4\Week-
4\Notebooks
 Import sys
Print(sys.executable + “ hello_world.py”)
Output: C:\Users\Color\Anaconda3\python.exe hello_world.py

Chapter 14: Reading and writing text files

File paths
 To open a file, we need to associate the file on the disk with a variable in Python.
 First we tell Python where the file is stored on your disk (the file path).
 Python will start looking in the ‘working’ or current’ directory. If it’s in the working directory,
you only have to tell Python the name of the file.
 If it’s not in the working directory, you have to tell Python the exact path to your file.
 / means the root of the current drive
 ./ means the current directory
 ../ means the parent of the current directory

 If you want to go from your current working directory (cwd) to the one directly above
(dir3), your path is ../
 If you want to go to dir1, your path is ../../
 If you want to go to dir5, your path is ../dir5/
 If you want to go to dir2, your path is ../../dir2/
 * Windows uses backslashes instead of forward slashes.

Opening a file
 We can use the file path to tell Python which file to open by using the built-in function
open().
 The open() function does not return the actual text that is saved in the text file, but return a
file object from which we can read the content using the .read() function.
 We pass 3 arguments to the open() function:
o The path to the file that you wish to open

21
o The mode, a combination of characters explaining the purpose of the file opening
and type of content stored in the file
o Encoding (a keyword argument), specifies the encoding of the text file. This is
basically useful when reading non-English characters.

 Most important mode arguments the open() function can take:


o r = opens a file for reading only. The file pointer is placed at the beginning of the file.
o W = opens a file for writing only. Overwrites the file if the file exists. If the file does
not exist, creates a new file for writing.
o A = opens a file for appending. The file pointer is at the end of the file if the file
exists. If the file does not exist, it creates a new file for writing. Use it if you would
like to add something to the end of a file.
o Example 71: filepath = “../Data/Charlie/Charlie.txt”
Infile = open(filepath, “r”)
o Example 72: print(infile)
Infile.close()
Result: <_io.TextIOWrapper name='../Data/Charlie/charlie.txt' mode='r'
encoding='cp1252'>
o TextIOWrapper is Python’s way of saying it has opened a connection to the file
Charlie.txt. To actually see its content, we need to tell Python to read the file.

Reading a file
 read()
o The read() method is used to access the entire text in a file, which we can assign to a
variable.
o Example 73: infile = open(“../Data/Charlie/Charlie.txt”, “r”)
content = infile.read()
print(content)
Result: The entire text of Charlie.txt
o The variable content holds the entire content of the file Charlie.txt as a single string
and we can access and manipulate it just like any other string.
 Readlines()
o The readlines() function allows you to access the content of a file as a list of lines.
This means it splits the text in a file at the new lines characters (\n) for you.
o You can use a for-loop to print each line in the file.
o Example 74: lines = infile.readlines()
for line in lines:
Print(“Line:”, line)

22
 When we open a file, we can only use one of the read operations once. If we want to read it
again, we have to open a new file variable.
 Readline()
o The readline() returns the next line of the file, returning the text up to and including
the next newline character. If you call the operation again, it will return the next line
in the file.
 For small files that you want to lad entirely, you can use one of these three methods.
 For larger files and when we are only interested in a small portion of a file it is
recommended to use a for-loop.
 Example 75: infile = open(filename, “r”)
for line in infile:
print(line)
infile.close()

Closing the file


 After reading the contents of a file, the textwrapper no longer needs to be opened since we
have stored the content as a variable.

Close()
 Example 76: filepath = “../Data/Charlie/Charlie.txt”
infile = open(filepath, “r”)
content = infile.read()
infile.close()

Using a context manager


 Using the context manager is an easier way to make sure the file is closed as soon as you
don’t need it anymore.
 The main advantage of using the with-statement is that is automatically closes the file once
you leave the local context defined by the indentation level.
 Example 77: filepath = “../Data/Charlie/Charlie.txt”
with open(filepath, “r”) as infile:
content = infile.read()

Manipulating file content


 Once your file content is loaded in a Python variable, you can manipulate its content as you
can manipulate any other variable. You can edit, add, remove lines, count word occurrences,
etc.
 Example 78: filepath = “../Data/Charlie/Charlie.txt
With open(filepath, “r”) as infile:
Lines = infile.readlines()
We can preserve only the first 2 lines of the file in a new variable:
First_two_lines = lines[:2]
We can count the lines that are longer than 15 characters:
Counter = 0
For line in lines:
If len(line)>15:
Counter+=1

23
Writing files
 To write content to a file, we can open a new file and write the text to this file by using the
write() method. Again, we can do this by using the context manager. Remember that we
have to specify the mode using w.

 Example 79: filepath = “../Data/Charlie/Charlie.txt”


With open(filepath, “r”) as infile:
Content = infile.read()
your_name = “x y”
friends_name = “a b”
new_content = content.replace (“Charlie Bucket”, your_name)
new_new_content = new_content.replace (“Mr Wonka”, friends_name)

We can now save the manipulated content to a new file


filename = “../Data/Charlie/Charlie_new.txt”
with open(filename, “w”) as outfile:
outfile.write(new_new_content)
 The third mode of opening a file is ‘a’. if the file ‘charlie_new.txt’ does not exist, then
append and write act the same: they create this new file and fill it with content. The
difference between write and append occurs when this file would exist. In that case, the
write mode overwrites its content, while the append mode adds the new content at the end
of the existing one.

Reading and writing multiple files


 To process multiply files, we often want to iterate over a list of files. These files are often
stored in one or multiple directories on your computer.
 Instead of writing out every single file path, it is much more convenient to iterate over all the
fills in the directory. So we need to find a way to tell Python: I want to do sth with all these
files at this location.

The glob module


 The glob module is very useful to find all the pathnames matching a specific pattern
according to the rules used by the Unix shell. You can use wildcards: the asterisk * and the
question mark ?. An asterisk matches zero or more characters in a segment of a name, while
the question mark matches a single character in a segment of a name.
 Example 80: import glob
For filename in glob.glob(“../Data/Dreams/*”):
Print(filename)
 Example 81: for filename in glob.glob(“../Data/Dreams*.txt”)
 You can also find filenames recursively by using the pattern ** (the keyword argument
recursive should be set to true), which will match any files and zero or more directories and
subdirectories. The following code prints all files with the extension .txt in the directory
../Data/ and in all its subdirectories.
 Example 82: for filename in glob.glob(“../Data/**/*.txt”, recursive = True):
Print(filename)

The os module

24
 The os module has many features that can be very useful and which are not supported by
the glob module.
 List of some of the things you can do:
o Creating single or multiple directories: os.mkdir(), os.mkdirs()
o Removing single or multiple directories: os.rmdir(), os.rmdirs()
o Checking whether something is a file or a directory: os.path.isfile(), os.path.isdir()
o Split a path and return a tuple containing the directory and filename: os.path.split()
o Construct a pathname out of one or more partial pathnames: os.path.join()
o Split a filename and return a tuple containing the filename and the file extension:
os.path.splittext()
o Get only the basename or the directory path: os.path.basename(), os.path.dirname()

Chapter 15: Off to analyzing text

Intro to text processing


 There are many aspects of text we can (try to) analyze. Commonly used analyses conducted
in Natural Lanugage Processing (NLP) are for instance:
o Determining the part of speech of words in a text (verb, noun, etc.)
o Analyzing the syntactic relations between words and phrases in a sentence (i.e.
syntactic parsing)
o Analyzing which entities (people, organizations, locations) are mentioned in a text
 The NLP pipeline
o Usually, these tasks are carried out sequentially, because they depend on each
other. For instance, we need to first tokenize the text (split it into words) in order to
be able to assign part-of-speech tags to each word. This sequence is often called an
NLP pipeline.

The NLTK package


 NLTK (Natural Language Processing Toolkit) is a module we can use for most fundamental
aspects of natural language processing.

Word_tokenize()
 First open and read the file and assign the file contents to the variable contents.
 Example 83: with open(“../Data/Charlie/Charlie.txt” as infile:
content = infile.read()
tokens = nltk.word_tokenize(content)
print(tokens)
Result: ['Charlie', 'Bucket', 'stared', 'around', 'the', 'gigantic', 'room', 'in', 'which', 'he',
'now', 'found', 'himself'…]
 It returns a list of all words in the text. The punctuation marks are also in the list, but as
separate tokens.

Sent.tokenize()
 NLTK can also split a text into sentences by using the sent_tokenize() function.

Part-of-speech (POS) tagging


 Using the function pos_tag() we can label each word in the text with its part of speech.

25
 To do pos-tagging you first need to tokenize the text.

Pos_tag
 The pos_tag takes the tokenized text as input and returns a list of tuples in which the first
element corresponds to the token and the second to the assigned pos-tag.
 [('Charlie', 'NNP'), ('Bucket', 'NNP'), ('stared', 'VBD'), ('around', 'IN'), ('the', 'DT'), ('gigantic',
'JJ']
 Example 84: with open (“../Data/Charlie/Charlie.txt”) as infile:
Content = infile.read()
tokens = nltk.word_tokenize(content)
tagged_tokens = nltk_pos_tag(tokens)
verb_tags = [“VBD”, “VBG”, “VBN”, “VBP”, “VBZ”]
verbs = []
for token, tags in tagged_tokens:
if tags in verb_tags:
verbs.append(token)
print(verbs)

Lemmatization
 The lemma of a word is the form of the word which is usually used in dictionary entries. This
is useful for many NLP tasks, as it gives a better generalization.

The WordNet lemmatizer


 Example 85: extends the previous example
lmtzr = nltk.stem.wordnet.WordNet #instantiate a lemmatizer object
verb_lemmas = []
for participle in verbs:
lemma = lmtzr.lemmatize(participle, “v”)
verb_lemmas.append(lemma)
# for this lemmatizer, we need to indicate the POS of the word (in this case v
= verb)
 We need to specify the POS tag to the WordNet lemmatizer in a WordNet format (n for
noun, v for verb, a for adjective)/ if we do no indicate the POS tag, the WordNet lemmatizer
thinks it is a noun.

Combining NLTK POS tags with the WordNet lemmatizer


 The WordNet lemmatizer assumes every word is a noun unless specified differently. Luckily,
we found that we can also automatically infer the POS tags for each word. We can use these
automatic POS tags as input to our lemmatizer to improve its accuracy for non-nouns. As an
intermediate step, we need to translate the POS tags that we get from our POS tagger to
WordNet POS tags. Example 86 shows how to do that.
 Example 86: from nltk.corpus import wordnet as wn
def penn_to_wn(penn_tag):
“”” Returns the corresponding WordNet POS tag for a Penn Treebank
POS tag.”””
if penn_tag in [“NN”, “NNS”, “NNP”, “NNPS”]:
wn_tag = wn.NOUN
elif penn_tag in [“VB”, “VBD”, “VBG”, “VBN”, “VBP”, “VBP”, “VBZ”]:

26
wn_tag = wn.VERB
elif penn_tag in [“RB”, “RBR”, “RBS”]:
wn_tag = wn.ADV
elif penn_tag in [“JJ”, “JJR”, “JJS”]:
wn_tag = wn.ADJ
else:
wn_tag = None
return wn_tag
lmtz = nltk.stem.wordnet.WordNetLemmatizer()
lemmas = list()
for token, pos in taggend_tokens:
wn_tag = penn_to_wn(pos)
if not wn_tag == NONE:
lemma = lmtzr.lemmatize (token, wn_tag)
else:
lemma = lmtzr.lemmatize(token)
lemmas.append(lemma)
print(lemmas)

Nesting
 In Python, one can nest multiple loops or files in one another. For instance, we can use one
(external) for-loop to iterate through files, and then for each file iterate through all its
sentences (interal for-loop).
 Nesting too much will eventually cause computational problems, but this depends on the
size of your data.
 Example 87: import glob
for filename in glob.glob(“../Data/dreams/*.txt”):
with open(filename, “r”) as infile:
content = infile.read()
sentences = nltk.sent_tokenize(content)
print(f”INFO: File {filename} has {len(sentences)} sentences”)
counter=0
for sentence in sentences:
counter+=1
tokens = nltk.word_tokenize(sentence)
print(“Sentence %d has %d tokens” % (counter, len(tokens)))
print() #print an empty line after each file

Putting it all together


 Our goal is to collect all the nouns from Vickie’s dream reports.
 Important steps to remember:
o Create a list of all the files we want to process
o Open and read the files
o Tokenize the texts
o Perform pos-tagging
o Collect all the tokens analyzed as nouns
 Remember, we first need to import nltk to use it.

27
 Example 88.1: import nltk
def tag_tokens(filepath):
"""Read the contents of the file found at the location specified in
FILEPATH and return a list of its tokens with their POS tags."""
with open(filepath, "r") as infile:
content = infile.read()
tokens = nltk.word_tokenize(content)
tagged_tokens = nltk.pos_tag(tokens)
return tagged_tokens
 Now instead of having to open a file, read the contents and close the file, we can just call the
function tag_tokens_file to do this.
 Example 88.2: import glob
for filename in glob.glob("../Data/dreams/*.txt"):
tagged_tokens = tag_tokens_file(filename)
print(filename, "\n", tagged_tokens, "\n")
 Example 88.3: nouns_in_dreams = []
for filename in glob.glob("../Data/dreams/*.txt"):
tagged_tokens = tag_tokens_file(filename)
for token, pos in tagged_tokens:
if pos in ["NN", "NNP"]:
nouns_in_dreams.append(token)
print(set(nouns_in_dreams))

Chapter 16: Data formats I (CSV and TSV)

Intro to CSV and TSV


 Tabular data can be encoded as CSV (comma-separated values) or TSV (tab-separated
values). CSV and TSV files are simply plain text files in which each line represents a row and
the columns are separated by a comma or a tab character.

 Example 89: Example of a CSV output


AK,F,1910,Mary,14
AK,F,1910,Annie,12 Each line in the files has 5 elements:
AK,F,1910,Anna,10 - The state abbreviation
AK,F,1910,Margaret,8 - Gender
AK,F,1910,Helen,7 - Year
AK,F,1910,Elsie,6 - Name
AK,F,1910,Lucy,6 - Frequency of that name in the given year and state
AK,F,1910,Dorothy,5
AK,F,1911,Mary,12
AK,F,1911,Margaret,7

Representing the data in python as a list of lists or a list of dictionaries.


 The elements of the (first) list represent the complete rows.
 The individual rows can be either represented as a list (without column names) or as a
dictionary (with column names).
 Example 90: first three rows list of lists
[['AK', 'F', '1910', 'Mary', '14'],

28
['AK', 'F', '1910', 'Annie', '12'],
['AK', 'F', '1910', 'Anna', '10'],
First row list of dicts
[{'frequency': '14',
'gender': 'F',
'name': 'Mary',
'state': 'AK',
'year': '1910'},
Reading CSV files
 We can open and read CSV files in the same way as normal text files
 Example 91: with open (filename “r”) as csvinfile:
content = csvinfile.read()
print(content)
 The internal representation (repr()) = columns separated by commas and rows separated
by \n.

Reading rows as lists


 You can do this by iterating over each line of the file and then split each row into columns
using the split() method.
 Example 92: with open (filename, “r”) as csvfile:
csv_data = []
for row in csvfile:
row = row.strip(“\n”) #remove all newlines
columns = row.split(“,”) #split the line into columns
csv_data.append(columns)

Reading rows as dicts


 Example 93: first line 1-5 of example 91
dict_row = {"state": columns[0],
"gender": columns[1],
"year": int(columns[2]),
"name": columns[3],
"frequency": int(columns[4])}
csv_data.append(dict_row)

Writing CSV files


 Let’s say now we have a table in Python stored as a list of lists or a list of dicts and we want
to store our result in a CSV file. This is basically the inverse process of reading a CSV file.

Writing rows as lists


 In order to write a list of lists as a CSV file, we need to iterate over the rows and make a
string out of them.
 Example 94: a_list = [“John”, john@example.nl, “555-1234”]
a_string = “,”join(a_list)
 Example 95: with open(outfilename, “w”) as outfile:
for row in address_book:
line = “,”.join(row) + “\n”
outfile.write(line)

29
Writing rows as dicts
 Example 96: for row in address_book:
column_values = row.values()
line = "\t".join(column_values) + '\n'
outfile.write(line)

Chapter 17: Data formats (JSON)

Intro to JSON
 JSON is completely language independent.
 How JSON looks like
dict_doe_family = {
"John": {
"first name": "John",
"last name": "Doe",
"gender": "male",
"age": 30,
"favorite_animal": "panda",
"married": True,
"children": ["James", "Jennifer"],
"hobbies": ["photography", "sky diving", "reading"]},
"Jane": {
"first name": "Jane",
"last name": "Doe",
"gender": "female",
"age": 27,
"favorite_animal": "zebra",
"married": False,
"children": None,
"hobbies": ["cooking", "gaming", "tennis"]}}

Working with JSON in Python


 In order to work with JSON in Python, we
use the json library, which provides an easy
way to encode and decode data in JSON.
 You import json

Loading JSON from file or string


 The load() method is used to load a JSON encoded file as a Python dictionary
 Example 97: with open(“../Data/json_data/Doe.json”. “r”) as infile:
dict_doe_family = json.load(infile)

Writing JSON to file or string


 The json.dump() method is used to write a Python dictionary to a JSON encoded file
 Example 98: with open (“../Data/json_data/Doe.json”, “w”) as outfile:

30
Json.dump(dict_doe_family, outfile)
 The dumps() method is used to convert a Python dictionary to a JSON formatted string
 Example 99: str_doe_family = json.dump(dict_doe_family)

Two useful kwargs are indent and sort_keys

Chapter 18: Data formats (XML)

Intro to XML
 If we only want to indicate that a specific word is a noun, using word_tokenize and pos_tag
is enough. However, if we also want to indicate that Tom Cruise is an entity we will get into
trouble because some annotations are for single words and some are for combinations of
words. In addition, sometimes we have more than one annotation per token. Data
structures such as CSV and TSV are not great at representing linguistic information. XML is a
better format.

Terminology
 1. <Course>
2. <person role="coordinator">Van der Vliet</person>
3. <person role="instructor">Van Miltenburg</person>
4. <person role="instructor">Van Son</person>
5. <person role="instructor">Postma</person>
6. <person role="instructor">Sommerauer</person>
7. <person role="student">Baloche</person>
8. <person role="student">De Boer</person>
9. <person/>
10. </Course>
 Each XML element contains a starting tag and a end tag.
 An element can contain:
o Text (Van der Vliet), attributes (role), elements (person element in course element)
 The starting tag and end tag on line 9 are combined because the element has no children
and/or text.

Root element
 A root element (Course) is special because it is the sole parent element to all the other
elements.

Attributes
 Attributes can contain attributes, which contain info about the element. All attributes are
located in the start tag of an XML element.

Working with XML in Python


 Example 100: from lxml import etree
 We will focus on the following methods/attributes:
o To parse the XML from file or string: the methods etree.parse() and
etree.fromstring()
o To access the root element: the methods getroot()
o To access elements: the methods find(), findall(), and getchildren()

31
o To access attributes: the method get()
o To access element information: the attributes tag and text

Parsing XML from file or string


 The etree.fromstring() method is used to parse XML from a string.
o It returns an Element
 The etree.parse() method is used to load XML files on your computer.
o It returns a ElementTree

Accessing root element


 In order to access the root element of ElementTree, we first need to use the getroot()
method. Note that this does not show the XML element itself, but only a reference.
 In order to show the element itself, we can use the etree.dump() method.
 Example 101: root = tree.getroot()

Accessing elements
 The find() method return the first matching child.
 Example 102: first_person_el = root.find(“person”)
Etree.dump(first_person_el, pretty_print=True)
 The findall() method returns a list of all person children.
 Getchildren() will simply return all children.

Accessing element information


 The get() method is used to access the attribute of an element. If an attribute does not exist,
it will return None.
 Example 103: first_person_el = root.find('person')
role_first_person_el = first_person_el.get('role')
attribute_not_found = first_person_el.get('blabla')
print('role first person element:', role_first_person_el)
print('value if not found:', attribute_not_found)
 The text of an element is found in the attribute text:
 Example 104: print(first_person_el.text)
Result: Van der Vliet
 The tag of an element if found in the attribute tag.

XPATH
 Instead of using the find() and findall() methods, you can also use XPATH expressions.

32

Das könnte Ihnen auch gefallen