Beruflich Dokumente
Kultur Dokumente
Chapter 1
Variables, values, input()
Chapter 2
Data types, type(), casting types, mathematical operators
Chapter 3
Quotes, \n, repr(), \t, slicing, index numbers, len, immutability, comparison
operators, concatenation, f-strings, lower(), upper(), dir(), help()
Chapter 4
Boolean expressions, and/or/not, all and any, if/elif/else statements, identation,
nesting
Week 2
Chapter 5
Containers, args and kwargs, mutability
Chapter 6
Lists, .append() basic list operations, list methods
Chapter 7
Sets, .add(), union, intersection, set operations,
Chapter 8
Comparison of lists and sets
Chapter 9
For loops, .reverse(), tuples, unpacking variables, continue, break, .split()
Chapter 10
Dictionaries, .keys(), .values(), collections, nesting dictionaries,
1
Week 1
Chapter 1 Getting started with variables and values
Print()
Outputs its argument to the screen
Example 1: print (“Hello, world”) shows Hello. World on the screen.
Comment #
Comments are not executed in Python. They are used to make the code easy to understand.
Variables
A stored value which is named using the = symbol.
Variable names can be re-used, but they will always contain the value that you last assigned
to it.
Example 2: name = “Patrick Bateman”, x = 22
Variable names
Variable names are only valid if they:
o Start with a letter or underscore (_)
o Only contain letters, numbers and underscores
Use clear, meaningful, descriptive variable names so that your code will remain
understandable
Use underscores and lowercase characters for separating words
Do not use built-in names, such as print or sum
Input()
This takes user input and returns it as a string
Example 3: text = input(“What is your name?”)
print(text)
Result: Color
Type()
We can use the type() function to check object types.
Example 4: type(“hoi”)
Result: str
2
+ sign
The + sign (and other symbols) can act like different operators depending on the type which
they are used with.
Example 5: var1 = 1 + 5
print(var1)
Result: 6
Example 6: var2 = “hello “ + “there”
print(var2)
Result: hello there
Casting types
Converting types into other types
Example 7: x = “5”
y=2
print(int(x) + y)
result: 7
Mathematical operators
x+y sum of x and y
x–y difference of x and y
x*y product of x and y
x/y quotient of x and y
x // y floored quotient of x and y
x%y remainder of x / y
-x x negated
+x x unchanged
x ** y x to the power y
Chapter 3 Strings
3
Multi-line strings (hidden character \n)
A string that spans across multiple lines can be achieved in two ways:
1. With single or double quotes, where we use \n to indicate that the rest of the string
continues on the next line.
o Example 11: long_string1 = "A very long string\n\
can be split into multiple\n\
sentences by appending a newline symbol\n\
to the end of the line."
o Example 12: multiline_text_2 = "This is a multiline text, so it is enclosed by triple
quotes.\nPretty cool stuff!\nI always wanted to type more than one line, so today is
my lucky day!"
o ! zie dat \n\ wordt gebruikt als de string al op meerdere regels getypt is en \n als de
string als één geheel getypt is.
2. With three single or double quotes
o Example 13: long_string2 = """A very long string
can also be split into multiple
sentences by enclosing the string
with three double or single quotes."""
Hidden character \t
\t represents tabs
Example 15: colors = “yellow\tgreen\tblue\tred”
Print(colors)
Result: yellow green blue red
String indices
Each character in a string has a position, which can be referred to by the index number of
the position.
You can access the characters of a string using […] with the index number in between the
brackets.
Example 16: my_string = “Sandwiches are yummy”
Print(my_string[1])
Result: a
4
Len()
Len() lets you compute lengths of a sequence
Spaces count as characters too
String slices
We can also extract a range from a string
It works as follows:
You can also have negative step size. my_string[::-1] is the idiomatic way to reverse a string.
Immutability
Strings are immutable (cannot change).
It is possible to create a new string based on the old one.
Example 17: fruit = ‘guanabana’
Island = fruit[:5]
Print(island, ‘Island’)
Result: guana island
Comparing strings
Comparison operators are: ==, !=, <, <=, >, and >=
Comparing strings can be useful for putting words in lexicographical order (alphabet)
(uppercase letters come before all lowercase letters).
5
f-strings (formatted strings)
Used instead of the + sign when you want to concatenate strings.
Example 18: introduction = f”Hello. My name is {name}. I’m {age} years old.”
Example 19: text = f”Next year, I’m turning {age+1} years old.”
6
Dumpty
Print(“-“.join(words))
Result: Humpty-Dumpty
.lstrip() and .rstrip() remove whitespaces left and right
.startswith() and .endswith() return true when the string starts or ends with the given “ “
Boolean expression
A Boolean expression results in True or False
If statements
Example 29: number = 5
If number == 5:
print (“number equals 5”)
If number > 4:
print (“number is greater than 4”)
Results: number equals 5 / number is greater than 4
7
Multi-way decisions – if, elif, else
For every if block, you can have one if statement, multiple elif statements and one else
statement.
First the if statement will be evaluated. Only if that statement turns out to be False the
computer will proceed to evaluate the elif statements.
Example 31: age = 21
if age < 12:
print (“You’re still a child!”)
elif age <18:
print (“You are a teenager!”)
elif age <30:
print (“You’re pretty young!”)
else:
print (“Wow, you’re old!”)
Result: You’re pretty young!
NOTE! You cannot start with age <30 because then the machine won’t proceed to the next
option (the elif)
Indentation
Indentation (4 spaces) lets Python know when it needs to execute the piece of code
When a Boolean expression for example is True, python executes the code from the next
line that starts with four spaces or one tab to the right.
Indentation must be consistently throughout your code.
All statements with the same distance to the right belong to the same block of code
Nesting
Blocks can contain blocks as well
There may be a situation when you want to check for another condition after a condition
resolves to True. In such a situation, you can use the nested if construct.
8
Week 2
Chapter 5 – Core concepts of containers
Containers
Lists
Tuples
Set
Dictionaries
Mutability
Chapter 6 – Lists
Lists
Lists are surrounded by square brackets and the elements in the list are separated by
commas.
A list element can be any Python object – even another list.
A list can be empty.
Lists are mutable.
Two ways of creating an empty list:
o One_way = [ ]
o Another_way = list()
9
Extracting/inspecting items in a list
Indexing and slicing in lists works the same way as with string. Every item in the list has
hence its own index number.
Example 33: friend_list = [“John”, “Bob”, “Marry”]
print(friend_list[0])
Result: John
10
List methods
a.append(b)
o Add item b to the end of a
a.extend(c)
o Add the elements of list c at the end of a
a.insert(i, b)
o Insert item b at position i
a.pop(i)
o Remove from a the i’th element and return it. If i is not specified, remove the last
element
a.index(x)
o Return the index of the first element of a with value x. error if it does not exist
a.count(x)
o Return how often value x is found in a
a.remove(x)
o Remove from a the first element with value x. Error if it does not exist
a.sort()
o Sort the elements of list a
a.reverse()
o Reverses list a (no return value)
Chapter 7 – Sets
Sets
Sets are surrounded by curly brackets and the elements in the set are separated by commas.
A set can be empty.
Sets do not allow duplicates.
Sets are unordered:
o You can check if two sets are the same even if you don’t know the order in which
items were put in.
o You cannot use an index to extract an element from a set.
A set can only contain immutable objects (for now: strings and integers).
A set cannot contain mutable objects, hence no lists or sets.
Creating a set:
o A_set = set()
11
Extracting/inspecting items in a set
When you use sets, you usually want to compare the elements of different sets.
You can use union, intersection, difference and symmetric difference.
Union:
o Return the union of sets as a new sets (i.e. all elements that are in either set)
o Example 37: set1.union(set2)
Result: {1, 2, 3, 4, 5, 6, 7, 8}
Intersection:
o Return the intersection of two sets as a new set (i.e. all elements that are in both
sets)
Set operations
Set_a._add(an_element)
o Add an_element to set_a
Set_a.update(set_b)
o Add the elements of set_b_ to set_a
Set_a.pop()
o Remove and return an arbitrary set element.
Set_a.remove(an_element)
o Remove an_element from set_a
You can create a set from a list. Attention: duplicate will be removed
Finding elements
It’s quicker to check if an element is in a set than to check if it is in a list.
12
All other scenarios sets
Tuples
A tuple is defined in the same way as a list, except that the whole set of elements in
enclosed in parentheses.
The elements of a tuple have a defined order.
Tuples can contain immutable and mutable objects.
Items cannot be added or removed.
A tuple can be empty.
13
Tuples have two methods: index and count
You can unpack tuples.
Two ways of creating a tuple:
o An_empty_tuple = ()
o Another_empty_tuple = tuple()
Unpacking variables (tuples)
Example 40: a_tuple = (1, “a”, 2)
first_el, second_el, third_el = a_tuple
print (first_el)
Result: 1
Unpacking can also be used with other containers
14
p
Dictionaries
A dictionary is surrounded by curly brackets and the key/value pairs are separated by
commas.
A dictionary consists of one or more key:value pairs, the key is the identifier or name that is
used to describe the value.
The keys in a dictionary are unique.
o Key:value pairs get overwritten if you assign a different value to an existing key.
The syntax for a key/value pair is: KEY : VALUE.
The keys in a dictionary have to be immutable.
The values in a dictionary can be any python object.
A dictionary can be empty.
Two ways of creating an empty dictionary:
o Use dict() or {}
15
Using built-in function to inspect the keys and values
Len, max, min and sum can be used to inspect the keys and values.
Nesting dictionaries
Since dictionaries consists of key:value pairs, we can actually make another dictionary the
value of a dictionary.
Example 54: a_nested_dictionary = {“a_key”:
{“nested_key1” : 1,
16
“nested_key2 : 2,
“nested_key3 : 3}
}
Print(a_nested_dictionary)
Result: {'a_key': {'nested_key1': 1, 'nested_key2': 2, 'nested_key3': 3}}
In order to access the nested value, we must do a look up for each key on each nested level
Example 55: the_nested_value = a_nested_dictionary[“a_key”][“nested_key1”]
Print(the_nested_value)
Result: 1
Week 4
Chapter 11: Functions and scope
Writing a function
A function is an isolated chunk of code, that has a name, gets zero or more parameters, and
returns a value. In general, a function will do something for you, based on a number of input
parameters you pass it, and it will typically return a result.
Defining a function
Write def
The name you would like to call your function
A set of parentheses containing the argument(s) of your function
A colon
A docstring describing what your function does
The function definition
Ending with a return statement
Example 56: def happy_birthday_to_Emily():
“”” Print a birthday song to Emily.”””
print(“Happy birthday to you!”)
Calling a function
When calling a function, you should always use parenthesis
Example 57: happy_birthday_to_Emily()
17
print()
def two_new_lines():
“””Print two new line.”””
new_line()
new_line()
Parameters and arguments
Parameter: variables used in function definitions are called parameters.
Argument: variables used in function calls are called arguments.
Functions can have multiple parameters.
Example 59: def multiply(x, y):
“””Multiply who numeric values.”””
result = x * y
print(result)
18
Make sure you actually save your 2 values into 2 variables (so use var1 and var2)
Saving the resulting values in different variables can be useful when you want to use them in
different places in your code.
Docstring
Docstring is a string that occurs as the first statement in a function definition.
Always use triple double quotes around docstrings.
There is no blank line either before or after the docstring.
The docstring is a phrase ending in a period. It prescribes the function or method’s effect as
a command (do this, return that), not as a description; e.g. don’t write: returns the
pathname.
Variable scope
Any variables you declare in a function, as well as the parameters that are passed to a
function will only exist within the scope of that function, i.e. inside the function itself.
Example 63: def setx():
“””Set the value of a variable to 1.”””
x=1
return x*
setx()
print(x)
Result: NameError *error is with and without return x
Example 64: x = 0
def setx():
“””Set the value of a variable to 1.”””
x=1
set(x)
print(x)
Result: 0
Example 65: x = 1
def getx():
“””Print the value of a variable x.”””
print(x)
getx()
Result: 1
The function locals() returns a list of all local variables.
The function globals() returns a lost of all global variables.
It is best to check for membership with the in operator.
Local context stays local to the function and is not shared even with other functions called
within a function.
19
A python module is a python file which contains function definitions and statements.
To use an external module in you code you need to explicitly import it.
Example 66: import random
print(random.randint(0, 100)
Result: 48
The dot indicates that the machine should look for the randint() method inside the random
module that is imported.
You can import an entire module or import only a specific function in the module.
Example 67: from random import randint
Print (randint(0, 100))
Result: 83
You can also (temporarily) change the names of the functions you import
Example 68: from random import randint as random_number
Print(randon_number(0, 100))
Useful modules
Datetime
o Example 69: import datetime
Print(datetime.datetime.now())
Result: 2019-09-20 15:17:28.590338
Requests
o Example 70: import requests
A=request.get(“http://piasommerauersite.wordpress.com/”)
print(a.content)
Editors
Option 1: Jupyter notebook New Text file
Option 2: install editor
Terminal
The windows command line is not case sensitive.
When working with a file or directory with a space, surround it in quotes.
Type dir at the prompt to get a list of files in the current directory.
Type dir /p to list the files one page at a time.
Type dir a* if you want to list files that begin with the letter “A”.
Type cd to change location in your terminal. The cd command can move into a directory.
The cd command allows you to go back a directory by typing cd..
The cd command allows you to go back to the root directory by typing cd\
To make a directory (folder) in the current directory type mkdr file name
To open a file type ‘start notepad’ file name + extension (.txt. docx).
To make a new file type ‘start notepad’ file name that doesn’t exist + extension. After you
press enter a new file will be opened and the system will ask if you want to save it.
20
The move command let’s you move a file into an alternate directory. First go to the directory
the file is in and type move file name + target directory.
To rename a file use the rename command. Type rename old file name + new file name.
To delete a file use the del command. Type del file name.
To remove a directory use the rmdir command.
To run a program (.exe) type the file name.exe and press enter.
Use exit to close the command line window.
File paths
To open a file, we need to associate the file on the disk with a variable in Python.
First we tell Python where the file is stored on your disk (the file path).
Python will start looking in the ‘working’ or current’ directory. If it’s in the working directory,
you only have to tell Python the name of the file.
If it’s not in the working directory, you have to tell Python the exact path to your file.
/ means the root of the current drive
./ means the current directory
../ means the parent of the current directory
If you want to go from your current working directory (cwd) to the one directly above
(dir3), your path is ../
If you want to go to dir1, your path is ../../
If you want to go to dir5, your path is ../dir5/
If you want to go to dir2, your path is ../../dir2/
* Windows uses backslashes instead of forward slashes.
Opening a file
We can use the file path to tell Python which file to open by using the built-in function
open().
The open() function does not return the actual text that is saved in the text file, but return a
file object from which we can read the content using the .read() function.
We pass 3 arguments to the open() function:
o The path to the file that you wish to open
21
o The mode, a combination of characters explaining the purpose of the file opening
and type of content stored in the file
o Encoding (a keyword argument), specifies the encoding of the text file. This is
basically useful when reading non-English characters.
Reading a file
read()
o The read() method is used to access the entire text in a file, which we can assign to a
variable.
o Example 73: infile = open(“../Data/Charlie/Charlie.txt”, “r”)
content = infile.read()
print(content)
Result: The entire text of Charlie.txt
o The variable content holds the entire content of the file Charlie.txt as a single string
and we can access and manipulate it just like any other string.
Readlines()
o The readlines() function allows you to access the content of a file as a list of lines.
This means it splits the text in a file at the new lines characters (\n) for you.
o You can use a for-loop to print each line in the file.
o Example 74: lines = infile.readlines()
for line in lines:
Print(“Line:”, line)
22
When we open a file, we can only use one of the read operations once. If we want to read it
again, we have to open a new file variable.
Readline()
o The readline() returns the next line of the file, returning the text up to and including
the next newline character. If you call the operation again, it will return the next line
in the file.
For small files that you want to lad entirely, you can use one of these three methods.
For larger files and when we are only interested in a small portion of a file it is
recommended to use a for-loop.
Example 75: infile = open(filename, “r”)
for line in infile:
print(line)
infile.close()
Close()
Example 76: filepath = “../Data/Charlie/Charlie.txt”
infile = open(filepath, “r”)
content = infile.read()
infile.close()
23
Writing files
To write content to a file, we can open a new file and write the text to this file by using the
write() method. Again, we can do this by using the context manager. Remember that we
have to specify the mode using w.
The os module
24
The os module has many features that can be very useful and which are not supported by
the glob module.
List of some of the things you can do:
o Creating single or multiple directories: os.mkdir(), os.mkdirs()
o Removing single or multiple directories: os.rmdir(), os.rmdirs()
o Checking whether something is a file or a directory: os.path.isfile(), os.path.isdir()
o Split a path and return a tuple containing the directory and filename: os.path.split()
o Construct a pathname out of one or more partial pathnames: os.path.join()
o Split a filename and return a tuple containing the filename and the file extension:
os.path.splittext()
o Get only the basename or the directory path: os.path.basename(), os.path.dirname()
Word_tokenize()
First open and read the file and assign the file contents to the variable contents.
Example 83: with open(“../Data/Charlie/Charlie.txt” as infile:
content = infile.read()
tokens = nltk.word_tokenize(content)
print(tokens)
Result: ['Charlie', 'Bucket', 'stared', 'around', 'the', 'gigantic', 'room', 'in', 'which', 'he',
'now', 'found', 'himself'…]
It returns a list of all words in the text. The punctuation marks are also in the list, but as
separate tokens.
Sent.tokenize()
NLTK can also split a text into sentences by using the sent_tokenize() function.
25
To do pos-tagging you first need to tokenize the text.
Pos_tag
The pos_tag takes the tokenized text as input and returns a list of tuples in which the first
element corresponds to the token and the second to the assigned pos-tag.
[('Charlie', 'NNP'), ('Bucket', 'NNP'), ('stared', 'VBD'), ('around', 'IN'), ('the', 'DT'), ('gigantic',
'JJ']
Example 84: with open (“../Data/Charlie/Charlie.txt”) as infile:
Content = infile.read()
tokens = nltk.word_tokenize(content)
tagged_tokens = nltk_pos_tag(tokens)
verb_tags = [“VBD”, “VBG”, “VBN”, “VBP”, “VBZ”]
verbs = []
for token, tags in tagged_tokens:
if tags in verb_tags:
verbs.append(token)
print(verbs)
Lemmatization
The lemma of a word is the form of the word which is usually used in dictionary entries. This
is useful for many NLP tasks, as it gives a better generalization.
26
wn_tag = wn.VERB
elif penn_tag in [“RB”, “RBR”, “RBS”]:
wn_tag = wn.ADV
elif penn_tag in [“JJ”, “JJR”, “JJS”]:
wn_tag = wn.ADJ
else:
wn_tag = None
return wn_tag
lmtz = nltk.stem.wordnet.WordNetLemmatizer()
lemmas = list()
for token, pos in taggend_tokens:
wn_tag = penn_to_wn(pos)
if not wn_tag == NONE:
lemma = lmtzr.lemmatize (token, wn_tag)
else:
lemma = lmtzr.lemmatize(token)
lemmas.append(lemma)
print(lemmas)
Nesting
In Python, one can nest multiple loops or files in one another. For instance, we can use one
(external) for-loop to iterate through files, and then for each file iterate through all its
sentences (interal for-loop).
Nesting too much will eventually cause computational problems, but this depends on the
size of your data.
Example 87: import glob
for filename in glob.glob(“../Data/dreams/*.txt”):
with open(filename, “r”) as infile:
content = infile.read()
sentences = nltk.sent_tokenize(content)
print(f”INFO: File {filename} has {len(sentences)} sentences”)
counter=0
for sentence in sentences:
counter+=1
tokens = nltk.word_tokenize(sentence)
print(“Sentence %d has %d tokens” % (counter, len(tokens)))
print() #print an empty line after each file
27
Example 88.1: import nltk
def tag_tokens(filepath):
"""Read the contents of the file found at the location specified in
FILEPATH and return a list of its tokens with their POS tags."""
with open(filepath, "r") as infile:
content = infile.read()
tokens = nltk.word_tokenize(content)
tagged_tokens = nltk.pos_tag(tokens)
return tagged_tokens
Now instead of having to open a file, read the contents and close the file, we can just call the
function tag_tokens_file to do this.
Example 88.2: import glob
for filename in glob.glob("../Data/dreams/*.txt"):
tagged_tokens = tag_tokens_file(filename)
print(filename, "\n", tagged_tokens, "\n")
Example 88.3: nouns_in_dreams = []
for filename in glob.glob("../Data/dreams/*.txt"):
tagged_tokens = tag_tokens_file(filename)
for token, pos in tagged_tokens:
if pos in ["NN", "NNP"]:
nouns_in_dreams.append(token)
print(set(nouns_in_dreams))
28
['AK', 'F', '1910', 'Annie', '12'],
['AK', 'F', '1910', 'Anna', '10'],
First row list of dicts
[{'frequency': '14',
'gender': 'F',
'name': 'Mary',
'state': 'AK',
'year': '1910'},
Reading CSV files
We can open and read CSV files in the same way as normal text files
Example 91: with open (filename “r”) as csvinfile:
content = csvinfile.read()
print(content)
The internal representation (repr()) = columns separated by commas and rows separated
by \n.
29
Writing rows as dicts
Example 96: for row in address_book:
column_values = row.values()
line = "\t".join(column_values) + '\n'
outfile.write(line)
Intro to JSON
JSON is completely language independent.
How JSON looks like
dict_doe_family = {
"John": {
"first name": "John",
"last name": "Doe",
"gender": "male",
"age": 30,
"favorite_animal": "panda",
"married": True,
"children": ["James", "Jennifer"],
"hobbies": ["photography", "sky diving", "reading"]},
"Jane": {
"first name": "Jane",
"last name": "Doe",
"gender": "female",
"age": 27,
"favorite_animal": "zebra",
"married": False,
"children": None,
"hobbies": ["cooking", "gaming", "tennis"]}}
30
Json.dump(dict_doe_family, outfile)
The dumps() method is used to convert a Python dictionary to a JSON formatted string
Example 99: str_doe_family = json.dump(dict_doe_family)
Intro to XML
If we only want to indicate that a specific word is a noun, using word_tokenize and pos_tag
is enough. However, if we also want to indicate that Tom Cruise is an entity we will get into
trouble because some annotations are for single words and some are for combinations of
words. In addition, sometimes we have more than one annotation per token. Data
structures such as CSV and TSV are not great at representing linguistic information. XML is a
better format.
Terminology
1. <Course>
2. <person role="coordinator">Van der Vliet</person>
3. <person role="instructor">Van Miltenburg</person>
4. <person role="instructor">Van Son</person>
5. <person role="instructor">Postma</person>
6. <person role="instructor">Sommerauer</person>
7. <person role="student">Baloche</person>
8. <person role="student">De Boer</person>
9. <person/>
10. </Course>
Each XML element contains a starting tag and a end tag.
An element can contain:
o Text (Van der Vliet), attributes (role), elements (person element in course element)
The starting tag and end tag on line 9 are combined because the element has no children
and/or text.
Root element
A root element (Course) is special because it is the sole parent element to all the other
elements.
Attributes
Attributes can contain attributes, which contain info about the element. All attributes are
located in the start tag of an XML element.
31
o To access attributes: the method get()
o To access element information: the attributes tag and text
Accessing elements
The find() method return the first matching child.
Example 102: first_person_el = root.find(“person”)
Etree.dump(first_person_el, pretty_print=True)
The findall() method returns a list of all person children.
Getchildren() will simply return all children.
XPATH
Instead of using the find() and findall() methods, you can also use XPATH expressions.
32