Threads Concepts

1
Chapter 4 Thread Concepts

Outline 4.1 4.2 4.3 4.4 4.5 4.6
4.6.1 4.6.2 4.6.3
Introduction Definition of Thread Motivation for Threads Thread States: Life Cycle of a Thread Thread Operations Threading Models
User-Level Threads Kernel-Level Threads Combining User- and Kernel-Level Threads
4.7
4.7.1 4.7.2
Thread Implementation Considerations

Thread Signal Delivery Thread Termination
4.8 4.9 4.10 4.11
POSIX and Pthreads Linux Threads Windows XP Threads

Java Multithreading Case Study, Part 1: Introduction to Java Threads
2004 Deitel & Associates, Inc. All rights reserved.
Objectives After reading this chapter, you should understand: the motivation for creating threads. the similarities and differences between processes and threads. the various levels of support for threads. the life cycle of a thread. thread signaling and cancellation. the basics of POSIX, Linux, Windows XP and Java threads.
Recent Developments in Processors 1. 32-bit processors are replaced by 64-bit processors. Software lags behind hardware by 2-4 years. 2. Multi-core processors will be dominant. It is much easier to increase the number of cores than to increase processor clock frequencies. 3. Netflix uses about 30% of Internet traffic at night.
Double Data Rate 3
Each core can run two threads currently.
Architecture of Intel 486 processor

Intel 486 Intel 32-bit 386 fixed-point processor Intel 32-bit 387 floating-point processor 8KByte Cache memory & controller
Intel Pentium(P5) Two ALU and one Floating-point Unit
Instruction TLB
Intel Core Architecture
Data TLB FADD Floatingpoint ADD

The Problem with Threads, Edward A. Lee,

UC Berkeley, 2006[2]
Although threads seem to be a small step from sequential computation, in fact, they represent a huge step. They discard the most essential and appealing properties of sequential computation: understandability, predictability, and determinism. Threads, as a model of computation, are wildly non-deterministic, and the job of the programmer becomes one of pruning that non-determinism.
Single CPU
Massive Parallel Computer
10
Google Data Center
11
Hardware vs. Software

Hardware Inherently parallel Application specific, Higher speed
Software Mostly serial Flexible, Lower speed
Observations: 1. To transmit a 2-hour MPEG movie, or 2 Gbyte, requires transmissions by software servers of more than 1 million packets, with maximum data of 1,500 bytes. 2. Higher-definition movies require 40 Gbytes or more. 3. In late 2013, Federal Health Exchange website limited 50,000 users to be logged on at any time.
12
4.1 Introduction
General-purpose languages such as Java, C#, Visual C++ .NET, Visual Basic .NET and Python have made concurrency primitives available to applications programmer Multithreading
Programmer specifies applications contain threads of execution Each thread designate a portion of a program that may execute concurrently with other threads
13
Three-thread Word Processor
Word processor
mouse keyboard printer
Kernel 1. Accept inputs from a keyboard or mouse, 2. Display text and graphics on the video monitor, 3. Send outputs to a printer
14
Three-thread Word Processor

Thread 1 interacts with the user. Thread 2 handles document formating in the backgroud. Thread 3 handles interfacing with a printer. As soon as a user deletes a sentence from page 1, Thread 1 tells Thread 2 to reformat the whole document. Meanwhile, Thread 1 continues to listen to the inputs from a keyboard/mouse, and respond to the users commands, while Thread 2 computes the reformating in the backgroud. When the user wants to display another page for editing, Thread 2 might complete the reformating already. If the program were single-threaded, a printing task would cause the commands from a keyboard/mouse to be ignored until the printing is done. (Assume that the printing does not use an interrupt-driven programming model.)
15
Three-thread Word Processor Thread 1 interacts with a user. Thread 2 reformats the document when commanded by Thread 1.
Thread 3 outputs to a printer when commanded by Thread 1.
It should be clear that three processes would not work here because they all need to have access to the same document (in the memory). By having 3 threads, which share a common memory, all threads have access to the document being edited.
16
Two-thread Web Server Process

Web server process
Dispatcher thread worker thread
Web page cache
Kernel
Network connection
17

Requests for pages come in and the requested pages are sent back to the client. At most web sites, some pages are more commonly accessed than other pages. Web servers store these heavily used pages in the main (cache) memory to eliminate the need to go to a hard disk to get them. Dispatcher Thread reads the requests for work from the network. After examining the request, it chooses an idle Worker Thread to handle it, by writng a pointer to the message into a special word associated with each thread. The Dispatcher Thread then wakes up the sleeping Worker Thread, and move it from blocked state to ready state.
18

When the Worker Thread wakes up, it checks to see if the request can be satisfied from the Web page cache, to which all threads have access. If not, it starts a readDisk operation to get the page from the disk and blocks until the dsk operation is complete. This model allows the server to be written as a collection of sequential threads. If the web server program were written as a single-thread program, the main loop of the program gets a request, examine it, and carries it out before getting the next one. While waiting for the disk, the program is idle and does not process any other incoming requests.
19
Two-thread Web Server Process Dispatch thread

while(TRUE) { get_next_request(&buf); handoff_work(&buf); }
Worker thread
while(TRUE) { wait_for_work(&buf); look_for_page_in_cache(&buf, &page); if( page_not_in_cache(&page)) read_page_from_disk(&page); return_page(&page); }
20
4.2 Definition of Thread Thread

Lightweight process (LWP) Threads of instructions or thread of control Shares address space and other global information with its process Registers, stack, signal masks and other thread-specific data are local to each thread
Threads may be managed by the operating system or by a user application Examples: Win32 threads, C-threads, Pthreads
21
4.2 Definition of Thread

Figure 4.1 Thread Relationship to Processes.
TSD = thread-specific data
22
4.3 Motivation for Threads Threads have become prominent due to trends in Software design More naturally expresses inherently parallel tasks Performance Scales better to multiprocessor systems (each thread can be executed by a processor) Cooperation Shared address space incurs less overhead than IPC
23
Benefits of Threads 1. Responsive to users inputs: a multi-thread process will continue to run even if part of it (a thread) is blocked or it is performing a lengthy operation. 2. Resource Sharing: Threads share the memory and the resources of the process 3. Economy: it takes less processor time to create and manage threads than processes. 4. Scalability: threads can run concurrently on different processing cores, while a process with a single thread can run on only one core.
24
4.3 Motivation for Threads Each thread transitions among a series of discrete thread states Threads and processes have many operations in common (e.g. create, exit, resume, and suspend) Thread creation does not require operating system to initialize resources that are shared between parent processes and its threads
Reduces overhead of thread creation and termination compared to process creation and termination
25
4.4 Thread States: Life Cycle of a Thread Thread states Born state Ready state (runnable state) Running state Dead state Blocked state Waiting state Sleeping state
Sleep interval specifies for how long a thread will sleep
26
4.4 Thread States: Life Cycle of a Thread
27
Example 1: Multi-threaded sorting application

7 12 19 3 18 4 2 6 15 8
12
19
18
15
sort thread 0
3 7 12 18 19 2 4 6
sort thread 1
8 15
merge thread
2 3 4 6 7 8 12 15 18 19
28
Example 1: Multi-threaded sorting application Assume that an array a[n] with n entries is to be sorted. It is stored in a global array to be accessed by all threads. The Sort Thread 0 sorts the first half of the array, a[0] to a[n/2 -1], The Sort Thread 1 sorts the second half of the array, a[n/2] to a[n -1], The merge thread combines the two sorted sub-arrays into one array, b[n], which is another global array.
29
void merge(int a[], int n, int b[]) { //purpose: to merge two sorted arrays, a[0 to n/2-1] //and a[n/2 to n-1] into array b.
for( int i=0; i<n; i++) {
int i0, i1; i0=0; i1=n/2; if( a[i0] < a[i1]) { b[i] = a[i0]; i0++; } else { b[i] = a[i1]; i1++; }
}
30
Example 2 Multi-threaded Sudoku Solution Validator

Thread to check that each column contains 1 to 9. Thread to check that each row contains 1 to 9. Thread to check that each 3*3 block contains 1 to 9.
31
bool checkDigit(int a[][9], int rowStart, int rowEnd, int columnStart, int columnEnd) {//purpose: to check that a given Row, column or //3*3 block contains 1 to 9. int count[9]; for( int i=0; i<9; i++) count[i] = 0; for( int row=rowStart; row<rowEnd; row++)
for( int column=columnStart; column<columnEnd; column++)
count[ a[row][column]-1]++;
for( int i=0; i<9; i++) if(count[i]!= 1) return false;
return true; }
32
Check row, column and block

rowStart rowEnd Check Row 0 0 0 Check Column 0 Check Column 8 Check first block 0 0 0 9 9 3 columnStart columnEnd 0 9 0 8 0 0 8 3
33
Example-3 MPEG 8*8 Block Direct-Cosine-Transform
34
Example-3 MPEG 8*8 Block Direct-Cosine-Transform A 640*480 image is divided into 80*60 blocks of 8 rows* 8 pixels/row. Each 8*8 block must perform one DCT independently. Thus each 8*8 DCT can be performed by one thread.
X(k1, k2) =
7 =0
, cos[
+0.5 2 8
+0.5 1 7 cos [ ] =0 8
Some blocks are 16 rows * 16 pixels/row.
35
Amdahls Law
If a N-core system runs an application with S portion of serial component, Speedup <= 1/[ S + (1-S)/N] Example: an application with 40% serial component,
2-core: 4-core:
speedup = 1/[0.4 + 0.6/2] = 1/0.7 speedup = 1/[0.4 + 0.6/4] = 1/0.55
36
Example of Serial Code

Simulation programs are mostly serial. Fibonacci sequence F[0] F[n] = F[n-1] + F[n-2] F[0] =0, F[1] =1, F[1] Using the above recursive definition, F[n] cant be calculated before F[n-1] and F[n-2] are calculated. void fibonacci(int F[], int n) F[2] { F[0] =0; F[1] = 1; for(int i=2; i<=n; i++)F[i] = F[i-1] + F[i-2]; } F[3]
Example of Parallel Code for(int i=0; i< n; i++) a[i] = b[i] + c[i];
for(int i=0; i< n/2; i++) a[i] = b[i] + c[i]; for(int i=n/2; i< n; i++) a[i] = b[i] + c[i];
37
a[0] = b[0] + c[0]
a[1] = b[1] + c[1]
a[2] = b[2] + c[2] a[3] = b[3] + c[3]
38
Matrix Multiplication A = B * C int A[N][N], B[N][N], C[N][N]; for(int i=0; i<n; i++){ for(int j=0; j<m; j++) { A[i][j]=0; for( int k=0; k<p; k++) A[i][j] += B[i][k] * C[k][j]; }}
39
Parallel Code for Matrix Multiplication A = B * C int A[N][N], B[N][N], C[N][N]; for(int i=0; i<n/2; i++){ for(int j=0; j<m; j++) { A[i][j]=0; for( int k=0; k<p; k++) A[i][j] += B[i][k] * C[k][j]; }} for(int i=n/2; i<n; i++){ for(int j=0; j<m; j++) { A[i][j]=0; for( int k=0; k<p; k++) A[i][j] += B[i][k] * C[k][j]; }}
40
4.5 Thread Operations Threads and processes have common operations

Create Exit (terminate) Suspend Resume Sleep Wake
41
4.5 Thread Operations Thread operations do not correspond precisely to process operations
Cancel
Indicates that a thread should be terminated, but does not guarantee that the thread will be terminated Threads can mask the cancellation signal
Join
A primary thread can wait for all other threads to exit by joining them The joining thread blocks until the thread it joined exits
42
4.6 Threading Models Three most popular threading models

User-level threads Kernel-level threads Combination of user- and kernel-level threads
43
4.6.1 User-level Threads

User-level threads perform threading operations in user space
Threads are created by runtime libraries that cannot execute privileged instructions or access kernel primitives directly
User-level thread implementation

Many-to-one thread mappings
Operating system maps all threads in a multithreaded process to single execution context Advantages User-level libraries can schedule its threads to optimize performance Synchronization performed outside kernel, avoids context switches More portable Disadvantage Kernel views a multithreaded process as a single thread of control Can lead to suboptimal performance if a thread issues I/O Cannot be scheduled on multiple processors at once
44
4.6.1 User-level Threads

Figure 4.3 User-level threads.
45
4.6.2 Kernel-level Threads Kernel-level threads attempt to address the limitations of user-level threads by mapping each thread to its own execution context
Kernel-level threads provide a one-to-one thread mapping
Advantages: Increased scalability, interactivity, and throughput Disadvantages: Overhead due to context switching and reduced portability due to OS-specific APIs
Kernel-level threads are not always the optimal solution for multithreaded applications
46
4.6.2 Kernel-level Threads

Figure 4.4 Kernel-level threads.
47
4.6.3 Combining User- and Kernel-level Threads

The combination of user- and kernel-level thread implementation
Many-to-many thread mapping (m-to-n thread mapping)
Number of user and kernel threads need not be equal Can reduce overhead compared to one-to-one thread mappings by implementing thread pooling
Worker threads
Persistent kernel threads that occupy the thread pool Improves performance in environments where threads are frequently created and destroyed Each new thread is executed by a worker thread
Scheduler activation
Technique that enables user-level library to schedule its threads Occurs when the operating system calls a user-level threading library that determines if any of its threads need rescheduling
48
4.6.3 Combining User- and Kernel-level Threads

Figure 4.5 Hybrid threading model.
49
4.7.1 Thread Signal Delivery Two types of signals

Synchronous:
Occur as a direct result of program execution Should be delivered to currently executing thread
Asynchronous
Occur due to an event typically unrelated to the current instruction Threading library must determine each signals recipient so that asynchronous signals are delivered properly
Each thread is usually associated with a set of pending signals that are delivered when it executes Thread can mask all signals except those that it wishes to receive
50
4.7.1 Thread Signal Delivery

Figure 4.6 Signal masking.
51
4.7.2 Thread Termination Thread termination (cancellation)

Differs between thread implementations Prematurely terminating a thread can cause subtle errors in processes because multiple threads share the same address space Some thread implementations allow a thread to determine when it can be terminated to prevent process from entering inconsistent state
52
4.8 POSIX and Pthreads Threads that use the POSIX threading API are called Pthreads
POSIX states that processor registers, stack and signal mask are maintained individually for each thread POSIX specifies how operating systems should deliver signals to Pthreads in addition to specifying several thread-cancellation modes
53
4.9 Linux Threads Linux allocates the same type of process descriptor to processes and threads (tasks) Linux uses the UNIX-based system call fork to spawn child tasks To enable threading, Linux provides a modified version named clone
Clone
accepts arguments that specify which resources to share with the child task
54
4.9 Linux Threads

Figure 4.7 Linux task state-transition diagram.
55
4.10 Windows XP Threads Threads

Actual unit of execution dispatched to a processor Execute a piece of the processs code in the processs context, using the processs resources Execution context contains
Runtime stack State of the machines registers Several attributes
56
4.10 Windows XP Threads Windows XP threads can create fibers

Fiber is scheduled for execution by the thread that creates it, rather than the scheduler
Windows XP provides each process with a thread pool that consists of a number of worker threads, which are kernel threads that execute functions specified by user threads
57
4.10 Windows XP Threads

Figure 4.8 Windows XP thread state-transition diagram.
58
4.11 Java Multithreading Case Study, Part I: Introduction to Java Threads
Java allows the application programmer to create threads that can port to many computing platforms Threads
Created by class Thread Execute code specified in a Runnable objects run method
Java supports operations such as naming, starting and joining threads
59

Figure 4.9 Java threads being created, starting, sleeping and printing. (Part 1 of 4.)
60

61

62

63
Reference Andrew Tanenbaum, Moden Operating Systems, 2nd Edition, Prentice-Hall, 2001.

Threads Concepts

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Threads Concepts

Hochgeladen von

Copyright:

Verfügbare Formate

1

Chapter 4 Thread Concepts

Thread Implementation Considerations

4.8 4.9 4.10 4.11

POSIX and Pthreads Linux Threads Windows XP Threads

2004 Deitel & Associates, Inc. All rights reserved.

2004 Deitel & Associates, Inc. All rights reserved.

Double Data Rate 3

Each core can run two threads currently.

2004 Deitel & Associates, Inc. All rights reserved.

Architecture of Intel 486 processor

2004 Deitel & Associates, Inc. All rights reserved.

Intel Pentium(P5) Two ALU and one Floating-point Unit

2004 Deitel & Associates, Inc. All rights reserved.

Intel Core Architecture

Data TLB FADD Floatingpoint ADD

The Problem with Threads, Edward A. Lee,

Massive Parallel Computer

2004 Deitel & Associates, Inc. All rights reserved.

Google Data Center

2004 Deitel & Associates, Inc. All rights reserved.

Hardware vs. Software

2004 Deitel & Associates, Inc. All rights reserved.

Three-thread Word Processor

mouse keyboard printer

Three-thread Word Processor

2004 Deitel & Associates, Inc. All rights reserved.

2004 Deitel & Associates, Inc. All rights reserved.

Two-thread Web Server Process

Web page cache

Two-thread Web Server Process

Two-thread Web Server Process

Two-thread Web Server Process Dispatch thread

2004 Deitel & Associates, Inc. All rights reserved.

4.2 Definition of Thread Thread

2004 Deitel & Associates, Inc. All rights reserved.

4.2 Definition of Thread

TSD = thread-specific data

2004 Deitel & Associates, Inc. All rights reserved.

2004 Deitel & Associates, Inc. All rights reserved.

2004 Deitel & Associates, Inc. All rights reserved.

4.4 Thread States: Life Cycle of a Thread

2004 Deitel & Associates, Inc. All rights reserved.

Example 1: Multi-threaded sorting application

2004 Deitel & Associates, Inc. All rights reserved.

2004 Deitel & Associates, Inc. All rights reserved.

Example 2 Multi-threaded Sudoku Solution Validator

2004 Deitel & Associates, Inc. All rights reserved.

for( int i=0; i<9; i++) if(count[i]!= 1) return false;

Check row, column and block

2004 Deitel & Associates, Inc. All rights reserved.

Example-3 MPEG 8*8 Block Direct-Cosine-Transform

2004 Deitel & Associates, Inc. All rights reserved.

Some blocks are 16 rows * 16 pixels/row.

2004 Deitel & Associates, Inc. All rights reserved.

speedup = 1/[0.4 + 0.6/2] = 1/0.7 speedup = 1/[0.4 + 0.6/4] = 1/0.55

2004 Deitel & Associates, Inc. All rights reserved.

Example of Serial Code

a[0] = b[0] + c[0]

a[1] = b[1] + c[1]

a[2] = b[2] + c[2] a[3] = b[3] + c[3]

2004 Deitel & Associates, Inc. All rights reserved.

2004 Deitel & Associates, Inc. All rights reserved.