IT-318: Scalable and Cloud Computing: Programming at Scale Concurrency and Consistency

IT-318: Scalable and Cloud Computing
Programming at scale; Concurrency and consistency
1
© 2013 A. Haeberlen, Z. Ives University of Gujrat
Where are we?
 Basics: Scalability, concurrency, consistency...
 Cloud basics: EC2, EBS, S3, SimpleDB, ...
 Cloud programming: MapReduce, Hadoop, ...
 Algorithms: PageRank, adsorption, ...
 Web programming, servlets, XML, Ajax...
 Beyond MapReduce: Dryad, Hive, PigLatin, ...
2
Scale increases complexity
Wide-area
network
Single-core Multicore Large-scale distributed

machine server Cluster system
More challenges
Known True Network Wide-area network

concurrency Message passing Even more failure
More failure modes modes
(faulty nodes, ...) incentives, laws, ...
3
Fear not!
 You will hear about many tricky challenges
 Packet loss, faulty machine, network partition, inconsistency,
variable memory latencies, deadlocks...
 But there are frameworks that will help you

 This course is NOT about low-level programming
 Nevertheless, it is important to know about

these challenges
 Why?
4
Symmetric Multiprocessing (SMP)
Cache Cache Cache Cache
Memory bus
 For now, assume we have multiple cores that

can access the same shared memory
 Any core can access any byte; speed is uniform (no byte
takes longer to read or write than any other)
 Not all machines are like that -- other models discussed later
5
Plan for the next two lectures
 Parallel programming and its challenges NEXT
 Parallelization and scalability, Amdahl's law

 Synchronization, consistency
 Mutual exclusion, locking, issues related to locking
 Architectures: SMP, NUMA, Shared-nothing
 All about the Internet in 30 minutes
 Structure; packet switching; some important protocols
 Latency, packet loss, bottlenecks, and why they matter
 Distributed programming and its challenges
 Network partitions and the CAP theorem
 Faults, failures, and what we can do about them
6
What is scalability?
 A system is scalable if it can easily adapt to
increased (or reduced) demand
 Example: A storage system might start with a capacity of
just 10TB but can grow to many PB by adding more nodes
 Scalability is usually limited by some sort of bottleneck
 Often, scalability also means...

 the ability to operate at a very large scale
 the ability to grow efficiently
 Example: 4x as many nodes  ~4x capacity (not just 2x!)
7
Parallelization
void bubblesort(int nums[]) {
boolean done = false;
while (!done) { int[] mergesort(int nums[]) {
done = true; int numPieces = 10;
for (int i=1; i<nums.length; i++) { int pieces[][] = split(nums, numPieces);
if (nums[i-1] > nums[i]) { for (int i=0; i<numPieces; i++)
swap(nums[i-1], nums[i]); sort(pieces[i]);
done = false; return merge(pieces);
} }
}
}
} Can be done in parallel!
 The left algorithm works fine on one core

 Can we make it faster on multiple cores?
 Difficult - need to find something for the other cores to do
 There are other sorting algorithms where this is much easier
 Not all algorithms are equally parallelizable
 Can you have scalability without parallelism?
8
Completion time
with one core
T1
Scalability Speedup: S N 
Tn
Completion time
with n cores
Ideal
Numbers
sorted per
second
Expected
Cores used
 If we increase the number of processors, will

the speed also increase?
 Yes, but (in almost all cases) only up to a point
9
....
Amdahl's law
Parallel Core #6
part Sequential
Core #5
parts
Core #4
Core #3 Core #3
Core #2 Core #2
Core #1 Core #1
Time Time Time
 Usually, not all parts of the algorithm can be parallelized

 Let f be the fraction of the algorithm that can be
parallelized, and let Spart be the corresponding speedup
 Then 1
Soverall 
f
(1  f ) 
S part
10
Is more parallelism always better?
Ideal
Numbers
sorted per Sweet
spot
second
Expected
Reality (often)
Cores
 Increasing parallelism beyond a certain point

can cause performance to decrease! Why?
 Time for serial parts can depend on #cores
 Example: Need to send a message to each core to tell it what
to do
11
Granularity
 How big a task should we assign to each core?
 Coarse-grain vs. fine-grain parallelism
 Frequent coordination creates overhead

 Need to send messages back and forth, wait for other cores...
 Result: Cores spend most of their time communicating
 Coarse-grain parallelism is usually more

efficient
 Bad: Ask each core to sort three numbers
 Good: Ask each core to sort a million numbers
12
Dependencies START
START Dependencies
...
Individual
tasks DONE DONE
"Embarrassingly parallel" With dependencies
 What if tasks depend on other tasks?

 Example: Need to sort lists before merging them
 Limits the degree of parallelism
 Minimum completion time (and thus maximum speedup) is
determined by the longest path from start to finish
 Assumes resources are plentiful; actual speedup may be lower
13
Heterogeneity
 What if...
 some tasks are larger than others?
 some tasks are harder than others?
 some tasks are more urgent than others?
 not all cores are equally fast, or have different resources?
 Result: Scheduling problem

 Can be very difficult
14
Recap: Parallelization
 Parallelization is hard
 Not all algorithms are equally parallelizable -- need to pick
very carefully
 Scalability is limited by many things

 Amdahl's law
 Dependencies between tasks
 Communication overhead
 Heterogeneity
 ...
15
 Parallel programming and its challenges
 Synchronization, consistency NEXT

16
Why do we need synchronization?
void transferMoney(customer A, customer B, int amount)

{
showMessage("Transferring "+amount+" to "+B);
int balanceA = getBalance(A);
int balanceB = getBalance(B);
setBalance(B, balanceB + amount);
setBalance(A, balanceA - amount);
showMessage("Your new balance: "+(balanceA-amount));
}
 Simple example: Accounting system in a bank

 Maintains the current balance of each customer's account
 Customers can transfer money to other customers
17
Why do we need synchronization?
$100
Alice $500 Bob
1) B=Balance(Bob) 1) A=Balance(Alice)
2) A=Balance(Alice) 2) B=Balance(Bob)
3) SetBalance(Bob,B+100) 3) SetBalance(Alice,A+500)
4) SetBalance(Alice,A-100) 4) SetBalance(Bob,B-500)
 What can happen if this code runs concurrently?

1 2 3 4
Time
1 2 3 4
Alice's balance: $200 $200 $700 $700 $100
Bob's balance: $800 $900 $900 $300 $300
18
Problem: Race condition
{
Alice's and Bob's
threads of execution int balanceB = getBalance(B);
}
 What happened?
 Race condition: Result of the computation depends on the
exact timing of the two threads of execution, i.e., the order
in which the instructions are executed
 Reason: Concurrent updates to the same state
 Can you get a race condition when all the threads are reading the data,
and none of them are updating it?
19
Goal: Consistency
 What should have happened?
 Intuition: It shouldn't make a difference whether the
requests are executed concurrently or not
 How can we formalize this?

 Need a consistency model that specifies how the system
should behave in the presence of concurrency
20
Sequential consistency
Core #1: T2 T4 T5 Actual
execution
Core #2: T1 T3 T6
Time
Hypothetical
Single core: T1 T2 T5 execution
T3 T6 T4
Time
Same start Same
state result
 Sequential consistency:
 The result of any execution is the same as if the operations
of all the cores had been executed in some sequential order,
and the operations of each individual processor appear in
this sequence in the order specified by the program
21
Other consistency models
 Strong consistency
 After update completes, all subsequent accesses will return
the updated value
 Weak consistency
 After update completes, accesses do not necessarily return
the updated value; some condition must be safisfied first
 Example: Update needs to reach all the replicas of the object
 Eventual consistency
 Specific form of weak consistency: If no more updates are
made to an object, then eventually all reads will return the
latest value
 Variants: Causal consistency, read-your-writes, ...
 More of this later in the course!
 How do we build systems that achieve this?

22
 Mutual exclusion, locking, issues related to locking NEXT

23
Mutual exclusion
{
Critical section int balanceB = getBalance(B);
}
 How can we achieve better consistency?

 Key insight: Code has a critical section where accesses from
other cores to the same resources will cause problems
 Approach: Mutual exclusion
 Enforce restriction that only one core (or machine) can
execute the critical section at any given time
 What does this mean for scalability?
24
Locking
{
Critical section int balanceB = getBalance(B);
}
 Idea: Implement locks

 If LOCK(X) is called and X is not locked, lock X and continue
 If LOCK(X) is called and X is locked, wait until X is unlocked
 If UNLOCK(X) is called and X is locked, unlock X
 How many locks, and where do we put them?
 Option #1: One lock around the critical section
 Option #2: One lock per variable (A's and B's balance)
 Pros and cons? Other options?
25
Locking helps!
$100
Alice $500 Bob

1) LOCK(Bob) 1) LOCK(Alice)
2) LOCK(Alice) 2) LOCK(Bob)
7) UNLOCK(Alice) 7) UNLOCK(Bob)
8) UNLOCK(Bob) 8) UNLOCK(Alice)
1 2 3 4 5 6 7 8
blocked Time
1 12 2 3 4 5 6 7 8
Alice's balance: $200 $200 $100 $600 $600
Bob's balance: $800 $900 $900 $900 $400
26
Problem: Deadlock
$100
Alice $500 Bob

1) LOCK(Bob) 1) LOCK(Alice)
2) LOCK(Alice) 2) LOCK(Bob)
7) UNLOCK(Alice) 7) UNLOCK(Bob)
8) UNLOCK(Bob) 8) UNLOCK(Alice)
1 2
blocked (waiting for lock on Alice)
blocked (waiting for lock on Bob) Time

1 2
 Neither processor can make progress!

27
The dining philosophers problem
Philosophers
Philosopher:
repeat
think
pick up left fork
pick up right fork
eat
put down forks
forever
28
What to do about deadlocks
 Many possible solutions, including:
 Lock manager: Hire a waiter and require that philosophers
must ask the waiter before picking up any forks
 Consequences for scalability?
 Resource hierarchy: Number forks 1-5 and require that each

philosopher pick up the fork with the lower number first
 Problem?
 Chandy/Misra solution:
 Forks can either be dirty or clean; initially all forks are dirty
 After the philosopher has eaten, all his forks are dirty
 When a philosopher needs a fork he can't get, he asks his neighbor
 If a philosopher is asked for a dirty fork, he cleans it and gives it up
 If a philosopher is asked for a clean fork, he keeps it
29
 Architectures: SMP, NUMA, Shared-nothing NEXT

30
Other architectures
 Earlier assumptions:
All cores can access the same memory
?

 Access latencies are uniform
 Why is this problematic?

 Processor speeds in GHz, speed of light is 299 792 458 m/s
 Processor can do 1 addition while signal travels 30cm
 Putting memory very far away is not a good idea
 Let's talk about other ways to organize cores

and memory!
 SMP, NUMA, Shared-nothing
31
Symmetric Multiprocessing (SMP)
Memory bus
 All processors share the same memory

 Any CPU can access any byte; latency is always the same
 Pros: Simplicity, easy load balancing
 Cons: Limited scalability (~10 processors), expensive
32
Non-Uniform Memory Architecture (NUMA)
Consistent cache interconnect
 Memory is local to a specific processor

 Each CPU can still access any byte, but accesses to 'local'
memory are considerably faster (2-3x)
 Pros: Better scalability
 Cons: Complicates programming a bit, scalability still limited
33
Example: Intel Nehalem
Source: The Architecture of the Nehalem Microprocessor and

Nehalem-EP SMP Platforms, M. Thomadakis, Texas A&M
 Access to remote memory is slower
 In this case, 105ns vs 65ns
34
Shared-Nothing
Network
 Independent machines connected by network

 Each CPU can only access its local memory; if it needs data
from a remote machine, it must send a message there
 Pros: Much better scalability
 Cons: Nontrivial programming model
35
 All about the Internet in 30 minutes NEXT

36
http://www.flickr.com/photos/brandoncripps/623631374/sizes/l/in/photostream/
Stay tuned
Next time you will learn about:

Internet basics; faults and failures
37

IT-318: Scalable and Cloud Computing: Programming at Scale Concurrency and Consistency

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

IT-318: Scalable and Cloud Computing: Programming at Scale Concurrency and Consistency

Hochgeladen von

Copyright:

Verfügbare Formate

IT-318: Scalable and Cloud Computing

Programming at scale; Concurrency and consistency

Single-core Multicore Large-scale distributed

Known True Network Wide-area network

 But there are frameworks that will help you

 Nevertheless, it is important to know about

Cache Cache Cache Cache

 For now, assume we have multiple cores that

 Parallelization and scalability, Amdahl's law

 Often, scalability also means...

 The left algorithm works fine on one core

 If we increase the number of processors, will

Time Time Time

 Usually, not all parts of the algorithm can be parallelized

 Increasing parallelism beyond a certain point

 Frequent coordination creates overhead

 Coarse-grain parallelism is usually more

"Embarrassingly parallel" With dependencies

 What if tasks depend on other tasks?

 Result: Scheduling problem

 Scalability is limited by many things

 Mutual exclusion, locking, issues related to locking

void transferMoney(customer A, customer B, int amount)

 Simple example: Accounting system in a bank

Alice $500 Bob

 What can happen if this code runs concurrently?

 How can we formalize this?

 How do we build systems that achieve this?

 Architectures: SMP, NUMA, Shared-nothing

 How can we achieve better consistency?

 Idea: Implement locks

Alice $500 Bob

Alice $500 Bob

blocked (waiting for lock on Bob) Time

 Neither processor can make progress!

 Resource hierarchy: Number forks 1-5 and require that each

 All about the Internet in 30 minutes

 Access latencies are uniform

 Why is this problematic?

 Let's talk about other ways to organize cores

Cache Cache Cache Cache

 All processors share the same memory

Cache Cache Cache Cache

Consistent cache interconnect

 Memory is local to a specific processor

Source: The Architecture of the Nehalem Microprocessor and

Cache Cache Cache Cache

 Independent machines connected by network

 Structure; packet switching; some important protocols

Next time you will learn about:

Das könnte Ihnen auch gefallen