Sie sind auf Seite 1von 51

Kernel Synchronization

國立中正大學
資訊工程研究所
羅習五 老師

少部分內容參酌自薛智文老師
Chapter 5: Kernel Synchronization
• Kernel Control Paths
• When Synchronization is Not Necessary
• Synchronization Primitives
• Synchronizing Accesses to Kernel Data
Structures
• Examples of Race Condition Prevention

1
Kernel
• You could think of the kernel as a server that
answers requests; these requests can come
either from a process running on a CPU or an
external device issuing an interrupt request.

Bottom halves
Top halves
2
Kernel Control Paths
• Kernel Control Path (KCP)
– a sequence of instructions executed by the kernel
to handle interrupts (/exception) of different kinds
• Each kernel request is handled by a different
KCP
– system call request System call

(software interrupt):
system_call  ret_from_sys_call
Bottom halves
Top halves

3
Kernel Requests
• A process executing in User Mode causes an
exception. (e.g., x/0)
• A process executing in Kernel Mode causes a Page
Fault exception.
• An external device sends a signal to a programmable
interrupt controller (PIC), and the corresponding
interrupt is enabled
• A process running raises an interprocessor interrupt
(IPI).

4
Kernel Control Paths
• The CPU interleaves KCPs when:
– A process switch occurs. (it relinquishes control of
CPU, e.g., sleep/wait)
– An interrupt occurs.
– A deferrable function is executed.
• Interleaving improves the throughput of PIC
and device controllers.

5
A fully preemptable kernel
• Nonpreemptive kernel? & preemptive kernel?
– Nonpreemptive kernel: Linux kernel ~2.4
– preemptive kernel: Linux kernel 2.6
• Kernel 2.4 + preempt_count* = kernel 2.6
The value is greater than 0 when …
– The kernel is executing an ISR
– The deferrable functions are disabled
– The kernel preemption level has been explicitly
disabled

*: This field is in the thread_info descriptor. 6


When Sync. Is Not Necessary
Simplifying assumptions:
• Interrupt handlers and tasklets need not to be
coded as reentrant functions
– Interrupt handlers, softirqs, and tasklets are both
nonpreemptable and non-blocking
• Per-CPU variable accessed by softirqs and
tasklets only do not require sync.
• A data structure access by only one kind of
tasklet does not require sync.

7
Synchronization Primitives
• Per-CPU variables keep them short!
– One element per each – general, read/write, big
CPU in the system reader
• Atomic operations • Semaphores
– memory bus lock, read- – general, read/write
modify-write (rmw) ops – Local interrupt disabling
• Memory barriers – Local softirq disabling
– avoids compiler, CPU – Read-copy-update (RCU)
instruction re-ordering
• Spin locks
– only on SMP systems;

8
Synchronization Primitives
Technique Description Scope
Atomic read-modify-write
Atomic operation All CPUs
instruction to a counter
Memory barrier Avoid instruction re-ordering Local CPU
Spin lock Lock with busy wait All CPUs
Semaphore Lock with blocking wait All CPUs
Forbid interrupt handling on a
Local interrupt disabling Local CPU
single CPU
Forbid deferrable function
Local softirq disabling Local CPU
handling on a single CPU
Forbid interrupt and softirq
Global interrupt disabling All CPUs
handling on all CPUs
9
Atomic Operations
• Many instructions not atomic in hw (MP)
– rmw instructions: inc, test-and-set, swap
– unaligned memory access
– rep instructions
• Compiler may not generate atomic code
– even i++ is not necessarily atomic! (i=i+1)
• Linux – atomic_ macros
– atomic_t – 24 bit atomic counters
– Intel implementation (atomic, for MP)
• lock prefix byte 0xf0 – locks memory bus

10
Atomic operations in Linux
Function Description
atomic_read(v) Return *v
atomic_set(v,i) Set *v to i
atomic_add(i,v) Add i to *v
atomic_sub(i,v) Subtract i from *v
Subtract i from *v and return 1 if the result is
atomic_sub_and_test(i, v)
zero; 0 otherwise
atomic_inc(v) Add 1 to *v
atomic_dec(v) Subtract 1 from *v
Subtract 1 from *v and return 1 if the result
atomic_dec_and_test(v)
is zero; 0 otherwise
Add 1 to *v and return 1 if the result is zero;
atomic_inc_and_test(v)
0 otherwise
Add i to *v and return 1 if the result is
atomic_add_negative(i, v)
negative; 0 otherwise
11
Atomic bit handling functions in Linux
Function Description
test_bit(nr, addr) Return the value of the nrth bit of *addr
set_bit(nr, addr) Set the nrth bit of *addr
clear_bit(nr, addr) Clear the nrth bit of *addr
change_bit(nr, addr) Invert the nrth bit of *addr
Set the nrth bit of *addr and return its old
test_and_set_bit(nr, addr)
value
Clear the nrth bit of *addr and return its old
test_and_clear_bit(nr, addr)
value
Invert the nrth bit of *addr and return its old
test_and_change_bit(nr, addr)
value
atomic_clear_mask(mask, addr) Clear all bits of addr specified by mask
atomic_set_mask(mask, addr) Set all bits of addr specified by mask
12
Memory Barriers
• Compilers and hw re-order memory accesses
– as an optimization
– true on SMP and even UP systems!
• Memory barrier – instruction to hw/compiler to complete all
pending accesses before issuing more
– read memory barrier – acts on read requests
– write memory barrier – acts on write requests
• Linux macros
– for UP and MP: mb(), rmb(), wmb()
– for MP only: smp_mp(), smp_rmb(), smp_wmb()

13
Memory barriers in Linux

Macro Description
mb( ) Memory barrier for MP and UP
rmb( ) Read memory barrier for MP and UP
wmb( ) Write memory barrier for MP and UP
smp_mb( ) Memory barrier for MP only
smp_rmb( ) Read memory barrier for MP only
smp_wmb( ) Write memory barrier for MP only

14
Peterson’s Solution
• Two process solution
• Assume that the LOAD and STORE instructions are atomic;
that is, cannot be interrupted.
• The two processes share two variables:
– int turn;
– Boolean flag[2]
• The variable turn indicates whose turn it is to enter the critical
section.
• The flag array is used to indicate if a process is ready to enter
the critical section. flag[i] = true implies that process Pi is
ready!

15
Algorithm for Process Pi
while (true) {
flag[i] = TRUE;
turn = j;
while ( flag[j] && turn == j);

/*CRITICAL SECTION*/

flag[i] = FALSE;

/*REMAINDER SECTION*/

}
Task_i Task_j

while (true) { while (true) {

turn = j;
flag[i] = False turn = i;
turn = i flag[j] = TRUE;
while ( flag[i] && turn == i);
flag[i] = TRUE;
while ( flag[j] && turn == j); /*CRITICAL SECTION*/

/*CRITICAL SECTION*/ flag[i] = FALSE;

flag[i] = FALSE; /*REMAINDER SECTION*/

/*REMAINDER SECTION*/ }

}
Peterson’s Solution
while (true) {
flag[i] = TRUE;
mb( );
turn = j;
while ( flag[j] && turn == j);
/*CRITICAL SECTION*/
flag[i] = FALSE;
/*REMAINDER SECTION*/
}
18
Spin Lock

• A special kind of lock designed to work in a


multiprocessor environment.
– Spin lock

– R/W spin lock

– Sequential lock

• Useless in a uniprocessor environment (?)


19
Spin lock functions
Function Description
spin_lock_init( ) Set the spin lock to 1 (unlocked)
Cycle until spin lock becomes 1
spin_lock( )
(unlocked), then set it to 0 (locked)
spin_unlock( ) Set the spin lock to 1 (unlocked)
Wait until the spin lock becomes 1
spin_unlock_wait( )
(unlocked)
Return 0 if the spin lock is set to 1
spin_is_locked( )
(unlocked); 1 otherwise
Set the spin lock to 0 (locked), and
spin_trylock( ) return 1 if the lock is obtained; 0
otherwise
20
Spin lock functions
spin_lock(slp) spin_unlock(slp)

1: lock; decb slp lock; movb $1, slp


jns 3f
2: cmpb $0,slp
pause
jle 2b
jmp 1b
3:

21
Read/Write Spin Locks

initial 0x01 000000


lock # of reading
write 0x00000000
One read 0x00ffffff
Two read 0x00fffffe 22
Read Spin Lock
read_lock(rwlp) read_unlock(rwlp)
movl $rwlp,%eax lock; incl rwlp
lock; subl $1,(%eax)
jns 1f
call __read_lock_failed
1:

__read_lock_failed:
lock; incl (%eax)
1:cmpl $1,(%eax)
js 1b
lock; decl (%eax)
js __read_lock_failed
ret

23
Write Spin Lock
write_lock(rwlp) write_unlock(rwlp)
movl $rwlp,%eax lock; addl $0x01000000,rwlp
lock; subl $0x01000000,(%eax)
jz 1f
call write_lock_failed
1:

__write_lock_failed:
lock; addl $0x01000000,(%eax)
1: cmpl $0x01000000,(%eax)
jne 1b
lock; subl $0x01000000,(%eax)
jnz __write_lock_failed
ret

24
Seqlock (sequential lock)
• A seqlock is a locking mechanism Linux for
supporting fast writes of shared variables.
• seqlock := sequence number + lock
– The lock is to support synchronization between
two writers
– the counter is for indicating consistency in readers

25
Seqlock (sequential lock)
– the writer increments the sequence number, both after
acquiring the lock and before releasing the lock.
– Readers read the sequence number before and after
reading the shared data.
do {
while (((old_seq_num = seq_num)%2) != 0);
//READER: critical section
} while (old_seq_num != seq_num);
• Seqlock was first applied to system time counter
updating.

26
MONITOR & MWAIT
(x86, for thread synchronization)
• MONITOR defines an address range used to
monitor write-back stores.

• MWAIT is used to indicate that the software


thread is waiting for a write-back store to the
address range defined by the MONITOR
instruction.
27
Read-copy-update (RCU)
• It allows extremely low overhead, wait-free reads.
• RCU updates can be expensive
– they must leave the old versions of the data structure in
place to accommodate pre-existing readers.
– These old versions are reclaimed after all pre-existing
readers finish their accesses.
• RCU is a new addition in Linux 2.6; it is used in the
networking layer and in the virtual file system (VFS).

Reference: Paul E. McKenney: Read-copy-update (RCU),


http://www.rdrop.com/users/paulmck/rclock/
IPDPS 2006 Best Paper 28
Read-copy-update (RCU)
reader

Local_PTR
data
PTR

RCU allows extremely low overhead, wait-free reads.


29
Read-copy-update (RCU)
reader

Local_PTR
data
PTR

writer
kmalloc + copy +
data (new)
update
New_PTR

RCU updates can be very expensive…


30
Read-copy-update (RCU)
reader

PTR
data

An atomic
operation
writer

PTR = data (new)


PTR
New_PTR

Remove pointers to a data structure, so that


subsequent readers cannot gain a reference to it. 31
Read-copy-update (RCU)
reader

PTR
data

writer

PTR = data (new)


PTR
new_PTR

Wait for all previous readers to complete their RCU


read-side critical sections. 32
Read-copy-update (RCU)

data

writer or GC

data (new)
kfree(old_ptr) PTR

The “GC” can safely reclaim the data (the old


version). 33
Read-copy-update (RCU)

data (new)
PTR

34
Read-copy-update (RCU)
Lock scheduler

scheduler
Unlock

CTX_SW
reader

writer GC

Lock_scheduler := preempt_count++
Unlock_scheduler := preempt_count-- 35
Semaphores
• Kernel semaphores
– used by kernel control paths.
– can be acquired only by functions that are allowed
to sleep; interrupt handlers and deferrable
functions cannot use them.
• System V IPC semaphores
– used by User Mode processes

36
Semaphores
• struct semaphore
– count (atomic_t):
• >0 free; 0 inuse, no waiters; <0 inuse, waiters
– wait: wait queue
– sleepers: 0 (none), 1 (some), occasionally 2
• implementation requires lower-level synch!
– atomic updates, spinlock, interrupt disabling
• optimized assembly code for normal case (down())
– C code for slower “contended” case (_ _down())

37
Semaphores
up: down:
movl $sem,%ecx movl $sem,%ecx
lock; incl (%ecx) lock; decl (%ecx);
jg 1f jns 1f
pushl %eax pushl %eax
pushl %edx pushl %edx
pushl %ecx pushl %ecx
call _ _up call _ _down
popl %ecx popl %ecx
popl %edx popl %edx
popl %eax popl %eax
1: 1:
38
_ _down

WaitingQ.ins

WaitingQ.del

39
Read/Write Semaphores
• New feature of Linux 2.4
• Read/Write Semaphores
• FIFO
• complex implementation
– similar to regular semaphores
• operations:
– down_read(), down_write()
– up_read(), up_write()
40
Read/Write Semaphores
• The first process is always awoken.
– If it is a writer, the other processes in the wait
queue continue to sleep.
– If it is a reader, any other reader following the first
process is also woken up and gets the lock.
However, readers that have been queued after a
writer continue to sleep.

R R R W R W R R
41
Completions
• The current implementation of up( ) and
down( ) also allows them to execute
concurrently on the same semaphore.
• up( ) might attempt to access a data structure
that no longer exists.
• up( )  complete( ).
• down( )  wait_for_completion( ).

42
Completions
1 2

create_sem

down

del_sem
up

del_sem
43
Local Interrupt Disabling
• Local interrupt disabling does not protect
against concurrent accesses to data structures
by interrupt handlers running on other CPUs.

• In multiprocessor systems, local interrupt


disabling is often coupled with spin locks.

Spin locks
•only on SMP systems; keep them short!
•general, read/write, big reader 44
Global Interrupt Disabling
• A typical scenario consists of a driver that
needs to reset the hardware device.
• Global interrupt disabling significantly lowers
the system concurrency level.
• An interrupt service routine should never
execute the cli( ) macro.

45
_ _global_cli()
• wait for top and bottom halves to complete
• disable local interrupts
• grab spinlock
• disable all interrupts

46
47
Disabling Deferrable Functions
• disabling interrupts disables deferred
functions
• possible to disable deferred functions but not
all interrupts
• ops (macros):
– local_bh_disable()
– local_bh_enable()

48
Choosing Synch Primitives
• avoid synch if possible! (clever instruction
ordering)
– example: inserting in linked list (needs barrier still)
– Example: task migration
• use atomics or rw spinlocks if possible
• use semaphores if you need to sleep
• complicated structures accessed by deferred
functions

49
Example Race Conditions
• reference counters for sharing structs
– get/put functions
– deallocate when 0
• memory map semaphore
• slab cache list semaphore
• inode semaphore

50

Das könnte Ihnen auch gefallen