Beruflich Dokumente
Kultur Dokumente
Vishal Sapre
What is concurrency ?
Wikipedia:
In computer science, concurrency is a property of systems in which several computations are executing simultaneously, and potentially interacting with each other.
Concurrency is one of the most well researched and one of the most difficult subjects in Computer Science today.
Its difficult to express a sequential process in a parallel fashion, without rethinking and reengineering system architecture itself. Retrofitting concurrency on sequential processes is NOT EASY.
Literally hundreds of papers have been presented belonging to various aspects of Concurrency.
Most machines today have multiple compute resources available. Our software should be able make use of these resources.
Any decently significant software project interacts with the external world. The nature of things in the external world can and will be highly asynchronous. (e.g. HTTP requests, Users interactions with the software (UI /command line), interrupts from an external device, events in a simulation...etc) Ideally, software should be able to do its own stuff and at the same time cater to interactions with external world.
The financial worth of a companys products may depend on the use of concurrency methods. Our job security as software engineers may depend on our know-how of concurrency methods.
Multi-Processing
Distributed Computing C extensions
Cooperative Multitasking Alternate Python interpreter
Multi-Threading
Distribute work across multiple threads Python has real OS threads available for use
Very easy to setup and use (will be shown shortly) Memory wise cheap (because data is shared) For IO bound applications, they are very good
Best used for heavily IO bound operations !!! Best avoided for heavily CPU bound operations !!!
The interpreter (python.exe) is shared among threads and only one thread uses it at a time (Demo 1) The python interpreter was never designed to be thread-safe Threads were bolted onto an existing interpreter Python decided to provide a thin wrapper around OS threads (since OSes already have threads)
Allows C extensions to be written without worrying about thread safety issues Eases interpreter maintenance (remember Python is mostly maintained by volunteers)
To share the interpreter, each thread: Acquires a lock on the interpreter (called the Global Interpreter Lock or GIL) and then does its stuff Releases the lock: after every n interpreter operations OR while waiting for I/O OR while sleeping OR when it exits Sends a signal to the OS; OS performs a context switch and tries to schedule other available threads As a ramification, thread contention results for CPU bound operations OS Context Switch time is nondeterministic and mostly greater than the time for which the original thread waits before reacquiring the GIL. So, the thread that had the lock initially, mostly holds the interpreter until its work is done. While other threads fight to get the interpreter. Affects performance negatively and acutely
Multi-Processing
Distribute work across multiple processes (multiple python.exe invocations) multiprocessing module in Python 2.6+
Create new python processes Follows the threading API very closely Mostly a drop-in replacement for threading module if multi-processing is required Uses fork() on Unix / Linux and CreateProcess() on Windows Uses cPickle to pickle objects to send to new process on Windows: only pickelable entities are allowed to be exchanged between process
Parallel Python
Communicate using subprocess.PIPE and stdin, stdout, stderr Idea is to manage child processes, not do multiprocessing. API is not geared towards multiprocessing, data safety etc.
Setting up a new process is an expensive operation No shared stuff, all memory has to be copied for the new process
Large data exchange (e.g. 1 list containing 1million chart points or 10 lists, each having 100000 points) will bring the performance down.
Need to amortize the communication cost against computing done in individual processes We better have lots of work for the CPU in each process !!!
XML-RPC
Well supported on most platforms and languages Created to do web services (e.g: blip.tv getting video stream from youtube.com) Light weight and higher performing than its XML cousin. Almost ever type in a JSON stream maps directly into some type in Python / C++ Python 2.6+ has the parser and dumper in the standard library as json module
Distributed Computing
Distribute work across multiple compute resources (local or remote) Essentially gives handle to a remote process Depends on RPC mechanisms Prior examples: CORBA, COM/DCOM, Java RMI (Remote Method Invocation) Basic Idea:
Create a (remote or local) object that acts as a server and call methods on it Transparently Manage : method calling, error handling, return value ordering and security aspects User sees just a function call. Lot of magic happens under the hood.
The first known distributed computing framework in Python. Found cryptic by many people Most people say this is the easiest route to Distributed Computing in Python PyScripter uses this to allow remote python debugging mpi4py module
C Extensions
Write code in C/C++ functions, call them from Python using Python/C API Once inside C/C++ we can make use of what-ever concurrency method we want
Py_Begin_Allow_Threads, Py_End_Allow_Threads
Code it by hand use Cython to create C extensions or wrap existing C/C++ code use SWIG to wrap existing C code
One of the oldest methods to connect Python to existing C++ Makes it easier than using Python/C API
use Boost.Python to wrap existing C/C++ code use SIP to wrap existing C/C++ code
Converts python code to C++ code using static code analysis (compiler stuff !!!) Actively maintained only on Linux Very new.
Cooperative Multitasking
Cooperative Multitasking:
Distribute work across multiple agents, each one of which cooperatively yields control to the caller, under certain conditions. A Scheduler is the absolute essential aspect of this kind of concurrency method
Coroutines
Stackless Python : Different Python interpreter (used by Cisco and EVE online) Greenlet module : C Extension that mimics Stackless, with the standard interpreter
Asynchronous I/O + cooperative multitasking often results in high performance multitasking systems
Basically, every IO request is passed on to the underlying OS, control is passed back to the caller, an event is generated by OS when I/O completes
Unix / Linux: use Python select module to employ in built epoll mechanism Windows: use Python Windows Extensions and employ Windows OVERLAPPED I/O
Eventlet, gevent, Cogen, multitask etc Most geared towards Unix/Linux, less for Windows. May require many basic libraries (socket, time.sleep, threading) to be patched to support asynchronous I/O
IronPython
Python implemented in C# Python source gets compiled to .Net byte code Source level compatibility with Python2.6 Provides access to all .Net internals (or Mono on Unix / Linux) Allows user to employ whatever concurrency method .Net provides Single threaded performance << CPython
Jython
Python implemented in Java Python source gets compiled to Java byte code Source level compatibility with Python 2.5.2 Provides access to the entire Java ecosystem Allows users to employ all concurrency primitives that Java provides Single threaded performance < CPython
Once stuck with one of them, it would be difficult to move across platforms !!!
Threads for heavily I/O bound operations Processes for heavy computations.
1.
1.
Lets use Cython to convert the computationally intensive code to C And remember to release the GIL once we are inside C
1.
1.
Create a library that employs Async I/O and Python Coroutines for having simple concurrency within our systems.
Q&A