Sie sind auf Seite 1von 21

Concurrency in Python

Vishal Sapre

What is concurrency ?
Wikipedia:
In computer science, concurrency is a property of systems in which several computations are executing simultaneously, and potentially interacting with each other.

Concurrency is one of the most well researched and one of the most difficult subjects in Computer Science today.
Its difficult to express a sequential process in a parallel fashion, without rethinking and reengineering system architecture itself. Retrofitting concurrency on sequential processes is NOT EASY.

Literally hundreds of papers have been presented belonging to various aspects of Concurrency.

Why is concurrency needed?


Performance:

Most machines today have multiple compute resources available. Our software should be able make use of these resources.

Flexibility to interact with outside world:

Any decently significant software project interacts with the external world. The nature of things in the external world can and will be highly asynchronous. (e.g. HTTP requests, Users interactions with the software (UI /command line), interrupts from an external device, events in a simulation...etc) Ideally, software should be able to do its own stuff and at the same time cater to interactions with external world.

As a means of survival in the near future:

The financial worth of a companys products may depend on the use of concurrency methods. Our job security as software engineers may depend on our know-how of concurrency methods.

Concurrency primitives in Python


This talk focuses on options available within Python. Following are the usually accepted concurrency options in Python:
Multi-Threading

Multi-Processing
Distributed Computing C extensions
Cooperative Multitasking Alternate Python interpreter

Concurrency primitives in Python

Multi-Threading

Concurrency primitives in Python


Multi-threading

Distribute work across multiple threads Python has real OS threads available for use

Posix threads on Unix / Linux Windows threads on Windows

Very easy to setup and use (will be shown shortly) Memory wise cheap (because data is shared) For IO bound applications, they are very good

GUI Networking Database operations

Best used for heavily IO bound operations !!! Best avoided for heavily CPU bound operations !!!

Concurrency primitives in Python


Multi-threading

The interpreter (python.exe) is shared among threads and only one thread uses it at a time (Demo 1) The python interpreter was never designed to be thread-safe Threads were bolted onto an existing interpreter Python decided to provide a thin wrapper around OS threads (since OSes already have threads)

Allows C extensions to be written without worrying about thread safety issues Eases interpreter maintenance (remember Python is mostly maintained by volunteers)

To share the interpreter, each thread: Acquires a lock on the interpreter (called the Global Interpreter Lock or GIL) and then does its stuff Releases the lock: after every n interpreter operations OR while waiting for I/O OR while sleeping OR when it exits Sends a signal to the OS; OS performs a context switch and tries to schedule other available threads As a ramification, thread contention results for CPU bound operations OS Context Switch time is nondeterministic and mostly greater than the time for which the original thread waits before reacquiring the GIL. So, the thread that had the lock initially, mostly holds the interpreter until its work is done. While other threads fight to get the interpreter. Affects performance negatively and acutely

Concurrency primitives in Python

Multi-Processing

Concurrency primitives in Python


Multi-Processing

Distribute work across multiple processes (multiple python.exe invocations) multiprocessing module in Python 2.6+

Create new python processes Follows the threading API very closely Mostly a drop-in replacement for threading module if multi-processing is required Uses fork() on Unix / Linux and CreateProcess() on Windows Uses cPickle to pickle objects to send to new process on Windows: only pickelable entities are allowed to be exchanged between process

Parallel Python

Full fledged framework for parallelizing applications in Python using processes

subprocess module in Python 2.4+


Communicate using subprocess.PIPE and stdin, stdout, stderr Idea is to manage child processes, not do multiprocessing. API is not geared towards multiprocessing, data safety etc.

Concurrency primitives in Python


Multi-Processing

Setting up a new process is an expensive operation No shared stuff, all memory has to be copied for the new process

Large data exchange (e.g. 1 list containing 1million chart points or 10 lists, each having 100000 points) will bring the performance down.

In most cases, its worthwhile only for CPU bound operations


Need to amortize the communication cost against computing done in individual processes We better have lots of work for the CPU in each process !!!

We may need to use a separate RPC mechanism


XML-RPC

Well supported on most platforms and languages Created to do web services (e.g: blip.tv getting video stream from youtube.com) Light weight and higher performing than its XML cousin. Almost ever type in a JSON stream maps directly into some type in Python / C++ Python 2.6+ has the parser and dumper in the standard library as json module

JSON-RPC (JavaScript Object Notation)


Concurrency primitives in Python

Distributed Computing

Concurrency primitives in Python


Distributed Computing:

Distribute work across multiple compute resources (local or remote) Essentially gives handle to a remote process Depends on RPC mechanisms Prior examples: CORBA, COM/DCOM, Java RMI (Remote Method Invocation) Basic Idea:

Create a (remote or local) object that acts as a server and call methods on it Transparently Manage : method calling, error handling, return value ordering and security aspects User sees just a function call. Lot of magic happens under the hood.

PyRO Python Remote Objects


The first known distributed computing framework in Python. Found cryptic by many people Most people say this is the easiest route to Distributed Computing in Python PyScripter uses this to allow remote python debugging mpi4py module

RPyC Remote Python Call


MPI (Message Passing Interface)

Concurrency primitives in Python

C Extensions

Concurrency primitives in Python


C Extensions:

Write code in C/C++ functions, call them from Python using Python/C API Once inside C/C++ we can make use of what-ever concurrency method we want

The Interpreter is released/acquired by using a couple of macros in Python/C API

Py_Begin_Allow_Threads, Py_End_Allow_Threads

Python/C API can be used by:


Code it by hand use Cython to create C extensions or wrap existing C/C++ code use SWIG to wrap existing C code

One of the oldest methods to connect Python to existing C++ Makes it easier than using Python/C API

use Boost.Python to wrap existing C/C++ code use SIP to wrap existing C/C++ code

Created for PyQt

ShedSkin Python to C++ compiler


Converts python code to C++ code using static code analysis (compiler stuff !!!) Actively maintained only on Linux Very new.

Concurrency primitives in Python

Cooperative Multitasking

Concurrency primitives in Python

Cooperative Multitasking:

Distribute work across multiple agents, each one of which cooperatively yields control to the caller, under certain conditions. A Scheduler is the absolute essential aspect of this kind of concurrency method

Coroutines

: Python generators turned inside-out

Stackless Python : Different Python interpreter (used by Cisco and EVE online) Greenlet module : C Extension that mimics Stackless, with the standard interpreter

Asynchronous I/O + cooperative multitasking often results in high performance multitasking systems

Basically, every IO request is passed on to the underlying OS, control is passed back to the caller, an event is generated by OS when I/O completes

Asynchronous I/O is different for Unix / Linux and Windows


Unix / Linux: use Python select module to employ in built epoll mechanism Windows: use Python Windows Extensions and employ Windows OVERLAPPED I/O

Many projects in Python use greenlets + coroutines + event loops:


Eventlet, gevent, Cogen, multitask etc Most geared towards Unix/Linux, less for Windows. May require many basic libraries (socket, time.sleep, threading) to be patched to support asynchronous I/O

Concurrency primitives in Python

Alternate Python Implementations

Concurrency primitives in Python


Alternate Python implementations:

IronPython

Python implemented in C# Python source gets compiled to .Net byte code Source level compatibility with Python2.6 Provides access to all .Net internals (or Mono on Unix / Linux) Allows user to employ whatever concurrency method .Net provides Single threaded performance << CPython

Jython

Python implemented in Java Python source gets compiled to Java byte code Source level compatibility with Python 2.5.2 Provides access to the entire Java ecosystem Allows users to employ all concurrency primitives that Java provides Single threaded performance < CPython

Once stuck with one of them, it would be difficult to move across platforms !!!

Concurrency primitives in Python

Moral of the Story:


There is no single best way. We cannot follow a blanket approach. We have to work with available options on a case-by-case basis.

Concurrency primitives in Python


Lets Use (as a proposal)
1.

Threads for heavily I/O bound operations Processes for heavy computations.

1.

multiprocessing subprocess with JsonRPC RPyC

1.

If the data passed between processes is large

Lets use Cython to convert the computationally intensive code to C And remember to release the GIL once we are inside C

1.

Mix 2nd and 3rd approach

1.

Create a library that employs Async I/O and Python Coroutines for having simple concurrency within our systems.

Concurrency primitives in Python

Q&A

Das könnte Ihnen auch gefallen