Parallel Operating SIMD-MIMD

International Conference on Computer, Communication and Electrical Technology ICCCET 2011, 18th & 19th March, 2011
Realization of a parallel operating SIMD-MIMD architecture for image processing application

Y.Krishnakumar1 T.D.Prasad2, K.V.S.Kumar2, P.Raju2, B.Kiranmai2 Asst. Professor(s) Dept of ECE AITAM College of Engg.1, GITAM University2
Abstract--Typical real time computer vision tasks require huge amount of processing power and time for handling real time computer vision applications. Parallel processing founds to be the only solution to obtain the require processing speed for handling high-speed image processing applications. Generally SIMD architecture is suitable under low level processing while MIMD architecture is suitable for high-level processing. This paper works on realization of a parallely operating SIMDMIMD architecture for image processing applications. The project is isolated into two levels firstly, with the creation of a programming model for supporting parallel processing and secondly, with the creation of image processing algorithms for simulation framework. The programming model is to be created using kernel program realized using C language. The image processing algorithms are to be realized on Matlab platform and linked with the developed support for parallel processing.
and have the same image value. Since the graph under consideration is derived from a rectangular grid, we can distribute it over the N processes by splitting it in equally sized slices. We have decided to distribute the image on the last coordinate of a pixel. It follows that we split the image in (almost) equally.
B. Parallel Image Processing

The structure of the computational tasks in many low-level and mid-level image processing routines readily suggests a natural parallel programming approach. On either a fine- or coarse-grain architecture, the most natural approach is for each computational node to be responsible for computing the output image at a spatially compact set of image locations. This generally will minimize the amount of data which needs to be distributed to the individual nodes and therefore minimize the overall communication cost. This section discusses the approach for data distribution utilized in the toolkit and the message passing system used to implement it.
I. INTRODUCTION
The evident goal of the use of parallel computers is to speed up computations by using multiple CPUs, or to perform larger computations which are not possible on a single processor machine. In general we can divide parallel computing architectures in two main classes with respect to memory layout. The first is the class of distributed memory machines. The second class is that of shared memory machines.
C. Data Distribution
For large classes of image processing tasks, the input image data required to compute a given portion of the output is spatially localized. In the simplest case, an output image is computed simply by independently processing single pixels of the input image. More dependent generally, a neighborhood (or window) of pixels from the input image is used to compute an output pixel. Hence, the output pixels can be computed independently and in parallel. This high degree of natural parallelism can be easily exploited by parallel algorithms. In fact, many image processing routines can achieve near linear speedup with the addition of processing nodes (over a reasonable number of nodes). A fine-grained parallel decomposition of a window operator based image processing algorithm assigns an output pixel per processor and assign the necessary windowed data required for each output pixel to the corresponding processors. Each processor would perform the necessary computations for their output pixel. A coarse-grained decomposition (suitable for MIMD or SPMD parallel environments) would assign large contiguous regions of the output image to each of a small number of processors. Each processor performs the appropriate window based
A. Image Processing
In this section the application to image processing is focused. We first consider a grey-scale image represented by a two-dimensional integer-valued array im(H,W), where H and W are the height and the width of the image, respectively. The first coordinate (x) denotes the row number (scan line), while the second coordinate (y) denotes the number of the column. Since grey-scale images are discriminations of real black-and-white photographs there is an implicit underlying grid. We consider the case of 4-connectivity, meaning that pixels (except for boundary pixels) have four neighbors (north, east, south, west). Two neighboring pixels that have the same image value, are considered to be in the same connected component. So, the graph considered has the pixels as nodes, and two pixels are connected if they are neighbors
978-1-4244-9394-4/11/$26.00 2011 IEEE 98
operations to its own region of the image. Appropriate overlapping regions of the image would be assigned to properly accommodate the window operators at the image boundaries.
computation. Instead, the manager only performs the data distribution and load balancing schemes.
II. PARALLEL IMAGE PROCESSING

A parallel systemis defined as a set of processing units (PUs) or processors, where, that can work cooperatively, in parallel, on an application. An execution modeof a system is the way the data is processed (e.g. in parallel). A homogeneous parallel systemis a parallel system where all (data) processors are identical and have only one execution mode. A heterogeneous parallel systemis a parallel system where (data) processors are of two or more different types and/or where they have different execution modes. Pipeliningis an implementation technique whereby multiple instructions are overlapped in execution. Multiple-issue processorsissue (start) multiple instructions in a clock cycle. A superscalar processor can issue a varying numbers of instructions per clock cycle. AVery Long Instruction Word (VLIW) processor issues a fixed number of instructions formatted either as one large instruction or as a fixed instruction packet.
D. Message Passing
One of the challenges in developing a parallel image processing library is making it portable to the various (and diverse) types of parallel hardware that are available (both now and in the future). In order to make parallel code portable, it is important to incorporate a model of parallelism that is used by a large number of potential target architectures. The most widely used and well understood paradigm for the implementation of parallel programs on distributed memory architectures is that of message passing. Several message passing libraries are available in the public domain, including, Parallel Virtual Machine (PVM), PICL, and Zipcode. In 1994, a core of library routines (influenced strongly by existing libraries) was standardized in the Message Passing Interface (MPI). Public domain implementations of MPI are widely available. More importantly, all vendors of parallel machines and high-end workstations provide native versions of MPI optimized for their hardware.
E. Execution Model
The PIPT uses a manager/worker scheme in which a manager process reads an image file from disk, partitions it into equally sized slices, and sends the pieces to worker processes. The worker programs invoke a specified image processing routine to process their slices and then send the processed slices back to the manager. The manager reassembles the processed slices to create a final processed output image. Since the PIPT's internal image format stores rows of pixels contiguously in memory, slices are likewise composed of rows of pixels from the original image. While the manager/worker model is not scalable to large numbers of nodes, it does scale well over machine sizes that most users are likely to have at their disposal (the present model is effective to at least 32 nodes). Even without a designated manager, the disk (and perhaps visualization) I/O of the images already represents a bottleneck which would not be alleviated by using a different model. The workers would still have to read their respective slices from a networked file system, which would effectively be serialized by the file server. Scalability to very large numbers of nodes would thus require a parallel file system as well as a manager-less collection of workers. Although commercial and public domain parallel file systems exist, making a parallel file system a prerequisite for the PIPT introduces complications that, for most users, are unnecessary. It should be noted that the manager does not actually process any of the image slices; they are all distributed to the workers for
A. Parallelism
Although performance of (sequential) general purpose computers continues to increase steadily every year (Moores Law stating that the number of transistors on a chip doubles every 18 months still holds, there are still arguments that drive the research in the area of parallel computing to identify and exploit parallelism in applications. Firstly, it should be realized that due to physical limitations (i.e. the speed of light and physical limitations on miniaturization) the performance will be limited at some point in the future. Secondly, computing power demands of applications tend to grow as more computing power becomes available. One way of indicating parallelism is by looking at the data and control.
B. Data parallelism:
This means data is processed where the processing itself is identical, or very similar, for all data to be processed and no dependence relation exists between the data being processed. For example, threshold an image is a data parallel operation; all pixels in the image are independently compared to a threshold value and then set to a background or foreground value.
C. Control parallelism:
Here the parallelism is present in different independent control flows that are present concurrently. An example is the independent processing of a single image
99
where there is no relation or dependence; one operation could for instance transpose the image while the other operation could calculate the histogram of the image. This also referred to as task parallelism. Another way of looking at parallelism is to consider the so called grain size of the parallelism, indicating the size of the parallel operations. We then can distinguish fine-grain and coarse grain parallelism. Fine-grain is the kind of parallelism where the actions that can be performed in parallel are numerous and relatively small. With coarsegrain parallelism the number of parallel actions is much smaller and the actions themselves are bigger (in code and data). Performing small independent operations in parallel on pixels of an image is an example of fine-grain parallelism. Applying a filter algorithm to two separate images in parallel is an example of coarse-grain parallelism.
Fig: (a) Architecture Overview diagram
III. DESIGN APPROACH

The parallel computing functionality was developed keeping in view the processor intensive tasks such as performing s-fold cross-validation on a large amount of data in MATLAB.The basic operations that needed to be performed were setting up the server farm which would be an array of active processors running the remote MATLAB engine. Other operations included distribution of data to the remote hosts in the farm, execution of commands on the remote data, fetching of the result after the data had been processed and finally after the tasks were done, shutting down the farm of active processor nodes. The aim was to use the parallel functionality transparently from within the MATLAB environment. This meant that programming would be done using MATLAB M-files which would utilize the parallel functionality in the form of special functions to achieve parallelism. The Figure (a) shows the overview of the architecture stack. A collection of routines was developed based on the concept of Master/Slave paradigm to do parallel programming in MATLAB. Keeping in mind the task of sfold cross validation, this approach was suitable, as there needed to be minimum inter-slave Communication. Data could be distributed to and processed separately by each slaveprocess and results retrieved at the end. Figure 4.2 depicts the Master/Slave paradigmgraphically.
Fig :( b) Architecture based on Master/ Slave Paradigm
IV. RESULT ANALYSIS INPUT CONSIDERATIONS
Fig: 4.1Input images considered for multi dimensional analysis
100
Fig: 4.2 Output images obtained for processing multi dimensional analysis SIMULATION RESULTS Fig: 4.4 plot for Number of Process v/s Computational time The processes analysis for the two systems is obtained by applying the single input image tothe varying number of processes. From the graph obtained it is n that the computational time for the sequentially processing system is more when compared with the parallel processing system.
Fig: 4.3 plot for Image Dimension v/s Computational time The dimension analysis for the two systems is obtained by applying the varying the dimensions of the inputimages tothe single process. From the graph obtained it is seen that the computational time for the sequentially processing system is more when compared with the parallel processing system.
Fig: 4.5 plot for System Performance v/s time Figure shown above illustrates the system performance level obtained for the two implemented systems namely sequentially process system (SPS) ,parallelly process system(PPS). The system performance level for the two systems is obtained by varying the image dimensions and the number of the processes. Performance = [(no of processes)*(total image dimen)]/ (total computation time)
101
From the graph obtained it is n that the sequential performance is degraded when compared with the parallel performance.
REFERENCES [1] K. Hwang. Advanced Computer Architecture: Parallelism,Scalability, Programmability.McGraw-Hill Inc., 1993. [2] J. Mundy et al. Image Understanding Environment Overview Document, April 1994 [3] C.C. Weems. Architectural Requirements of Image Understanding With Respect to Parallel Processing.Proceedings of the IEEE, 79(4):537 547 April 1991. [4] C.C.Weems, S.P. Levitan, A.R. Hanson, E.M. Riseman, D.B. Shu, and J.G. Nash. The Image Understanding Architecture.Intl. Journal of Computer Vision, 2(3):251282, January 1989. [5]K.E.Batcher. Design of a Massively Parallel Processor IEEE Transactions onComputers,C-29(9);836840,September 1999 [6] V. S. Menon, A. E. Trefethen: Multi MATLAB: Integrating MATLABwith High-Performance Parallel Computing,Supercomputing' 97ACM SIGARCH and IEEE Computer Society, 1997. [7] Yung-Lin Liu, Hau-Yang Cheng, Chung-Ta King, High performance Computing on networks of workstations through the exploitation of function parallelism, Journal of Systems Architecture 45 , 1307-1321,1999. [8] N.Carriero and D.Gelernter, How to write parallel programs:A first course, MIT Press,Cambridge,MA,1990. [9] B.W.Kernighan and D.M.Ritchie, The C Programming Language, Prentice Hall, 1978 [10] M. Auguin and F. Boeri. The OPSILA computer. In M. Consard,editor, Parallel Languages and Architectures, pages 143153. Elsevier Science Publishers, Holland,1996. [11] S. Borkar et al. iWarp: An Integrated Solution to High-Speed ParallelComputing. In Proceedings of Supercomputing 88, pages 330339.ACM SIGARCH, January 1989. [12] M.Moris Mano, Computer SysyemArchitecture,PHIpublication,ed-3rd, 2001. [13] R.C.Gonzalez, R.E.woods, DIGITAL IMAGE PROCESSINGUSING MATLAB ,Pearson Edition,2004. [14] Bala Guru Swamy, C PROGRAMMING LANGUAGE,PHI publication ,2002. [15] A.K.Jain, FUNDAMENTALS OF DIGITAL IMAGEPROCESSING ,PHI publication ,2003
Fig: 4.6 Plots for Accuracy Figure shown above illustrates the error level obtained for the two implemented systems namely sequentially process system(SPS) ,parallelly process system(PPS).From the graph obtained it is seen that the accuracy level for the two system remain almost similar with the variation in the image dimension. V. CONCLUSION In this paper an enhanced pipelined, SIMD-MIMD architecture is developed for parallel processing of single and multiple input informations over a single and multiple processors. The designed architecture is integration of image processing tool box with developed linking libraries and operation layer. The proposed system is evaluated for its efficiency and accuracy over various parameters such as dimension of the image, complexity of processing and the error level on the retrieved data. From the observation made for the implemented system it is observed that present sequential system is similar in performance at lower bound but degrades with increase in input level. An average of 40 to 65% enhancement is observed for parallel computing system over existing sequential system.
102

Parallel Operating SIMD-MIMD

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Parallel Operating SIMD-MIMD

Hochgeladen von

Copyright:

Verfügbare Formate

International Conference on Computer, Communication and Electrical Technology ICCCET 2011, 18th & 19th March, 2011

Realization of a parallel operating SIMD-MIMD architecture for image processing application

B. Parallel Image Processing

978-1-4244-9394-4/11/$26.00 2011 IEEE 98

II. PARALLEL IMAGE PROCESSING

Fig: (a) Architecture Overview diagram

III. DESIGN APPROACH

Fig :( b) Architecture based on Master/ Slave Paradigm

IV. RESULT ANALYSIS INPUT CONSIDERATIONS

Fig: 4.1Input images considered for multi dimensional analysis

Das könnte Ihnen auch gefallen