Beruflich Dokumente
Kultur Dokumente
and have the same image value. Since the graph under consideration is derived from a rectangular grid, we can distribute it over the N processes by splitting it in equally sized slices. We have decided to distribute the image on the last coordinate of a pixel. It follows that we split the image in (almost) equally.
I. INTRODUCTION
The evident goal of the use of parallel computers is to speed up computations by using multiple CPUs, or to perform larger computations which are not possible on a single processor machine. In general we can divide parallel computing architectures in two main classes with respect to memory layout. The first is the class of distributed memory machines. The second class is that of shared memory machines.
C. Data Distribution
For large classes of image processing tasks, the input image data required to compute a given portion of the output is spatially localized. In the simplest case, an output image is computed simply by independently processing single pixels of the input image. More dependent generally, a neighborhood (or window) of pixels from the input image is used to compute an output pixel. Hence, the output pixels can be computed independently and in parallel. This high degree of natural parallelism can be easily exploited by parallel algorithms. In fact, many image processing routines can achieve near linear speedup with the addition of processing nodes (over a reasonable number of nodes). A fine-grained parallel decomposition of a window operator based image processing algorithm assigns an output pixel per processor and assign the necessary windowed data required for each output pixel to the corresponding processors. Each processor would perform the necessary computations for their output pixel. A coarse-grained decomposition (suitable for MIMD or SPMD parallel environments) would assign large contiguous regions of the output image to each of a small number of processors. Each processor performs the appropriate window based
A. Image Processing
In this section the application to image processing is focused. We first consider a grey-scale image represented by a two-dimensional integer-valued array im(H,W), where H and W are the height and the width of the image, respectively. The first coordinate (x) denotes the row number (scan line), while the second coordinate (y) denotes the number of the column. Since grey-scale images are discriminations of real black-and-white photographs there is an implicit underlying grid. We consider the case of 4-connectivity, meaning that pixels (except for boundary pixels) have four neighbors (north, east, south, west). Two neighboring pixels that have the same image value, are considered to be in the same connected component. So, the graph considered has the pixels as nodes, and two pixels are connected if they are neighbors
International Conference on Computer, Communication and Electrical Technology ICCCET 2011, 18th & 19th March, 2011
operations to its own region of the image. Appropriate overlapping regions of the image would be assigned to properly accommodate the window operators at the image boundaries.
computation. Instead, the manager only performs the data distribution and load balancing schemes.
D. Message Passing
One of the challenges in developing a parallel image processing library is making it portable to the various (and diverse) types of parallel hardware that are available (both now and in the future). In order to make parallel code portable, it is important to incorporate a model of parallelism that is used by a large number of potential target architectures. The most widely used and well understood paradigm for the implementation of parallel programs on distributed memory architectures is that of message passing. Several message passing libraries are available in the public domain, including, Parallel Virtual Machine (PVM), PICL, and Zipcode. In 1994, a core of library routines (influenced strongly by existing libraries) was standardized in the Message Passing Interface (MPI). Public domain implementations of MPI are widely available. More importantly, all vendors of parallel machines and high-end workstations provide native versions of MPI optimized for their hardware.
E. Execution Model
The PIPT uses a manager/worker scheme in which a manager process reads an image file from disk, partitions it into equally sized slices, and sends the pieces to worker processes. The worker programs invoke a specified image processing routine to process their slices and then send the processed slices back to the manager. The manager reassembles the processed slices to create a final processed output image. Since the PIPT's internal image format stores rows of pixels contiguously in memory, slices are likewise composed of rows of pixels from the original image. While the manager/worker model is not scalable to large numbers of nodes, it does scale well over machine sizes that most users are likely to have at their disposal (the present model is effective to at least 32 nodes). Even without a designated manager, the disk (and perhaps visualization) I/O of the images already represents a bottleneck which would not be alleviated by using a different model. The workers would still have to read their respective slices from a networked file system, which would effectively be serialized by the file server. Scalability to very large numbers of nodes would thus require a parallel file system as well as a manager-less collection of workers. Although commercial and public domain parallel file systems exist, making a parallel file system a prerequisite for the PIPT introduces complications that, for most users, are unnecessary. It should be noted that the manager does not actually process any of the image slices; they are all distributed to the workers for
A. Parallelism
Although performance of (sequential) general purpose computers continues to increase steadily every year (Moores Law stating that the number of transistors on a chip doubles every 18 months still holds, there are still arguments that drive the research in the area of parallel computing to identify and exploit parallelism in applications. Firstly, it should be realized that due to physical limitations (i.e. the speed of light and physical limitations on miniaturization) the performance will be limited at some point in the future. Secondly, computing power demands of applications tend to grow as more computing power becomes available. One way of indicating parallelism is by looking at the data and control.
B. Data parallelism:
This means data is processed where the processing itself is identical, or very similar, for all data to be processed and no dependence relation exists between the data being processed. For example, threshold an image is a data parallel operation; all pixels in the image are independently compared to a threshold value and then set to a background or foreground value.
C. Control parallelism:
Here the parallelism is present in different independent control flows that are present concurrently. An example is the independent processing of a single image
99
International Conference on Computer, Communication and Electrical Technology ICCCET 2011, 18th & 19th March, 2011
where there is no relation or dependence; one operation could for instance transpose the image while the other operation could calculate the histogram of the image. This also referred to as task parallelism. Another way of looking at parallelism is to consider the so called grain size of the parallelism, indicating the size of the parallel operations. We then can distinguish fine-grain and coarse grain parallelism. Fine-grain is the kind of parallelism where the actions that can be performed in parallel are numerous and relatively small. With coarsegrain parallelism the number of parallel actions is much smaller and the actions themselves are bigger (in code and data). Performing small independent operations in parallel on pixels of an image is an example of fine-grain parallelism. Applying a filter algorithm to two separate images in parallel is an example of coarse-grain parallelism.
100
International Conference on Computer, Communication and Electrical Technology ICCCET 2011, 18th & 19th March, 2011
Fig: 4.2 Output images obtained for processing multi dimensional analysis SIMULATION RESULTS Fig: 4.4 plot for Number of Process v/s Computational time The processes analysis for the two systems is obtained by applying the single input image tothe varying number of processes. From the graph obtained it is n that the computational time for the sequentially processing system is more when compared with the parallel processing system.
Fig: 4.3 plot for Image Dimension v/s Computational time The dimension analysis for the two systems is obtained by applying the varying the dimensions of the inputimages tothe single process. From the graph obtained it is seen that the computational time for the sequentially processing system is more when compared with the parallel processing system.
Fig: 4.5 plot for System Performance v/s time Figure shown above illustrates the system performance level obtained for the two implemented systems namely sequentially process system (SPS) ,parallelly process system(PPS). The system performance level for the two systems is obtained by varying the image dimensions and the number of the processes. Performance = [(no of processes)*(total image dimen)]/ (total computation time)
101
International Conference on Computer, Communication and Electrical Technology ICCCET 2011, 18th & 19th March, 2011
From the graph obtained it is n that the sequential performance is degraded when compared with the parallel performance.
REFERENCES [1] K. Hwang. Advanced Computer Architecture: Parallelism,Scalability, Programmability.McGraw-Hill Inc., 1993. [2] J. Mundy et al. Image Understanding Environment Overview Document, April 1994 [3] C.C. Weems. Architectural Requirements of Image Understanding With Respect to Parallel Processing.Proceedings of the IEEE, 79(4):537 547 April 1991. [4] C.C.Weems, S.P. Levitan, A.R. Hanson, E.M. Riseman, D.B. Shu, and J.G. Nash. The Image Understanding Architecture.Intl. Journal of Computer Vision, 2(3):251282, January 1989. [5]K.E.Batcher. Design of a Massively Parallel Processor IEEE Transactions onComputers,C-29(9);836840,September 1999 [6] V. S. Menon, A. E. Trefethen: Multi MATLAB: Integrating MATLABwith High-Performance Parallel Computing,Supercomputing' 97ACM SIGARCH and IEEE Computer Society, 1997. [7] Yung-Lin Liu, Hau-Yang Cheng, Chung-Ta King, High performance Computing on networks of workstations through the exploitation of function parallelism, Journal of Systems Architecture 45 , 1307-1321,1999. [8] N.Carriero and D.Gelernter, How to write parallel programs:A first course, MIT Press,Cambridge,MA,1990. [9] B.W.Kernighan and D.M.Ritchie, The C Programming Language, Prentice Hall, 1978 [10] M. Auguin and F. Boeri. The OPSILA computer. In M. Consard,editor, Parallel Languages and Architectures, pages 143153. Elsevier Science Publishers, Holland,1996. [11] S. Borkar et al. iWarp: An Integrated Solution to High-Speed ParallelComputing. In Proceedings of Supercomputing 88, pages 330339.ACM SIGARCH, January 1989. [12] M.Moris Mano, Computer SysyemArchitecture,PHIpublication,ed-3rd, 2001. [13] R.C.Gonzalez, R.E.woods, DIGITAL IMAGE PROCESSINGUSING MATLAB ,Pearson Edition,2004. [14] Bala Guru Swamy, C PROGRAMMING LANGUAGE,PHI publication ,2002. [15] A.K.Jain, FUNDAMENTALS OF DIGITAL IMAGEPROCESSING ,PHI publication ,2003
Fig: 4.6 Plots for Accuracy Figure shown above illustrates the error level obtained for the two implemented systems namely sequentially process system(SPS) ,parallelly process system(PPS).From the graph obtained it is seen that the accuracy level for the two system remain almost similar with the variation in the image dimension. V. CONCLUSION In this paper an enhanced pipelined, SIMD-MIMD architecture is developed for parallel processing of single and multiple input informations over a single and multiple processors. The designed architecture is integration of image processing tool box with developed linking libraries and operation layer. The proposed system is evaluated for its efficiency and accuracy over various parameters such as dimension of the image, complexity of processing and the error level on the retrieved data. From the observation made for the implemented system it is observed that present sequential system is similar in performance at lower bound but degrades with increase in input level. An average of 40 to 65% enhancement is observed for parallel computing system over existing sequential system.
102