Sie sind auf Seite 1von 33

August 2010 Master of Computer Application (MCA) Semester 6 MC0086 Digital Image Processing 4 Credits

(Book ID: B1007)

Assignment Set 1 (60 Marks)

1. Explain the following: A) Fundamental Steps in Digital Image Processing Processing of digital image involves the following steps to be carried out in a sequence: Image acquisition, Image enhancement, Image restoration, Color image processing, Wavelets and Multiresolution processing, Compression, Morphological processing, Segmentation, Representation with description and finally Object recognition. Image acquisition is the first process. To do so requires an imaging sensor and the capability to digitize the signal produced by the sensor. The sensor could be a monochrome or a color TV camera that produces an entire image of the problem domain every 1/30 seconds. The imaging sensor could also be a line-scan camera that produces a single image line at a time. If the output of the camera or other imaging sensor is not already in digital form, an analog-todigital converter digitizes it. Note that acquisition could be as being given an image that is already in digital form. Generally, the image acquisition stage involves preprocessing, such as scaling. Image enhancement is one of the simplest and most appealing areas of digital image processing. Basically, the idea behind enhancement techniques is to bring out detail that is obscured, or simply to highlight certain features of interest in an image. A familiar example of enhancement is when we increase the contrast of an image because it looks better. It is important to keep in mind that enhancement is a very subjective area of image processing. Image restoration is an area that also deals with improving the appearance of an image. However, unlike enhancement, which is subjective, image restoration is objective, in the sense that restoration techniques tend to be based on mathematical or probabilistic models of image degradation. Color image processing is an area that has been gaining in importance because of the significant increase in the use of digital images on the Internet. Color is used as the basis for extracting features of interest in an image. Wavelets are the foundation for representing images in various degrees of resolution. In particular, this is used for image data compression and for pyramidal representation, in which images are subdivided successively into smaller regions. Compression deals with techniques for reducing the storage required to save an image, or the bandwidth required to transmit it. Although storage technology has improved significantly over the past decade, the same cannot be said for transmission capacity. This is true particularly in uses of the Internet, which are characterized by significant pictorial content. Image compression is familiar (perhaps inadvertently) to most users of computers in the form of image file extensions. Morphological processing deals with tools for extracting image components that are useful in the representation and description of shape. Segmentation procedures partition an image into its constituent parts or objects.

In general autonomous segmentation is one of the most difficult tasks in digital image processing. A rugged segmentation procedure brings the process a long way toward successful solution of imaging problems that require objects to be identified individually. On the other hand, weak or erratic segmentation algorithms almost always guarantee eventual failure. In terms of character recognition the key role of segmentation is to extract individual characters and words from the background. The next stage is Representation and description. Here, the first decision that must be made is whether the data should be represented as a boundary or as a complete region. Boundary representation is appropriate when the focus is on external shape characteristics, such as corners and inflections. Regional representation is appropriate when the focus is on internal properties, such as texture or skeletal shape. Choosing a representation is only part of the solution for transforming raw data into a form suitable for subsequent computer processing. Description, also called feature selection, deals with extracting attributes that result in some quantitative information of interest or are basic for differentiating one class of objects from another. Recognition is the process that assigns a label (e.g., vehicle) to an object based on its descriptors. Knowledge about a problem domain is coded into an image processing system in the form of a knowledge database. This knowledge may be as simple as detailing regions of an image where the information of interest is known to be located, thus limiting the search that has to be conducted in seeking that information. The knowledge base can also be quite complex, such as an interrelated list of all major possible defects in a materials inspection problem or an image database containing highresolution satellite images of a region in connection with change-detection applications. In addition to guiding the operation of each processing module, the knowledge base also controls the interaction between the modules. B) Components of an Image Processing System Ans: With reference to sensing, two elements are required to acquire digital images. The first is a physical device that is sensitive to the energy radiated by the object we wish to image. The second, called a digitizer, is a device for converting the output of the physical sensing device into digital form. For instance, in a digital video camera, the sensors produce an electrical output proportional to light intensity. The digitizer converts these outputs to digital data. Specialized image processing hardware usually consists of the digitizer just mentioned plus hardware that performs other primitive operations such as an arithmetic logic unit (ALU), which performs arithmetic and logical operations in parallel on entire images. This type of hardware sometimes is called a from-end subsystem, and its most distinguishing characteristic is speed. In other words, this unit performs functions that require fast data throughputs (e.g., digitizing and averaging video images at 30 frames / s) that the typical main computer cannot handle.) The computer in an image processing system is general-purpose computer and can range from a PC to a supercomputer. Software for image processing consists of specialized modules that perform specific tasks. A well-designed package also includes the capability for the user to write code that, as a minimum, utilizes the specialized modules. More sophisticated software packages allow the integration of those modules and general-purpose software commands from at least one computer language. Mass storage capability is a must in image processing applications. An image of size 1024X1024 pixels, in which the intensity of each pixel is an 8-bit quantity, requires one megabyte of storage space if the image is not compressed. When dealing with thousands, or even millions, of images, providing adequate storage in an image processing system can be a challenge. Digital storage for image

processing applications falls into three principal categories: (1) short-term storage for use during processing. (2) On-line storage for relatively fast re-call, and (3) archival storage, characterized by infrequent access. Storage is measured in bytes (eight bits), Kbytes (one thousand bytes), Mbytes (one million bytes), Gbytes (meaning giga, or one billion, bytes), and Tbytes (meaning tera, or one trillion, bytes). Image displays in use today are mainly color (preferably flat screen) TV monitors. Monitors are driven by the outputs of image and graphics display cards that are an integral part of the computer system. Seldom are there requirements for image display applications that cannot be met by display cards available commercially as part of the computer system. In some cases, it is necessary to have stereo displays, and these are implemented in the form of headgear containing two small displays embedded in goggles worn by the user. Hardcopy devices for recording images include laser printers, film camera, heat-sensitive devices, inkjet units, and digital units, such as optical and CD-ROM disks. Film provides the highest possible resolution, but paper is the obvious medium of choice for written material. For presentations, images are displayed on film transparencies or in a digital medium if image projection equipment is used. The latter approach is gaining acceptance as the standard for image presentations. Networking is almost a default function in any computer system in use today. Because of the large amount of data inherent in image processing applications, the key consideration in image transmission is bandwidth. In dedicated networks, this typically is not a problem, but communications with remote sites via the Internet are not always as efficient. Fortunately, this situation is improving quickly as a result of optical fiber and other broadband technologies.

2. Explain the following: A) Light and the Electromagnetic Spectrum Ans: In 1666, Sir Isaac Newton discovered that when a beam of sunlight is passed through a glass prism, the emerging beam of light is not white but consists instead of a continuous spectrum of colors ranging from violet at one end to red at the other. The range of colors we perceive in visible light represents a very small portion of the electromagnetic spectrum. On one end of the spectrum are radio waves with wavelengths billions of times longer than those of visible light. At the other end of the spectrum are gamma rays with wavelengths millions of times smaller than those of visible light. The electromagnetic spectrum can be expressed in terms of wavelength, frequency, or energy. Wavelength () and frequency () are related by the expression = c / (2.3-1) where c is the speed of light (2.998*108 m/s).The energy of the various components of the electromagnetic spectrum is given by the expression

E =h (2.3-2) where h is Plancks constant. The units of wavelength are meters, with the terms microns (denoted m and equal to 106 m) and nanometers (109 m) being used frequently. Frequency is measured in Hertz (Hz), with one Hertz being equal to one cycle of a sinusoidal wave per second. Light is a particular type of electromagnetic radiation that can be seen and sensed by the human eye. The visible band of the electromagnetic spectrum spans the range from approximately 0.43 m (violet) to about 0.79 m (red).For convenience, the color spectrum is divided into six broad regions: violet, blue, green, yellow, orange, and red. No color (or other component of the electromagnetic spectrum) ends abruptly, but rather each range blends smoothly into the next. The colors that humans perceive in an object are determined by the nature of the light reflected from the object. A body that reflects light and is relatively balanced in all visible wavelengths appears white to the observer. However, a body that favors reflectance in a limited range of the visible spectrum exhibits some shades of color. For example, green objects reflect light with wavelengths primarily in the 500 to 570 nm range while absorbing most of the energy at other wavelengths. Light that is void of color is called achromatic or monochromatic light. The only attribute of such light is its intensity, or amount. The term gray level is generally used to describe monochromatic intensity because it ranges from black, to grays, and finally to white. Chromatic light spans the electromagnetic energy spectrum from approximately 0.43 to 0.79 m, as noted previously. Three basic quantities areused to describe the quality of a chromatic light source: radiance; luminance; and brightness. Radiance is the total amount of energy that flows from the light source, and it is usually measured in watts (W). Luminance, measured in lumens (lm), gives a measure of the amount of energy an observer perceives from a light source. For example, light emitted from a source operating in the far infrared region of the spectrum could have significant energy (radiance), but an observer would hardly perceive it; its luminance would be almost zero. At the short-wavelength end of the electromagnetic spectrum, we have gamma rays and hard X-rays. Gamma radiation is important for medical and astronomical imaging, and for imaging radiation in nuclear environments. Hard (high-energy) X-rays are used in industrial applications. Moving still higher in wavelength, we encounter the infrared band, which radiates heat, a fact that makes it useful in imaging applications that rely on heat signatures. The part of the infrared band close to the visible spectrum is called the near-infrared region. The opposite end of this band is called the far-infrared region. This latter region blends with the microwave band. This band is well known as the source of energy in microwave ovens, but it has many other uses, including communication and radar. Finally, the radio wave band encompasses television as well as AM and FM radio. In the higher energies, radio signals emanating from certain stellar bodies are useful in astronomical observations.

B) Image Sensing and Acquisition Ans: The types of images are generated by the combination of an illumination source and the reflection or absorption of energy from that source by the elements of the scene being imaged. For example, the illumination may originate from a source of electromagnetic energy such as radar, infrared, or X-ray

energy. But, as noted earlier, it could originate from less traditional sources, such as ultrasound or even a computer-generated illumination pattern. Similarly, the scene elements could be familiar objects, but they can just as easily be molecules, buried rock formations, or a human brain. We could even image a source, such as acquiring images of the sun. Depending on the nature of the source, illumination energy is reflected from, or transmitted through, objects. An example in the first category is light reflected from a planar surface. An example in the second category is when X-rays pass through a patients body for the purpose of generating a diagnostic X-ray film. In some applications, the reflected or transmitted energy is focused onto a photo converter (e.g., a phosphor screen), which converts the energy into visible light. a principal sensor arrangement used to transform illumination energy into digital images. Incoming energy is transformed into a voltage by the combination of input electrical power and sensor material that is responsive to the particular type of energy being detected. The output voltage waveform is the response of the sensor(s), and a digital quantity is obtained from each sensor by digitizing its response. In this section, we look at the principal modalities for image sensing and generation. 2.4.1 Image Acquisition using a Single Sensor the components of a single sensor. The most common sensor of this type is the photodiode, which is constructed of silicon materials and whose output voltage waveform is proportional to light. The use of a filter in front of a sensor improves selectivity. For example, a green (pass) filter in front of a light sensor favors light in the green band of the color spectrum. As a consequence, the sensor output will be stronger for green light than for other components in the visible spectrum.In order to generate a 2-D image using a single sensor, there have to be relative displacements in both the x- and y-directions between the sensor and the area to be imaged. an arrangement used in highprecision scanning, where a film negative is mounted onto a drum whose mechanical rotation provides displacement in one dimension. The single sensor is mounted on a lead screw that provides motion in the perpendicular direction. Since mechanical motion can be controlled with high precision, this method is an inexpensive (but slow) way to obtain high-resolution images. Other similar mechanical arrangements use a flat bed, with the sensor moving in two linear directions. These types of mechanical digitizers sometimes are referred to as microdensitometers. Another example of imaging with a single sensor places a laser source coincident with the sensor. Moving mirrors are used to control the outgoing beam in a scanning pattern and to direct the reflected laser signal onto the sensor. 2.4.2 Image Acquisition using Sensor Strips A geometry that is used much more frequently than single sensors consists of an in-line arrangement of sensors in the form of a sensor strip. The strip provides imaging elements in one direction. Motion perpendicular to the strip provides imaging in the other direction This is the type of arrangement used in most flat bed scanners. Sensing devices with 4000 or more in-line sensors are possible. In-line sensors are used routinely in airborne imaging applications, in which the imaging system is mounted on an aircraft that flies at a constant altitude and speed over the geographical area to be imaged. Onedimensional imaging sensor strips that respond to various bands of the electromagnetic spectrum are mounted perpendicular to the direction of flight. The imaging strip gives one line of an image at a time, and the motion of the strip completes the other dimension of a two-dimensional image. Lenses or other focusing schemes are used to project the area to be scanned onto the sensors. Sensor strips mounted in a ring configuration are used in medical and industrial imaging to obtain cross-sectional (slice) images of 3-D objects. A rotating X-ray source provides illumination and the portion of the sensors

opposite the source collect the X-ray energy that pass through the object (the sensors obviously have to be sensitive to X-ray energy).This is the basis for medical and industrial computerized axial tomography (CAT) imaging. It is important to note that the output of the sensors must be processed by reconstruction algorithms whose objective is to transform the sensed data into meaningful crosssectional images. 2.4.3 Image Acquisition using Sensor Arrays Individual sensors can be arranged in the form of a 2-D array. Numerous electromagnetic and some ultrasonic sensing devices are arranged frequently in an array format. This is also the predominant arrangement found in digital cameras. A typical sensor for these cameras is a CCD array, which can be manufactured with a broad range of sensing properties and can be packaged in rugged arrays of 4000 * 4000 elements or more. CCD sensors are used widely in digital cameras and other light sensing instruments. The response of each sensor is proportional to the integral of the light energy projected onto the surface of the sensor, a property that is used in astronomical and other applications requiring low noise images. The first function performed by the imaging system is to collect the incoming energy and focus it onto an image plane. If the illumination is light, the front end of the imaging system is a lens, which projects the viewed scene onto the lens focal plane. The sensor array, which is coincident with the focal plane, produces outputs proportional to the integral of the light received at each sensor. Digital and analog circuitry sweeps these outputs and converts them to a video signal, which is then digitized by another section of the imaging system. The output is a digital image. 3. Explain the following with respect to Basic concepts in Sampling and Quantization: A) Representation of Digital Images Ans: The result of sampling and quantization is a matrix of real numbers. We will use two principal ways in this book to represent digital images. Assume that an image f(x, y) is sampled so that the resulting digital image has M rows and N columns. The values of the coordinates (x, y) now become discrete quantities. For notational clarity and convenience, we shall use integer values for these discrete coordinates. Thus, the values of the coordinates at the origin are (x, y)=(0, 0).The next coordinate values along the first row of the image are represented as (x, y)=(0, 1). It is important to keep in mind that the notation (0, 1) is used to signify the second sample along the first row. The notation used above allows us to write the complete M*N digital image in the following compact matrix form: The right side of this equation is by definition a digital image. Each element of this matrix array is called an image element, picture element, pixel, or pel. In some discussions, it is advantageous to use a more traditional matrix notation to denote a digital image and its elements: Clearly, aij = f(x=i, y=j) = f(i, j), so the above two matrices are identical matrices. Expressing sampling and quantization in more formal mathematical terms can be useful at times. Let Z and R denote the set of real integers and the set of real numbers, respectively. The sampling process may be viewed as partitioning the xy plane into a grid, with the coordinates of the center of each grid being a pair of elements from the Cartesian product Z2, which is the set of all ordered pairs of elements

Azi, zj B, with zi and zj being integers from Z. Hence, f(x, y) is a digital image if (x, y) are integers from Z2 and f is a function that assigns a gray-level value (that is, a real number from the set of real numbers, R) to each distinct pair of coordinates (x, y). This functional assignment obviously is the quantization process described earlier. If the gray levels also are integers (as usually is the case in this and subsequent chapters), Z replaces R, and a digital image then becomes a 2-D function whose coordinates and amplitude values are integers. This digitization process requires decisions about values for M, N, and for the number, L, of discrete gray levels allowed for each pixel. There are no requirements on M and N, other than that they have to be positive integers. However, due to processing, storage, and sampling hardware considerations, the number of gray levels typically is an integer power of 2: L= 2k (Equation 1) We assume that the discrete levels are equally spaced and that they are integers in the interval [0, L-1]. Sometimes the range of values spanned by the gray scale is called the dynamic range of an image, and we refer to images whose gray levels span a significant portion of the gray scale as having a high dynamic range. When an appreciable number of pixels exhibit this property, the image will have high contrast. Conversely, an image with low dynamic range tends to have a dull, washed out gray look. The number, b, of bits required to store a digitized image is b=M*N*k. (Equation 2) When M=N, this equation becomes b = N2k. (Equation 3) B) Spatial and Gray-Level Resolution Ans: Sampling is the principal factor determining the spatial resolution of an image. Basically, spatial resolution is the smallest discernible detail in an image. Suppose that we construct a chart with vertical lines of width W and with the space between the lines also having width W. A line pair consists of one such line and its adjacent space. Thus, the width of a line pair is 2W, and there are 1/2W line pairs per unit distance. A widely used definition of resolution is simply the smallest number of discernible line pairs per unit distance; for example, 100 line pairs per millimeter. Gray-level resolution similarly refers to the smallest discernible change in gray level, but, measuring discernible changes in gray level is a highly subjective process. We have considerable discretion regarding the number of samples used to generate a digital image, but this is not true for the number of gray levels. Due to hardware considerations, the number of gray levels is usually an integer power of 2, as mentioned in the previous section. The most common number is 8 bits, with 16 bits being used in some applications where enhancement of specific gray-level ranges is necessary. Sometimes we find systems that can digitize the gray levels of an image with 10 or 12 bits of accuracy, but these are exceptions rather than the rule. When an actual measure of physical resolution relating pixels and the level of detail they resolve in the original scene are not necessary, it is not uncommon to refer to an L-level digital image of size M*N as having a will use this terminology from time to time in subsequent discussions, making a reference to actual resolvable detail only when necessary for clarity. an image of size 1024*1024 pixels whose gray levels are represented by 8 bits. The results of subsampling the 1024*1024 image. The subsampling was accomplished by deleting the appropriate number of rows and columns from the original image. For example, the 512*512 image was obtained by deleting every other row and column from the

1024*1024 image. The 256*256 image was generated by deleting every other row and column in the 512*512 image, and so on. The number of allowed gray levels was kept at 256. These images show the dimensional proportions between various sampling densities, but their size differences make it difficult to see the effects resulting from a reduction in the number of samples. The simplest way to compare these effects is to bring all the subsampled images up to size 1024*1024 by row and column pixel replication. C) Aliasing and Moir Patterns Ans: Functions whose area under the curve is finite can be represented in terms of sines and cosines of various frequencies. The sine/cosine component with the highest frequency determines the highest frequency content of the function. Suppose that this highest frequency is finite and that the function is of unlimited duration (these functions are called band-limited functions), then, the Shannon sampling theorem tells us that, if the function is sampled at a rate equal to or greater than twice its highest frequency, it is possible to recover completely the original function from its samples. If the function is undersampled, then a phenomenon called aliasing corrupts the sampled image. The corruption is in the form of additional frequency components being introduced into the sampled function. These are called aliased frequencies. Note that the sampling rate in images is the number of samples taken (in both spatial directions) per unit distance. As it turns out, except for a special case discussed in the following paragraph, it is impossible to satisfy the sampling theorem in practice. We can only work with sampled data that are finite in duration. We can model the process of converting a function of unlimited duration into a function of finite duration simply by multiplying the unlimited function by a gating function that is valued 1 for some interval and 0 elsewhere. Unfortunately, this function itself has frequency components that extend to infinity. Thus, the very act of limiting the duration of a band-limited function causes it to cease being band limited, which causes it to violate the key condition of the sampling theorem. The principal approach for reducing the aliasing effects on an image is to reduce its highfrequency components by blurring the image prior to sampling. However, aliasing is always present in a sampled image. The effect of aliased frequencies can be seen under the right conditions in the form of so called Moir patterns. D) Zooming and Shrinking Digital Images. Ans: We conclude the treatment of sampling and quantization with a brief discussion on how to zoom and shrink a digital image. This topic is related to image sampling and quantization because zooming may be viewed as oversampling, while shrinking may be viewed as undersampling. The key difference between these two operations and sampling and quantizing an original continuous image is that zooming and shrinking are applied to a digital image. Zooming requires two steps: the creation of new pixel locations, and the assignment of gray levels to those new locations. Let us start with a simple example. Suppose that we have an image of size 500*500 pixels and we want to enlarge it 1.5 times to

750*750 pixels. Conceptually, one of the easiest ways to visualize zooming is laying an imaginary 750*750 grid over the original image. Obviously, the spacing in the grid would be less than one pixel because we are fitting it over a smaller image. In order to perform gray-level assignment for any point in the overlay, we look for the closest pixel in the original image and assign its gray level to the new pixel in the grid. When we are done with all points in the overlay grid, we simply expand it to the original specified size to obtain the zoomed image. This method of gray-level assignment is called nearest neighbor interpolation. Pixel replication is applicable when we want to increase the size of an image an integer number of times. For instance, to double the size of an image, we can duplicate each column. This doubles the image size in the horizontal direction. Then, we duplicate each row of the enlarged image to double the size in the vertical direction. The same procedure is used to enlarge the image by any integer number of times (triple, quadruple, and so on). Duplication is just done the required number of times to achieve the desired size. The gray-level assignment of each pixel is predetermined by the fact that new locations are exact duplicates of old locations. A slightly more sophisticated way of accomplishing gray-level assignments is bilinear interpolation using the four nearest neighbors of a point. Let (x, y) denote the coordinates of a point in the zoomed image, and let v(x, y) denote the gray level assigned to it. For bilinear interpolation, the assigned gray level is given by

where the four coefficients are determined from the four equations in four unknowns that can be written using the four nearest neighbors of point (x, y). Image shrinking is done in a similar manner as just described for zooming. The equivalent process of pixel replication is row-column deletion. For example, to shrink an image by one-half, we delete every other row and column. We can use the zooming grid analogy to visualize the concept of shrinking by a noninteger factor, except that we now expand the grid to fit over the original image, do gray-level nearest neighbor or bilinear interpolation, and then shrink the grid back to its original specified size. To reduce possible aliasing effects, it is a good idea to blur an image slightly before shrinking it.

4. Explain the following with respect to Image Enhancement: A) Edge Crispening Ans:

Psychophysical experiments indicate that a photograph or visual signal with accentuated or crispened edges is often more subjectively pleasing than an exact photometric reproduction. We will discuss Linear and Statistical differencing technique for edge crispening. 4.5.1 Linear Edge Crispening Edge crispening can be performed by discrete convolution, as defined by Eq. 4.8 in which the impulse response array H is of high-pass form. Several common high-pass masks are given below

These masks possess the property that the sum of their elements is unity, to avoid amplitude bias in the processed image. 4.5.2 Statistical Differencing Statistical differencing involves the generation of an image by dividing each pixel value by its estimated standard deviation D( j, k) according to the basic relation G( j, k) = F( j, k) / D( j, k) where the estimated standard deviation

is computed at each pixel over some W * W neighborhood where W = 2w + 1. The function M(j,k) is the estimated mean value of the original image at point (j, k), which is computed as

The enhanced image G(j,k) is increased in amplitude with respect to the original at pixels that deviate significantly from their neighbors, and is decreased in relative amplitude elsewhere. B) Color Image Enhancement: Ans: The image enhancement techniques discussed previously have all been applied to monochrome images. We will now consider the enhancement of natural color images and introduce the pseudocolor and false color image enhancement methods. Pseudocolor produces a color image from a monochrome image, while false color produces an enhanced color image from an original natural color image or from multispectral image bands. 4.6.1 Natural Color Image Enhancement The monochrome image enhancement methods described previously can be applied to natural color images by processing each color component individually. It is accomplished by intracomponent and inter-component processing algorithms. Intracomponent Processing: Typically, color images are processed in the RGB color space. This approach works quite

well for noise cleaning algorithms in which the noise is independent between the R, G and B components. Edge crispening can also be performed on an intracomponent basis, but more efficient results are often obtained by processing in other color spaces. Contrast manipulation and histogram modification intracomponent algorithms often result in severe shifts of the hue and saturation of color images. Hue preservation can be achieved by using a single point transformation for each of the three RGB components. For example, form a sum image, and then compute a histogram equalization function, which is used for each RGB component. Intercomponent Processing: The intracomponent processing algorithms previously discussed provide no means of modifying the hue and saturation of a processed image in a controlled manner. One means of doing so is to transform a source RGB image into a three component image, in which the three components form separate measures of the brightness, hue and saturation (BHS) of a color image. Ideally, the three components should be perceptually independent of one another. 4.6.2 Pseudocolor Pseudocolor is a color mapping of a monochrome image array which is intended to enhance the detectability of detail within the image. The pseudocolor mapping of an array is defined as R(j, k) = OR{F(j, k)} G(j, k) = OG{F(j, k)} B(j, k) = OB{F(j, k)} where R(j, k) , G(j, k), B(j, k) are display color components and OR{F(j, k)}, OG{F(j, k)}, OB{F(j, k)} are linear or nonlinear functional operators. This mapping defines a path in three-dimensional color space parametrically in terms of the array F(j, k). Mapping A represents the achromatic path through all shades of gray;it is the normal representation of a monochrome image. Mapping B is a spiral path through color space. Another class of pseudocolor mappings includes those mappings that exclude all shades of gray. Mapping C, which follows the edges of the RGB color cube, is such an example. 4.6.3 False color False color is a point-by-point mapping of an original color image. It is described by its three primary colors (or of a set of multispectral image planes of a scene) to a color space defined by display tristimulus values that are linear or nonlinear functions of the original image pixel values. A common intent is to provide a displayed image with objects possessing different or false colors from what might be expected. For example, blue sky in a normal scene might be converted to appear red, and green grass transformed to blue. One possible reason for such a color mapping is to place normal objects in a strange color world so that a human observer will pay more attention to the objects than if they were colored normally. Another reason for false color mappings is the attempt to color a normal scene to match the color sensitivity of a human viewer. For example, it is known that the luminance response of cones in the retina peaks in the green region of the visible spectrum. Thus, if a normally red object is false colored to appear green, it may become more easily detectable. Another psychophysical property of color vision that can be exploited is the contrast sensitivity of the eye to changes in blue light. In some situations it may be worthwhile to map the normal colors of objects with fine detail into shades of blue. In a false color mapping, the red, green and blue display color components are related to natural or multispectral images Fi by

RD = OR{F1, F2,....} GD = OG{ F1, F2, ....} BD = OB{ F1, F2,...} where OR{ }, OG{ }, OB{ } are general functional operators. As a simple example, the set of red, green and blue sensor tristimulus values (RS = F1, GS = F2, BS = F3 ) may be interchanged according to the relation. C) Multispectral Image Enhancement Ans: of a scene in order to accentuate salient features to assist in subsequent human interpretation or machine analysis. These procedures include individual image band enhancement techniques, such as contrast stretching, noise cleaning and edge crispening, as discussed earlier. Other methods involve the joint processing of multispectral image bands. Multispectral image bands can be subtracted in pairs according to the relation Dm, n(j, k) = Fm( j, k) Fn( j, k) in order to accentuate reflectivity variations between the multispectral bands. An associated advantage is the removal of any unknown but common bias components that may exist. Another simple but highly effective means of multispectral image enhancement is the formation of ratios of the image bands. The ratio image between the mth and nth multispectral bands is defined as Rm,n (j, k) = Fm(j,k) /Fn(j,k) It is assumed that the image bands are adjusted to have nonzero pixel values. In many multispectral imaging systems, the image band Fn( j, k) can be modeled by the product of an object reflectivity function Rn( j, k) and an illumination function I(j, k) that is identical for all multispectral bands. Ratioing of such imagery provides an automatic compensation of the illumination factor. The ratio Fm(j, k) / [Fn(j, k) (j, k)] for which (j, k) represents a quantization level uncertainty, can vary considerably if Fn(j, k) is small. This variation can be reduced significantly by forming the logarithm of the ratios defined by Lmnjk= logRmnjk= logFmjk logFnjk 5. Describe the following with respect to Image restoration: A) General Image Restoration Models Ans: In order to effectively design a digital image restoration system, it is necessary quantitatively to characterize the image degradation effects of the physical imaging system, the image digitizer and the image display. Basically, the procedure is to model the image degradation effects and then perform operations to undo the model to obtain a restored image. It should be emphasized that accurate image modeling is often the key to effective image restoration. There are two basic approaches to the modeling of image degradation effects: a priori modeling and a posteriori modeling. In the former case, measurements are made on the physical imaging system, digitizer and display to determine their response to an arbitrary image field. In some instances, it will be possible to model the

system response deterministically, while in other situations it will only be possible to determine the system response in a stochastic sense. The posteriori modeling approach is to develop the model for the image degradations based on measurements of a particular image to be restored. Basically, these two approaches differ only in the manner in which information is gathered to describe the character of the image degradation is a general model of a digital imaging system and restoration process. In the model, a continuous image light distribution C(x,y,t,) dependent on spatial coordinates (x, y), time (t) and spectral wavelength () is assumed to exist as the driving force of a physical imaging system subject to point and spatial degradation effects and corrupted by deterministic and stochastic disturbances. Potential degradations include diffraction in the optical system, sensor nonlinearities, optical system aberrations, film nonlinearities, atmospheric turbulence effects, image motion blur and geometric distortion. Noise disturbances may be caused by electronic imaging sensors or film granularity. In this model, the physical imaging system produces a set of output image fields FO (i) ( x ,y ,t j ) at time instant t j described by the general relation

where OP { . } represents a general operator that is dependent on the space coordinates (x, y), the time history (t), the wavelength () and the amplitude of the light distribution (C). For a monochrome imaging system, there will only be a single output field, while for a natural color imaging system, FO (i) ( x ,y ,t j ) may denote the red, green and blue tristimulus bands for i = 1, 2, 3, respectively. Multispectral imagery will also involve several output bands of data.

B) Optical system Models Ans: One of the major advances in the field of optics during the past 50 years has been the application of system concepts to optical imaging. Imaging devices consisting of lenses, mirrors, prisms and so on, can be considered to provide a deterministic transformation of an input spatial light distribution to some output spatial light distribution. Also, the system concept can be extended to encompass the spatial propagation of light through free space or some dielectric medium. In the study of geometric optics, it is assumed that light rays always travel in a straight-line path in a homogeneous medium. By this assumption, a bundle of rays passing through a clear aperture onto a screen produces a geometric light projection of the aperture. However, if the light distribution at the region between the light and dark areas on the screen is examined in detail, it is found that the boundary is not sharp. This effect is more pronounced as the aperture size is decreased. For a pinhole aperture, the entire screen appears diffusely illuminated. From a simplistic viewpoint, the aperture causes a bending of rays called diffraction. Diffraction of light can be quantitatively characterized by considering light as electromagnetic radiation that satisfies Maxwell's equations. The formulation of a complete theory of optical imaging from the basic electromagnetic principles of diffraction theory is a complex and lengthy task.

C) Photographic Process Models Ans: There are many different types of materials and chemical processes that have been utilized for photographic image recording. No attempt is made here either to survey the field of photography or to deeply investigate the physics of photography. Rather, the attempt here is to develop mathematical models of the photographic process in order to characterize quantitatively the photographic components of an imaging system. 5.4.1 Monochromatic Photography The most common material for photographic image recording is silver halide emulsion, depicted .In this material, silver halide grains are suspended in a transparent layer of gelatin that is deposited on a glass, acetate or paper backing. If the backing is transparent, a transparency can be produced, and if the backing is a white paper, a reflection print can be obtained. When light strikes a grain, an electrochemical conversion process occurs, and part of the grain is converted to metallic silver. A development center is then said to exist in the grain. In the development process, a chemical developing agent causes grains with partial silver content to be converted entirely to metallic silver. Next, the film is fixed by chemically removing unexposed grains. The photographic process described above is called a nonreversal process. It produces a negative image in the sense that the silver density is inversely proportional to the exposing light. A positive reflection print of an image can be obtained in a two-stage process with nonreversal materials. First, a negative transparency is produced, and then the negative transparency is illuminated to expose negative reflection print paper. The resulting silver density on the developed paper is then proportional to the light intensity that exposed the negative transparency. A positive transparency of an image can be obtained with a reversal type of film. 5.4.2 Color Photography Modern color photography systems utilize an integral tripack film, as to produce positive or negative transparencies. In a cross section of this film, the first layer is a silver halide emulsion sensitive to blue light. A yellow filter following the blue emulsion prevents blue light from passing through to the green and red silver emulsions that follow in consecutive layers and are naturally sensitive to blue light. A transparent base supports the emulsion layers. Upon development, the blue emulsion layer is converted into a yellow dye transparency whose dye concentration is proportional to the blue exposure for a negative transparency and inversely proportional for a positive transparency. Similarly, the green and red emulsion layers become magenta and cyan dye layers, respectively. Color prints can be obtained by a variety of processes. The most common technique is to produce a positive print from a color negative transparency onto nonreversal color paper. In the establishment of a mathematical model of the color photographic process, each emulsion layer can be considered to react to light as does an emulsion layer of a monochrome photographic material. To a first approximation, this assumption is correct. However, there are often significant interactions between the emulsion and dye layers. Each emulsion layer possesses a characteristic sensitivity, 6. Describe the following in the context of Morphological Image processing:

A) Basic operations Ans: The foundation of morphological processing is in the mathematically rigorous field of set theory. We will discuss some fundamental concepts of image set algebra which are the basis for defining the generalized dilation and erosions operators. Consider a binary-valued source image function F(j, k). A pixel at coordinate (j, k) is a member of F(j, k), as indicated by the symbol , if and only if it is a logical 1. A binary-valued image B(j, k) is a subset of a binary-valued image A(j,k), as indicated by B(j,k) A(j,k), if for every spatial occurrence of a logical 1 of A(j, k), B(j, k) is a logical 1.

A reflected image F~( j, k) is an image that has been flipped from left to right and from top to bottom. this provides an example of image complementation. Translation of an image, as indicated by the function G j k = Tr c F j k consists of spatially offsetting F( j, k) with respect to itself by r rows and c columns, where R r R and C c C. 6.2.1 Dilation With dilation, an object grows uniformly in spatial extent. Generalized dilation is expressed symbolically as Gjk=FjkHjk where F(j, k), for 1 j, k N is a binary-valued image and H(j, k) for , 1 j, k L, where L is an odd integer, is a binary-valued array called a structuring element. For notational simplicity, F(j,k) and H(j,k) are assumed to be square arrays. Generalized dilation can be defined mathematically and implemented in several ways. The Minkowski addition definition is

6.2.2 Erosion With erosion an object shrinks uniformly. Generalized erosion is expressed symbolically as Gjk= Fjk Hjkwhere H(j,k) is an odd size L * L structuring element. Generalized erosion is defined to be

6.2.3 Properties of Dilation and Erosion i. Dilation is commutative: AB=BA but in general, erosion is not commutative. AB#BA ii. Dilation and erosion are opposite in effect; dilation of the background of an object behaves like erosion of the object. This statement can be quantified by the duality relationship.

6.2.4 Close and Open Dilation and erosion are often applied to an image in concatenation. Dilation followed by erosion is called a close operation. It is expressed symbolically as Gjk= FjkHjkwhere H(j,k) is a L * L structuring element. The close operation is defined as Gjk= FjkHjk H~ jk) Closing of an image with a compact structuring element without holes (zeros), such as a square or circle, smooths contours of objects, eliminates small holes in objects and fuses short gaps between objects. B) Morphological algorithm operations on gray scale images Ans: Morphological concepts can be extended to gray scale images, but the extension often leads to theoretical issues and to implementation complexities. When applied to a binary image, dilation and erosion operations cause an image to increase or decrease in spatial extent, respectively. To generalize these concepts to a gray scale image, it is assumed that the image contains visually distinct gray scale objects set against a gray background. Also, it is assumed that the objects and background are both relatively spatially smooth. 6.5.1 Gray Scale Image Dilation and Erosion Dilation or erosion of an image could, in principle, be accomplished by hit-or-miss transformations in which the quantized gray scale patterns are examined in a 3 * 3 window and an output pixel is generated for each pattern. This approach is, however, not computationally feasible. For example, if a look-up table implementation were to be used, the table would require 272entries for 256-level quantization of each pixel. The common alternative is to use gray scale extremum operations over a 3 * 3 pixel neighborhoods. Consider a gray scale image F(j,k) quantized to an arbitrary number of gray levels. According to the extremum method of gray scale image dilation, the dilation operation is defined as

where MAX{S1, ..., S9} generates the largest-amplitude pixel of the nine pixels in the neighborhood. By the extremum method, gray scale image erosion is defined as

where MIN {S1, ..., S9}generates the smallest-amplitude pixel of the nine pixels in the 3 * 3 pixel neighborhood.

August 2010 Master of Computer Application (MCA) Semester 6 MC0086 Digital Image Processing 4 Credits
(Book ID: B1007)

Assignment Set 2 (60 Marks)

Answer all Questions Each Question carries TEN

1. Describe the following texture features of Image Extraction: A) Fourier Spectra Methods Ans: Several studies have considered textural analysis based on the Fourier spectrum of an image region, as discussed in Section 7.3. Because the degree of texture coarseness is proportional to its spatial period, a region of coarse texture should have its Fourier spectral energy concentrated at low spatial frequencies. Conversely, regions of fine texture should exhibit a concentration of spectral energy at high spatial frequencies. Although this correspondence exists to some degree, difficulties often arise because of spatial changes in the period and phase of texture pattern repetitions. Experiments have shown that there is considerable spectral overlap of regions of distinctly different natural texture, such as urban, rural and woodland regions extracted from aerial photographs. On the other hand, Fourier spectral analysis has proved successful in the detection and classification of coal miners black lung disease, which appears as diffuse textural deviations from the norm. B) Edge Detection Methods: Ans: Rosenfeld and Troy have proposed a measure of the number of edges in a neighborhood as a textural measure. As a first step in their process, an edge map array E(j, k) is produced by some edge detector such that E(j, k) = 1 for a detected edge and E(j, k) = 0 otherwise. Usually, the detection threshold is set lower than the normal setting for the isolation of boundary points. This texture measure is defined as

Where W = 2w + 1 is the dimension of the observation window. A variation of this approach is to substitute the edge gradient G(j, k) for the edge map array in Eq.6.

C) Autocorrelation Methods Ans: The autocorrelation function has been suggested as the basis of a texture measure. Although it has been demonstrated in the preceding section that it is possible to generate visually different stochastic fields with the same autocorrelation function, this does not necessarily rule out the utility of an autocorrelation feature set for natural images. The autocorrelation function is defined as

for computation over a W X W window with - T m, n T pixel lags. Presumably, a region of coarse texture will exhibit a higher correlation for a fixed shift than will a region of fine texture. Thus, texture coarseness should be proportional to the spread of the autocorrelation function. Faugeras and Pratt have proposed the following set of autocorrelation spread measures:

2. Describe the following features of Edge detection: A) Edge, Line and Spot models Ans: It is a sketch of a continuous domain, one-dimensional ramp edge modeled as a ramp increase in image amplitude from a low to a high level, or vice versa. The edge is characterized by its height, slope angle and horizontal coordinate of the slope midpoint. An edge exists if the edge height is greater than a specified value. An ideal edge detector should produce an edge indication localized to a single pixel located at the midpoint of the slope. If the slope angle of is 90, the resultant edge is called a step edge. In a digital imaging system, step edges usually exist only for artificially generated images such as test patterns and bilevel graphics data. Digital images, resulting from digitization of optical images of real scenes, generally do not possess step edges because the antialiasing low-pass filtering prior to digitization reduces the edge slope in the digital image caused by any sudden luminance change in the scene. The one-dimensional profile of a line . In the limit, as the line width w approaches zero, the resultant amplitude discontinuity is called a roof edge. The vertical ramp edge model in the contains a single transition pixel whose amplitude is at the midvalue of its neighbors. This edge model can be obtained by performing a 2 * 2 pixel moving window average on the vertical step edge model. The figure also contains two versions of a diagonal ramp edge. The single pixel transition model contains a single midvalue transition pixel between the regions of high and low amplitude; the smoothed transition model is generated by a 2 * 2 pixel moving window average of the diagonal step edge model. presents models for a discrete step and ramp corner edge. The edge location for discrete step edges is usually marked at the higher-amplitude side of an edge transition.

B) First-Order Derivative Edge Detection Ans: There are two fundamental methods for generating first-order derivative edge gradients. One method involves generation of gradients in two orthogonal directions in an image; the second utilizes a set of directional derivatives. We will be discussing the first method. 8.3.1 Orthogonal Gradient Generation An edge in a continuous domain edge segment F(x,y) can be detected by forming the continuous onedimensional gradient G(x,y) along a line normal to the edge slope, which is at an angle with respect to the horizontal axis. If the gradient is sufficiently large (i.e., above some threshold value), an edge is deemed present. The gradient along the line normal to the edge slope can be computed in terms of the derivatives along orthogonal axes according to the following

For computational efficiency, the gradient amplitude is sometimes approximated by the magnitude combination

The orientation of the spatial gradient with respect to the row axis is

The remaining issue for discrete domain orthogonal gradient generation is to choose a good discrete approximation to the continuous differentials of Eq. 8.3a. The simplest method of discrete gradient generation is to form the running difference of pixels along rows and columns of the image. The row gradient is defined as

and the column gradient is

Diagonal edge gradients can be obtained by forming running differences of diagonal pairs of pixels. This is the basis of the Roberts cross-difference operator, which is defined in magnitude form as

and in square-root form as

Prewitt has introduced a pixel edge gradient operator described by the pixel numbering The Prewitt operator square root edge gradient is defined as


where K = 1. In this formulation, the row and column gradients are normalized to provide unit-gain positive and negative weighted averages about a separated edge position. The Sobel operator edge detector differs from the Prewitt edge detector in that the values of the north, south, east and west pixels are doubled (i.e., K = 2). The motivation for this weighting is to give equal importance to each pixel in terms of its contribution to the spatial gradient.

C) Second-Order Derivative Edge Detection Ans: Second-order derivative edge detection techniques employ some form of spatial second- order differentiation to accentuate edges. An edge is marked if a significant spatial change occurs in the second derivative. We will consider Laplacian second-order derivative method. The edge Laplacian of an image function F(x,y) in the continuous domain is defined as

where, the Laplacian is

The Laplacian G(x,y) is zero if F(x,y) is constant or changing linearly in amplitude. If the rate of change of F(x,y) is greater than linear, G(x,y) exhibits a sign change at the point of inflection of F(x,y). The zero crossing of G(x,y) indicates the presence of an edge. The negative sign in the definition of Eq. 8.4a is present so that the zero crossing of G(x,y) has a positive slope for an edge whose amplitude increases from left to right or bottom to top in an image. Torre and Poggio have investigated the mathematical properties of the Laplacian of an image function. They have found that if F(x,y) meets certain smoothness constraints, the zero crossings of G(x,y) are closed curves. In the discrete domain, the simplest approximation to the continuous Laplacian is to compute the difference of slopes along each axis:

This four-neighbor Laplacian can be generated by the convolution operation


The four-neighbor Laplacian is often normalized to provide unit-gain averages of the positive weighted and negative weighted pixels in the 3 * 3 pixel neighborhood. The gain-normalized four-neighbor Laplacian impulse response is defined by

Prewitt has suggested an eight-neighbor Laplacian defined by the gain normalized impulse response array

3. Describe the following with respect to Image Segmentation: A) Detection of Discontinuities Ans: There are three basic types of discontinuities in a digital image: Points, lines and edges. In practice, the most common way to look for discontinuities is to run a mask through the image. For the 3 x 3 mask ., this procedure involves computing the sum of products of the coefficients with the gray levels contained in the region encompassed by the mask. That is, the response of the mask at any point in the image Where is the gray level of the pixel associated with mask coefficient As Wi. usual, the response of the mask is defined with respect to its center location. When the mask is centered on a boundary pixel, the response is computed by using the appropriated partial neighbor hood.

9.2.1 Point Detection The detection of isolated points in an image is straightforward. we say that a point has been detected at the location on which the mask is centered if

Where T is a nonnegative threshold and R is the response of the mask at any point in the image. Basically all that this formulation does is measure the weighted differences between the center point and its neighbors. The idea is that the gray level of an isolated point will be quite different from the gray level of its neighbors. 9.2.2 Line Detection

Line detection is an important step in image processing and analysis. Lines and edges are features in any scene, from simple indoor scenes to noisy terrain images taken by satellite. Most of the earlier methods for detecting lines were based on pattern matching. The patterns directly followed from the definition of a line. These pattern templates are designed with suitable coefficients and are applied at each point in an image. A set of such templates is shown in If the first mask were moved around an image, it would respond more strongly to lines oriented horizontally. With constant background, the maximum response would result when the line passed through the middle row of the mask. This is easily verified by sketching a simple array of 1s with a line of a different gray level running horizontally through the array. A similar experiment would reveal that the second mask in responds best to lines oriented at +45; the third mask to vertical lines; and the fourth mask to lines in the 45 direction. These directions can also be established by noting that the preferred direction of each mask is weighted with larger coefficient i.e., 2 than other possible directions. Let R1, R2, R3 and R4 denote the responses of the masks in from left to right, where the Rs are given by equation 9.2. Suppose that all masks are run through an image. If, at a certain point in the image, for all j I, that point is said to be more likely associated with a line in the direction of mask i. For example, if at a point in the image, for j = 2, 3, 4, that particular point is said to be more likely associated with a horizontal line. 9.2.3 Edge Detection The most common approach for detecting meaningful discontinuities in gray level. We discuss approaches for implementing first-order derivative (Gradient operator), second-order derivative (Laplacian operator). Basic Formulation An edge is a set of connected pixels that lie on the boundary between two regions. An edge is a local concept whereas a region boundary, owing to the way it is defined, is a more global idea. We start by modeling an edge intuitively. This will lead us to formalism in which meaningful transitions in gray levels can be measured. In practice, optics, sampling, and other acquisition imperfections yield edges that are blurred, with the degree of blurring being determined by factors such as the quality of the image acquisition system, the sampling rate, and illumination conditions under which the image is acquired. The slope of the ramp is inversely proportional to the degree of blurring in the edge. In this model, we no longer have a thin (one pixel thick) path. Instead, an edge point now is any point contained in the ramp, and an edge would then be a set of such points that are connected. The thickness is determined by the length of the ramp. The length is determined by the slope, which is in turn determined by the degree of blurring. Blurred edges tend to be thick and sharp; edges tend to be thin.

The first derivative is positive at the points of transition into and out of the ramp as we move from left to right along the profile; it is constant for points in the ramp; and is zero in areas of constant gray level. The second derivative is positive at the transition associated with the light side of the edge, and zero along the ramp and in areas of constant gray level. The following are the two additional properties of the second derivative around an edge: It produces two values for every edge in an image (an undesirable feature) An imaginary straight line joining the extreme positive and negative values of the second derivative would cross zero near the midpoint of the edge (zero-crossing property)

B) Edge Linking and Boundary Detection Ans: Edge linking and boundary detection operations are the fundamental steps in any image understanding. Edge linking process takes an unordered set of edge pixels produced by an edge detector as an input to form an ordered list of edges. Local edge information are utilized by edge linking operation; thus edge detection algorithms typically are followed by linking procedure to assemble edge pixels into meaningful edges. 9.3.1 Local Processing

One of the simplest approaches of linking edge points is to analyze the characteristics of pixels in a small neighborhood (say, 3 x 3 or 5 x 5) about every point (x, y) in an image that has undergone edgedetection. All points that are similar are linked, forming a boundary of pixels that share some common properties. The two principal properties used for establishing similarity of edge pixels in this kind of analysis are (1) the strength of the response of the gradient operator used to produce the edge pixel; and (2) the direction of the gradient vector. The first property is given by the value of , the gradient. Thus an edge pixel with coordinates in a predefined neighborhood of (x, y) is similar in magnitude to the pixel at (x, y) if

where E is a non negative threshold The direction (angle) of the gradient vector is given by. An edge pixel at in the predefined neighborhood of (x, y) has an angle similar to the pixel at (x, y) if

where A is a nonnegative angle threshold. As noted in 9.2.3, the direction of the edge at (x, y) is perpendicular to the direction of the gradient vector at that point.

A point in the predefined neighborhood of (x, y) is linked to the pixel at (x, y) if both magnitude and direction criteria are satisfied. This process is repeated at every location in the image. A record must be kept of linked points as the center of the neighborhood is moved from pixel to pixel. A simple book keeping procedure is to assign a different gray level to each set of linked edge pixels.

9.3.2 Global Processing via the Hough transform

In this section, points are linked by determining first if they lie on a curve of specified shape. Unlike the local analysis method, we now consider global relationships between pixels. Suppose that, for n points in an image, we want to find subsets of these points that lie on straight lines. One possible solution is to first find all lines determined by every pair of points and then find all subsets of points that are close to particular lines. The problem with this procedure is that it involves finding n(n1)/2 lines and then performing (n)(n(n-1))/2 comparisons of every point to all lines. This approach is computationally prohibitive in al but the most trivial applications. 9.3.3 Global Processing via Graph-Theoretic Techniques In this section, a global approach based on representing edge segments in the form of a graph and searching the graph for low-cost paths that correspond to significant edges is discussed. This representation provides a rugged approach that performs well in the presence of noise. As might be expected, the procedure is considerably more complicated and requires more processing time. A graph G = (N, A) is a finite, non empty set of nodes N, together with a set A of unordered pair of distinct elements of N. Each pair of A is called an arc. A graph in which the arcs are directed is called a directed graph. If an arc is directed from node to node, then is said to be a successor of its parent node . The process of identifying the successors of a node is called expansion of the node. In each graph we define levels, such that level consists of a single node, called the start node, and the nodes in the last level are called goal nodes. A cost can be associated with every arc . A sequence of nodes with each being a successor of node is called a path from , and the cost of the path is

4. Describe the following with respect to Region Based segmentation: A) Basic Formulation Ans: Let R represent the entire image region. We may view segmentation as a process that partitions R into n sub regions, R1,R2, . Rn such that

Here,P(R1) is a logical predicate over the points in set and is the null set. Condition (a) indicates that the segmentation must be complete; that is, every pixel must be in a region. Condition (b) requires that points in a region must be connected. Condition (c) indicates that the regions must be disjoint. Condition (d) deals with the properties that must be satisfied by the pixels in a segmented region for example = TRUE if all pixels in have the same gray level. Finally, condition (e) indicates that regions and are different in the sense of predicate P. B) Region Growing Ans: Region growing is one of the conceptually simplest approaches to image segmentation; neighboring pixels of similar amplitude are grouped together to form a segmented region. Region-growing approaches exploit the fact that pixels which are close together have similar gray values. Start with a single pixel (seed) and add new pixels slowly. (1) Choose the Seed pixel. (2) Check the neighboring pixels and add to the region if they are similar to the seed. (3) Repeat step 2 for each of the newly added pixels; stop if no more pixels can be added. How do we choose the seed(s) in practice? It depends on the nature of the problem. If targets need to be detected using infrared images for example, choose the brightest pixel(s). Without a-priori knowledge, compute the histogram and choose the gray-level values corresponding to the strongest peaks How do we choose the similarity criteria (predicate)? The homogeneity predicate can be based on any characteristic of the regions in the image such as: average intensity variance

color texture

C) Region Splitting and Merging Ans: Sub-divide an image into a set of disjoint regions and then merge and/or split the regions in an attempt to satisfy the conditions stated in section 10.3.1. Let R represent the entire image and select predicate P. One approach for segmenting R is to subdivide it successively into smaller and smaller quadrant regions so that, for ant region, R1. P(R1) = TRUE. We start with the entire region. If P(R ) = FALSE, then the image is divided into quadrants. If P is FALSE for any quadrant, we subdivide that quadrant into sub quadrants, and so on. This particular splitting technique has a convenient representation in the form of a so called quad tree (that is, a tree in which nodes have exactly four descendants). The root of the tree corresponds to the entire image and that each node corresponds to a subdivision. In this case, only was sub divided further. If only splitting were used, the final partition likely would contain adjacent regions with identical properties. This draw back may be remedied by allowing merging, as well as splitting. Satisfying the constraints of section 10.3.1 requires merging only adjacent regions whose combined pixels satisfy the predicate P. That is, two adjacent regions and are merged only if = TRUE.

5. Describe the following with respect to Shape Analysis: A) Shape Orientation Descriptors Ans: The spatial orientation of an object with respect to a horizontal reference axis is the basis of a set of orientation descriptors developed at the Stanford Research Institute. These descriptors, defined below, 1. Image-oriented bounding box: the smallest rectangle oriented along the rows of the image that encompasses the object 2. Image-oriented box height: dimension of box height for image-oriented box 3. Image-oriented box width: dimension of box width for image-oriented box 4. Image-oriented box area: area of image-oriented bounding box 5. Image oriented box ratio: ratio of box area to enclosed area of an object for an image-oriented box 6. Object-oriented bounding box: the smallest rectangle oriented along the 7. major axis of the object that encompasses the object 8. Object-oriented box height: dimension of box height for object-oriented box 9. Object-oriented box width: dimension of box width for object-oriented box 10. Object-oriented box area: area of object-oriented bounding box 11. Object-oriented box ratio: ratio of box area to enclosed area of an object for an object-oriented box 12. Minimum radius: the minimum distance between the centroid and a perimeter pixel 13. Maximum radius: the maximum distance between the centroid and a perimeter pixel 14. Minimum radius angle: the angle of the minimum radius vector with respect to the horizontal axis 15. Maximum radius angle: the angle of the maximum radius vector with respect to the horizontal axis 16. Radius ratio: ratio of minimum radius angle to maximum radius angle. B) Fourier Descriptors: Ans: The perimeter of an arbitrary closed curve can be represented by its instantaneous curvature at each perimeter point. Consider the continuous closed curve drawn on the complex plane in which a point on the perimeter is measured by its polar position z(s) as a function of arc length s. The complex function z(s) may be expressed in terms of its real part x(s) and imaginary part y(s) as z(s) = x(s) + iy(s) The tangent angle defined is given by

The coordinate points [x(s), y(s)] can be obtained from the curvature function by the reconstruction formulas

where x(0) and y(0) are the starting point coordinates. Because the curvature function is periodic over the perimeter length P, it can be expanded in a Fourier series as

where the coefficients cnare obtained from

This result is the basis of an analysis technique developed by Cosgriff and Brill in which the Fourier expansion of a shape is truncated to a few terms to produce a set of Fourier descriptors. These Fourier descriptors are then utilized as a symbolic representation of shape for subsequent recognition. If an object has sharp discontinuities (e.g., a rectangle), the curvature function is undefined at these points. This analytic difficulty can be overcome by the utilization of a cumulative shape function

This function is also periodic over P and can therefore be expanded in a Fourier series for a shape description. C) Thinning and Skeletonizing Ans: We have previously discussed the usage of morphological conditional erosion as a means of thinning or skeletonizing, respectively, a binary object to obtain a stick figure representation of the object. There are other non-morphological methods of thinning and skeletonizing. Some of these methods create thinner, minimally connected stick figures. Others are more computationally efficient. Thinning and skeletonizing algorithms can be classified as sequential or parallel. In a sequential algorithm, pixels are examined for deletion (erasure) in a fixed sequence over several iterations of an algorithm. The erasure of a pixel in the nth iteration depends on all previous operations performed in the (n-1)th iteration plus all pixels already processed in the incomplete nth iteration. In a parallel algorithm, erasure of a pixel in the nth iteration only depends upon the result of the (n-1)th iteration. Sequential operators are, of course, designed for sequential computers or pipeline processors, while parallel algorithms take advantage of parallel processing architectures. Sequential algorithms can be classified as raster scan or contour following. The morphological conditional erosion operators are examples of raster scan operators. With these operators, pixels are examined in a 3 * 3 window, and are marked for erasure or not for erasure. In a second pass, the conditionally marked pixels are sequentially examined in a 3 * 3 window. Conditionally marked pixels are erased if erasure does not result in the breakage of a connected object into two or more objects. In the contour following algorithms, an image is first raster scanned to identify each binary object to be processed. Then each object is traversed about its periphery by a contour following algorithm, and the outer ring of pixels is conditionally marked for erasure. This is followed by a connectivity test to eliminate erasures that would break connectivity of an object. Rosenfeld and Arcelli and di Bija have developed some of the first connectivity tests for contour following thinning and skeletonizing.

6. Describe the following: A) Image Pyramids Ans: Analyzing, manipulating and generating data at various scales should be a familiar concept to anybody involved in Computer Graphics. We will start with image pyramids. In pyramids such as a MIP map used for filtering, successive averages are built from the initial . It is clear that it can be seen as the result of applying box filters scaled and translated over the signal. For initial values we have log 2 (n) terms in the result. Moreover because of the order we have chosen for the operations we only had to compute n = 1 additions (and shifts if means are stored instead of sums). This is not a good scheme for reconstruction, since all we need is the last row of values to reconstruct the signal (of course they are sufficient since they are the initial values, but they are also necessary since only a sum of adjacent values is available from the levels above). We can observe, though, that there is some redundancy in the data. Calling si;j the jth element of level i (0 being the top of the pyramid, k = log2(n) being the bottom level) we have:

We can instead store s0;0 as before, but at the level below we store:

It is clear that by adding s0;0 and s0 1;0 we retrieve s1;0 and by subtracting s0;0 and s0 1;0 we retrieve s1;1. We therefore have the same information with one less element. The same modification applied recursively through the pyramid results in n = 1 values being stored in k = 1 levels. Since we need the top value as well (s0;0), and the sums as intermediary results, the computational scheme . The price we have to pay is that now to effect a reconstruction we have to start at the top of the pyramid and stop at the level desired. If we look at the operations as applying a filter to the signal, we can see easily that the successive filters in the difference pyramid are (1/2, 1/2) and (1/2, -1/2), their scales and translates. We will see that they are characteristics of the Haar transform. Notice also that this scheme computes the pyramid in O(n) operations. B) Series Expansion Ans: The standard Fourier transform is especially useful for stationary signals, that is for signals whose properties do not change much (stationarity can be defined more precisely for stochastic processes, but a vague concept is sufficient here) with time (or through space for images). For signals such as images with sharp edges and other discontinuities, however, one problem with Fourier transform and Fourier synthesis is that in order to accommodate a discontinuity high frequency terms appear and they are not localized, but are added everywhere. In the following examples we will use for simplicity and clarity piece-wise constant signals and piece-wise constant basis functions to show the characteristics of several transforms and encoding schemes. Two sample 1-D signals will be used, one with a single step, the other with a (small) range of scales in constant spans.

They have various ordering for their index i, so always make sure you know which ordering is used when dealing with W i (t). The most common, used here, is where i is equal to the number of zero crossings of the function (the so-called sequency order). There are various definitions for them. A simple recursive one is:

with W 0 (t) = 1. where j ranges from 0 to1and q = 0 or 1. The Walsh transform is a series of coefficients given by:

and the function can be reconstructed as:

Note that since the original signals have discontinuities only at integral values, the signals are exactly represented by the first 32 Walsh bases at most. But we should also note that in this example, as well as would be the case for a Fourier transform, the presence of a single discontinuity at 21 for signal 1 introduces the highest frequency basis, and it has to be added globally for all t. In general cases the coefficients for each basis function decrease rapidly as the order increases, and that usually allows for a simplification (or compression) of the representation of the original signal by dropping the basis functions whose coefficients are small (obviously with loss of information if the coefficients are not 0).

C) Scaling functions - Continuous Wavelet Transform Ans: We can choose any set of windows to achieve the constant relative bandwidth, but a simple version is if all the windows are scaled version of each other. To simplify notation, let us define h (t) as:

and scaled versions of h (t) :

where a is the scale factor (that is f = f0 / a ), and the constant 1 / | a | is for energy normalization. The WFT now becomes: This is known as a wavelet transform, and h (t) is the basic wavelet. It is clear from the above formula that the basic wavelet is scaled, translated and convolved with the signal to compute the transform. The translation corresponds to moving the window over the time signal, and the scaling, which is often called dilation in the context of wavelets, corresponds to the filter frequency bandwidth scaling.

We have used the particular form of h (t) related to the window w (t), but the transform WF() can be defined with any function h(t) satisfying the requirements for a band pass function, that is it is sufficiently regular its square integral is finite and its integral h(t) dt = 0. We can rewrite the basic wavelet as: The transform is then written as: We can reconstruct the signal as: where c is a constant depending on h (t). The reconstruction looks like a sum of coefficients of orthogonal bases, but the h a, (t) are in fact highly redundant, since they are defined for every point in the a, space. Since there is a lot of redundancy in the continuous application of the basic wavelet, a natural question is whether we can discretize a and in such a way that we obtain a true orthonormal basis. Following Daubechies one can notice that if we consider two scales a 0 < a 1, the coefficients at scale a 1 can be sampled at a lower rate than for a 0 since they correspond to a lower frequency. In fact the sampling rate can be proportional to a 0 / a 1 Generally, if:

(i and j integers, T a period ) the wavelets are:

and the discretized wavelets coefficients are: We hope that with a suitable choice of h(t), a0 and T we can then reconstruct f(t) as: It is clear that for a0 close to 1 and T small, we are close to the continuous case, and the conditions on h(t) will be mild, but as a0 increases only very special h(t) will work.