Sie sind auf Seite 1von 61

Chapter 1 Introduction

After the terrorist attacks of September 11 th security has become of greater concern for many Americans. One of the biggest ways to improving our security is to be able to identify a person so that we can distinguish an authorized user from an unauthorized user. There are a number of ways that a security system can confirm that someone is an authorized user, but most security systems nowadays are looking for three main things. The first thing they look for is something you have; this can be something like a token or a smart card where a machine has to read data from your card before granting or denying you access into a building or secured area. But the problem with this is that something you have really means something you might lose or something you might forget at home. Another thing that security systems look for would be something you know which usually involves typing in a password to log onto a secure network. Like many things there is also a drawback to passwords and that is simply because of the fact that people tend to forget their passwords. Finally, the third way for a computer system to identify a person is through something they are, this usually entails the use of biometrics. Biometrics is a way to identify a person identity or authenticate them using their biological traits. Biometrics can include fingerprints scans, iris scans, signature verification and also facial recognition. Only until recent has there been a need for biometrics, this need mainly came from companies looking for a more advance security precaution. This is fact is also evident from recent market trends, which show biometrics constantly growing from the $6.6 million in 1990 to the $63 million in 1999, according to the U.S National Biometrics Test Center in San Jose. Cahners In-Stat Group has even gone further to predict that sales of biometrics will reach $520 million by 2006.

Other type of Biometrics


One of the most widely used types of biometrics today is the use of fingerprints as a form of identification. In the same way that a tire on a car has rubber ridges to grip the road, fingers have ridges of skin which allows us to more easily grip things. These ridges of skin on our fingers can also act as sort of a built in identity card that can easily be accessible. These ridges on our fingers form from a combination of environmental factors and genetics i . The skin forms from the general orders of the DNA code but the particular way that it forms comes from the outcome of random events like the exact position of the fetus in the womb at a particular moment, the exact makeup and density of the surrounding amniotic fluid can all decides how every individual ridge on a person finger will form. Consequently, because of the immeasurable environmental factors influencing the formation of a persons finger, there is virtually no chance of the same exact finger print pattern forming on two separate individuals. This fact alone allows fingerprints to be one of the most widely used forms of identification. The way a fingerprint scanner system works can be broken up in two steps. First, the system has to be able to scan or obtain an image of your fingerprint. Second, it then has to compare the ridges and valleys of the new image with the image of your fingerprint that has been previously scanned and saved onto a database. Although there are a number of ways that a fingerprint can be scanned the two most common ways are through the use of an optical scanner, which uses light and a charge coupled device (CCD) to take the picture of a fingerprint 3. The second method uses capacitance scanners which uses an electric current to sense the fingerprint. Once the image of the fingerprint is scanned, specific feature of the fingerprint called minutiae is compared using highly complex algorithms to recognize and analyze the minutiae. Comparison of the fingerprints can be thought of as basically measuring the relative positions of the minutiae in the same way you would recognize various parts of the sky by the relative positioning of the stars. Then to obtain a match, the system has to find an adequate number of minutiae patterns that the two prints have in common which will fluctuate according to the scanning program. Another type of biometric that is used for verification of an individuals identity is through the use of iris scans. Iris scans is a great biometric for many reason, one is because

of the fact that no two people have the same iris and in addition the two irises on an individual are also unique. Also irises contain thousands of features like arching ligaments, furrows, ridges, crypts, rings, corona, and freckles which can all be used to make a socalled reference template for that particular eye. Furthermore, because there are so many features that can be included in a reference template for a particular iris the false acceptance and rejection are extremely unlikely. With new technology coming out that can detect muscle movement due to lighting within the iris, these systems would now be able to differentiate between a living iris and an artificial one. These are just some of the many reasons why using the iris as a biometric is one of the best choices for verification of a person identity. With new technology constantly coming out each day, biometric systems are now capable of verifying a person identity by their signature, systems like theses are often referred to as dynamic signature verification (DSV). With a specially designed pen and tablet, DVS systems are designed to capture characteristics like angle of the pen, speed at which the signature is written, number of times the pen is lifted during the signature, pressure exerted on the tablet, and also variations in the speed during different parts of signing are all taken as the person signs their signature. Combining all this information together about a person signature allows for the creation of a biometric trait which can be used by a security system to identify a person. Since variations often occur each time a person signs their name, more than one sample of a signature must be taken. Next, the average of the data is taken for each of the different characteristics which is then stored in a database that is used during the comparison process when a person uses the signature authentication system. Why Facial Recognition? When discussing the ways of identifying a persons identity we must go back to the basics, to the ways in which were identifying people since we were born, and that is through facial recognition. Using facial recognition as a biometric basically means the use of ones face as a tool to verify the identity of that of that person. Although other types of biometrics such as iris scanning are more accurate than facial recognition which has about a one percent error rate, facial recognition will most likely be the more commonly accepted

type of biometric because of the fact that it is not intrusive, or in other words, it is a form of passive surveillance. It does not require the person to walk up to a machine and put their eye into a hole so that it can take a picture of their iris nor does it require them to stop to sign or type in something into a computer. Also, for companies that have security cameras already in place and picture of all the employees already on file, often they would not need to install anything beyond new software which is obviously more inexpensive than installing a whole new system with checkpoints for an iris scanning setup. As director of corporate communications at Visionics Frances Zelazney put it unlike other biometrics, facial recognition provides for inherent human backup because we naturally recognize one another. In addition Zelazney stated that if the facial recognition system fails or goes down, you can just pull up an Id with the employee picture as a backup, something that wouldnt be applicable with other biometrics like iris scans and fingerprint devices. Additionally, facial recognition systems can also reduce the user frustration. For example, if youre in a large company that has security as one of their top priorities then it would not be uncommon for a user to have upwards of a dozen different passwords since these companies would institute standards to ensure high quality passwords that will also need to be changed on a regular basis 1. This would then lead to users forgetting their passwords and time off work spending at the IT (Information Technology) office trying to reset their forgotten password can be enormous. User will be frustrated because they would have to remember all these complicated passwords that can they can easily forget while the companies would also be upset because of all the time lost from work. The use of facial biometrics would prevent all this from happening and would make life for the user so much more simple since the only thing the user would have to do is basically show up for work so that the cameras can see and authenticate him. As you can see one of the main reasons why facial recognition stands out from the other types of biometrics is that its a passive surveillance tool, in that a user walking down a hall can be granted or denied access to a secure area even before they reach the door.

How Facial Recognition Works? Facial recognition software is established on the capability to first recognize faces and then measure various features of each of the face. For instance, there are certain distinguishable landmarks on each person face, the combination of these peaks and valleys are responsible for the different makeup of each human face and their features. Visionic, a leading developer in facial recognition technology defines these features or landmarks as nodal points. While there are about eighty nodal points on a human face, Visionics FaceIt facial recognition software only needs 14-22 of them for the facial recognition software to complete its detection process says Zelazney. Some of the nodal points used by FaceIt facial recognition software are the distance between the eyes, the width of the nose, tip of nose, depth of the eye sockets, cheekbones, jaw line, and chin. FaceIt software uses these 14-22 nodal points to concentrate on what is called the golden triangle which runs from temple to temple and just over the lip. By concentrating on the more stable inner region of the face FaceIt is able to recognize people who have grown a beard, put on glasses, gain weight, or aged. Facial recognition works in that it first need to captures a picture of a person to analyze. To capture an image, the recognition software will search the field of view of the video camera for a face. A multi-scaled algorithm is used to search for faces and once a head like shape is in view of the camera, it is then detected within a fraction of a second. After the head like shape is detected the system will then switches to a high resolution search. After the detection process is completed the system then ascertains the position of the head, its size, and pose. Also, in order for the system to recognize the face, it must be turned at least thirty-five degrees towards the camera. Next the photograph of the head is rotated and scaled so that it can be mapped into a suitable size and pose. This step if often referred to as Normalization and is performed despite the distance and location of the head from the camera. Furthermore, light has no bearing on this process. Once the normalization process is done, the system will then need to convert the data from the face into a unique code, this is considered the heart of many facial recognition systems. Visionics FaceIt systems uses a Local Feature Analysis (LFA) algorithm which plots the relative positions of certain nodal points and comes up with a

unique long string of numbers and creates what is called a faceprint for that particular face 6. By creating the faceprint the system can now more easily compare the new faceprints with the facesprints store in the database. As comparisons are being made with the faces in the database, each comparison is assigned a value on a scale of one to ten by the system with ten being 100% match. If a score is above a predetermined threshold (usually 8) then a match is declared. As matches in the database are found an operator will then double check the two pictures to make sure that the computer has made a correct match. Applications for Facial Recognition The fact that facial identification systems is a form of passive surveillance, and that it is relatively cheaper to implement than many other biometric systems, facial recognition systems are more widely being used. For example, Visionics Corp. of Jersey City, N.J., gain widespread attention after its FaceIt system was installed by Tampa police. The Tampa Police department was loaned the system for one year in order to match images of captured by dozens of camera in an entertainment district against digital mug shots of known criminal. These 36 camera positioned in different areas of the crowded streets of Ybor City have allowed police to keep a closer eye on the generally activity. Also in January of 2000 Tampa Bay police were able to use the FaceIt at Super Bowl XXXV to conduct what is called the biggest police line up in history. Officers were able to use the FaceIt to get snapshots of faces from the crowd and match them against a database of criminal mug shots. There have also been many broad applications for facial recognition systems similar to Visionics FaceIt. These identification systems are even finding their way into Casinos which finds them useful for scanning their premises for known cheats. FaceIt surveillance system has even be added to Keflavik International Airport in Iceland close-circuit television as a security measure which allows them to search for terrorist. Facial recognition is also being used by Motor Vehicle officials in West Virginia to scan databases of driver license pictures for duplicates and frauds. Israeli official have also used Visionics FaceIt software to monitor and manage the flow of people entering and exiting the Gaza Strip.

In Britain in 1999, London Borough of Newham police installed some 300 camera to search for known criminals. Authorities there have credited the 34% drop in crime to the facial recognition system. Facial identification systems have also been known help eliminate voter fraud. For instance, to influence an election many people will register several times under false alias so that they can vote multiple times. This problem has led to once of the most innovative use of the facial recognition systems, which was used by the Mexican government in the presidential election in 2000 to eradicate duplicate voters. Official here can search through the digital photos in the voter database for duplicates at the registration time by comparing images of the person registering to the images that have already been on file to catch those who try to register twice. Facial recognition programs can even be used for home purposes. Adding a webcam to your home computer and installing facial recognition software can allow the user to use their face to log into a computer instead of typing in a password. This type of technology has even been integrated into IBMs screensaver for the A, T, and X series of Thinkpad laptops. Additionally, since 1998 Visionics has received about $800,000 each year as part of a project by the Justice Department in order to come up with a way to search Internet pornography sites for images of exploited and missing children. In 2001 West Virginian official started to using the newly developed software to perform these types of searches on the Internet. There has also been much talk about potential applications for facial recognition systems. To illustrate, facial recognition systems can be used in ATM (Automated Teller Machine). The facial recognition software can speedily verify a customers face. After the FaceIt system creates a face print of the user, it can then be used to protect the customer against fraudulent transaction and even identity theft. With banks using facial identification to help identify ATM users, there wouldnt be a need for photo IDs or personal identification number (PIN) anymore since identification would be verified by the system. As evident from the above examples more and more people are adopting facial recognition as a means to verifying the identity of a person. It would also seem true that only good would come from these facial recognition systems in that they are designed to take criminal

off the our streets, secure our airports, secure our computers, and also help in locating missing kids on the internet. So why arent more identification systems like FaceIt being used?

Reasons against the Use of Facial Recognition


Although it seems that there are many advantages in using facial recognition, there are many people who are opposed to the use of this kind of software. One of the reasons against facial recognition is that fact that it has a one percent error rate for matches. This one percent can lead to at least one terrorist boarding any full commercial jet and the figure only grows when with dealing with a jumbo jet. Additionally, this one percent false positive rate could lead to at least one innocent person on every flight being wrongly identified as a criminal or suspicious person. If you look on a bigger scale, this one person per flight false positive rate would mean that thousands of people would falsely be identify as a suspicious character which could then lead to thousands of angry customers. Moreover, out of date photos, poor lighting, and poor quality photos in the database can all add to the number of misidentifications. To some all these factors combine together can lead to thousands of people being falsely identified which would then out way the profit and thus causing the identification tool to be deemed effective. Another argument against the use of facial recognition software is the fact that many believe that although this type of tool is design to protect our privacy, it will in the end invade our privacy. To illustrate, facial recognition could easily be used by the operator to take your picture when you are entirely unaware of the camera, it can also easily be used as a tool for spying on citizens as they move around in these public places. There are also those that are concerned that state government might try to implement facial recognition and other biometric technology into the Department of Motor Vehicles (DMV) identification system which would then result in a national identification system. Also according to the American Civil Liberties Union (ACLU) the FaceIt system that was installed in Tampa, Florida didnt improve our safety. Systems logs revealed many false matches between those photographed by police camera and mug shots of those on the departments database. It was also reported by the ACLU that the FaceIt software often

times matched male and female subjects and people with considerable differences in age and weight 8. Not only is the ACLU the only people against the FaceIt software, even people like House Majority Leader Dick Armey, R-Texas, have went on to publicly insist Tampa Police Department to stop the use of the FaceIt software, citing that it was too invasive 9. Even some of the residents from Tampa have gone onto protest against the Facial identification system, as evident from the protest stage on a Saturday by about one hundred masked protesters who compared the Visionics face recognition system to Big Brother 9. As with any type of controversial products in he market there comes a great amount of opposition which is apparent here for the facial recognition software. When dealing with security and surveillance there is a fine line between security and privacy, where too much security or surveillance can land you on opposite side with the ACLU, while too much privacy can lead to you product being ineffective.

Chapter 2 Literature Survey


2.1 Introduction
Caricatures are often gross distortions of the faces they represent. Moreover, a cartoon caricature is usually depicted with a very few lines. Despite the impoverished and distorted nature of the information in caricatures, we are remarkably good at recognising the individuals that they represent. Indeed, recent findings (Rhodes et al., 1987; see below) indicate that linedrawn caricatures can be recognised more efficiently than veridical line drawings. It is likely, therefore, that an understanding of how caricatures "work" in activating representations of familiar individuals will help our understanding of how normal images of faces are stored in memory and recognised in everyday life. This paper compares the perception and recognition of caricatures and veridical images of familiar faces. We first describe techniques for producing caricatures automatically. These were originally devised for manipulating line drawings but we have extended them to continuous tone images. These techniques have allowed us to examine whether caricature effects found with line drawings have more general applicability to the processing of natural images. Secondly, they have allowed us to examine the bizarre possibility that an image of aface can be artificially transformed so that it looks more like the person than the original natural image. How Caricatures Work Perkins (1975) speculated on what parameters were necessary for caricaturing to work. He suggests that humour, "ugliness" and expression of personality are irrelevant to making caricatures recognisable, even though these are probably the most enjoyable aspects of cartoon art. It is generally agreed that caricatures work by selectively accentuatingparticular details of a face (Brennan, 1985; Perkins, 1975). Presumably, the details which get accentuated in caricaturing are those which are characteristic of that individual. The skill of the caricature artist begins with realising which features are characteristic. There seems to be some consensus across artists as to which features should be caricatured for a given face. For example, Goldman and Hagen (1978) studied the caricaturing of Richard Nixon by 17 artists and found a high degree of concordance as to what was caricatured. The features of an individual face that are characteristic are those which differentiate that person from the general population in which they live. This definition of characteristic features

embodies a comparison between the features of the face in question and those which are normal or average for a population of faces. The degree to which a feature departs from the population average is a measure of how characteristic that feature is. It follows that not all faces are suitable for caricaturing. If a person has facial proportions close to average then there will be nothing deviant and nothing to caricature.

2.2 Choosing the Appropriate Norm for Comparison


Careful attention must be given to the definition of an average face against which an individual's face is compared. Chance, Goldstein and McBride (1975) and Shepherd (1981) showed that people recognise faces from their own race better than faces of others, and therefore cross-racial caricaturing is undesirable. Unless some humorous distortion was required, cross-sex caricaturing again would also be unsuitable because of configural differences between male and female faces. Stereotypical influences are also critical to successful caricaturing; pop stars are often thought of and portrayed as having deviant appearances (bright clothing, long hair, excesses of jewellery, etc.), politicians tend to conform (normally clean imaged, suited, with well-groomed hair). The age parameter is important in the same context with many pop stars either depicted as very young and naive or aged hippies from the 1960s, and political leaders typically old and haggard. From these considerations, successful caricaturing of a face needs to be done with respect to an average derived from the same age range. Contrasting a face against an average face of younger age will be likely to enhance the age of the target (this may be desirable for generating amusing effects but is likely to impair recognition). Stereotypical influences of age and socioeconomic class are also important in the recognition of faces (see Cross, Cross & Daly, 1971; Dion, Berscheid & Walster, 1972; Klatzky, Martin & Kane, 1982a; 1982b; Klatzky & Forrest, 1984). In caricaturing and recognising a face, we may compare it to a norm for the appropriate age, sex, race, and perhaps perceived socioeconomic status or occupation. Once the features of a face which are characteristic have been established, a decision must be made as to the degree to which the features are to be reduced or exaggerated in the caricature. This has been found to on a number of factors. Different artists use different degrees of exaggeration. For Nixon's face, artists varied in the degree of exaggeration from 12 to 86% of

the veridical feature dimensions (Goldman & Hagen, 1978). The extent of exaggeration also increased with time and with the decline in Nixon's popularity. Automated Caricature Synthesis Brennan (1982; 1985) developed a computer model to generate line-drawn caricatures. Brennan's procedure utilised the idea of comparing a target face to the norm for a population of faces and then exaggerating deviations. Images of real faces were digitised and aligned so that interocular separation was constant across faces. For each face a fixed number of points around the main features (eyes, nose, mouth, ears, smile lines, hair, etc.) was manually recorded. Veridical line drawings of the original faces were generated by linking appropriate points using spline curves. To caricature a given face the x-y co-ordinates of feature points were compared with those of the "facial norm" (produced by averaging the data sets for many faces), and the distance between the target face and the norm calculated. These values indicate the extent to which particular facial feature points deviate from the norm. Caricatures were produced by multiplying each deviation by a fixed percentage. (Amplifying differences by a constant % means that features are exaggerated in proportion to the difference from the norm, and thus features close to the norm are relatively unaffected.) For example, in a caricature exaggerating deviations by + 50%, a point on the end of the nose which was 20 units from the norm would be displaced a further 10 units to a position deviating from the norm by 30 units. Similarly, to produce a representation diminishing all deviations by 50% (a 50% anticaricature or 50% caricature), the nose point would be shifted to a final distance 10 units from the norm. Having calculated all the modified feature positions, line-drawn caricatures were produced by linking up appropriate points around each facial feature. It should be noted that the caricature process exaggerates differences in feature positions relative to the interocular separation (which is standardized for all faces). Because the process is relative, all dimensions including interocular separation are evenly caricatured. For a face that is average in every dimension except an unusually narrow separation of the eyes, the outline of the face will be evenly expanded by caricaturing. Thus in the endproduct the eyes will lie even closer together relative to the overall shape of the face. Other proportions of the face, such as the height/width ratio, will remain normal, though the face will appear magnified.

2.3 Advantages and Limitations of Automatic Caricaturing This process has the advantage that it does not require the skills of a caricature artist. Because all deviations are amplified to the same extent, those deviations which are characteristic of a given individual should also be accentuated. Furthermore, all exaggerations can be controlled quantitatively. Another advantage of the automated process is that the holistic cues such as the spatial configuration between features are also transformed. While it is clear that faces are recognised on the basis of information from individual features, it has also been established that the configuration of features has a profound impact on the facial appearance (Haig, 1984; 1986; Rhodes, 1988; Sergent, 1984; Shepherd, Ellis & Davies, 1977). The dimensions of faces which go to make up configuration cues are not well established (Yamane, Kaji & Komatsu, 1988); however, it is not necessary to know which features or configurational cues are perceptually important as the process automatically amplifies all deviations, and should therefore include all relevant cues. The process is not without problems. Perhaps the least obvious limitation is the demarcation of the feature points on the original image of the face; this can be difficult and is subjective (particularly with cheek bones). Different operators will apply different subjective criteria to determine where smile lines finish, etc. As a result, lines will be differentially accentuated, because small differences in feature points get amplified inthe caricature. Finally, while the process is good for exaggerating feature shape and configuration, it is not good for exaggerating hair texture or style. Indeed, it is the internal features rather than the external hairstyle, etc., which is important in the recognition of familiar faces (Ellis, Shepherd & Davies, 1979; Young et al., 1985). There is, however, a danger in removing the hair or other parts of the face in so far as a lack of contextual information may impair recognition. A further limitation is that the input faces must all have roughly the samepose. This usually involves standardisation to the frontal view. Only details. visible in the frontal view are therefore accentuated. Unfortunately, the nose profile so often caricatured in cartoons is not visible from this view. Furthermore, there are a number of psychological studies which indicate that the face turned half way to profile presents a perspective view which has advantages in certain recognition and face matching tasks (Bruce, Valentine & Baddeley, 1987; Thomas, Perrett, Davis & Harries, submitted). This effect is not entirely consistent as the half profile view

does not seem to confer an advantage on the recognition of familiar faces (for a discussion, see Bruce et al., 1987; Harries, Perrett & Lavender, in press). In principle, using a standard l/2 profile pose could circumvent the limitation, but such standardisation is even more difficult with pictures of famous faces. Comparisons Between Caricatures and Veridical Images We are adept at recognising caricatures despite the relative lack of information that they contain. Many authors have questioned whether caricatures are in any way better representations than natural images. Perhaps the caricature contains not only the essential minimum of information, but because the information is accentuated they may also be "super-fidelity" or "super-normal" stimuli. Such a concept derives from ethological studies in which accentuation of particular dimensions of natural stimuli can produce behaviour which is more marked than that produced by natural stimuli. For example, if a nesting herring gull is given a choice between two eggs, one a natural egg and one larger than life size, it will attempt to roll the large egg back to the nest in preference to the natural egg (Hinde, 1982). Hagen and Perkins (1983) and Tversky and Baratz (1985) attempted to assess the validity of the super-fidelity concept of caricatures. They foundno advantage for caricatures over veridical representations of faces when comparing recognition performance. The latter study also failed to find a result for name/face matching. In these experiments, however, comparisons were made across two different media. Photographs were used for veridical representations but caricatures were line drawn. Photographs clearly contain much more information than line drawings (Davies, Ellis & Shepherd, 1978). So any potential advantage which caricatures had as better representations could have been offset by the impoverished medium of display. Rhodes et al. (1987) used Brennan's procedure to make a balanced comparison between recognitipn of veridical and caricature line drawings. They compared recognition of normal line drawings with caricatures which had been exaggerated 50% away from the average face and "anticaricatures" where departures from average were attenuated 50%. For faces of familiar individuals (departmental students and staff), student subjects were significantly quicker to name caricatures than veridical line drawings or anticaricatures. The mean reaction times were 3.2, 6.4 and 12.3 sec for +50% caricatures, 0% veridical drawings and 50% anticaricatures respectively.

Although the caricatures were recognised more quickly than the veridical images, they were not identified more accurately. The proportion of correct identifications was 33, 38 and 27% (for caricature, veridical and anticaricature images), but the differences were not significant. Thus there would appear to be a caricature advantage from the reaction time data, but not from the accuracy data. Other studies of caricature recognition for famous faces have found a speed/accuracy trade-off with faster but less accurate recognition of 50% caricatures (Carey, 1989, pers. comm.). Rhodes et al. also investigated how well subjects perceived the resemblance of the caricature images of familiar faces to the depicted individual. Subjects rated the goodness of likeness of seven randomly ordered pictures consisting of 75, 50, 25 and 0% caricatures. The highest ratings were found for veridical images 0% and +25% caricatures (these images were rated approximately equally). The distribution of scores was not symmetrical about the 0% veridical image; positive caricatures (+25, +50 and + 75%) were rated higher than their counterpart anticaricatures (25, 50 and 75%). Indeed, the distribution of scores was significantly shifted away from the veridical image towards images with positive caricaturing. Again the data suggested a caricature advantage. Interpolating from the data the peak of the rating distribution occurred at a caricature level of + 16%. Because Rhodes et al. (1987) measured ratings for 0, 25, 50 and 75% caricatures, actual ratings for a 16% caricature have yet to be obtained. Two possibilities exist. If the distribution of the ratings is sharply peaked, then the 16% caricature level might produce significantly higher ratings than the 0% veridical image. In this case, the +16% image could be considered supernormal. Alternatively, the distribution might be fairly "flat topped", in which case there would be no differences in ratings for 0, + 16 and +25% caricatures. In a recent series of studies, Rhodes and McLean (submitted) examined the recognition of familiar birds whose line-drawing representation had been enhanced using the caricature "algorithm". For expert ornithologists (but not for non-specialist subjects), some evidence was found for a caricature advantage with significantly faster reaction times to +50% caricatures than to veridical line drawings. However, the experts were significantly less accurate in recognising the +50% caricatures compared to the veridicals (and all other levels). This study, then, provides some evidence that the caricature advantage might not be restricted to faces.

Chapter 3 Technical analysis


The findings described above are important because they suggest that we might store in memory the way in which faces deviate from a norm, rather than storing a veridical structural description of particular features or feature configuration. The caricature advantage found by Rhodes et al. (1987) was based on line drawings. While the recognition advantage found may have implications for the nature of mental representations, it is necessary to entertain the possibility that the results rely on processes unique to line drawings. For example, cartoon conventions may apply only to simplified line-drawn illustrations. Independent of whether veridical line drawings and natural images access the same stored representations, the caricaturing process may yield different effects on the recognition of line drawings and natural images. It is important therefore to determine whether a caricature effect can be obtained with natural photographic stimuli. To this end, we have sought to extend the methods of Brennan (1985) and Rhodes et al. (1987) to compare the perception and recognition of normal images and caricatures with photographic detail. Experiment 1 Rationale The objective of Experiment 1 was to determine the possibility of manipulating images such that they looked more like the target faces than the original images. Evidence for facial caricatures being perceptually supernormal images was from interpolation only (Rhodes et al., 1987) and as noted is open to other interpretations. Furthermore, the perceptual advantage of caricatures detected so far is only small. It could be argued that the use of caricature photographs might enlarge the caricature advantage at the perceptual level. 3.1 Methods Subjects. A total of 30 voluntary members of staff, postgraduates and undergraduates from the Department of Psychology at St Andrews took part in the experiment. All of the subjects were familiar with the people depicted in the photographs. Stimuli. The stimuli were produced using the techniques of Brennan described above, but extended to allow pixel-based images to be transformed in accordance with the caricature distortions (Benson & Perrett, in press a, b; Benson, Perrett & Davies, in press). Full-colour (24-

bit) or grey-scale (8-bit) images of faces (live, photographs or video-tape) were frame-grabbed using a Pluto 24i graphics system and a JVC BY-110 video camera and RS-110 remote control unit. Interpupilary distance for each face was standardised. Images were then transferred to a Silicon GraphicsIRIS 3130 graphics computer for processing. For each image, the x and y coordinates of 186 feature points are defined manually (see Appendix; see also Brennan, 1985). The feature points of the input face were compared to those of a facial norm. Appropriate norms were prepared for adult Caucasian male and female faces (using 14 faces for the male, and 11 for the female). Differences between feature points and the average face were then accentuated or diminished by 16 or 32%, producing five data sets (32, -16, 0, +16 and +32% caricatures). These degrees of distortion were chosen because Rhodes et al. indicated that a +16% caricature would produce the best likeness. Following Brennan's procedure, feature points were linked to form line drawings. Line drawings were not used further in image processing, but formed a useful tool to determine whether the face had been correctly delineated. The face was divided up into a mesh of triangular tessellations, the vertices of each polygon formed by 3 adjacent feature points from the 186 originally delineated. One tessellation, for example, joins the innermost points of the left and right eyebrows and a point at the middle of the lower hairline. Sets of corresponding tessellations were produced for the veridical image and for each level of image caricaturing. The grey-scale pixel values from each tessellation of the veridical image were then remapped into the corresponding tessellation in the caricatured image (see Benson & Perrett, in press b, for details of this process). If the eyebrows become separated during caricaturing, the triangle between the eyebrows and forehead becomes wider at the base, and hence the pixel information within that space becomes "stretched". Seven famous faces (television actors and personalities) were digitized and caricatured at the 32, 16 and 0% levels. The final versions of the veridical (0%) images used for study were constructed by back transforminga +16% caricature; in effect, this used the pixels and the feature co-ordinates of the +16% caricature as the source and the feature co-ordinates of the original 0% image as the destination. This was done as a precaution to ensure that any anomalies (at pixel level) in the pixel remapping process required to produce 16 and 32% caricatures would also be present in the 0% caricature. Otherwise, the 0% caricature might be detectable as the only

image without "glitches" and hence regarded as the best (most natural) representation. In fact, no obvious anomalies were visible in any of the processed images. Continuous tone 5 x 7 inch black-and-white photographs were taken of each caricature displayed on the computer terminal (examples are given in Fig. 1). The pictures were mounted horizontally on card strips such that two sets of stimuli were produced. An anticaricature-biased set contained 32, 16, 0 and +16% distortions, while a caricature-biased set contained -16, 0, +16 and +32% distortions. Two partial sets were constructed rather than a complete set of five images to avoid the truest likeness or least distorted image (0%) always lying in the middle of a set.

FIG. 1. Images of Nicholas Parsons exaggerating or diminishing the differences in position and dimensions of facial features from an average face. The 48% caricature (left) shows the target face as having a 48% less distinctive feature configuration than the veridical (0%). Conversely, those distinctive features found by the generator in the veridical have been accentuated by 48% giving a +48% caricature (right). Procedure. The subjects were presented with either four anticaricature- biased sets and three caricature-biased sets or vice versa, varying equally for each of the seven faces amongst all 30 subjects. Thus no subject saw the entire range of five caricature levels for a given face. Each subject was required to give three verbal ratings: (a) familiarity of the person depicted (7 = highly familiar, 1 = don't know person), (b) best likeness of that person by selecting from the

four images presented, the picture which looked most like the depicted person, and (c) goodness of likeness of photograph chosen (7 = very good, 1 = very bad). Assessment of Image Transformations by Caricature Artists. It is possible that mistakes could occur in the computer caricaturing process itself. For example, small inaccuracies in the manual placement of feature points in the veridical image, might lead to inappropriate distortions in the caricature production process. We were keen to obtain some independent measure of the quality of the caricature manufacturing process. We therefore consulted three artists familiar with portrait and caricature production. They were informed as to the nature of the image transformation (a comparison with a facial norm and accentuation of differences) and were then shown the 0, +16 and +32% caricature images. They were interviewed for any general comments and were asked to rate the images of the seven processed faces. Specifically, they were asked to compare the features distorted in the +16 and +32% images with the features they found characteristic for the depicted individual, and whether the distortions introduced were ones they would have expected for successful caricaturing. Their ratings were requested on a 7-point scale (7 = good caricature accentuating right details; 4 = no remarkable change to image; 1 = bad caricature with wrong details accentuated). Results A two-way repeated measures analysis of variance was carried out on the "best likeness" ratings with the seven target faces and two sets of stimuli caricature- and anticaricature-biased sets) as main factors. There was a significant effect of the face subjected to caricaturing [F(6, 196) = 3.021, P < 0.01), which may indicate that only some of the faces caricatured well. There was no effect of stimulus set (anticaricature- vs caricaturebiased sets) [F(l, 196) = 0.186, P > 0.6] and there was no interaction between target face and stimulus set [F(6, 196) = 1.052, P > 0.3]. Because there was no significant effect of the presentation set (32 to +16% or 16 to + 32%) on ratings of best likeness, the data were pooled for further analysis. Figure 2 gives the overall frequency with which images at different levels of caricaturing were chosen as the best likeness. The mean of the distribution occurred at a caricature level of +4.4%. A f-test was conducted to determine whether this sample mean departed from that expected from the null hypothesis that caricature distortions had no effect on the image perceived as the best likeness of a face. Under this hypothesis, subjects would be expected to rate the 0% caricature or veridical image as most like the target individual. The

distribution of the level of caricatures (0, 16, 32%) chosen as best representation of the seven faces by the 30 subjects is significantly different from the 0% mean expected by the null hypothesis (t = 2.18, d.f. - 29, P < 0.04). Because the subjects may have seen Margaret Thatcher's face in a caricatured state before the experiment, the overall effect of caricature manipulation was therefore assessed using the data for the six other faces. These data still showed a significant bias towards positive caricaturing for ratings of best likeness (t = 2.09, d.f. = 29, P < 0.05). Further, it is noted that the subjects rated caricatures more like the target individuals than anticaricatures [matched-pairs Mest, pooled caricature (+16% and +32%) vs anticaricature (-16% and -32%) data: t = 3.021, d.f. = 6, P < 0.05]. Thus images accentuating distinctiveness were rated higher than images decreasing distinctiveness. Caricaturing of Individual Faces. Because the overall analysis indicated a significant effect of the face caricatured on ratings of likeness, the results for the different faces were analysed separately. Figure 3 gives the distribution of ratings for different caricature levels for each of the seven faces. For each face, the mean level of caricaturing for images chosen as the best likeness is given in Table 1. A positive percentage in column 1 indicates that more subjects chose images with +16 or +32% than images with the same degree of distortion but in the anticaricature domain (16%, - 32%). The faces of Anita Dobson, Margaret Thatcher and Nicholas Parsons all have distributions significantly different than that expected by the null hypothesis (t = 4.74, d.f. = 29, P < 0.001; t = 4.37, d.f. = 29, P < 0.001; t = 2.57, d.f. = 29, P < 0.02). For each of these faces, it would seem that the image producing the best likeness was one with a small degree of positive caricaturing (+11.2, +7.5 and +5.3% respectively). For the other faces, the average percentage caricaturing for best likeness ratings were not significantly different from 0%.

FIG. 2. Experiment 1: Overall ratings of perceived likeness for caricatures of seven familiar faces. Ordinate = proportion of subjects choosing an image as the best likeness of the target face. Preference for images is expressed as a proportion of the number of times the image was available for choice (thus correcting for the different number of subjects seeing 32% thanother levels of caricatures). Abscissa = distortion of image from the veridical image. The data show a preference for caricatures over anticaricatures, the mean of the distribution is shifted away from the 0% and lies at +4.4%. Quality of Original Image. It might be expected that poor quality starting images would produce ineffective caricatures. Starting images could be poor in terms of photographic quality or because they were

"Column 1: Mean ( S.E.M.) for level of % caricaturing for image selected as best likeness (interpolated from distributions displayed in Fig. 3). ^Column 2: Mean ( S.E.M.) for ratings for subjects' familiarity with the individual depicted. cColumn 3: Mean ( S.E.M.) for ratings of goodness of likeness of image selected as most similar to depicted individual. ^Column 4: Rank order of ratings of the success of the caricature process (comparing 0% and +32% caricatures) judged by experts (caricature/portrait artists). uncharacteristic of the target person. Accentuating differences from average would therefore accentuate inappropriate features. For such faces with poor quality starting images one might expect the image chosen as best likeness to be the 0% caricature, as any computer-based transformation would further degrade the initially inadequate image. This consideration leads to the prediction that ratings of image quality (whether the image is a good likeness of the target person) should correlate with the level of caricaturing chosen as the best representation. Correlation between the quality judged for the image chosen as best likeness and the level of caricaturing of that image was not significant (Spearman's Rank Correlation: rs 0.025, d.f. =

208, P > 0.1). Furthermore, the quality of images was judged fairly uniform across faces and was generally high (see Table I)Correlation of likeness ratings for the veridical image and level of caricature of best likeness might be more appropriate here. Such a correlation is, however, unlikely to yield a different result, because the veridical image turned out to be the best likeness for most faces and observers. Furthermore, all subjects judged the images to differ in likeness very subtly. Not surprisingly, ratings of quality of veridical image and image chosen as best likeness are very close. Familiarity with Target Face. The rating of familiarity with the target person correlated with the degree of caricaturing of the image chosen to best represent that person (rs = 0.251, d.f. = 208, P < 0.005). That is, for more familiar faces, the caricaturing process was more successful and positive caricatures enhanced the likeness of the original image. For less familiar faces, the caricaturing was less successful. The effect was still significant when the results for Margaret Thatcher's face were excluded (rs = 0.228, d.f. = 178, P < 0.02). Ratings by Caricature Artists. The assessments of the three caricature artists showed a high degree of concordance as to which faces they thought the caricature process had distortions in the correct direction and which faces the process had produced inappropriate results. Of greater importance was the correspondence between their judgements of success of the computer caricaturing process and the extent of the caricature bias in the selection of images judged to be most like the target faces. The artists rated the caricatures of Margaret Thatcher and Anita Dobson as most successful. The distortions for the faces of Michael Cashman and Nicholas Parsons were also seen as in the right direction. For Leslie Grantham, there was little perceptible distortion, whereas the distortions for Letitia Dean and Susan Tully were seen as ineffectual or in the wrong direction. Overall, the magnitude of the caricature bias for subjects making perceptual judgements of the best likeness showed a significant correlation with the rank order of expert ratings of the quality of the caricaturing process (rs 0.883, d.f. = 5, P < 0.01). Experiment 2 Rationale Experiment 1 revealed a small but significant bias in subjects' perceptual ratings of manipulated images. Interpolating from the results, images judged to look most like a target figure would have a small degree of positive caricaturing.

Rhodes et al. (1987) found that the effects of caricaturing on recognition of familiar faces were stronger than the effects on perceptual ratings of how like an individual a line drawing was judged. With line drawings, Rhodes et al. found +50% caricatures were named faster than veridical line drawings. Despite this, +50% caricatures were perceptually judged to be less like the target individuals than veridical drawings. Thus the effects of caricaturing on speed of recognition appear more likely to benefit from caricaturing than perceptual judgements of the goodness of likeness. Experiment 2, therefore, set out to determine whether a more marked caricature effect could be found with caricatured photographs using a recognition task. Unfortunately, the small number of stimulus faces used in Experiment 1 meant that this material was not suited to the naming recognition task used by Rhodes et al. (1987). We therefore employed a name-face matching task following a similar design to Rhodes and McLean (submitted) and Tversky and Baratz (1985). Methods Subjects. A total of 11 subjects participated in the experiment, which lasted approximately 1 hour and for which they were paid 3. None of the subjects in Experiment 2 had taken part in Experiment 1. Stimuli. The images used were identical to those used in Experiment 1 with the addition of 48% caricatures for each of the famous faces. The grey-scale pictures were stored in on-line disk files on the IRIS workstationto be recalled individually for display on the monitor. A black cardboard mask was made to cover the monitor display leaving only an elliptical viewing aperture through which the seven faces would appear. This was used so as to remove extraneous background details and limit subjects' use of hair outline as a recognition cue. In this way, the subjects' attention was focused on internal facial features. Procedure. The subjects were seated approximately 80 cm from the 19- inch IRIS display. One of the seven names appeared in the centre of the screen (through the aperture) for 1 sec followed by a blank display for 1 sec. This was followed by one of the seven faces, which remained visible until a response was made. The subjects were requested to press the "yes" key if the name matched the face, "no" otherwise; lateralisation of response keys was alternated between subjects. The stimuli were allocated to four experimental blocks. Each block contained each of the seven faces caricatured at each of the seven levels (0, 16, 32 and 48%) with two trials in which the name and subsequent face matched and two trials in which the name and face did not

match. The order of faces, caricature level, and match/non-match trials were randomised within each block. Thus, overall, each subject made 16 decisions about each face at each caricature level. For each level of caricature, it was arranged that on non-match trials each face was paired with all possible non-matching names. No practice session was administered in case any caricature recognition advantage persisted only for the first block of trials. Results Level of Caricature Producing Subjects' Most Efficient Responses. Figure 4a gives the distribution of caricature levels producing the fastest overall reaction time for each of the 11 subjects. For this analysis, correct match and non-match trials for all of the seven target faces were averaged together. The averaging was performed separately for each of the seven levels of caricaturing (16, 32, 48 and 0%) for each subject. The caricature level producing the fastest reaction time for each subject was then evident by comparing the means for the seven levels. If all image manipulations away from the original image affect reaction times adversely, one would predict a peaked distribution with most of the subjects having fastest reaction times for the veridical image (0% level). (It is also possible that image manipulations have no effect on processing efficiency, in which case the mean level of caricaturing producing the fastest reaction time would again be 0%, but the distribution of reaction times across level of caricature would be flat.) As can be seen from Fig. 4a, the majority of subjects had their fastest reaction times when the image was positively caricatured 16 or 32%. The mean of the distribution was significantly shifted from the 0% level of image manipulation [F(l,10) = 5.2, P = 0.01]. In this way, caricaturing can be seen to enhance recognition. (Interpolating from the distribution, optimal speed of processing would occur for +19% caricatures.) The effect does not reflect any kind of speed-accuracy trade-off, because the distribution of caricature levels producing the most accurate performance for each subject showed a similar trend for better performance with images that were positively caricatured (Fig. 4b). The mean of the distribution of highest accuracies was, however, not significantly different from the 0% caricature level [F(l,10) = 1.78, P = 0.1]. Reaction Times. To assess the evidence of the effects of distinctiveness of features on reaction time, planned comparisons were used to contrast different levels of positive caricaturing

(accentuating distinctiveness) with matched levels of negative caricaturing (diminishing distinctiveness). Planned comparisons indicated a distinctiveness effect with caricatures being processed faster than anticaricatures at both the 16 and 32% levels [F(l,10) = 8.7, P = 0.014; F(l,10) - 7.24, P = 0.023, respectively). There was no distinctiveness effect at the 48% levels of caricature (P = 0.9). To assess the evidence for a caricature advantage, the mean reaction time for each level of caricaturing was compared to the mean reaction time for the veridical image. The veridical image was processed significantly faster than both the +48 and -48% caricature levels [F(l,10) = 5.06, P = 0.048; F(l,10) = 9.55, P = 0.011], but was not significantly different from other levels of caricature.

FIG. 4. Experiment 2: (a) Distribution of conditions producing the fastest reaction times for

11 subjects across different levels of caricaturing, (b) Distribution of conditions producing the highest accuracy of response for 11 subjects across different levels of face caricaturing.

FIG. 5. Experiment 2: Distribution of reaction times for correct responses in the name/face matching task with match and non-match trials plotted separately. Positive caricatures significantly improve subjects' accuracy in rejecting non-match trials. Error bars denote the 95% confidence interval of mean reaction times. , Match; O, non-match.

Match Trials To facilitate interpretation of the results, two separate one-way analyses of variance were performed on the match and non-match trials (Fig. 5). For congruous (match) trials, analysis indicated a significant overall effect of caricaturing on reaction times [F(6,60) = 2.66, P = 0.024]. Planned comparisons indicated no significant advantage for caricatures over anticaricatures (distinctiveness) or over the veridical image (caricature advantage). Indeed, the only emergent differences were that the veridical image led to faster reaction times than both the +48 and 48% caricatures [F(l,10) = 7.51, P = 0.021; F(l,10) = 8.7, P = 0.014]. Non-match Trials With non-match trials, a different picture emerged. One-way ANOVA revealed a significant overall effect of caricature level on reaction time to correctly reject a face not matching a preceding name [F(6,60) = 3.73, P = 0.003].

FIG. 6. Experiment 2: Effect of level of facial caricaturing on accuracy of subjects' judgements

for match or non-match between a name and subsequently presented face image. Error bars denote the 95% confidence interval. , Match; O, non-match. Distinctiveness. Planned comparisons indicated a distinctiveness effect with caricatures being processed faster than anticaricatures at both the +16 and +32% levels [F(l,10) = 13.4, P = 0.004; F(l,10) = 5.57, P = 0.04, respectively]. There was no distinctiveness effect at the 48% levels of caricature (P = 0.96). Caricature Advantage. Planned comparisons between performance with veridical images and that with different levels of caricature revealed that reaction times were significantly faster for +16% caricatures than for veridical images [F(6,60) = 7.02, P = 0.024]. No other differences were evident. Accuracy. An analysis of the number of errors using two-way ANOVA revealed a significant main effect of the degree of caricature [F(6,60) = 3.47, P = 0.005]. Figure 6 shows the overall accuracy of subjects. The effect of trial type (match/non-match) showed a trend for more accurate performance with incongruous trials which did not reach significance [F(l,10) = 4.60, P = 0.058]. There was no significant interaction between these factors [F(6,60) = 1.58, P = 0.17]. Planned contrasts revealed an increase in accuracy with 0 and +32% over the 32% anticaricature [F(l,10) = 5.52, P = 0.041; F(l,10) = 8.07, P = 0.018, respectively), but no other differences. Separate one-way ANOVA for caricature level effects on accuracy data revealed accuracy differences for the match trials [F(6,60) = 2.69, P = 0.023] but not for the nonmatch trials [F(6,60) = 2.2, P = 0.055]. For match trials, planned contrasts showed improved accuracy in favour of caricatures at +32 and +48% over the respective anticaricature levels [F(l,10) = 7.45, P = 0.021; F(l,10) = 5.2, P = 0.046 respectively]. There was no evidence of a caricature advantage. Effect of Novelty of Caricature on Recognition A two-way ANOVA (four blocks of trials, seven caricature levels) was performed to assess whether caricature effects were affected by experience with stimuli within the task. There was no effect of trial block on reaction times [F(3,40) = 0.728, P = 0.539], and block number did not interact with caricaturing effects on reaction times [F(18,240) = 1.004, P = 0.457]. As expected, the degree of image distortion

affected performance [F(6,240) = 5.277, P < 0.0001]. Separate one-way ANOVAs showed no effect of trial block on either match [F(3,4Q) = 0.795, P = 0.502) or non-match [F(3,40) = 0.65, P 0.587] responses, and there was no interaction between the effects of block and level of caricature on response time for either trial type [F(18,240) = 1.136, P = 0.328; F(18,240) = 1.466, P = 0.104, respectively]. Thus there was no evidence to show that the caricature effect was only prevalent within the first series of trials or that the novelty of seeing the caricatures led to improved recognition.

3.2 Linear discriminant analysis


Linear discriminant analysis (LDA) and the related Fisher's linear discriminant are methods used in statistics and machine learning to find the linear combination of features which best separate two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification LDA is closely related to ANOVA (analysis of variance) and regression analysis, which also attempt to express one dependent variable as a linear combination of other features or measurements. In the other two methods however, the dependent variable is a numerical quantity, while for LDA it is a categorical variable (i.e. the class label). LDA is also closely related to principal component analysis (PCA) and factor analysis in that both look for linear combinations of variables which best explain the data. LDA explicitly attempts to model the difference between the classes of data. PCA on the other hand does not take into account any difference in class, and factor analysis builds the feature combinations based on differences rather than similarities. Discriminant analysis is also different from factor analysis in that it is not an interdependence technique : a distinction between independent variables and dependent variables (also called criterion variables) must be made. LDA works when the measurements made on each observation are continuous quantities. When dealing with categorical variables, the equivalent technique is discriminant correspondence analysis.[citation needed]

LDA for two classes


Consider a set of observations x (also called features, attributes, variables or

measurements) for each sample of an object or event with known class y. This set of samples is called the training set. The classification problem is then to find a good predictor for the class y of any sample of the same distribution (not necessarily from the training set) given only an observation x. LDA approaches the problem by assuming that the conditional probability density functions and are both normally distributed. Under this assumption,

the Bayes optimal solution is to predict points as being from the second class if the likelihood ratio is below some threshold T, so that

Without any further assumptions, the resulting classifier is referred to as QDA (quadratic discriminant analysis). LDA also makes the simplifying homoscedastic assumption (i.e. that the class covariances are identical, so y = 0 = y = 1 = ) and that the covariances have full rank. In this case, several terms cancel and the above decision criterion becomes a threshold on the dot product

for some constant c, where

This means that the probability of an input x being in a class y is purely a function of this linear combination of the known observations.

Canonical discriminant analysis for k classes

Canonical discriminant analysis finds axes (the number of categories -1 = k-1 canonical coordinates) that best separate the categories. These linear functions are uncorrelated and define, in effect, an optimal k-1 space through the n-dimensional cloud of data that best separates (the projections in that space of) the k groups. See "Multiclass LDA" below.

Fisher's linear discriminant


The terms Fisher's linear discriminant and LDA are often used interchangeably, although Fisher's original article The Use of Multiple Measures in Taxonomic Problems (1936) actually describes a slightly different discriminant, which does not make some of the assumptions of LDA such as normally distributed classes or equal class covariances. Suppose two classes of observations have means the linear combination of features will have means and covariances y = 0,y = 1. Then and variances

for i = 0,1. Fisher defined the separation between these two distributions to be the ratio of the variance between the classes to the variance within the classes:

This measure is, in some sense, a measure of the signal-to-noise ratio for the class labelling. It can be shown that the maximum separation occurs when

When the assumptions of LDA are satisfied, the above equation is equivalent to LDA. Be sure to note that the vector is the normal to the discriminant hyperplane. As an example, in .

a two dimensional problem, the line that best divides the two groups is perpendicular to Generally, the data points are projected onto

. However, to find the actual plane that best

separates the data, one must solve for the bias term b in wT1 + b = (wT2 + b).

Multiclass LDA
In the case where there are more than two classes, the analysis used in the derivation of the Fisher discriminant can be extended to find a subspace which appears to contain all of the class variability. Suppose that each of C classes has a mean i and the same covariance . Then the between class variability may be defined by the sample covariance of the class means

where is the mean of the class means. The class separation in a direction given by

in this case will be

This means that when

is an eigenvector of

b the separation will be equal to the

corresponding eigenvalue. Since b is of most rank C-1, then these non-zero eigenvectors identify a vector subspace containing the variability between features. These vectors are primarily used in feature reduction, as in PCA. The smaller eigenvalues will tend to be very sensitive to the exact choice of training data, and it is often necessary to use regularisation as described in the next section. Other generalizations of LDA for multiple classes have been defined to address the more general problem of heteroscedastic distributions (i.e., where the data distributions are not homoscedastic). One such method is Heteroscedastic LDA (see e.g. HLDA among others). If classification is required, instead of dimension reduction, there are a number of alternative techniques available. For instance, the classes may be partitioned, and a standard Fisher discriminant or LDA used to classify each partition. A common example of this is "one against the rest" where the points from one class are put in one group, and everything else in the other, and then LDA applied. This will result in C classifiers, whose results are combined. Another common method is pairwise classification, where a new classifier is created for each

pair of classes (giving C(C-1)/2 classifiers in total), with the individual classifiers combined to produce a final classification.

Practical use
In practice, the class means and covariances are not known. They can, however, be estimated from the training set. Either the maximum likelihood estimate or the maximum a posteriori estimate may be used in place of the exact value in the above equations. Although the estimates of the covariance may be considered optimal in some sense, this does not mean that the resulting discriminant obtained by substituting these values is optimal in any sense, even if the assumption of normally distributed classes is correct. Another complication in applying LDA and Fisher's discriminant to real data occurs when the number of observations of each sample exceeds the number of samples. In this case, the covariance estimates do not have full rank, and so cannot be inverted. There are a number of ways to deal with this. One is to use a pseudo inverse instead of the usual matrix inverse in the above formulae. Another, called regularised discriminant analysis, is to artificially increase the number of available samples by adding white noise to the existing samples. These new samples do not actually have to be calculated, since their effect on the class covariances can be expressed mathematically as

where I is the identity matrix, and is the amount of noise added, called in this context the regularisation parameter. The value of is usually chosen to give the best results on a crossvalidation set. The new value of the covariance matrix is always invertible, and can be used in place of the original sample covariance in the above formulae. Also, in many practical cases linear discriminants are not suitable. LDA and Fisher's discriminant can be extended for use in non-linear classification via the kernel trick. Here, the original observations are effectively mapped into a higher dimensional non-linear space. Linear classification in this non-linear space is then equivalent to non-linear classification in the original space. The most commonly used example of this is the kernel Fisher discriminant.

LDA can be generalized to multiple discriminant analysis, where c becomes a categorical variable with N possible states, instead of only two. Analogously, if the class-conditional densities are normal with shared covariances, the sufficient statistic for

are the values of N projections, which are the subspace spanned by the N means, affine projected by the inverse covariance matrix. These projections can be found by solving a generalized eigenvalue problem, where the numerator is the covariance matrix formed by treating the means as the samples, and the denominator is the shared covariance matrix.

Applications
In addition to the examples given below, LDA is applied in positioning, product management, and marketing research.

Bankruptcy prediction
In bankruptcy prediction based on accounting ratios and other financial variables, linear discriminant analysis was the first statistical method applied to systematically explain which firms entered bankruptcy vs. survived. Despite limitations including known nonconformance of accounting ratios to the normal distribution assumptions of LDA, Edward Altman's 1968 model is still a leading model in practical applications.

Face recognition
In computerised face recognition, each face is represented by a large number of pixel values. Linear discriminant analysis is primarily used here to reduce the number of features to a more manageable number before classification. Each of the new dimensions is a linear combination of pixel values, which form a template. The linear combinations obtained using Fisher's linear discriminant are called Fisher faces, while those obtained using the related principal component analysis are called eigenfaces.

Marketing

In marketing, discriminant analysis is often used to determine the factors which distinguish different types of customers and/or products on the basis of surveys or other forms of collected data. The use of discriminant analysis in marketing is usually described by the following steps:
1. Formulate the problem and gather data - Identify the salient attributes consumers use to

evaluate products in this category - Use quantitative marketing research techniques (such as surveys) to collect data from a sample of potential customers concerning their ratings of all the product attributes. The data collection stage is usually done by marketing research professionals. Survey questions ask the respondent to rate a product from one to five (or 1 to 7, or 1 to 10) on a range of attributes chosen by the researcher. Anywhere from five to twenty attributes are chosen. They could include things like: ease of use, weight, accuracy, durability, colourfulness, price, or size. The attributes chosen will vary depending on the product being studied. The same question is asked about all the products in the study. The data for multiple products is codified and input into a statistical program such as R, SPSS or SAS. (This step is the same as in Factor analysis).
2. Estimate the Discriminant Function Coefficients and determine the statistical significance

and validity - Choose the appropriate discriminant analysis method. The direct method involves estimating the discriminant function so that all the predictors are assessed simultaneously. The stepwise method enters the predictors sequentially. The two-group method should be used when the dependent variable has two categories or states. The multiple discriminant method is used when the dependent variable has three or more categorical states. Use Wilkss Lambda to test for significance in SPSS or F stat in SAS. The most common method used to test validity is to split the sample into an estimation or analysis sample, and a validation or holdout sample. The estimation sample is used in constructing the discriminant function. The validation sample is used to construct a classification matrix which contains the number of correctly classified and incorrectly classified cases. The percentage of correctly classified cases is called the hit ratio.
3. Plot the results on a two dimensional map, define the dimensions, and interpret the

results. The statistical program (or a related module) will map the results. The map will plot each product (usually in two dimensional space). The distance of products to each other indicate either how different they are. The dimensions must be labelled by the

researcher. This requires subjective judgement and is often very challenging. See perceptual mapping. SUMMARY OF RESULTS Experiment 1 Perceptual ratings of the degree to which images resembled depicted individuals was found to vary with level of caricaturing. Interpolation indicated the best likeness would occur with a small degree of positive caricaturing (+4.4% on average). The magnitude of the caricature advantage at the perceptual level correlated with the familiarity of the faces and with the quality of the caricaturing process as judged by caricature experts. Experiment 2 Overall analysis of the degree of image manipulation producing the fastest reaction times for individual subjects revealed a caricature advantage. This increased speed of processing for caricatured images did not reflect any speed-accuracy trade-off. Caricaturing images can therefore produce more efficient processing in a task requiring matching of a person's face and name. In the overall analysis of variance of reaction times (containing match and non-match trials), the caricature advantage did not achieve statistical significance. Three factors might have contributed to the lack of effect. First, the caricature advantage was relatively small in magnitude amounting to a 3% increase in speed. Secondly, the amount of caricaturing producing optimal speed of processing varied across subjects, some performing best with +16% caricatures, others with +32% caricatures. Finally, and of more theoretical interest, the effects of caricaturing appeared to depend on the type of trial. There was no caricature advantage on congruous trials when the name matched the subsequently presented face image. The caricature advantage was prevalent, however, on incongruous trials where the face and name did not match. With non-match trials, + 16% caricatures were processed significantly faster than the veridical images. Again the increase in speed of processing was not an artifact produced by a speed-accuracy trade-off. GENERAL DISCUSSION Investigations using computer-generated caricatures have indicated that a systematic distortion of a facial line drawing improves recognition. Images with a slight degree of positive caricaturing were found to provide a better likeness of an individual than veridical images. The

results obtained here thus supplement Rhodes et al.'s (1987) study. The present results indicate that the caricature advantage is not restricted to line drawings but also occurs for images containing photographic detail. The caricature advantage may therefore tell us about the processing of natural images and cannot be taken to reflect simply a series of artistic conventions used in line-drawn cartoons. * Explanations of the Caricature Advantage These results have implications for the nature of representations stored in memory. Rhodes et al. (1987) suggested two explanations for a caricature advantage. The first explanation suggested that caricatured representations of faces are actually stored in memory rather than veridical representations. The details of feature configuration, shape, size and colouration stored in memory would be exaggerated by the way they differ from the norm or the prototypical face. It could be predicted from this hypothesis that caricatures would be more efficiently recognised because they are closer to the stored representations. A second explanation is that representations stored in memory are veridical but that caricaturing aids the process of matching the input image to the veridical representations. This retrieval advantage can also be explained as follows. Rhodes et al. (1987) suggested that the attempt to match the caricature to stored veridical templates might lead to a greater relative activation of the target face compared to non-target (distractors), even though the absolute level of activation of the target might be reduced compared to that produced by the veridical line drawing. Searching for a potential match against the stored representations of all familiar faces is presumably an extensive task. The caricaturing process could "constrain" the search, because exaggerating the features would make it easier to realise qualitatively what kind of features the target face possesses. Thereafter, search could be restricted to only those faces with approximately the right feature dimensions, e.g. face X has a big nose, therefore do not attempt matches with representations of faces with small noses. The second explanation would be appropriate if faces are stored as distances in multidimensional space at the centre of which is the norm for a particular face type (McClelland & Rumelhart, 1985; Rhodes, 1988; Valentine & Bruce, 1986). Thus, nose length might be one dimension, interocular separation another. When a caricature is presented for recognition, the exaggeration of particular feature deviations from the

norm will increase the distances (in the multidimensional space) of the caricaturedface from representations of other faces. One potential problem with this interpretation is that a caricature image will also be further away from the position of the target face in multidimensional space as compared with the veridical image. The caricature advantage must come, therefore, from the fact that a small increase in distance from the representation of the target face is more than offset by the large increase in distance from the representations of non-target faces. In summary, the advantage for caricatures could result from (1) mimicking the stored information or (2) optimising the retrieval process. The first model can be taken to make the prediction that caricatured images should be perceived as being more like the face they represent than veridical images. From the study of Rhodes et al. (1987) and the present study, there is a small but significant trend in the data to support this claim. Interpolation from the data of both studies indicates that the highest rating for best likeness occurs for images with a small degree of positive caricaturing (4-16%). In the present study, the strength of the caricature advantage was found to vary across faces and the overall effect might well have been stronger if the quality of the processing had been uniformly good (see below). The interpretation of perceptual ratings of different images is not, however, clear cut. The actual distribution of ratings could be flat between 0% and some level of positive caricaturing (+16% in our study and +25% in that of Rhodes et al., 1987). Thus, although the distribution of ratings might statistically peak at +4.4% caricature in the present study and at + 16% caricature in the study of Rhodes et al., the ratings for these slightly caricatured images might not be significantly higher than the veridical image. One could also say slightly positive caricatures are no better but also no worse than the veridical image as representations of individuals. On the other hand, anticaricatures are consistently perceived as poorer representations than veridical images. The second model (where caricatures give faster access to veridical representations held in memory) is perhaps favoured by the dissociation between recognition and perception of likeness. Rhodes et al. (1987) found +50% caricatures were recognised faster than the veridical representations

but were judged to be of inferior likeness compared to the veridical. Here, too, we found a significant caricature advantage in the recognition of + 16% caricatures, yet no differences in the perceptual ratings of 0 and 16% caricatures. The Distinctiveness Hypothesis A number of authors (Bartlett, Hurry & Thorley, 1984; Cohen & Carr, 1975; Going & Read, 1974; Light, Kayra-Stuart & Hollander, 1979; Winograd, 1981) note that when subjects concentrate upon the atypical features of a face, they are less likely to confuse that face with others. It is perhaps not surprising that more distinctive faces are easier to recognise. An effect of distinctiveness can also be seen in the present study. Positive caricaturing accentuates deviations from the norm and hence should make a face more distinctive. Positive caricatures (+16 and +32%) were judged better likenesses than the anticaricatures of the same degree. Furthermore, in the name/face matching task, there was evidence to show that the +16 and +32% caricatures were processed more efficiently than anticaricatures. These distinctiveness effects, like those of Rhodes et al. (1987) and Rhodes and McLean (submitted), occur even though the magnitude of the image deformation from the veridical image is exactly matched for positive and negative caricatures. Representing Relative and Metric Proportions of a Face The caricature advantage is argued (above) to reflect the existence of an abstract configural representation for familiar faces. This representation stores the configural information about how faces differ from one another and hence how individual faces deviate from average. If we now consider the case where a positive caricature of Parson's face follows Cashman's name, it is evident that the evidence for mismatch can be accumulated more rapidly. The caricature of Nicholas Parsons will form a very bad match to the abstract representation of Cashman's face and a poor (possibly very poor) match to the metric representation of Cashman's face. For the non-match trials, caricaturing can be seen to aid recognition because it increases the discrepancy between input image and the stored representation(s) of the target face. This explanation predicts an increasing advantage for more exaggerated caricatures. Alternatively, on non-match trials the advantage may come because a caricatured image of Nicholas Parsons may be recognised as Nicholas Parsons quicker than a veridical image. Matching the abstract representation of Parsons is quick and performed in parallel without reference to the matching to

the veridical representation of Parsons. Any evidence that the face is not Cashman is a signal to stop the recognition search. Photographic and Line-drawn Representations The magnitude of the caricature advantage was small in the present study (see Figs 2 and 3). With photographic images, interpolation from the data indicates that a caricature level of +4.4% would on average be chosen as best likeness, whereas with line drawings Rhodes et al. (1987) found a value of +16%, Likewise in the recognition task with photographic caricatures we found a caricature advantage with +16% caricatures but a disadvantage with much larger caricatures (+48%), whereas Rhodes et al. reported a significant advantage for +50% caricatures of line drawings. Simple line-drawn faces are impoverished stimuli containing no texture, colour, shadows, etc. At the level .of the metric code for facial attributes, line drawings will match stored representations less well than real photographs. Line drawings may, however, maintain all the configurational information necessary to acees abstract codes of the relative proportions of facial features. Thus line-drawn caricatures can reap the benefits of improved matching to representations at the abstract configurational level without suffering such a disadvantage at the metric level. In this sense, greater advantages would be expected for caricaturing impoverished representations of faces. Quality of Starting Image The amount of caricaturing present in the image chosen as best likeness was affected by the identity of the face portrayed. Because faces differ in the amount that they deviate from the norm, different levels of caricaturing might be required for efficient matching of the input to the stored representation. If a face has a highly deviant nose and lips, then these may not need further exaggeration in caricatures. The difference found here between the level of caricaturing required for best likeness may thus reflect the feature dimensions of the target face chosen. Alternatively, this result may arise from limitations of the processing technique or from the poor quality of starting images. In the present study, the subjects rated all the images they chose as best likenesses as being good representations of the target individuals. Thus we can assume that the starting images for each of the faces was at least adequate. A photograph may be of good quality in terms of contrast, focus, pose and lighting image. In this sense, it may be a good likeness of an individual; however, it is often the case that nuances such as expressions, gestures, facial asymmetry and posture that are

typical of a person are absent from a given photograph. One only has to inspect family snapshots to appreciate this; often someone we know very well, while still recognisable in a snapshot, is none the less caught with an atypical posture or expression. Leslie Grantham smiled for the BBC portrait photograph that we used as a starting image, but how frequently did he smile as "Dirty Den" in the TV series East Endersl Perhaps the subjects were more familiar with stern, cynical or even morose expressions that were more typical of his TV role. In attempting to assess the quality of our original images, it is possible that the subjects rated the goodness of likeness more with reference to photographic quality than with reference to the visibility of characteristic features and expressions. Caricature artists noted that when preparing a caricature, they have the opportunity to experience many instances and views of a target's face before constructing a portrait, and can therefore spot the appearance of facial features, expressions, mannerisms, etc., which are characteristic of the individual. The automated process has access only to one starting image and no matter how good the photographic quality is, if important idiosyncratic features or expressions are absent in that starting image they will not be accentuated in the, final caricature. Expert Assessment of Caricature Processing For some of the faces, particular feature transformations such as raising the forehead, were considered distortions rather than accentuations typical of a caricature of the face in question. While individual artists may pick on slightly different features to accentuate, the three artists interviewed were in agreement as to which images had been caricatured successfully and which had not. The ratings of artists with experience in caricaturing faces thus provided an independent measure of the quality of the computerized image transformations. Of great interest was the finding that the ratings of the experts as to the quality of the caricature processing correlated with a tendency by the subjects to choose caricatured images as those most like the target faces. This provides evidence for thinking that the magnitude of the potential caricature advantage was underestimated in the present study. It might have been larger if the computer processing or selection of starting images was improved. It is important to note that this argument is not circular, and artificially produced by both the artists and the other subjects judging the same dimension. The two ratings were qualitatively different; caricature experts had to determine the extent deviations introduced by the computer were those that they would have introduced. The subjects, on the other hand, were simply choosing which out of a set of images they thought looked most like a target face. They were aware that images had been deformed to

differing extents but were attempting to choose an image most similar to how the person looked in real life. Familiarity and Caricaturing Rhodes et al. (1987) ifound a caricature advantage for the recognition of familiar faces but no advantage for unfamiliar faces. Rhodes and McLean (submitted) also found a caricature advantage for line drawings of birds, but only with subjects who were highly familiar with the targets. The results of the present study also show an effect of face familiarity. The caricature advantage at the perceptual level was greater for faces that were more familiar to the subjects. That is, images with a greater degree of positive caricaturing were judged to be more like the target face when the face was highly familiar. The relation of the caricaturing success to face familiarity is expected from both explanations of the caricature advantage (in terms of mimicking stored representation or optimising retrieval), because there would be no long-term representation for unfamiliar faces. Under the first explanation, caricatures would fail because .there would be no caricatured representation in memory. Under the second explanation, caricatures would also fail because there would be no veridical representation in memory to match the caricatured input image. It is true that to recognise an unfamiliar face, even after a short interval, some representation must be stored, but evidently the type of representation and/or matching process used for unfamiliar faces is qualitatively different from that for familiar faces and is not affected by caricaturing. Familiarity is known to produce qualitative differences in the processing of faces. For familiar faces, more attention is paid to the internal facial features, whereas for unfamiliar faces, more attention is paid to the external detail of the hair. Future work on caricaturing might show stronger effects if the hair of target faces was masked out during perceptual and recognition tasks. The results suggest that to achieve the biggest caricature advantage, highly familiar or famous faces should be chosen as target images. There are problems, however, with using the faces of some highly famous politicians and media stars because they may already have been subjected

to caricaturing. In our present study, this confound of familiarity and previous caricaturing was present only for Thatcher's face. Data from the other faces not known to have been subjected to caricaturing before the experiment still revealed a significant correlation between familiarity and degree of positive caricaturing accepted as providing the best likeness. Applications: Reducing False Positives in Identification The present study has demonstrated advantages in recognising caricatures generated automatically from photographs effaces. The effect of caricaturing did not appear to improve recognition performance by enhancing the hit rate or speed, perhaps because of the conflict between the different ways information about individual faces is represented in the brain. Nevertheless, the study does suggest that the caricaturing process could have practical applications, because they may aid recognition by decreasing the number of false positives. The evidence presented here suggests that false targets are eliminated from consideration with greater efficiency when the image under scrutiny is a caricature than when it is a veridical image. In other words, with caricatures, subjects may not be any more able to identify the target face but may be more sure of who it is not ("I don't know who it is but I'm certain its not X, or Y").

Chapter 4 Conclusion
The events of 9/11 have lead to an increase in security or the need for more security. As companies look for a new method to authenticate user, one realizes that passwords are becoming harder and harder to remember. While smart cards are constantly being lost or left at home, authentication by biometrics seems to answer the our call. Biometrics like iris scan, fingerprint scans, signature verification, and facial recognition all seem to make our life easier since we dont have to remember anything password or keep anything on our person at all time, there user basically isnt require to do much. With iris scans you have to put your eye up to a machine to be scanned, while for signature verification you have to sign your name on a tablet. Even small tasks like these can be too time consuming when dealing with a high volume of people. Yet for a facial recognition system this is not a problem since you face can be scanned and authenticated before you even reach the secured area. Facial recognition can be use for numerous amounts of applications like eliminating voter fraud, eliminating the use of pin and photo IDs at the bank, it can be a security precaution for your computer, it can help to search internet pornography sites for missing kids, it can help to secure out streets by being able to pick out known criminal out of a large crowd.

References:
5. References

[1]

W. Zhao, R. Chellappa, A. Rosenfeld and P .J. Phillips. Face Recognition: Literature

Survey. Technical Report TR4167, University of Maryland, USA: 399 458, 2000. [2] M. A. Mottaleb and A. Elgammal. Face Detection in complex environments from color

images. IEEE ICIP: 622-626, 1999. [3] J. Ross and Beveridge et. al. A nonparametric statistical comparison of principal

component and linear discriminant subspaces for face recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition: 535 542, 2001. [4] J. Lu, K. N. Plataniotis and A. N. Venetsanopoulos. Face Recognition Using Kernel

Direct Discriminant Analysis Algorithms. IEEE Transactions on Neural Networks, 4(1): 117126, 2003. [5] 2002. [6] M. Turk and A. Pentland. Face Recognition using Eigenfaces, Proceedings of IEEE I. T. Jolliffe, Principal Component Analysis, 2nd edition. New York, Springer-Verlag,

Conference on Computer Vision and Pattern Recognition , Maui, Hawaii: 586-591, 1991. [7] P. C. Yuen and J. H. Lai. Independent Component Analysis of Face Images. IEEE

workshop on Biologically Motivated Computer Vision, Seoul, 2000. [8] K. Etemad and R. Chellappa. Discriminant Analysis for recognition of human face

images. Journal of the Optical Society of America A, 4(8): 17241733, 1997 [9] 1994. [10] D. S. Bolme, J. Ross Beveridge, M. L. Teixeira, and B. A. Draper. The CSU Face S. Haykin. Neural Networks, A Comprehensive Foundation, Macmillan, New York, NY,

Identification Evaluation System: Its Purpose, Features and Structure. Proceedings 3rd International Conference on Computer Vision Systems, Graz, Austria, 2003.

[11] [12] 1995. [13]

R. Kimmel, D. Shaked and M. Elad. A Variational Framework for Retinex. D. P. Bertsekas. Non-Linear Programming. Athena Scientific, Belmont, Massachusetts,

Z. Rahmany, D.J. Jobsonz and G.A. Woodellz. Retinex Processing for Automatic Image

Enhancement. Human Vision and Electronic Imaging VII, SPIE Symposium on Electronic Imaging, SPIE 4662, 2002. [14] 2002 Gonzalez, Woods. Digital Image Processing. Pearson Education, Second Edition, India,

Appendix A Illumination Invariant Elastic Bunch Graph Matching for Efficient Face Recognition
Introduction During the past decade, face recognition has drawn significant attention [1][2][3][4] from the perspective of different applications. A general statement of the face recognition problem can be formulated as follows. Given still or video images of a scene, the problem is to identify or verify one or more persons in the scene using a stored database of faces. The environment surrounding a face recognition application can cover a wide spectrum from a well controlled environment to an uncontrolled one. In a controlled environment, frontal and profile photographs of human faces are taken complete with a uniform background and identical poses among the participants. In the case of uncontrolled environment, recognition of human faces is to be done at different scales, positions, luminance and orientations; facial hair, makeup and turbans etc. This challenging and interesting problem has attracted researchers from various background i.e., psychology, pattern recognition, neural networks, computer vision and computer graphics. The challenges associated with face recognition can be attributed to the following factors: Pose: The images of a face vary due to the relative camera-face pose (frontal, tilted, profile, upside down). Presence or absence of structural components: Facial features such as beards, mustaches, and glasses may or

may not be present and there is a great deal of variability among these components including shape, color and size. Facial expression and emotions: The appearance of faces is directly affected by a persons facial expression and emotions.

Occlusion: Faces may be partially occluded by other objects. For an example, in an image with a group of people, some faces may partially occlude other faces. Image orientation: Face images directly vary for different rotations about the cameras optical axis. Imaging conditions: When the image is formed, factors such as lightning and camera characteristics affect the appearance of a face. In general, face recognition algorithms can be divided into two groups based on the face representation. They are: 1) Appearance-based which uses holistic texture features and is applied to either whole-face or specific regions in a face image. 2) Feature- based which uses geometric facial features (mouth, eyes, brows, cheeks etc.) and geometric relationships between them. Holistic based method uses the whole face region as input to the recognition system. Subspace analysis is done by projecting an image into a lower dimensional subspace formed with the help of training face images and after that recognition is performed by measuring the distance between known images and the image to be recognized. The most challenging part of such a system is finding an adequate subspace. Some well known face recognition algorithms for face recognition are Principal Component Analysis (PCA) [5][6], Independent Component Analysis (ICA) [7], Linear Discriminant Analysis (LDA) [8] and Probalistic neural network Analysis (PNNA) [9]. However all the holistic based algorithms are very much dependent on the design decisions which involve the method of subspace analysis, varying the dimension of the subspace and choosing the similarity measure. Apart from this the successful implementation of these algorithms involves huge training set with multiple images in different pose and expression for each person. The performance of these algorithms varies under different training set. The choice of best design is still unsolved. On the other hand, feature based methods extract local features like eyes, nose, and mouth and they are fed into the structural classifier. The example includes Elastic Bunch Graph Matching (EBGM) [10]. The major advantage of EBGM is instead of performing holistic approaches it recognizes the faces by comparing their parts. The algorithm is memory efficient

as the face is represented in internal form as face graph containing node for each landmark position with the corresponding extracted features. Hence it needs less human efforts. Also the system does not require large training set for efficient recognition. One of the major disadvantages of the system is that eyes should be open as the system aligns the images on the basis of eyes locations. Another limitation of the system is that the system is not illumination invariant. The change in lightning conditions deteriorates the results of a system. Considering the advantages of the EBGM algorithm over other existing holistic and feature based face algorithms, this algorithm has been used for employing illumination invariance. The idea is to preprocess the input image with illumination invariance algorithm to remove the effect of change in luminance which is then passed to EBGM algorithm for recognition. Retinex method is a powerful image enhancement tool first introduced by Edwin Land forty years ago [11]. It is used for wide range of applications like dynamic range compression, gamut mapping and illumination invariance. The Retinex Algorithm together with color constancy handles the problem of separating the illumination from reflectance thereby compensating for non-uniform lightning. This paper makes use of this idea given in Retinex to handle illumination/lightning problems in EBGM proposing an illumination invariant face recognition system. Next Section provides the detailed discussion on the proposed system. Results are presented Section 3. The paper is concluded in Section 4. 2. The Proposed System The input image is preprocessed by separating the illumination from the reflectance part. Separating the illumination from the source image yields the reflectance image, which is expected to be free of non-uniform illumination. This illumination estimation problem can be formulated as a Quadratic Programming optimization problem that can be efficiently solved by the Projected Normalized Steepest Descent (PNSD) algorithm [12], accelerated by a multiresolution technique.

The RETINEX algorithm [13] can be used for the enhancement of the different regions of the image. The algorithm has been used twice. At first the algorithm is applied on the input image in order to enhance details in dark areas of the image. The algorithm is invoked again on the inverse image (the result is re-inverted afterwards) to enhance details in bright areas of the image. Both the RETINEX images together reveal more details as compared to the input image. After applying the Retinex algorithm, the image is further enhanced by histogram equalization, color restoration and image stretching. This illumination invariant image is passed to the EBGM algorithm for recognition purpose. 2.1. RETINEX Algorithm RETINEX algorithm [13] decomposes a given image S into two images: the reflectance image R, and the illumination image L, such that each point in the image domain S(x, y) can be expressed as Equation 1,

S(x, y) = R(x, y). L(x, y)

(1)

By taking the image to the logarithmic domain, one get s = l + r where s = log S, l = log L and r = log R. The algorithm assumes spatial smoothness of the illumination field. In addition, knowledge of the limited dynamic range of the reflectance is used as a constraint in the recovery process. The physical nature of reflecting objects is such that they reflect only a part of the incident light. Thus, the reflectance is restricted to the range R [0, 1], and L S, which implies l s. Thus the retinex algorithm is used to reduce the image into reflectance and illumination image [11][13]. The two iterations of the algorithms are applied on the original and inverted image to give bright and dark retinex which reveal more information from the original image as shown in Figure 1(a) and Figure 1(b))

Figure1 (a) The reflectance image obtained by the RETINEX is sometimes an over-enhanced image. This may be due to the facts that 1) the human visual system usually prefers some illumination in the image and that 2) removal of complete illumination exposes the image to noise that might exist in darker regions of the original image. The illumination image, L = exp (l), is tuned by a Gamma correction with a free parameter to obtain a new illumination image L and multiply it by R, which gives the output image S = L.R. The Gamma Correction is given by Equation 2.

L = W. [L / W] 1/

(2)

where W is the highest value of dynamic range (equal to 255 in 8-bit images). Multiplying L by R one gets the image S as given in Equation 3

S= L.R = (L/L).S

(3)

Note for =1, S=S i.e., the whole illumination is added back and for =, S = R.W i.e., no illumination is added back which is the same reflectance image, R, as obtained by the original RETINEX, stretched to the interval [0, W]. The RETINEX algorithm is applied over the input image and its inverse, to produce bright and dark RETINEX images. After Gamma correction these two images are combined together by using the average operation between bright and dark RETINEX images as shown in Figure 2.

Figure 2: Combined image

2.2. Histogram Enhancement The RETINEX algorithm reveals details in the bright and dark areas, but it shifts the colors once again towards the middle of the spectrum thus giving non-uniform histogram. Thus to get more uniform distribution without loosing colors histogram enhancement is done.

2.3. Color Restoration To restore color loss, each channels i.e., Red, Green and Blue, relative color is calculated from the original image and then, each pixel of the enhanced image is multiplied by its relative color as given in Equation 4:

R' channel

I channel = Rchannel . f I +I +I G B R

(4)

where f is an ascending monotonic function that may be linear or logarithmic. Ichannel is the channel of the image i.e., R, G and B. IR is the value of Red channel, IG is the value of Green channel and IB is the value of Blue channel. The result of color restoration is shown in Figure 3.

Figure 3: Color restored image

But the Color Restoration stage may create colors out of the normal color range. Therefore there is a need to use the Image Stretching stage after Color Restoration.

2.4. Image Stretching Even after the Histogram Enhancement stage, usually there is still a few percent of the color-space left unused or rarely used. Although there is no need to care much about few overexposed or underexposed pixels, one may like to increase the image visual range by stretching the image [14], by saturating a small percentage of the image. The idea is to map the pixel range into new range i.e., [0.0, 1.0]. The appropriate linear transformation is used, following by saturating the pixels with colors out of the range. The outcome is the color restored image as shown in Figure 4 and it is passed to the EBGM module for verification purpose.

Figure 4: Image Stretched image

2.5. Elastic Bunch Graph Matching (EBGM): Face recognition using elastic bunch graph matching [10] is based on recognizing novel faces by estimating a set of novel features using a data structure called a bunch graph. The algorithm operates in two modes a) Training mode b) Testing mode. Training mode involves creation of bunch graph by manually selecting the landmarks on training face images to form the model imaginary. These landmarks are convolved with Gabor Wavelet to form Gabor jets. A data structure called bunch graph is created corresponding to facial landmarks that contains a bunch of model jets extracted from the model imaginery. To add every image to the database, the following steps are performed.

Step 1: Estimate and locate Landmark positions with the use of bunch graph as given in Equation 5 p n p m + v mn (5)

where v mn is the difference between two nodes of the landmark in bunch graph and p n and p m are average locations of the nodes for two landmarks Step 2: Calculate the novel jets displacement from the actual position by comparing it to the most similar model jet. Step 3: Create another data structure namely face graph containing node for each landmark position and jet values for that landmark position. Similarly for each query image, the landmarks are estimated and located using bunch graph. Then the features are extracted by convolution with the number of instances of Gabor filters followed by the creation of face graph. The matching score is calculated on the basis of similarity between face graphs of database and query image. Computation of the similarity between landmark jets in face graphs of database and query image is obtained by Equation 6
L jet ( G , G ) = 1 n S x (J i , J i ) n i=0

(6)

where n is the number of instances of Gabor filter convolved with the landmark point and Sx is a similarity function which can be phase, magnitude etc., Ji, Ji are the jets extracted from landmark positions and G and G are face graphs of database and query face image respectively. The face graph similarity between database and query image is calculated as an average of landmarks jet similarity.

3. Experimental Results

The proposed system has been tested on IIT Kanpur database consisting of more than 1000 facial images with five images per person in different illumination and expressions. In the first experiment, EBGM algorithm was tested as standalone under different background and lightning conditions. The face images are acquired in light and dark background under different illumination. The images are acquired using SONY digital camera at a distance of about 12 cm. These images are acquired in different sessions. However to maintain the consistency, the same setup is used for image acquisition. The test results have been computed along with FAR, FRR and accuracy Graphs.

Table 1 shows the, FAR, FRR and accuracy of the system.

Table 1: FAR, FRR and Accuracy Table FAR EBGM 2.83% FRR 10.52% Accuracy 93.32%

In next experiment, RETINEX algorithm has been plugged in together with color constancy method with the EBGM. Retinex and color constancy algorithm is used to preprocess the image to make the image illumination invariant. Testing is performed for the calculation of FAR, FRR and accuracy and the results are found to be very encouraging as shown in Table 2. The FAR and FRR graph is shown in Figure 5 while the accuracy graph is given in Figure 6. The best matching result is shown in Figure 7. Table 2: FAR, FRR and Accuracy FAR Proposed System 0.59% FRR 6.49% Accuracy 96.46%

Figure 5: FAR & FRR Graph

Figure 6: The accuracy Graph

Figure 7: Best Matching Images

i 3 1 6 8 9 9

Das könnte Ihnen auch gefallen