A biometric recognition system is an automated system that verifies or identifies a persons identity using a person’s physiological characteristics and/or behavioral characteristics as pointed out by Jain et al (2004). Face recognition has been growing rapidly in the past few years for its numerous uses in the areas of Law Enforcement, Biometrics, Security, and other commercial uses. The biometric system of face recognition has several advantages over other biometric methods. Biometric based technologies include identification based on the biological characteristics and behavioral traits.
Practically all these technologies require some voluntary action by the user, i.e. the user is supposed to place his hand on a hand rest for finger printing or hand geometry detection and has to stand in a static position in front of a camera for Iris or Retina identification. Moreover the technologies require expensive equipment and are sensitive to any body motion. Voice recognition is more prone to background noises in public places and audio may fluctuate on a phone line or tape recording. For the security system using signatures there are more chances that they can be altered or copied and could result in forgery. When many individuals use the same equipment to capture their biological characteristics there are more chances that the germs could transmit from one user to the other. But, facial images can be obtained easily using inexpensive fixed cameras. Good face recognition algorithms and appropriate pre-processing of the images can compensate for noise and slight variations in orientation, scale and illumination.
Robust face recognition requires the ability to recognize identity in spite of many variations in appearance that the face can have in a scene. Also, the output of the detection and recognition system has to be accurate. The recognition system has to associate an identity for each face it comes across by matching it to a large database of individuals and the system must be robust to typical image-acquisition problems such as noise, video-camera distortion and lighting conditions.
1.2 KEY ISSUES IN FACE RECOGNITION
The input of a face recognition system is an image or video stream and the output is an identification or verification of the subject or subjects that appear in the image or video. Zhao et al (2003) defined a face recognition system as a three stage process as shown in Figure 1.1.
Image / Video
Figure 1.1 A basic face recognition system
In a face recognition system, face detection is the first task which means determining all possible faces at different locations with different sizes in a given image. Face detection has countless computer vision applications and is a model that includes several sub-problems. Detection and localization are done at the same time in some systems and in others detection is performed first and then, if positive, face localization is performed. Face detection is thus a two class problem wherein a decision has to be made if there is a face or not in the picture.
Once the face is detected, the next stage is feature extraction that involves obtaining relevant facial features from the data. These features could be certain face regions, variations, angles or measures etc. This phase has other applications like facial feature tracking or emotion recognition. The amount of resources required to define a large set of data are simplified. When analysis is done with complex data, more number of variables led to problems. It might also require a classification algorithm where the generalization could be poor to new samples. Figure 1.2 shows the process of feature extraction in a image.
Figure 1.2 Face detection and facial feature extraction in a group photo
Finally, the system enters the third phase to perform recognition where the task could be to perform identification or verification. In an identification (one-to-many matching) task, the system would report an identity from a database. When an image of an unknown individual is given, the identity of the person can be determined by comparing it with the database of images of known individuals. In the verification (one-to-one matching) task, when a face image of an unknown individual along with a claim of identity is presented, the task is to determine whether the individual is who he/she claims to be.
The problems faced by any face detection system include challenges such as:
The techniques used for face detection have been investigated for many years and good advancement has been reported. Most of the face detection methods stress on detecting frontal faces with ample lighting conditions. An overview of the most representative face detection and recognition methods is presented.
1.3 SCHEMES FOR FACE DETECTION
Face detection is the method of discovering all possible faces at different locations with different sizes in a given image. It is an important first step to many advanced computer vision, biometrics recognition and multimedia applications, such as face tracking, face recognition, and video surveillance.
Adelson and Bergen (1986) investigated the technique for real-time face detection using spatio-temporal filtering. Real-time face detection means detecting a face from a series of frames from a video capturing device. Although the system requires complex hardware, from a computer vision stand point, real-time face detection is actually a simpler process than detecting a face in a still image. The reason for this could be unlike the surrounding environment, people are continually moving. People may walk around (continuous motion), blink, fidget, wave their hands about, etc. But, in real-time face detection, the system is presented with a series of frames in which face detection has to be done. By using spatio-temporal filtering (finding the difference between subsequent frames), the area of the frame that has changed can be identified and the individual detected.
Turk and Pentland (1991a, 1991b), observed that in real time face detection, the exact face location can be identified easily when the movement of head is slow and adjacent – heads won’t jump around irregularly. Thus the Real-time face detection has become a rather simple problem and it has been found that the detection is possible even in uncontrolled environments (e.g. in difficult lighting conditions, when the input image is noisy etc.).
Turk and Pentland (1991a) used Principal Component Analysis, known as the eigenfaces approach, to preprocess the gray levels. A simple way to check whether an image is really of a face has been described. The technique involves transforming an image into face space and then transforming it back (reconstructing) into the original image space. It has been found that about forty eigenfaces were sufficient for a good description of human faces and the reconstructed image has very less pixel-by-pixel errors.
Yuille et al (1992) demonstrated the face detection using templates whose size is variable (deformable templates). Instead of using several fixed size templates, a deformable template (which is non-rigid) was utilized and by changing the size of the template a face in an image could be detected.
Automatic extraction of human head and face boundaries and facial features is critical in the areas of face recognition, criminal identification, security and surveillance systems, and human computer interfacing. The concept of head and face boundary extraction for face detection applications was proposed by Samal and Iyengar (1992).
The efficient detection systems should be capable of eliminating the unwanted details like background and the areas other than face like hair that are not necessary to perform face recognition. It should be capable of extracting the useful information to perform the task. This is possible in still images by running a window across the image and is called window based technique for face detection. Brunelli and Poggio (1993) described the usage of this technique to check if a face is present inside the window.
Burel and Carel (1994) proposed a neural network method for automatic detection and localization of faces on digital images. In the technique, the large number of training examples of faces and non-faces are compressed into fewer examples. The method of Multiresolution analysis and learning by example has been used in the method. The management of the learning data has to be taken care of in order to improve the performance.
Penev and Atick (1996) proposed Local Feature Analysis (LFA) to encode the local topological structure of face images. LFA is considered as a local method as it utilizes a set of kernels to implicitly detect the local structure such as eyes, nose and mouth.
According to Yang and Huang (1997), the face detection methods can be categorized into four types: knowledge-based, feature invariant based, appearance-based, and template matching based. The facial features have been modeled in a simple way as two symmetric eyes, a nose in the middle and a mouth underneath in the knowledge-based method as observed by Yang and Huang (1997). In the feature invariant methods, the facial features which are invariant to pose, lighting condition or rotation such as, skin colors, edges and shapes are analyzed as demonstrated by Sung and Poggio (1998). Lanitis et al (1995) proposed the method of calculating the correlation between a test image and pre-selected facial templates and the technique has been classified as template matching technique. Machine learning techniques are adopted to extract discriminative features from a pre-labeled training set in the appearance-based category. One of the important methods under this category, Eigenfaces method, has been developed by Turk and Pentland (2000).
Sung and Poggio (1998) presented an example based learning approach for vertical frontal views of human faces in complex and difficult scenes. A distribution-based model of face patterns is built and a set of distance parameters to distinguish between face and non-face window patterns has been built. The suggested method could be used for feature detection and pattern recognition tasks in other problem domains.
Jeng et al (1998) proposed approach face detection based on the configuration of facial features where the detection was done with frontal-view faces and also with tilted faces. This approach has been categorized under the geometric relationship aspect. However the problem with the technique is that it doesn’t work well on images with face sizes less than 80 ´ 80 pixels as well as the images with multiple faces.
Rowley et al (1998) proposed a neural network-based upright frontal face detection system. Small windows of the image are scanned by connected neural networks and decision is made based on whether the face is contained in the window or not. The performance could be improved using multiple networks instead of a single network.
Haiyuan Wu et al (1999) demonstrated a new method to detect faces in color images based on the fuzzy theory. Two fuzzy models were created to describe the skin color and hair color, respectively. Uniform color space was used to describe the color information to increase the accuracy and stableness. The two models were used to extract the skin color regions and the hair color regions and are compared with the prebuilt head-shape models by using a fuzzy theory based pattern-matching method to detect face candidates. The proposed method failed to detect the real face and also the methods gave some false positives under some conditions.
Schneiderman and Kanade (2000) described a statistical method for 3D object detection. Product of histograms has been used to represent the statistics of both object appearance and non-object appearance. Each histogram represents the joint statistics of a subset of wavelet coefficients and their position on the object. The algorithm developed is the first of its kind that can detect human faces with out-of-plane rotation.
Nikolaidis and Pitas (2000) developed a combined approach using adaptive Hough transform, template matching, active contour model, and projective geometry properties. To detect the curves Adaptive Hough transform has been used. Template matching technique has been used locating the inner facial features and active contour model for inner face contour detection. For determining the pose accurately, projective geometry properties were utilized.
Viola and Jones (2001, 2004) proposed AdaBoost-based face detection technique. In the technique, a face detection framework that is capable of processing the images very quickly while achieving high detection rates was presented. For the fast computation of the features by the face detector, the concept of Integral Image was first introduced for image representation. A simple and efficient classifier has been built using the AdaBoost learning algorithm. The classifiers are combined in cascade which allows the background regions of the image to be quickly rejected while spending more computation on face-like regions.
Ryu and Oh (2001) developed an algorithm based on eigenfeatures and neural networks for the extraction of eyes and mouth using rectangular fitting from gray-level face images. Since eigenfeatures and sliding window were used, large training set was not required. The performance of the algorithm reduces when face images are with glasses or beard.
Wong et al (2001) introduced an algorithm for face detection and facial features extraction based on genetic algorithm and eigenfaces. Chen et al (2000) presented an algorithm to detect multiple faces in complex background. It was assumed that in the frontal-view face images, the centers of two eyes and the center of mouth form an isosceles triangle, and in the side-view face images, the center of one eye, the center of one ear hole, and the center of mouth form a right triangle. But the algorithm fails to perform well when the images are too dark or eyes are occluded by hair.
Consequently, many improved methods were researched. Li and Zhang (2004) proposed a novel learning procedure called FloatBoost method for training the classifier. FloatBoost learning uses a backtrack mechanism after each iteration of AdaBoost learning to minimize the error rate Backtracking scheme was employed for removing unfavorable classifiers from the existing classifiers. A new statistical model has been introduced for learning best weak classifiers using a stage wise approximation of the posterior probability.
Wu et al (2004) carried out a rotation invariant multi view face detection using nested structure and real AdaBoost. The whole 360-degree range has been divided into 12 sub-ranges and their corresponding view based detectors were designed separately. Experiments were conducted using CMU and MlT face datasets. Even though, the classifiers discussed above are faster and robust, still they are slow, since they are calculating the features for the entire image.
Shih and Chuang (2004) presented a new approach for the extraction of head, face and facial features using the double-threshold method. Head boundary is traced using the high-thresholded image and the face boundary using the low-threshold image. The facial features such as eyes, nostrils and mouth are extracted using X- and Y-projections.
Timo et al (2004) adopted the local binary pattern (LBP) that originated from texture analysis for face representation. In this method, LBP operator is first applied and then the resulting LBP image is divided into small regions for the extraction of histogram features. Face has been divided into blocks in this component based method. Finally the blocks can be fed to the classifiers as inputs.
Dalal and Triggs (2005) investigated and studied the requirement of feature sets for robust visual object recognition; adopting linear Support Vector Machine (SVM) based human detection as a test case. After reviewing existing edge and gradient based descriptors, it has been shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. The influence of each stage of the computation on performance is studied and a more challenging dataset containing over 1800 human images with a large range of pose variations and backgrounds has been introduced.
Jang and Kim (2008) suggested the evolutionary algorithms for face detection. The algorithm minimizes the number of classifiers without any degradation in the detection accuracy. Also the number of weak classifiers is reduced. The detector is Adaboost-based and evolutionary pruning has been employed to perform detection.
Jain and Learned-Miller (2010) pointed out that, due to the significant pose and lighting variations, face detection in completely unconstrained settings is very complicated and remains a challenging task.
1.4 SCHEMES FOR FACE RECOGNITION (USING LBP, LTP AND GABOR FEATURES)
Face is one of the most common parts used by people to recognize each other. Over the course of its evolution, the human brain has developed highly specialized areas dedicated to the analysis of the facial images. While face recognition has increased in reliability significantly it is still not accurate all the time. The ability to correctly classify the image depends on a variety of variables including lighting , pose as suggested by Gross and Brajovic (2003), facial expressions as researched by Georghiades et al (2001) and image quality as proposed by Shan et al (2003).
As one of the most successful applications of image analysis and understanding, face recognition has recently gained significant attention, especially during the past several years. There are at least two reasons for such a trend: the first is the wide range of commercial and law enforcement applications and the second is the availability of feasible technologies after several years of research as said by Zhao et al (2003). Large numbers of face recognition algorithms have been proposed for the past two decades and there are various face representation methods based on global features, including a great number of subspace based methods and some spatial frequency techniques.
In computer vision, face recognition has a distinguished advancement going as far as back as the 1960s. Bledsoe (1966) designed a man machine system in which the human faces are classified based on the fiducial marks drawn by the humans on the projected photographs and their spatial relationships (angles, lengths, ratios, etc.).
Face detection using templates based techniques have been suggested by Sakai et al (1969). A program has been written to perform face detection in a photograph by the usage of templates for mouth, nose, eyes and contour of the head. Kelly (1970) proposed a method for face detection by measuring a set of parameters out of a full length photograph of a man and the technique has been used to identify about twelve persons.
Kanade (1973) first introduced a complete automatic version of such a system. The drawback was that the recognition rate was very low even for small database images. However Goldstein et al (1971, 1972), and Kaya and Kobayashi (1972) came to the conclusion that if these features are extracted manually then acceptable results may be obtained. This prototype continued for nearly thirty years but unfortunately, recognition rates were low even on small data sets.
Baron (1981) suggested the simple technique of comparing grey-scale intensity values for face recognition. The basis of the template matching approach is to extract whole facial regions and compare those with the stored images of known individuals. Euclidean distance was used to find the closest match.
Craw et al(1987) proposed a face recognition system using geometrical features wherein the geometrical features such as nose width and length, mouth position and chin shape, etc. were computed from the picture of the face. This set of features is then matched with the features of known individuals. Euclidean distance was used to find the closest match. The advantage is that recognition is possible even at very low resolutions and with noisy images. The main drawback is that automated extraction of the facial geometrical features is difficult.
Sirovich and Kirby (1987) developed a technique using Principal Component analysis (PCA) for human face representation. In the technique, first the principal components of the distribution of faces, expressed in terms of eigenvectors were found. Each individual face in the face set can then be approximated by a linear combination of the largest eigenvectors, called eigenfaces, using right weights.
O’Toole et al (1993) demonstrated the importance of eigenfaces with large and small eigenvalues. Eigenvectors with larger eigenvalues deliver information relative to the basic shape and structure of the faces. This kind of information is useful in classifying faces according to sex, race etc. Eigenvectors with smaller eigenvalues capture information that is specific to small subsets and are useful for differentiating a particular face from any other face.
Lanitis et al (1995) described the use of flexible models for representing the shape and grey-level appearance of human faces. A small number of parameters control these models which can be used to code the overall appearance of a face for classification purposes. The parameters can be used for classification. The model parameters control both inter-class and within-class variation. Experiments were conducted using face images that shows variability in 3D viewpoint, lighting and facial expression.
Adini et al (1997) investigated and determined that the performance of face recognition systems is affected by strong variations in pose and illumination. It was found that the variation between images of different faces is smaller than taken from the same face in a variety of environments. That means changes induced by illumination could be larger than the differences between individuals, causing systems based on comparing images to misclassify the identity of the input image.
Wiskott et al (1997) investigated the pre-processing using Elastic Bunch Graph matching with Gabor filters. Extensive pre-processing and transformation of the extracted grey-level intensity values are the major highlights of the technique. Here, the illumination-insensitive feature sets are directly extracted from the given image.
Chengjun Liu and Harry Wechsler (1999) demonstrated the relative usefulness of Independent Component Analysis (ICA) for Face Recognition. Comparative assessments were made regarding ICA sensitivity to the dimension of the space where it is carried out, and (ii) ICA discriminant performance alone or when combined with other discriminant criteria such as Bayesian framework or Fisher’s Linear Discriminant (FLD).
Nikolaidis and Pitas (2000) developed a combined approach using adaptive Hough transform, template matching, active contour model, and projective geometry properties. Adaptive Hough transform has been used for curve detection, template matching for locating the facial features, active contour model for detecting the face contour, and projective geometry properties for accurate pose determination.
Bijita Biswas et al (2001) proposed a new methodology for matching of digital gray images using fuzzy membership-distance products, called moment descriptors. Three common kinds of image attributes namely edge, shade and mixed range are chosen and descriptors are estimated for these attributes.
Gross and Brajovic (2003) introduced a simple and automatic image-processing algorithm for compensation of illumination-induced variations in images. Initially, an estimate of the illumination field is computed and then the algorithm compensates for it. The algorithm provides large performance improvements for standard face recognition algorithms across multiple face databases.
Keun-Chang Kwak and Witold Pedrycz (2004) developed a method for recognizing face images by combining wavelet decomposition, Fisherface method, and fuzzy integral. Wavelet decomposition was used in the first stage for extracting the intrinsic features of face images which led to four sub images. Fisherface method was applied to the four decomposed images. The conclusion was that the approach using fisherfaces was better than the PCA method. Mario I. Chacon et al (2007) presented a new approach to design a fuzzy face recognition system. Face feature lines, incorporated in the feature vector are used to design the pattern recognition system. Besides the face feature lines the feature vector incorporates eigenvectors of the face image obtained with the Karhunen-Loeve transformation. The fuzzy face recognition system is based on the Gath-Gheva fuzzy clustering method and the Abonyi and Szeifert classification scheme. The features are incorporated according to face recognition on newborns.
Jianming Lu et al (2007) presented a method for face recognition based on parallel neural networks. A new technique of face recognition based on fuzzy clustering and parallel NNs have been proposed. The face patterns were divided into small scale neural networks based on fuzzy clustering and is combined to obtain the recognition result.
Aruna Chakraborty et al (2009) presented a fuzzy relational approach to human emotion recognition from facial expressions and its control. External stimulus has been used to excite specific emotions in human subjects and the facial expressions were analyzed by segmenting and localizing the individual frames into regions of interest. Selected facial features such as eye opening, mouth opening, and the length of eyebrow limitation are extracted from the localized regions, fuzzified, and mapped onto an emotion space by employing Mamdani-type relational models.
Vishwakarma and Gupta (2010) illustrated a new approach of information extraction for face recognition system based on fuzzy logic. Fuzzification operation using Ï€ membership function has been applied to extract the pixel wise association of face images to different classes. Nearest neighbor classification using correlation coefficient and principal component analysis were used to obtain the classification error.
Seyed Mohammad Seyedzade et al (2010) proposed an approach for face recognition using Symlet decomposition, Fisherface algorithm, and Sugeno and Choquet Fuzzy Integral. Recognition is performed using the extracted intrinsic facial features.
Shreeja et al (2011) suggested techniques for comparing the face recognition methods using neural networks and Neuro-fuzzy. Feature extraction is performed using curvelet transform. Feature vector is obtained by extracting statistical quantities of curve coefficients. The technique is time consuming since most of the time is spent in training the network.
Manish Gupta and Govind sharma (2012) developed an efficient face recognition system based on sub-window extraction algorithm and recognition based on principal component analysis (PCA) and Back propagation algorithm. In extraction phase, face images are captured and then enhanced using filtering, clipping and histogram equalization. Using Sobel operator, the image is converted into edge image. In recognition phase, back propagation algorithm (BPA) and PCA algorithm have been used.
1.5 SCHEMES FACE RECOGNITION USING LOCAL GLOBAL FEATURES
Chengjun Liu and Harry Wechsler (2002) considered the usage of Gabor-Fisher Classifier (GFC) for face recognition. The technique involves the derivation of Gabor feature vector and its dimensionality is further reduced using the Enhanced Fisher linear discriminant model. The algorithm has been tested on face recognition system using 600 FERET frontal face images corresponding to 200 subjects, which were captured under variable illumination and facial expressions.
Kepenekci and Akar (2004), proposed a new approach to feature based frontal face recognition with Gabor Wavelets and Support Vector Machines (SVM). Using the local characteristics of each individual face, the feature points are automatically extracted. The identity of a test face is found as follows. Using the support vector machines, the labels of each feature vector of the test face is found and the decision is made by considering all of those labels.
Timo et al (2004) suggested a new method for face representation using texture analysis. The local binary pattern (LBP) that originated from texture analysis has been used for the representation. In this method, LBP operator is first applied and then the resulting LBP image is divided into small regions from which histogram features are extracted.
Ahonen et al (2006) demonstrated the usage of local binary pattern (LBP) texture features for efficient facial image representation. LBP feature distributions are extracted after dividing the face image into several regions. Then the distributions are combined to form an enhanced feature vector to be used as a face descriptor. The method is evaluated in the face recognition under different challenging conditions.
Savvides et al (2006) determined a technique for improving the face recognition performance on a large database with over 36,000 facial images. The focus is mainly on the images that were captured in uncontrolled conditions. A novel approach using discrete cosine transform features which improves the performance significantly has been proposed. The conclusion was that working in the Discrete Cosine Transform (DCT) transform domain is more optimal than working in the original spatial-pixel domain which only yields a lesser verification rate. Hwang et al (2006) proposed the Spatial-frequency technique of feature extraction using Fourier transform.
Alice et al (2007) investigated and compared seven contemporary face recognition algorithms with humans on a face-matching task. The algorithms have been tested and compared extensively with each other. Humans and algorithms determined whether pairs of face images, taken under different illumination conditions, were pictures of the same person or of different people. Although illumination variation continues to challenge face recognition algorithms, current algorithms compete favorably with humans.
Hassan et al (2007) introduced a new facial feature extraction approach using Wash-Hadamard Transform (WHT). The Transform is based on correlation between local pixels of the face image and involves simple computation. The approach, WHT, has been compared with PCA and DCT. Despite simple computation, the proposed WHT algorithm gave very close results to those obtained by the PCA and DCT.
Abusham et al (2008) presented a novel approach for face recognition to address the challenging task of recognition using integration of nonlinear dimensional reduction and confirmed that high recognition rate could be obtained using the method.
Yu Su et al (2009) proposed a novel face recognition method that uses both global and local discriminative features. Global features are extracted from the whole face images using the low-frequency coefficients of Fourier transform. Gabor wavelets are used for local feature extraction. Fisher’s Linear Discriminant (FLD) is separately applied to the global Fourier features and each local patch of Gabor features. Finally, all these classifiers are combined to form a joint classifier.