Biometric systems make use of the physiological and/or behavioral features/traits of an individual for recognition purpose (Jain et al 2004). Chapter 1 introduces biometric systems, its modules, functionalities and types. Before going into multibiometrics, this Chapter discusses the design and development of unimodal systems, using appropriate methods with exhaustive study suitable for every biometric trait used in this thesis. It presents unimodal designs built for various biometric traits namely iris, face, ear, hand vein patterns. This Chapter also presents the performance evaluation of each biometric trait for a recognition task using public datasets. A review of some of the related works for each trait which have been done in the field is presented in respective sections.

The iris is a thin circular diaphragm in a human eye, lies in between cornea and lens, positioned around a circular aperture named pupil. Iris is used to check the amount of light entering through the pupil caused by the sphincter and the dilator muscles, in order to adjust the size of the pupil during dilation. The mean diameter of the iris is of approximately 12 mm and the size of the pupil can change from 10% to 80% of the iris diameter (Daugman 1993, Libor 2003).

Wolff (1976) has described the anatomy of the eye and its functions in a spectacular way. The iris consists of an epithelium layer and a stromal layer and contains blood vessels, pigment cells and two muscles. The visible surface of the multi-layered iris contains two zones, and often differs in color. An outer ciliary zone and an inner papillary zone, and these two zones are separated by the zigzag pattern known as collarette. This pattern formation begins in the third month of an embryonic stage and during the first year and pigmentation for a few more years. These iris patterns are completely independent and random due to its epigenetic nature which is stable throughout adult life. Even identical twins possess uncorrelated patterns and iris acts as a protected organ. Hence, these characteristics make iris a successful biometric trait for identifying individuals. Evidently, iris recognition has already been put to use by governments and large corporations as a security measure.

Iris recognition is considered the most reliable form of biometric technology compared with other biometric technologies such as face, speech and fingerprint recognition systems (Sanderson & Erbetta 2000). Most commonly used recognition frameworks are based on a concept patented by Leonard Flom & Aran Safir in 1987 for iris recognition and the work of John Daugman, presented in 1993 being the first accurate algorithm for iris biometrics. John Daugman (1993; 2004; 2007) used integro-differential operator to segment the iris region. The iris region was located, normalized to a constant dimension to produce an iris code template for the input iris image. A metric called normalized Hamming Distance (HD) was used to measure the fraction of bits in which two iris codes disagree. A low normalized Hamming distance implies a strong similarity of iris codes. Daugman has made numerous improvements to this original algorithm.

Iris can be processed for recognition using different methods such as Phase-based methods (Daugman 1993; 2003; 2004), Texture-analysis based methods (Wildes 1994; Wildes et al 1997), Feature based methods, Appearance based methods, Zero-Crossing representation method (Boles & Boashash 1998), Approach based on intensity variations (Li Ma et al 2003); some other approaches were also introduced based on many other factors like occlusion, lighting, number of pixels on the iris that affect the image quality (Bowyer et al 2008).

An automatic method for iris localization was presented by Ibrahim et al (2012) in two stages. Initially, the pupil was localized using a circular moving window which finds gray levels enclosing the pupil. Later, the eyelashes were removed by median filtering and the iris boundary was estimated by taking the gradient of rows within the pupil.

The most critical regions of iris recognition are iris segmentation and occlusion detection. A lot of improvements have been proposed by many researchers in iris segmentation (Ross & Shah 2006; Arvacheh & Tizhoosh 2006; Ryan et al 2008) and a few have focused on off angle iris images. Thus, many segmentation algorithms were developed to deal off angle images (Abhyankar et al 2005; Xin Li 2006; Bowyer et al 2008; Khan et al 2011). The next critical part in iris recognition is occlusion detection, i.e., eyelashes and eyelid detection. A few research papers dealt with the detection of the eyelids and eyelashes by various approaches such as Boles & Boashash (1998); Kong & Zhang (2003) and Huang et al (2004). The following subsections present the modules of iris recognition system in detail. Every input image is normalized and enhanced using a median filter that removes impulsive noise and preserves sharp edges. The obtained median value will be the value for that pixel in the output image.

Iris is an annular part between the pupil (inner boundary) and the sclera (outer boundary). The first and foremost step in iris recognition is to find the accurate pupil and the iris boundary in the given input eye image and the iris portion is segmented for further processing. Iris segmentation takes place in two steps:

Iris Localization

Iris Normalization

Iris localization refers to the detection of iris boundary. This helps locate the unique portion of the eye, i.e. iris; however, the preliminary step is to detect pupil which is the black circular part surrounded by iris tissues. The center of the pupil is used to detect the outer radius of iris patterns. If eyelids and eyelashes cover the iris region, then they are eliminated first and only the unique iris portion is considered. There are various existing iris localization algorithms in the literature, a few of them being Integro-Differential operator, circular Hough Transform and edge and contour detectors like canny and sobel. Based on several experiments using different edge detection methods and finally in this design, the Iris localization is performed by the following suitable methods:

Canny Edge Detector for detecting all edges in the eye

Circular Hough Transform for circular boundary detection

Canny edge detector

Edge detection aims at identifying points in a digital image in which the image brightness changes sharply. The purpose of edge detection is to significantly reduce the amount of data in an image, while preserving the structural properties to be used for further processing. Sobel method, Prewitt method, Roberts method, Laplacian of Gaussian method, Zero Crossing method and Canny method are some of the edge detecting operators. Canny edge detector is used here for robust and optimal edge detection. Compared to other edge detection methods, Canny edge detector provides robust edge detection, localization and linking. Canny operator uses both the Gaussian smoothing and the derivative function to obtain optimal edges (Canny 1986). The canny edge detecting algorithm has five different separate steps:

Smoothing:

To prevent edges from noise, noise is reduced by smoothening the image by applying a Gaussian filter.

S(x,y)=I(x,y) * G(x,y,Sigma) (2.1)

Finding gradients:

The Canny algorithm finds edges where the grayscale intensity of the image changes most, which is determined by calculating the gradient magnitude of the image.

The gradient magnitude M(x,y) and the orientation Î(x,y) are calculated as follows:

M(x,y)=â((X^2+Y^2 ) ) (2.2)

Î(x,y)=arctan(-Y,X) (2.3)

Non-maximum suppression:

Nonâ”maxima suppression is done to obtain thin edges which are one pixel wide. The pixels are suppressed if they donât constitute the local maxima. Non-maxima suppression is performed using an image orientation. It is assumed that the orientation image gives feature normal orientation angles in degrees. Bilinear interpolation is used to estimate intensity values at pixel location on each side of a pixel to determine if they are local maxima.

Steps:

For each orientation angle in the gradient image, each compare pixel value to its neighboring pixel values.

Retain the pixel values which are greater than its neighboring pixels.

Remaining pixels which do not satisfy the local maxima are suppressed.

Double thresholding:

The Canny algorithm uses double thresholding method to collect all edges. Edge pixels which are stronger than the high threshold are marked as strong; weaker than the low threshold is suppressed; between the two thresholds are pinned as weak.

Edge tracking by hysteresis:

Strong edges are interpreted as “certain edges” which are to be included in the final edge image. Weak edges are included only if they are connected to strong edges.

Steps:

Find the indices of pixels whose value is greater than the higher threshold T1 and is marked as edge point and store it in a stack.

Find the neighborhood pixels and check whether their value is greater than threshold T2 for each edge point stored in the stack and if pixel value is connected with an earlier detected edge point. Mark it as edge point.

Finally mark â0â for remaining pixels which are not edge.

In general, Hough transform can be used on any kind of shape, although the complexity of the transformation increases with the number of parameters needed to describe the shape. Hough transform can be described as a transformation of a point in the x,y-plane to the parameter space. The parameter space is defined according to the shape of the object of interest. Circular Hough Transform is a classified form of Hough transform (HT) capable of extracting circular objects from an image. Circular Hough Transform represents the circle in parameter space (Figure 2.5). The parametric representation of the circle is

x=a+r cosâ¡ãÎ¸ ,y=b+r sinâ¡Î¸ ã (2.4)

By the property of a circle, if (a, b) is the center of a circle and r is the radius, then a point (x, y) lies on the circle if:

r^2=ã(x-a) ã^â(2@ )+(y-b)^2 (2.5)

Similarly, if it is

< r2, the point lies inside the circle > r2, the point lies outside the circle

The above formulae are useful in masking unwanted regions outside/inside a circular boundary.

Algorithm:

Circular Hough transform (Farouk 2011) include the following steps to find circles in an image (Figure 2.6).

1) Detect the minimum and the maximum radii of the circle from the input image.

2) Find edges in the input image using a suitable edge detector (canny).

3) Assign minimum radius value as ârâ.

4) For each edge point in the image,

Draw a circle with center in the edge point with radius r and increment all coordinates, such that the perimeter of the circle passes through in the accumulator.

5) Increment value of r by 1.

6) Repeat step 4 till r value equals the maximum radius of circle to be found.

7) Find one or several maxima in the accumulator.

8) Map the found parameters (r, a, b) corresponding to the maxima back to the original image.

Variant conditions of image capturing can influence the size of the iris, which must be processed in the system to compensate the stretching of the iris texture as the pupil changes in size and to have a new model of the iris, which removes the non-concentricity of the iris and the pupil. Daugmanâs rubber sheet model overcomes pupil dilation and size inconsistencies by representing the iris region in normalized form with constant dimensions (Daugman 2004).

The rubber sheet model maps each point in the iris region to a pair of polar coordinates (r,Î¸) where r is in the interval [0,1] and Î¸ is an angle [0,2Ï]. The remapping of the iris region from Cartesian coordinates (x,y) to the normalized non-concentric polar representation which is modeled as

I(x(r,Î¸),y(r,Î¸))â’I(r,Î¸) (2.6)

x(r,Î¸)=(1-r) x_p (Î¸)+rx_i (Î¸) (2.7)

y(r,Î¸)=(1-r) y_p (Î¸)+ry_i (Î¸) (2.8)

Where, I (x, y) is the iris region,

(x,y) is the original Cartesian coordinates (r,Î¸ ) are the corresponding normalized polar coordinates,

ã xã_i,y_i and ã xã_p,y_p are the coordinates of the iris and pupil boundaries along the Î¸ direction.

In this way, the iris region is modeled as a rubber sheet with the pupil center as the reference point. Two images of the same iris might be very different as a result of the size of the image, the size of the pupil and the orientation of the iris. To cope with this, the image is normalized (Figure 2.7) by converting it from Cartesian coordinates to polar coordinates which helps to remove the occlusions like eyelashes, etc.

1) Input the segmented iris with noise pixels set to NaN.

2) Calculate displacement of the iris center with the pupil center.

3) If displacement > 0

Use the following remapping formula to convert the radius of the iris region as a function of Î¸:

râ = âÎ±Î² Â± âÎ±Î²2-Î±-rI2 (2.9)

with

Î± = ox2+oy2 (2.10)

Î² = cos(Ï â” arctan(oy/ox) â” Î¸) (2.11)

Where, displacement of the center of the pupil related to the center of the iris is given by ox , oy, and r’ is the distance between the pupil and the iris edges at an angle Î¸ around the region and rI is the radius of the iris.

4) Now, the center of the pupil is considered the reference point and a set of radial vectors is drawn at equal intervals through the iris region. The number of radial vectors is determined by the angular resolution.

5) A number of data points are to be selected along each radial line, determined by the radial resolution.

6) Create a normalized polar representation by storing the Cartesian location of each data point in the iris region.

7) Data points which pass on the pupil border or the iris border are discarded to prevent non-iris region data from corrupting the normalized representation.

8) Create a mesh grid of the input image and interpolate with the normalized representation to extract the intensity values, which yields the normalized iris representation (Figure 2.8).

9) Identify the NaN regions in the iris and create a noise array based on their coordinate location.

10) Output both the polar array and the noise array.

2.2.3 Feature Extraction

The most discriminating information present in an iris pattern must be extracted to provide an accurate recognition result wherein only the significant features are encoded to form a biometric template. Most iris recognition systems uses a band pass decomposition method to create a biometric iris template. The template generated in the feature encoding process will require a suitable matching metric to measure the similarity between the two iris templates. This measure should obtain two different ranges of values when templates are compared. One range of values for intra-class comparisons, i.e., same instance and the other range of values for inter-class comparisons, i.e., different instance. These two cases should generate distinct and separate values so that a final decision can be made with high confidence as those two templates are from the same iris, or from two different irises.

Gabor filters are able providers of optimum conjoint representation of a signal in space and spatial frequency. A Gabor filter is created by modulating a sine/cosine wave with a Gaussian. This provides optimum conjoint localization in both distance and frequency, as the sine wave is perfectly localized in frequency but not localized in space. Modulation of the sine with a Gaussian provides localization in space, but with the loss of localization in frequency. Decomposition of a signal is carried out using a quadrature pair of Gabor filters: with a real part specified by a cosine and an imaginary part specified by a sine, modulated by a Gaussian. The real and the imaginary filters are also referred as the even symmetric and the odd symmetric components respectively (Figure 2.9).

The frequency of the sine/cosine wave determines the center frequency of the filter and the width of the Gaussian specifies the bandwidth of the filter. A 2D Gabor filter (Sanderson & Erbetta 2000) over an image domain (x,y) is represented as

ãG(x,y) = eã^(-Ï[(x-x0)2/Î±2+(y-y0)2/Î²2) ) e^(-2Ïi[u0(x-x0)+v0(y-y0))) (2.12)

where (x0, y0) specify the position in the image, (Î±, Î²) specifies the effective width and length and (u0, v0) specify modulation, which have a spatial frequency Ï0= â(u02+v02).

The output of the Gabor filters is demodulated to compress the data by quantizing the phase information into four levels, for each possible quadrant in the complex plane, because it provides the most significant information in an image rather than amplitude information. Consideration of phase information allows the encoding of discriminating information in the iris, while discarding redundant information, such as illumination, which is represented by the amplitude component. These four levels are represented using two bits of data, so each pixel in the normalized iris pattern corresponds to two bits of data in the iris template.

The polar coordinates are obtained in normalization process, therefore in polar form the filters are given as,

ãH(r,Î¸) =eã^(-iÏ(Î¸-Î¸0) ) e^(-(r-r0)2/Î±2 ) e^(-i( Î¸-Î¸0)2/Î²2) (2.13)

Where, (Î±, Î²) are the effective width and length and (r0,Î¸0) specify the center frequency of the filter.

The demodulation and phase quantization process can be represented as,

h{Re,Im} = sgn{Re,Im}â«_”Ï” ^” ” â'” ” â«_Ï^” ” â’ãI(Ï,Ï) ã e^(-iÏ(Î¸0-Ï) ) e^(-(r0-Ï)2/Î±2 ) e^(-( Î¸0-Ï)2/Î²2) (2.14)

where, h{Re, Im} can be regarded as a complex valued bit whose real and imaginary components are dependent on the sign of the 2D integral and

I(Ï, Ï) is the raw iris image in a dimensionless polar coordinate system (Lee 1996) and finally iris template is generated for the respective samples of the instances (Figure 2.10).

The weakness of the Gabor filter is that the even symmetric filter will have a DC component whenever the bandwidth is larger than one octave (Field 1987). However, zero DC components can be obtained for any bandwidth by using a Gabor filter which is Gaussian on a logarithmic scale known as the Log-Gabor filter.

The frequency response of a Log-Gabor filter is given as,

G(f)=exp((-logâ¡(f/f_0 )) ^2)/(2logâ¡(Ï/f_0 )) ^2 )) (2.15)

Where, f0 represents the center frequency, and Ï gives the bandwidth of the filter. Convolve the filter with the normalized iris image for the given the wavelength and bandwidth and produce the iris template and corresponding noise mask (Figure 2.11).

The iris template generated for the test sample is compared against the templates stored in the database during enrollment using Hamming Distance (HD) method to determine whether it is one among them.

Hamming distance is used as a metric for the process of matching in iris recognition. Hamming distance gives a measure of similarity between two bit patterns. Comparing the bit patterns D and P, Hamming distance (HD) (Daugman 2001) is defined as the sum of conflicting bits over N, which is the total number of bits in the bit pattern.

HD = 1/N â’_1^Nâ’D_i â¨ P_i (2.16)

Hamming distance algorithm employed additionally incorporates noise masking. Hence, only significant bits are applied in calculating the Hamming distance between the two iris templates. When taking the Hamming distance, unique bits in the iris pattern that corresponds to â0â bits in noise masks of both iris patterns are applied in the calculation. The Hamming distance are computed using unique bits generated from the true iris region, resulting in modified Hamming distance (Aly et al 2010) which is given as,

HD=( â’_i^Nâ’ã(T_(i )â¨ ã P_i) â© M_i^Tâ©M_i^P )/(N- â’_i^Nâ’ãM_i^Tâ©M_i^P ã) (2.17)

Where N is the number of bits represented by each template,

T and P are the two templates to be compared and

MT and MP are the corresponding noise masks.

In order to handle the noise in the input image, threshold based HD is used in this work. The steps are as follows:

1) Input two templates and their corresponding noise masks. Convert them to logical values.

2) Let s = – (number of shifts to be performed on the template).

3) Shift template_1 and mask_1 by s bits.

4) Perform logical AND between mask_2 and shifted mask_1 and let the result be mask.

5) Perform logical XOR between template_1 and template_2 only for those bits that have a mask value of 0.

6) Compute the hamming distance by finding the number of ones in the above result and dividing it by the total number of bits considered.

7) Increment s by 1.

8) If the value of s is greater than the number of shifts to be performed, return the lowest hamming distance found as the final result. Otherwise, go to step 4.

2.3 FACE AS A BIOMETRIC

Informally, face is used as an indexing factor to recognize individuals; it can be used formally as a trait in biometric recognizing system, using technology. Face can be captured from a distance, compared to other traits, and face recognition can be applied without the knowledge of the person. For example, it can be used to find missing children or tracking criminals or unknown subject using surveillance cameras. Face recognition method uses the spatial and geometrical features of the given face image such as structure, shape, proportions, distance between the eyes, nose, mouth and jaw, the sides of the mouth, location of the nose and eyes, etc.

The behavioral patterns of the face may vary due to stress, fatigue or illness. Face recognition possesses high accuracy and low intrusiveness. Though face recognition cannot compete with other biometric technologies like iris or fingerprint systems which have achieved low error rates, no other biometric technologies can match the convenience of identification and cooperation level given by the individual.

Though face recognition has been in use since 1960, the first automatic face recognition system was developed by Kanade in 1973. Face recognition definitely symbolizes a good compromise between social acceptance and reliability. It also balances security and privacy, compared with other successful biometric such as iris and fingerprint. The automated system was adopted and funded by the government agencies, especially, to identify individuals involved in terrorist attacks. There are several other security and forensic applications which require the use of face recognition like security checks in airports.

Face recognition can be done in manual based method and machine based method. Manual face recognition employs both global and local features of a human face for recognition of an individual (Zhao et al 2003) whereas a machine based face recognition system utilizes geometric or statistical features derived from face images (Belhumeur et al 1997; Batur

et al 2001; Belkin & Niyogi 2003; Liu et al 2005). In the literature, face recognition is broadly categorized into two types, namely Appearance based and Feature based approaches (Zhao et al 2003). Global and local features play an important role in occluded face recognition. Face recognition methods under partial occlusions are classified into three groups.

Feature based methods â” This deals with features like eyes, mouth and nose.

Part based methods â” This is based on holistic features of face image.

Fractal based methods â” This deals with the hybrid local and global features.

Face detection has become an important challenge to researchers due to increase in security concerns and use of applications in human computer interaction, surveillance, etc. Some of the main applications of face recognition are document control, access control and database retrieval. In the early days of development in the biometric field, Turk & Pentland (1991) have presented an excellent paper stating the importance of Eigen faces, and Brunelli & Poggio (1993) have suggested a holistic approach of using template matching for face recognition. Navarrete & Ruiz (2002) have presented the comparison of appearance based approaches, and finally a genetic algorithm was used to compute an optimal subspace and tested with different distance metrics.

In literature, there are a number of techniques available for face detection and recognition (Samal & Iyengar 1992; Valentin & Abdi 1994; Chellappa et al 1995; Zhao et al 2003; Hjelmas & Low 2001; Yang e al 2002). Some of the methods have obtained excellent performances in achieving accuracy and reducing processing time. Several survey papers have been presented summarizing the various approaches designed and executed by researchers for 2D and 3D face detection and recognition processes.

Some of the 2D face databases designed to address above challenges is FERET (Philips et al 2000),

CMU-PIE (Sim et al 2003), AR Faces (Martinez & Benavente 1998), Yale Face database, etc.

Many of the face detection and recognition approaches designed by researchers focus on improving the performance of a face biometric are summarized in Figure 2.13 and Figure 2.14. In recent years, substantial progress has been made in the area of face recognition with the development of many techniques and approaches. Even though many techniques give substantial improvement, the problem due to uncontrolled noisy environment remains unsolved. Noise in imaging systems is usually either additive or multiplicative. The nature of noise has to be identified before the features are extracted. The various forms of noise identified in practice are Gaussian or amplifier, impulsive or salt and pepper, shot, film grain, quantization and non-isotropic noises. A detailed study of several noise removal filtering algorithms is presented by Al-Khaffaf et al (2008) which helps to determine the type of filtering required for rectifying the noisy image accordingly to the type of noise involved.

The input is preprocessed using a median filter to remove the impulsive noise that exists and also to preserve the edge information.

The given input image is preprocessed using median filter and Gaussian filter to reduce the noise effect. Then the ROI – face region is detected using adaptive boosting method, a Viola-Jones algorithm (Viola & Jones 2004), the first effective and computationally intensive real-time face detection method where three ingredients working in concert to enable a quick and accurate detection:

Integral image for feature computation,

Adaboost for feature selection

Attentional cascade for efficient computational resource allocation.

Viola-Jones algorithm typically gives multiple detections and can be applied to color images.

The Integral Image: Integral image, also known as a summed area table, is an algorithm for quickly and efficiently computing the sum of values falling top and left of that corresponding pixel and use four corner values for computing for a particular region like a rectangle subset of a grid. It was first introduced to the computer graphics field by Crow (1984) for use in maps. Viola and Jones applied the integral image for rapid computation of Haar-like features (Viola & Jones 2004).

AdaBoost Learning: Boosting is a method of finding a highly accurate hypothesis by combining many “weak” hypotheses, each with moderate accuracy. Adaboost is used to select some important relevant features when redundant features exist. When these features are found, a weighted combination of all these features are used in evaluating and deciding any given image has face or non-face (Viola & Jones 2004).

Attentional Cascade: This is a critical component in the Viola- Jones detector. The objective of each sub window is to determine whether a given input window is definitely not a face or may be a face. If it is not a face, that particular sub-window is discarded immediately. In general, this classifier splits up all the features into several stages where each stage has a certain number of features that is the strong classifier. Choose the number of stages, features and threshold for each strong classifier to train a cascade (Viola & Jones 2004).

2.3.3 Feature Extraction

The face region contains all the necessary information to uniquely identify a person. Some existing methods dealing with face recognition from a single intensity image are broadly classified into 3 types.

Holistic methods â” It is to identify a face using a whole face image as input, a few methods being 2DPCA, noise model, discriminant Eigen face, etc.

Local Methods â” It uses local facial features for recognition, a few methods being neural network method, hidden Markov model, local binary model, etc.

Hybrid methods â” This method uses both local and holistic features. Virtual samples and local features come under this method.

Appearance based approach has been implemented for extracting facial features and tested with few public face datasets. The unimodal design for face recognition is designed and developed by extracting the Eigen faces to get the ultimate face features and as well, local features of face are extracted using Gabor filter similar to iris feature extraction. The extraction of Eigen faces using PCA is discussed in detail.

The features of the detected face region were extracted using effective approach Principal Component Analysis (PCA). PCA is a statistical technique that has found application in fields such as face recognition and image compression and is a common technique for finding patterns in data of high dimension (Turk & Pentland 1991). PCA is a powerful tool for analyzing data.

Compute the eigenvector and Eigenvalue of the covariance matrix as follows:

(A â” Î»)V =0 (2.18)

Where Î» is an Eigenvalue and V is an eigenvector. The eigenvectors of the covariance matrix provide us with information about the patterns in the data. Eigenvector with the highest Eigenvalue is the principal component of the data set. So, the Eigenvalues should be ordered from highest to lowest. This will result in the components in order of importance. Ignore the components that are less significant. To create feature vector, take the eigenvectors that we want to maintain from the list of all eigenvectors and build a matrix with these eigenvectors in the columns.

Eigenvectors possess the following properties:

They can be determined only for square matrices.

There are no eigenvectors (and corresponding Eigenvalues) in n Ã n matrix.

All eigenvectors are perpendicular, i.e. at right angle with each other.

Advantages of PCA:

PCA extracts the relevant information from the images which may or may not be directly related to the local information associated with the image.

In order to reduce the computation and space complexities, every image represents the whole face using a small number of parameters.

PCA captures a set of features that characterizes the global variation among the images. These set of features account for the most variance in the training set.

Feature vectors are extracted as follows:

Prepare a training dataset. This contains non-square M x N image matrix.

Reshape the 2D image in the dataset into 1D image vectors of size 1 x MN, simply by concatenating the rows of pixels in the original image.

The row-wise mean is calculated for the entire dataset.

For PCA to work properly, subtract the mean from each dimension and place the result in matrix A.

To calculate the Eigenvectors and Eigenvalues, the matrix must be a square. Construct n x n covariance matrix A’*A.

Find the Eigenvalues and eigenvectors of the dataset by solving equation (2.18). They reveal the hidden properties of A. Typically, there will be n different solutions to the above equation, so there will be n paired Eigen values and eigenvectors. The Eigenvalues will be real and positive (because A is symmetric).

The Eigenvectors are arranged in the order of decreasing Eigenvalue. That is, the first Eigenvector corresponds to the largest Eigenvalue. This gives the components in the order of significance.

The Eigenvector with the highest Eigenvalue is the principle component of the data set.

To reduce the dimensionality, ignore the components that are less significant. Hence, choose only the first p Eigenvectors (p<n).

Feature vectors are formed by taking the p Eigenvectors and form a matrix with an Eigen vector as a column

ãFeature vector= (eigã_(1,) ãeigã_2….ãeigã_n) (2.19)

This is the final step in the PCA. Once the feature vector is formed, take the transpose of the vector and multiply it with the original data set. This will give the original data in terms of the vectors.

The distance between the feature vector generated for the test sample and reference templates are computed using Euclidean distance (ED). If the distance is higher than a threshold, the person is rejected. Otherwise, the system outputs a match, i.e., the person claimed for identity is verified. The Euclidean distance measure can be used for finding similarity between two images and also order a set of images based on distance. If the Euclidean distance between any pair is zero, it represents the perfect match. The generalized Euclidean distance between n-dimension points such as a=(x_1,x_2…x_n ) and b=(y_1,y_(2…) ) are defined as follows:

ED(a,b)=â(â’_(i=1)^nâ'(x_i-y_i )^2 ) (2.20)

The steps are as follows:

First, the original images of the training set are transformed into a set of Eigen features vectors E1.

For an unknown image, calculate the feature vectors E2.

Calculate the Euclidean distance D between the Eigen feature vectors E1 and E2 using the equation (2.20).

If the distance D exceeds some threshold value, then the vector of the unknown image lies away from images in the dataset. In this case, the person is rejected. Otherwise, the person is considered as authenticated.

2.4 EAR AS A BIOMETRIC

Geometry and shape of the ear has been observed to have significant variation among individuals (Jain et al 2004). The various reasons for choosing ear as biometric are: i) Ear recognition does not suffer from the problems that are associated with other non contact biometrics, such as face recognition, ii) The ear is suitable and a promising trait for combination with face biometric especially during pose variations, iii) The ear acquired in surveillance videos can be used for human recognition where the face may be occluded in part or entirely. The structure of the ear (Figure 2.18) is unique and permanent, as the appearance of the ear does not vary over the life period of a human. Similar to face biometric, acquisition of ear images does not inevitably need a subjectâs full cooperation. The shape of the ear was highly valued by forensics to identify criminals. However, the ear appears to degrade slightly with age (Meijerman et al 2007; Sforza et al 2009). Even though current ear detection and recognition systems have reached a certain level of success, it is limited to variation in illumination. A few open research problems exist to address namely hair occlusion, ear symmetry and ear classification. The essential properties satisfied by ear biometric are given in Table 2.9.

2.4.1 Related Research

Many papers on ear biometrics are presented by Pun & Moon (2004), Yan & Bowyer (2005), Choras (2007), Islam et al (2008), Antakis (2009), Pflug & Busch (2012), Abaza et al (2013), summarizing the various approaches to ear detection and recognition in 2D and 3D images. Abaza et al (2013) has provided an exceptional survey on ear recognition. Their study encompasses the history of ear biometrics, databases and review of 2D and 3D ear recognition systems.

Ear recognition developed by Ali et al (2007) used wavelet transform for feature extraction. The ear images are cropped manually from side head image of a person and the cropped ear image is normalized. Haar wavelet transform is applied to extract the feature vector of size 256 bytes. Matching is performed based on Euclidean distance measurements.

MichaÅ, Choras (2007) presented an automated approach for ear biometric system. To detect the contour, the difference between the maximum value and minimum value of pixel intensity is compared with the threshold. The threshold is formed from the mean and standard deviation of each ear region. The centroid of the image is chosen as a reference point for geometrical feature extraction.

Li Yuan & Zhi chun Mu (2005) proposed a method for feature extraction from ear image using Neighborhood Preserving Embedding (NPE) algorithm. Ear image is divided into overlapping sub windows. Then NPE algorithm is applied to obtain a projection matrix for each sub-window region. Feature vector is extracted and dimensionality is reduced using projection matrix.

Ear detection is an essential step for the automated ear recognition system. Many have achieved this manually and there were approaches for fully automated ear detection as well. Table 2.10 describes some of the ear detection and recognition methods proposed in the literature. Basically, the shape features add value to ear biometric.

2.4.2 Ear Detection

An acquired ear image has uneven illumination and often it is surrounded by hair. Therefore, Ear recognition system performs automated segmentation of region of interest, the ear for the extraction of features. The block diagram of ear segmentation method is shown in Figure 2.20 and the outputs obtained are shown in Figure 2.21.

Noise Suppression

The acquired ear image is first subjected to a preprocessing step that removes the noise. Noise suppression is carried out by smoothing the ear image with an averaging filter that blurs the image and removes the irrelevant information.

Binarization

The resultant image is binarized in order to convert a gray level image into a binary image. Binarization separates the region of interest from its background. Binarization is carried using Otsuâs threshold (Otsu 1979).

Morphological operations

Morphological operation is used for extracting image components that are useful in representation and description of region shape. The shape information is obtained using a structuring element, S. The shape and size of the structuring element are selected based on prior information of acquired images. The morphological operation (Hsiao et al 2005) is carried out as follows:

The preprocessed image is first subjected to closing operation to enhance the images for edge detection. The closing operation is a dilation followed by erosion. The dilation Iï ï… S expands the image where erosion Iï Æ S removes the unwanted details from the image. The closing operation smoothens the contour and fills the gap in the contour.

Finally the edge E(x, y) is detected by subtracting the result of the closing operation from preprocessed image I(x,y).

E(x,y)=I(x,y)- (I(x,y) S) Æ S (2.21)

Mask Generation

The edge information obtained from series of morphological operation is binarized. The next step is to generate mask for the binarized image. This task is done to remove the skin region. Mask for an image is generated by performing mathematical operations over the image. The mask applied over the image is like filtering to remove the noisy part. Mask when moved over the image, it will have the effect of the mask.

Multiplication

The resulting mask is combined with the binarized form of original image generated earlier to completely eliminate the areas surrounding the ear.

Post processing

The resulting image is further subjected to grayscale morphological operations to remove the noise and to fill the gaps in the boundaries of the ear region.

2.4.3 Feature Extraction and Matching

The ear recognition is investigated using different features based on template matching, shape based features and geometrical moment invariants.

In template based matching method, the extracted ear region contains all the necessary information to uniquely identify a person. The features of the detected ear region are extracted using suitable feature extraction methods presented in iris section, namely PCA and Gabor for testing and analyzing the effect of ear biometric by calculating the distance between the feature vectors. If the distance is higher than a threshold, the person is rejected. Otherwise, the system outputs a match, i.e., the person claimed for identity is verified. The detailed steps of matching process were described in iris section.

Since the geometric moment invariants have been popularly used for shape description and reconstruction, here zeroth and first geometrical moment invariants (Hu 1962; Xu & Li 2008) were computed using central moments (Prokop & Reeves 1992) from the preprocessed edge sharpened input ear image to determine the ear region. The experimental results are compared with the other results obtained using different methods.