ISSN : 2288-9604(Online)
Effective face detection and robust face tracking using hybrid filter
Abstract
- 페이지_ 18권 2호 전체-9-19.pdf1.02MB
- 1. Introduction
- 2. Effective face detection
- 2.1. Image enhancement
- 2.2. Skin color segmentation
- 2.3. Labeling and face authentication
- 3. Face tracking using hybrid filter
- 3.1. Camshift algorithm
- 3.2. Kalman filter
- 3.3. Overview of my face tracking method
- 4. Experimental results
- 5. Conclusion
- Acknowledgement
1. Introduction
Automatic human face detection and tracking is a major work in many commercial applications such as video conferencing, security access control, and content based video indexing.
A first step of face tracking is to determine whether or not there are any faces in an image and detect the location of each if faces exist. In general, face detection algorithms can be divided into roughly four categories : knowledge-based techniques, template-matching, appearances-based approach and feature-based approach. Feature-based approachs are trying to find invariant facial features such as nose, mouth, texture, skin color for face detection.[1] Among the several features, skin color has been the most widely used method because of its invariance about translation, rotation and scale changes. I proposed the feature based approach based on skin and edge information for face detection. First image enhancement is carried out. After that skin segmentation using explicitly defined rule is conducted. At the same time edge of the gray scale image of the input image is determined using canny edge detection algorithm. Skin tone percentage index method is applied to the skin segmented image for the refinement of the skin segmentation result. The edge image and skin tone image are multiplied to separate all non-face regions from the candidate faces. The face candidate verification is then applied to decide which of the candidate regions actually correspond to a face by using primitive shape features. The advantage of the proposed method is that, it can detect faces with different size, pose, and expression under unconstrained illumination conditions.
In the recent years, more and more improved algorithms for face tracking have been proposed, for example, algorithm with Particle filter[2], meanshift, Camshift[3], algorithm with Kalman filter[4] and so on.
In Particle filter, the object model having high complexity and a lot of sample are required for an exact tracking.
Meanshift has a problem when the size of object changes and a big burden of computational complexity. Camshift is used to solve these problems. Camshift approach is a kind of target tracking based on the color, which has the advantages that the color is rotation invariance and scale invariance, for the size and orientation neither sensitive. Camshift is computationally effective and achieves good performance in a simple environment. But, in the factual complicated applications, Camshift has some demerits. Firstly, Camshift based on color histogram, has a worse robustness when object color is similar to background color and object is fully or partly occluded. Secondly, sometimes it could lose motion objects in dynamic background. Thirdly, Camshift can fail in tracking small and fast moving objects because it is trapped in local maximum[5].
Kalman filter give an optimal estimation method of next state by the observed value of the exterior and measured errors. It is easily implemented through a few operations by using a sequential and recursive algorithm. Kalman filter has the least estimation error when comparing with several filters. I combined the Kalman filter with Camshift to enable track recovery after occlusions by getting the global maximum and to avoid the tracking failures caused by objects and background with similar colors to faces. The experimental results show that my tracking method get the better results than Camshift in occlusion sequences and dynamic backgrounds.
2. Effective face detection
2.1. Image enhancement
Image enhancement is very important preprocessing technique for face detection. Many techniques for enhancement of gray level images such as histogram equalization, contrast stretching etc. can be found in the literature. But those methods are not directly applicable to the color images because due to the presence of color information as well as gray-level information. Buzuloiu et al. [6] proposed an adaptive neighborhood histogram equalization method, and Trahanias et al [7] proposed a 3D histogram equalization method in the RGB cube. Li Tao and V. K. Asari [8] presented an integrated neighborhood dependent approach for nonlinear enhancement (AINDANE) of color images. They apply the enhancement to the gray component of the original color image and obtained the output enhanced color image by linear color restoration process.
In this paper I follow the algorithm described in [8] for the luminance and contrast enhancement of V component of the input image.
2.2. Skin color segmentation
Skin color has proven to be a useful and robust cue for face detection, localization and tracking. Mainly the method of skin detection falls into two categories: Pixel-Based Methods and Region Based Methods. The pixel-based skin detection methods classify each pixel as skin or non-skin individually, independently from its neighbors. The color based methods falls in this category. In contrast, region-based methods try to take the spatial arrangement of skin pixels into account during the detection stage to enhance the method performance. In this methods additional knowledge in terms of texture etc. are required. The final goal of skin color detection is to build a decision rule that will discriminate between skin and non-skin pixels. One method to build a skin classifier is to define explicitly (through a number of rules) the boundaries skin cluster in some colors pace. For example, the simplest model is to define a region of skin tone pixels in YCbCr color space using Cr; Cb values, from samples of skin color pixels. With carefully chosen thresholds, [CrLow;CrHigh] and [CbLow;CbHigh], a pixel is classified to have skin tone if the chromaticity values (Cr;Cb) fall within the ranges, i.e., CrLow ≤ Cr ≤CrHigh and CbLow ≤ Cb ≤ CbHigh. The skin color distribution can also be modeled in parametric form by an elliptical Gaussian joint probability density function, defined as:
Here, x is a color vector and m and C are the distribution parameters (mean vector and covariance matrix respectively). The parameters are estimated from the training data of skin pixels.
The skin tone percentage index is used to filter out the non-skin tone face areas. To overcome the disadvantage of morphological operations in refining the result of skin segmentation here I used the skin tone percentage index method [9]. The main advantage of this method is that, with carefully selected threshold I can remove the noise like skin pixels from the skin segmented image and at the same time the non-skin pixels are filled with skin pixels if the most of the neighboring pixels belongs to the skin pixels.
In this method the binary image is filtered out in two stages. By applying Eq. (2) the first filter will compute the sum matrix for the binary image. Suppose the binary image I(x, y), in the 3 x 3 neighboring area, including the point itself, the way to compute the sum matrix is to count the neighboring areas of binary image value. So the possible value for Sn(x, y) would be in the set of {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. According to percentage of 1, 2, 3 respectively in Sn(x, y), the threshold T may be 1, 2, 3 etc.
2.3. Labeling and face authentication
The resulting image, combination of skin color segmentation and edge detection is now searched for connected components according to the adjacent 8-neighbor pixels. The algorithm divides the binary image into groups of connected pixels. The algorithm works as follows:
1) The 1D connected components of each row are labeled first.
2) The labels of row-adjacent components are merged by using an associative memory scheme.
3) After merging, a relabeling is done to give a consecutive set of positive component labels.
Each of the connected components is then analyzed to judge whether it is a face or not. In order to filter out the explicit non-face region, suitable discriminative criterions are defined which extracted from prior knowledge about facial geometric structure and shape features. Each of these features can be considered as a weak classifier. The features are area, bounding box proportions, centroid, extent and standard deviation range. Generally, the component with small areas corresponds to the non-face components. So, the components with small areas are dropped. The anatomy of the face suggests that the ratio of the bounding box width to its height is in some fixed range. So, any component satisfying following condition is classified as face component: Tl < R < Th . In my experiment I use Tl = 0.5 and Th = 1.2. The face is evenly distributed in the region where it is located. Therefore, the centroid of a face region should be found in a small window centered in the middle of the bounding box. Dimension of this window were found to be 20% of the dimension of the bounding box. Any region whose centroid is outside this window corresponds to a blob that is not evenly distributed and therefore it is not a face. The extent of a blob is the area of this blob divided by the area of the bounding box surrounding it. From the experiment it is determined that the extent for a face is between 0.45 and 0.80. Any region whose extent is not in this range is eliminated. The standard deviation of the face should be in some range. The standard deviation of the normalized gray scale region on the input image corresponding to the labeled component is calculated. If this ratio is between 0.25 and 0.55, it is classified as face.
This cascade is applied to the connected components and the component which aggress with all the above five conditions is a face, any component which disagrees with any of the condition is dropped.
3. Face tracking using hybrid filter
3.1. Camshift algorithm
In the recent years, more and more improved algorithms for face tracking have been proposed, for example, algorithm with Particle filter, meanshift, Camshift, algorithm with Kalman filter and so on.
Particle filter is the probabilistic tracking method based on factored sampling method and used to track a specific object in complex image. But, the object model having high complexity and a lot of sample are required for an exact tracking.
The implementation of meanshift is simple because of using a single candidate. But, there is a problem when the size of object changes because of fixing the size of search window. It is hard to use in realtime application by the big burden of computational complexity. Camshift is used to solve these problems. Camshift algorithm is quite effective in face tracking. Camshift is computationally effective and is to improve the meanshift algorithm by color segment method to use under the streaming environment. The disadvantage of meanshift is built up because of changing the size of search window in Camshift.
When video frame sequences change over time, their color probability distribution varies simultaneously. The Camshift algorithm has been used to adaptively meet the size and location of the search window. Current frame’s localization results of the tracked object are reported and used to set the size and location of the search window in the next frame image. The camshift algorithm is shown as follows[3].
1) Choose the initial location and size of the search window.
2) Compute the color probability distribution within the search window.
3) Mean shift and get the mean location and new size of the search window.
4) Center the search window in successive frame at the mean location obtained in step 3, and repeat step 3.
5) Calculate orientation and scale of the target.
The second mement is:
Then the length l and width w is
Where , ,
The direct angle from target length to horizontal is
3.2. Kalman filter
Camshift based on color histogram, has a worse robustness when object color is similar to background color. The success rate of tracking is obviously dropped when object is fully or partly occluded, because it is trapped in local maximum.
Kalman filter give an optimal estimation method of next state by the observed value of the exterior and measured errors in linear dynamic system having a white gaussian noise. It is easily implemented through a few operations because of using a sequential and recursive algorithm. Kalman filter has showed excellent results in past researchs and has the least estimation error when comparing with the several filters[4]. I combined the Kalman filter with Camshift to enable track recovery after occlusions by getting the global maximum and to avoid the tracking failures caused by objects and background with similar colors to faces.
In this paper, I set the center position (x, y) and the velocity(displacement) of the target as a state vector. The state vector of Kalman filter in time t is as follows[4].
The state equation for prediction using state vector of tracking model is as follows.
Kalman filter predicts the state of system by the the sets of measurement values. The set z(t) of observation values is as follows if there is a linear relation between the state of system and measurement values.
The state transition matrix Φ(t) is as follow if the tracking trajectory of moving object has uniform velocity and the direction is linear.
The dimension of input vector has the 4-dimension vector having coordinates and variables in the x, y axis. The observation matrix H(t) is the relation matrix between state vector and observation vector.
w(t) in equation (11) and v(t) in equation (12) is the forecasting noise and measured noise. They have mean value 0 and Gaussian distributions with diagonal variance Q(t), R(t). They are mutual independence.
3.3. Overview of my face tracking method
First, I get the detected face by proposed face detection method. In traditional Camshift, the iteration starts from a specified location(such as the upper left corner of the image) in every new frame. But, in the continuous video, displacement between successive frames is in fact very small. Therefore, I may set the ROI of Image to reduce the computational complexity.
The window width (or height) of last frame is added to 2*region to get the width (or height) of ROI. The value of region is 50 in my experiment. After that, the color histogram of ROI is computed.. I get the back projected probability distribution (also known as the probability distribution function, PDF) of ROI. Camshift climbs the gradient of a back-projected probability distribution computed from re-scaled color histograms to find the nearest peak within the search window. The size and position of window are obtained after Camshift iteration. The 1st Kalman filter is used to predict the position of window. The 2nd Kalman filter is used to predict the size of window. Kalman filtering is used to predict the next starting iterative point of Camshift. Camshift has a worse robustness when a face color is similar to background color and the face is fully or partly occluded. Camshift can fail in tracking small and fast moving objects because it is trapped in local maximum. Kalman filter enable track recovery after occlusions. And, Kalman filter can avoid converging to the local maxima. So, I combined Kalman filter with Camshift to avoid those problems.
Figure 1. Flow chart of My Face Tracking Method.
4. Experimental results
To evaluate the performance of the proposed face detection method I have collected 100 color pictures randomly from web and from personal collection containing 228 face images. In order to evaluate the performance of the proposed face detection algorithm, the following parameters are chosen:
From Table 1 I can see that out of 228 faces, 196 were correctly detected by the proposed face detection method. The detection rate is 85.9% with 22.8% false positive rate. The same image set is used for detecting faces using Viola and Jones face detection method [10], in which 77.6% of correct detection rate is achieved with 6.1% false positive rate. The computation time for the face detection using Viola and Jones method is better. All the experiment is carried out on the Visual C++ programming environment with computer specification of 2.40 GHz CPU and 3.25 GB of RAM.
Table 1. Face Detection Performance
The field of face detection has made significant progress in the past decade. In particular, the seminal work by Viola and Jones has made face detection practically feasible in real world applications such as digital cameras and photo organization software. If one were asked to name a single face detection algorithm that has the most impact in the 2000's, it will most likely be the seminal work by Viola and Jones. The Viola-Jones face detector contains three main ideas that make it possible to build a successful face detector that can run in real time : the integral image, classifier learning with AdaBoost, and the attentional cascade structure.
There is a face detector library using Viola-Jones face detection algorithm in OpenCV. Viola-Jones face detector is much used as a standard for the comparison of detection performance in other papers. So, I used Viola-Jones face detector to compare with the face detector proposed in this paper. Face detection results using proposed method in some test images is shown in Figure 2. Note the variation in pose, scale, position, illumination, and expression. Obtaining over 85% detection rate on images under such variation is considered a good result. I get better results than Viola and Jones face detection method.
Figure 2. The results of Viola & Jones method (a,c) and proposed face detection method (b,d).
I employ 8 sequences with 640*360 image size for the experiments of my tracking method. The 4 sequences are captured in the laboratory. The other 4 sequences are captured in the house.
There is a door having similar color to face in house sequences. There is an occlusion in the hyphen 1 sequences(like lab_1 or house_1). The hyphen 3 sequences have up and down faces. The hyphen 4 sequences have profile faces. The faces move to back and front or move to left and right in the hyphen 5 sequences. As can be seen in lab_1 sequence of Figure 3, the face moves into right occlusion. After that, the face moves back to left. My tracking method is able to track a face in occlusion environment. But, the Camshift can not recover a track after occlusion in 137 to 190 frames.
Figure 3. The result of face tracking in lab_1 sequence under the occlusion. The first row shows the results of Camshift tracking. The second row shows the results of my tracking method.
In the lab. and house sequences of Table 2, the average tracking rates of Camshift and my tracking method are 45.7% and 94.0%. My tracking method got the excellent tracking rates of the 100% and 91.09% in occlusion environments like lab_1 and house_1 sequences.
Table 2. Performances of Camshift.
Table 3. Performances of my tracking method.
But, the Camshift got the poor tracking rates of 7.49% and 3.08%. This experimental results show that my tracking method can help to recover a track after occlusion. I got the 100% tracking rate for Camshift and my tracking method in lab_3, 4 and 5 sequences under the simple environments.
And, I got the excellent tracking rates for my tracking method in house_3, 4 and 5 house sequences which have the object and background with similar color to face. But, Camshift got the poor tracking rates of 19.65%, 11.92% and 23.75%. This experimental results show that my tracking method is robust for tracking failures caused by the background and object with similar colors.
The average fps of Camshift and my tracking method are 35.9 and 20.2. A small computational cost is added for my tracking method when comparing with Camshift.
5. Conclusion
In this paper, I present my face detection and tracking method. At first the image enhancement is applied. I used a method for image enhancement in HSV space based on the local processing of image. I propose a lighting invariant face detection system based upon the edge and skin tone information of the input color image. The experimental result shows that the proposed method is invariant to the lighting condition at which the image is taken. The results also revealed the robustness and efficiency of this method under varying condition like pose, expression, scale, position etc. I get the better results than Viola and Jones face detection method.
I combined the Kalman filter with Camshift to enable track recovery after occlusions and to avoid the tracking failures caused by objects and background with similar colors to face. The experimental results show that my tracking method got the better results than Camshift in occlusion and background with similar color to face. In my tracking method, the small computational cost is added when comparing with Camshift.
Acknowledgement
“이 논문은 2010년도 군산대학교 연구교수 경비의 지원에 의하여 연구되었음."
Reference
2.B. Ristic, S. Arulampalam., and N. Gordon, Beyond the Kalman Filter : Particle Filters for Tracking Applications, Boston, Artech House, 2004.
3.G. R. Bradski., Computer vision face tracking as a component of a perceptual user interface, Proc. of WACV '98, Princeton, NJ, 214, 1998.
4.C. K. Chui and G. Chen, Kalman Filtering with Real-Time Application, Springer-Verlag, 42, 1991.
5.Y. Yue, Y. Gao and X. Zhang, An improved Camshift algorithm based on dynamic background, Proc. of the 1st Int. Conf. on ICISE, Nanjing, China, 1141, 2009.
6.V. Buzuloiu, M. Ciuc, R. M. Rangayyan, and C. Vertan, Adaptive-neighborhood histogram equalization of color images, J . of Electronic Image, 10(2), 445 (2001).
7.P. E. Trahanias, and A. N. Venetsanopoulos, Color image enhancement through 3-D histogram equalization, Proc. 1st IAPR Conf. on Pattern Recognition, The Hauge, Natherlands, 545, 1992.
8.L. Tao, and V. K. Asari, Adaptive and integrated neighborhood dependent approach for nonlinear enhancement of color images, J . of Electronic Imaging, 14(4), 043006-1-043006-14, (2005).
9.Y. Huang, X. Ao, and Y. Li, Real Time Face Detection Based on Skin tone detector, IJCSNS, 9(7), 71, (2009).
10.P. Viola, and M. Jones, Robust Real-Time Face Detection, Int. J . of computer vision, 57(2), 137 (2004).