This patent application is assigned to
The following quote was obtained by the news editors from the background information supplied by the inventors: "Disclosed embodiments relate to video monitoring and interpretation by software-aided methodology, and more particularly, to a system and method for improving the utility of video images in systems handling video, such as, for example, for system-interpreted analysis of video images for security purposes.
"Video analytics is an industry term for the automated extraction of information from video. Video analytic systems can include a combination of imaging, computer vision and/or machine intelligence applied to real-world problems. Its utility spans several industry segments including video surveillance, retail, and automation. Video analytics is distinct from machine vision, machine inspection, and automotive vision. Known applications of video analytics can include, for example, detecting suspicious objects and activities for improved security, license plate recognition, traffic analysis for intelligent transportation systems, and customer counting and queue management for retail applications. Advantages of automated video surveillance systems include increased effectiveness and lower cost compared to human-operated systems.
"Some known surveillance systems can accurately detect changes in a scene. Changes in a scene lead to changes of pixel values in a camera image. Scene changes can be induced by global or local illumination changes (e.g. sun light, car headlights, street lights), by environmental motion (e.g. blowing debris, shaking trees, running water), or by moving objects (e.g. moving people, cars, and pets). One of the challenging tasks of a surveillance system is to discriminate changes caused by moving objects from illumination changes.
"Changes in pixel values for a given image can be computed relative to pixel values in a reference image of the same scene. A reference image, which can be referred to as a background image, generally depicts motionless objects (e.g. buildings, trees, light posts, roads, and parked cars) in a scene. To discriminate changes caused by illumination and environmental conditions (also called clutter motion) from changes caused by moving foreground objects of interest (e.g., moving objects), one known technique assumes that a small pixel and/or illumination change in an image corresponds to static objects, and a large pixel and/or illumination change in an image corresponds to moving objects. A pixel and/or illumination difference threshold can be used to differentiate static objects from moving objects. In a given scene, pixel differences below a threshold can be classified as static objects and pixel differences above the threshold can be classified as moving objects. Defining a threshold that accurately separates moving objects from clutter and background motion can be difficult. The more accurate the threshold, the greater the number of pixels corresponding to moving objects can be detected. Such an assumption is often violated by drastic and/or non-uniform illumination changes, such as regions of brightness and regions having shadows resulting from, for example, headlights or clouds, respectively. Pixel differences associated with such regions can be large, and thus, the regions are classified as moving objects rather than background. In addition to illumination change, large pixel differences can be associated with clutter motion (e.g. moving leaves, moving water, fountains). Pixels corresponding to clutter motion often do not correspond to objects of surveillance interest. As such, it is desirable to exclude such regions from the moving object detection.
"Known systems can group pixels corresponding to moving objects into blobs for analysis. During blob analysis, some blobs can be rejected from consideration as moving objects while other blobs can be passed to high level post processing (e.g. recognition, classification). By employing moving object detection, a surveillance system reacts immediately with a lesser chance for a false alarm (i.e. the surveillance system almost instantly detects a moving object and quickly analyzes its behavior).
"FIG. 1 illustrates the flow of moving object detection. Collectively, these steps can be referred to as background subtraction or background maintenance. The initial step can be to build a background image, representing an observed scene (e.g. by acquiring a frame of video input), at 100. This step can be performed at startup. Steps 2-4 can be repeated for each subsequent frame. The second step, for each 'current' frame, can be to determine illumination differences between the current scene image and the background image (e.g., compute a background difference image), at 105. The third step can be to filter out noise (e.g., illumination changes due to static objects) in the background difference image using a pixel difference threshold, at 110. The pixels can then be grouped into blobs, at 115, and the blobs analyzed, at 120. The background image can be updated with objects that are deemed to be part of the background. Each step includes algorithms of different complexity, and the performance of moving object detection is dependent on the performance of such algorithms. The ultimate goal of a background subtraction algorithm is to provide accurate moving object detection while maintaining a low rate of false alarms from clutter or illumination changes.
"The performance of the background subtraction algorithm can bound the performance and capabilities of a surveillance system because downstream processing such as recognition and classification depend on the quality and accuracy of blobs. Therefore, there is a constant demand for improving the performance of the background subtraction algorithm. Background subtraction is a key step for moving object detection in the presence of either static or dynamic backgrounds. To understand what occurs in a scene, the background subtraction algorithm can be used jointly with object tracking, recognition, classification, behavior analysis, and statistical data collection. Background subtraction is suitable for any application in which background removal is a guide for both reducing the search space and detecting regions of interest for further processing.
"Many known approaches to background subtraction include modeling the color value of each background pixel by a Gaussian I(x, y).apprxeq.N (.mu.(x, y).SIGMA.(x, y)) The parameters of the Gaussian distribution are determined from a sequence of consecutive frames. Once the background model is built, a likelihood function is used to decide whether a pixel value of a current frame corresponds to the Gaussian model, N(.mu.(x,y).SIGMA.(x,y)).
"Another approach uses a mixture of Gaussians to model color pixel values in outdoor scenes. Another approach uses not only color information but spatial information as well. That is, each pixel of a current frame is matched to both the corresponding pixel in the background image and pixels neighboring the corresponding pixel. Another approach uses a three-component system: the first component predicts pixel value in a current frame, the second component fills in homogeneous regions of foreground objects, and the third component detects sudden global changes. Yet another approach aggregates the color and texture information for small image blocks.
"Existing techniques use a mathematical model for the background or foreground blobs using scene statistics. However, they fail to address some challenges that occur in real-world usage. In many scenes, the assumption that a pixel value can be modeled by a Gaussian distribution is only true part of the time, which makes it difficult to build a robust algorithm. Additionally, the temporal updating of the background image or model is an unsolved issue that can instantly and drastically decrease the performance of the whole system. Accordingly, improvement is still required to complement existing algorithms. Some drawbacks of various known approaches are identified below in connection with real-world situations.
"Pixel Difference Thresholding
"The illumination of outdoor scenes cannot easily be controlled. Accordingly, the pixel difference between an image and its corresponding background image cannot be modeled robustly (i.e. the values of the background image can fluctuate drastically, chaotically and non-uniformly). Shadows, glares, reflections and the nature of object surfaces are examples of factors that can cause unpredictable behavior of pixel values and, as such, the pixel difference. Observing that pixel differences corresponding to groups of pixels behave more predictably and/or less chaotically, models were developed to calculate pixel differences by considering groups of pixels around a particular pixel. Such models assume that a pixel difference calculated using spatially close pixels behave more predictably than differences calculated using individual pixels. Although spatial modeling of a pixel difference provides some improvement, clutter motion (e.g., moving leaves on trees) remains a problem when values of the grouped pixels change both simultaneously and unpredictably. As a consequence, clutter motion regions can be identified as moving objects and can cause a surveillance system to generate false alarms.
"One known solution for eliminating illumination changes caused by clutter motion regions is based on complex modeling of a background image. Multiple models, rather than a single model, can be used. Complex modeling assumes that the pixel value may fluctuate around several average values. Theoretically, the assumption is quite valid and indeed imitates real life scenarios. One or more thresholds can be applied to the difference between current and average pixel values. However, complex modeling can be sensitive to weather conditions (e.g. snow, rain, wind gusts), and the required processing power makes its implementation impractical due to the need for continuous moving object detection at a high frame rate. Complex modeling relies on continuous moving object detection and the accurate updating of individual models, which depends on the natural environment and specifics of the surveillance scene. Accordingly, statistics of a pixel model of the background image are updated after each input frame has been processed. An error in updating the background image or model directly affects the pixel difference threshold, and problems related to illumination change can accordingly reappear. One solution is to manually mask out the clutter motion regions, which results in the system failing to detect objects in the masked-out regions. However, artifacts of video compression (e.g., blockiness and ringing) can raise problems similar to those caused by clutter motion regions. Hence, manual masking is frequently not an acceptable solution.
"Choosing a Threshold
"Applying thresholds to pixel differences between an image and a corresponding background discards pixels assumed to be in the background. Once the pixels are discarded from the process (i.e., pixels are classified as background pixels), it can be difficult to re-classify them as object pixels. Moreover, estimating and updating the threshold value can be difficult. In some systems, for example, thresholds depend on the specifics of the surveillance scene, and hence, require tuning during the installation phase. Additionally, different thresholds can be used for different parts of the scene. Such tuning increases the cost of installation without guaranteeing high performance in an uncontrolled outdoor environment. Tuning may require a human to manually re-tune the system under unpredictable weather conditions. The process of choosing thresholds is unpredictable and depends strongly on aspects of background modeling. In systems based on Gaussian modeling, the threshold is uniform and constant for the entire scene. However, in some outdoor scenes each pixel cannot be modeled and updated using the same parametric model.
"There is therefore a need to improve the performance of background subtraction so that it is robust to global and local illumination change, clutter motion, and is able to reliably update the background image/model."
In addition to the background information obtained for this patent application, VerticalNews journalists also obtained the inventors' summary information for this patent application: "A foreground object's motion can occlude edges corresponding to background objects and can induce motion of the edges corresponding to foreground objects. These edges provide strong cues for the detection of foreground objects. Human object perception analyzes object edges to determine the object's contour, location, size and three-dimensional (3D) orientation. Once the object contours are analyzed, inner object edges can be analyzed to assist in understanding the object's structure. Similar to human perception, a surveillance system can prioritize the analysis of an object's edges. As such, the surveillance system can analyze edge information first and then analyze the finer grained appearance inside the object boundaries.
"The proposed approach utilizes techniques for non-linear weighting, edge detection and automatic threshold updating. Non-linear weighting facilitates the discrimination of pixel differences owing to illumination from changes induced by object motion. Edge detection is performed by a modified Laplacian of Gaussian filter, which preserves the strength of edges. Any edge detection algorithm that does not convert a grayscale (or color) image into a binary image but instead preserves the edge strength maybe used in the proposed approach. Such an edge detection algorithm can be used to localize motion in the image. Automatic threshold updating can keep the background current and can react quickly to localized illumination changes in both space and time.
"Disclosed embodiments for moving object detection can include separating true and false moving object detections, edge detection, and the automatic updating of thresholds. In addition to Gaussian smoothing, disclosed embodiments can include automatic non-linear weighting of pixel differences. Non-linear weighting does not depend on the specifics of a surveillance scene. Additionally, non-linear weighting significantly separates pixel differences corresponding to static objects and pixel differences induced by moving objects. Benefits of non-linear weighting include suppressing noise with a large standard of deviation, simplifying the choice of threshold values, and allowing longer periods of time between the updating of pixel values of the background image. A standard technique, based on a Laplacian of Gaussian (LoG) filter to detect edges can be modified such that the strength of edges is preserved in a non-binary image. Non-linear weighting together with the modified edge detection technique can accurately discriminate edges of moving objects from edges induced by illumination change, clutter motion, and video compression artifacts. Further, two thresholds can be used for each edge image of pixel differences to increase the accuracy in moving object detection. These threshold values can be updated automatically. The threshold updating mechanism does not depend on specifics of a surveillance scene, time of day, or weather conditions and is directly controlled by a pixel difference. The edge detection of moving objects allows preservation of pixels corresponding to low illumination change and eliminates pixels corresponding to high illumination change.
BRIEF DESCRIPTION OF THE DRAWINGS
"FIG. 1 is a flow chart illustrating the flow of moving object detection.
"FIG. 2 is a flow chart illustrating a method for moving object detection, according to an embodiment.
"FIGS. 3A and 3B show a non-smoothed and smoothed image, respectively.
"FIG. 4A shows a smoothed background image.
"FIG. 4B shows a smoothed subsequent image of the same scene as in FIG. 4A.
"FIG. 5A illustrates an absolute Altitude Image Difference between the images of FIGS. 4A and 4B.
"FIG. 5B illustrates the application of a universal threshold to the image of FIG. 5A.
"FIGS. 6A and 6B show the image of FIG. 5A smoothed with a Gaussian kernel using two different values for sigma.
"FIG. 7A shows an image produced by multiplying the pixel values of the images of FIGS. 5A and 6A.
"FIG. 7B shows an image produced by convolution of the image of FIG. 7A using a Laplacian of Gaussian kernel.
"FIG. 8 illustrates a convolution using a Laplacian of Gaussian kernel.
"FIGS. 9A and 9B show images produced by the application of a low threshold and a high threshold, respectively, to the image of FIG. 7B.
"FIG. 10A shows an image produced by the combination of FIGS. 9A and 9B.
"FIG. 10B shows an image produced by filling the blobs in the image of FIG. 10A.
"FIGS. 11A and 11B show values of an adaptive Current Low and High Threshold, respectively, for each pixel in an image.
"FIGS. 12A and 12B show values of an adaptive Base Low and High Threshold, respectively, for each pixel in an image.
"FIGS. 13-16 compare the application of the GMM method and a method according to a disclosed embodiment to four sets of images."
URL and more information on this patent application, see: Gagvani, Nikhil; Gritai, Alexei. System and Method for Motion Detection in a Surveillance Video. Filed
Keywords for this news article include: Algorithms,
Our reports deliver fact-based news of research and discoveries from around the world. Copyright 2014, NewsRx LLC
Most Popular Stories
- Alabama House Speaker Arrested on Felony Ethics Charges
- 'Fury' Blows 'Gone Girl' Out of the Box Office
- Turkey to Help Kurds Reach Fight in Kobani
- Microsoft's Cloud Platform Shines
- German Intelligence Blames Ukraine Rebels for MH17
- ISIS Seeks to Expand Terror War
- Perez Leads Push for Obama's Job Proposals
- 2016 Camaro Shrinks, Moves to Caddy Platform
- Prius Drivers Battle Stereotypes
- Clinton Rallies Early Vote for Landrieu