# Object tracking using a Kalman filter (MATLAB)

The Kalman filter is useful for tracking different types of moving objects. It was originally invented by Rudolf Kalman at NASA to track the trajectory of spacecraft. At its heart, the Kalman filter is a method of combining noisy (and possibly missing) measurements and predictions of the state of an object to achieve an estimate of its true current state. Kalman filters can be applied to many different types of linear dynamical systems and the “state” here can refer to any measurable quantity, such as an object’s location, velocity, temperature, voltage, or a combination of these.

In a previous article, I showed how face detection can be performed in MATLAB using OpenCV. In this article, I will combine this face detector with a Kalman filter to build a simple face tracker that can track a face in a video.

If you are unfamiliar with Kalman filters, I suggest you read up first on how alpha beta filters work. They are a simplified version of the Kalman filter that are much easier to understand, but still apply many of the core ideas of the Kalman filter.

## Face tracking without a Kalman filter

The OpenCV-based face detector can be applied to every frame to detect the location of the face. Because it may detect multiple faces, we need a method to find the relationship between a detected face in one frame to another face in the next frame — this is a combinatorial problem known as data association. The simplest method is the nearest neighbour approach, and some other methods can be found in this survey paper on object tracking. However, to greatly simplify the problem, the tracker I have implemented is a single face tracker and it assumes there is always a face in the frame. This means that every face that is detected can be assumed to be the same person’s face. If more than one face is detected, only the first face is used. If no faces are detected, a detection error is assumed. The MATLAB code below will detect the face location in a sequence of images and output the bounding box coordinates to a CSV file.

function detect_faces(imgDir, opencvPath, includePath, outputFilename) % Load the required libraries if libisloaded('highgui100'), unloadlibrary highgui100, end if libisloaded('cv100'), unloadlibrary cv100, end if libisloaded('cxcore100'), unloadlibrary cxcore100, end loadlibrary(... fullfile(opencvPath, 'bin\cxcore100.dll'), @proto_cxcore); loadlibrary(... fullfile(opencvPath, 'bin\cv100.dll'), ... fullfile(opencvPath, 'cv\include\cv.h'), ... 'alias', 'cv100', 'includepath', includePath); loadlibrary(... fullfile(opencvPath, 'bin\highgui100.dll'), ... fullfile(opencvPath, 'otherlibs\highgui\highgui.h'), ... 'alias', 'highgui100', 'includepath', includePath); % Load the cascade classifierFilename = 'C:/Program Files/OpenCV/data/haarcascades/haarcascade_frontalface_alt.xml'; cvCascade = calllib('cv100', 'cvLoadHaarClassifierCascade', classifierFilename, libstruct('CvSize',struct('width',int16(100),'height',int16(100)))); % Create memory storage cvStorage = calllib('cxcore100', 'cvCreateMemStorage', 0); % Get the list of images imageFiles = dir(imgDir); detections = struct; h = waitbar(0, 'Performing face detection...'); % progress bar % Open the output CSV file fid = fopen(outputFilename, 'w'); fprintf(fid, 'filename,x1,y1,x2,y2'); for i = 1:numel(imageFiles) if imageFiles(i).isdir; continue; end imageFile = fullfile(imgDir, imageFiles(i).name); % Load the input image cvImage = calllib('highgui100', ... 'cvLoadImage', imageFile, int16(1)); if ~cvImage.Value.nSize error('Image could not be loaded'); end % Perform face detection cvSeq = calllib('cv100', 'cvHaarDetectObjects', cvImage, cvCascade, cvStorage, 1.1, 2, 0, libstruct('CvSize',struct('width',int16(40),'height',int16(40)))); % Save the detected bounding box, if any (and if there's multiple % detections, just use the first one) detections(i).filename = imageFile; if cvSeq.Value.total == 1 cvRect = calllib('cxcore100', ... 'cvGetSeqElem', cvSeq, int16(1)); fprintf(fid, '%s,%d,%d,%d,%d', imageFile, ... cvRect.Value.x, cvRect.Value.y, ... cvRect.Value.x + cvRect.Value.width, ... cvRect.Value.y + cvRect.Value.height); else fprintf(fid, '%s,%d,%d,%d,%d', imageFile, 0, 0, 0, 0); end % Release image calllib('cxcore100', 'cvReleaseImage', cvImage); waitbar(i / numel(imageFiles), h); end % Release resources fclose(fid); close(h); calllib('cxcore100', 'cvReleaseMemStorage', cvStorage); calllib('cv100', 'cvReleaseHaarClassifierCascade', cvCascade); end

We can then run our face detector and generate an output file, `faces.csv`

, like this:

imgDir = 'images'; opencvPath = 'C:\Program Files\OpenCV'; includePath = fullfile(opencvPath, 'cxcore\include'); detect_faces(imgDir, opencvPath, includePath, 'faces.csv');

In the video below, I have run this script on the FGnet Talking Face database (which is free to download) and displayed the bounding boxes overlayed on the image sequence. You can download a copy of the `faces.csv`

file that was used to generate the video from here.

The bounding box roughly follows the face, but its trajectory is quite noisy and the video contains numerous frames where the bounding box disappears because the face was not detected. The Kalman filter can be used to smooth this trajectory and estimate the location of the bounding box when the face detector fails.

## Kalman filtering: The gritty details

The Kalman filter is a recursive two-stage filter. At each iteration, it performs a *predict* step and an *update* step.

The predict step predicts the current location of the moving object based on previous observations. For instance, if an object is moving with constant acceleration, we can predict its current location, , based on its previous location, , using the equations of motion.

The update step takes the measurement of the object’s current location (if available), , and combines this with the predicted current location, , to obtain an *a posteriori* estimated current location of the object, .

The equations that govern the Kalman filter are given below (taken from the Wikipedia article):

- Predict stage:
- Predicted (
*a priori*) state: - Predicted (
*a priori*) estimate covariance:

- Predicted (
- Update stage:
- Innovation or measurement residual:
- Innovation (or residual) covariance:
- Optimal Kalman gain:
- Updated (
*a posteriori*) state estimate: - Updated (
*a posteriori*) estimate covariance:

They can be difficult to understand at first, so let’s first take a look at what each of these variables are used for:

- is the current state vector, as estimated by the Kalman filter, at time .
- is the measurement vector taken at time .
- measures the estimated accuracy of at time .
- describes how the system moves (ideally) from one state to the next, i.e. how one state vector is projected to the next, assuming no noise (e.g. no acceleration)
- defines the mapping from the state vector, , to the measurement vector, .
- and define the Gaussian process and measurement noise, respectively, and characterise the variance of the system.
- and are control-input parameters are only used in systems that have an input; these can be ignored in the case of an object tracker.

Note that in a simple system, the current state and the measurement will contain the same set of state variables (only will be a filtered version of ) and will be an identity matrix, but many real-world systems will include latent variables that are not directly measured. For example, if we are tracking the location of a car, we may be able to directly measure its location from a GPS device and its velocity from the speedometer, but not its acceleration.

In the predict stage, the state of the system and its error covariance are transitioned using the defined transition matrix , and can be implemented in MATLAB as:

function [x,P] = kalman_predict(x,P,F,Q) x = F*x; %predicted state P = F*P*F' + Q; %predicted estimate covariance end

In the update stage, we first calculate the difference between our predicted and measured states. We then calculate the Kalman gain matrix, , which is used to weight between our predicted and measured states and is adjusted based on a ratio of prediction error to measurement noise .

Finally, the state vector and its error covariance are then updated with the measured state. It can be implemented in MATLAB as:

function [x,P] = kalman_update(x,P,z,H,R) y = z - H*x; %measurement error/innovation S = H*P*H' + R; %measurement error/innovation covariance K = P*H'*inv(S); %optimal Kalman gain x = x + K*y; %updated state estimate P = (eye(size(x,1)) - K*H)*P; %updated estimate covariance end

Both the stages only update two variables: , the state variable, and , the prediction error covariance variable.

The two stages of the filter correspond to the state-space model typically used to model linear dynamical systems. The first stage solves the *process equation*:

The process noise is additive Gaussian white noise (AWGN)with zero mean and covariance defined by:

The second one is the *measurement equation*:

The measurement noise is also AGWN with zero mean and covariance defined by:

## Defining the system

In order to implement a Kalman filter, we have to define several variables that model the system. We have to choose the variables contained by and , and also choose suitable values for , , and , as well as an initial value for .

We will define our measurement vector as:

where and are the upper-left and lower-right corners of the bounding box around the detected face, respectively. This is simply the output from the Viola and Jones face detector.

A logical choice for our state vector is:

where and are the first-order derivatives. Other vectors are also possible; for example, some papers introduce a “scale” variable, which assumes that the bounding box maintains a fixed aspect ratio.

The transition matrix defines the equations used to transition from one state vector to the next vector (without taking into account any measurements, ). It is plugged in to the process equation:

Let’s look at some basic equations describing motion:

We could express this system using the following recurrence:

These same equations can also be used to model the variables and their derivatives. Referring back to the process equation, we can thus model this system as:

The process noise matrix measures the variability of the input signal away from the “ideal” transitions defined in the transition matrix. Larger values in this matrix mean that the input signal has greater variance and the filter needs to be more adaptable. Smaller values result in a smoother output, but the filter is not as adaptable to large changes. This can be a little difficult to define, and may require some fine tuning. Based on our definition of the measurement noise above, our process noise matrix is defined as:

where and .

The measurement matrix maps between our measurement vector and state vector . It is plugged in to the measurement equation:

The variables and are mapped directly from to , whereas the derivative variables are latent (hidden) variables and so are not directly measured and are not included in the mapping. This gives us the measurement matrix:

The matrix defines the error of the measuring device. For a physical instrument such as a speedometer or voltmeter, the measurement accuracy may be defined by the manufacturer. In the case of a face detector, we can determine the accuracy empirically. For instance, we may find that our Viola and Jones face detector detects faces to within 10 pixels of the actual face location 95% of the time. If we assume this error is Gaussian-distributed (which is a requirement of the Kalman filter), this gives us a variance of 6.5 pixels for each of the coordinates, so the measurement noise vector is then given by:

The errors are independent, so our covariance matrix is given by:

Decreasing the values in means we are optimistically assuming our measurements are more accurate, so the filter performs less smoothing and the predicted signal will follow the observed signal more closely. Conversely, increasing means we have less confidence in the accuracy of the measurements, so more smoothing is performed.

The estimate covariance matrix is a measure of the estimated accuracy of at time . It is adjusted over time by the filter, so we only need to supply a reasonable initial value. If we know for certain the exact state variable at start-up, then we can initialise to a matrix of all zeros. Otherwise, it should be initialised as a diagonal matrix with a large value along the diagonal:

where . The filter will then prefer the information from the first few measurements over the information already in the model.

## Implementing the face tracker

The following script implements the system we have defined above. It loads the face detection results from CSV file, performs the Kalman filtering, and displays the detected bounding boxes.

% read in the detected face locations fid = fopen('detect_faces.csv'); fgetl(fid); %ignore the header detections = textscan(fid, '%[^,] %d %d %d %d', 'delimiter', ','); fclose(fid); % define the filter x = [ 0; 0; 0; 0; 0; 0 ]; F = [ 1 0 0 0 1 0 ; ... 0 1 0 0 0 1 ; ... 0 0 1 0 1 0 ; ... 0 0 0 1 0 1 ; ... 0 0 0 0 1 0 ; ... 0 0 0 0 0 1 ]; Q = [ 1/4 0 0 0 1/2 0 ; ... 0 1/4 0 0 0 1/2 ; ... 0 0 1/4 0 1/2 0 ; ... 0 0 0 1/4 0 1/2 ; ... 1/2 0 1/2 0 1 0 ; ... 0 1/2 0 1/2 0 1 ] * 1e-2; H = [ 1 0 0 0 0 0 ; ... 0 1 0 0 0 0 ; ... 0 0 1 0 0 0 ; ... 0 0 0 1 0 0 ]; R = eye(4) * 42.25; P = eye(6) * 1e4; nsamps = numel(detections{1}); for n = 1:nsamps % read the next detected face location meas_x1 = detections{2}(n); meas_x2 = detections{4}(n); meas_y1 = detections{3}(n); meas_y2 = detections{5}(n); z = double([meas_x1; meas_x2; meas_y1; meas_y2]); % step 1: predict [x,P] = kalman_predict(x,P,F,Q); % step 2: update (if measurement exists) if all(z > 0) [x,P] = kalman_update(x,P,z,H,R); end % draw a bounding box around the detected face img = imread(detections{1}{n}); imshow(img); est_z = H*x; est_x1 = est_z(1); est_x2 = est_z(2); est_y1 = est_z(3); est_y2 = est_z(4); if all(est_z > 0) && est_x2 > est_x1 && est_y2 > est_y1 rectangle('Position', [est_x1 est_y1 est_x2-est_x1 est_y2-est_y1], 'EdgeColor', 'g', 'LineWidth', 3); end drawnow; end

The results of running this script are shown in the following video:

Clearly we can see that this video has a much smoother and more accurate bounding box around the face than the unfiltered version shown previously, and the video no longer has frames with missing detections.

## Closing remarks

In the future, I aim to write an article on the extended Kalman filter (EKF) and unscented Kalman filter (UKF) (and the similar particle filter). These are both non-linear versions of the Kalman filter. Although face trackers are usually implemented using the linear Kalman filter, the non-linear versions have some other interesting applications in image and signal processing.

### Trackbacks & Pingbacks

- How to Detect and Track Objects Using Matlab - Into Robotics
- Kalman filter in computer vision: the choice of Q and R noise covariancesCopyQuery CopyQuery | Question & Answer Tool for your Technical Queries,CopyQuery, ejjuit, query, copyquery, copyquery.com, android doubt, ios question, sql query, sqlite query,
- Kalman filter in computer vision: the choice of Q and R noise covariances | Stackforum.com
- Kalman filter in computer vision: the choice of Q and R noise covariances | Zaren Answers

Learned about kalman filter from your blog… which is new to me.

hi alister,

Awesome work. I was wondering if all those can be applied in real time, i.e., plug in a USB camera and apply the haar classifier version and Kalman filtered version face detection in real time?

If you think so, do you have any suggestion on how to implement that?

Thanks!

Xi

Yes – Haar-like object detectors are very efficient and are well suited to real-time face detection. I’d recommend you implement it in pure C++ using OpenCV rather than mixing MATLAB code as I have done. OpenCV contains an easy to use Kalman filter (see cvKalmanCorrect and cvKalmanPredict) and some example code of how to stream video from a webcam.

Hello,

I am new in kalman filter development, so I apologize in advance for my questions.

To develop a multi-tracking system using kalman filter, Do I need to have one kalman filter for each object? \

I would need a data association step to relate each detection to the right track. And maybe I have to define different motion models to track objects of different type (for example: pedestrians, trucks, bikes, …). Am I right?

Thanks in advance.

Yes, that’s correct. For the data association step, a simple approach is to associate the objects between frames that have the minimal Euclidean distance, with thresholds for detecting new objects entering the scene or old objects leaving the scene. For more robust data association, particularly in more complex scenarios where objects may occasionally occlude each other, take a look at the Multiple Hypothesis Tracker – I plan to write an article on this in the future.

hello,

im a student and really new to kalman filter.

can you expakin to me about “associate the objects between frames that have the minimal Euclidean distance”??

it’s hard enough for me to understand.

thanks in advance.

For each video frame, we are detecting the locations of all of the faces in the image. This means that to track individual faces, we need a method to determine whether face X detected in frame 1 is the same as face Y detected in frame 2. A very simple way to solve this problem is to say that if face X is within a certain distance of face Y, we assume they are the same face. Otherwise, we assume they are two different faces. This Wikipedia article gives a good description of the technique (it’s about radar tracking, but the concepts are the same as in face tracking).

do we need that “minimal Euclidean distance” for tracking multi-person for counting people?

thanks,

very cool, thanks for sharing, I’m also working on an optimized code for tracking 3D objects in noisy environment. I’ll share it here as soon as I publish it.

Glad you found it helpful, your project sounds interesting!

hi i’m working on a project extraction of pulmonary fissures using kalman filters want to know is it possible to detect it using this filter?

will it be efficient?

is it possible to do with kalman filters? if means how to do it?

thanks in advance…..

Sorry, I don’t have any experience with medical imaging so can’t really advise whether the Kalman filter is appropriate for your project.

Hi there.

Excellent example, thanks. I’ve download and tested it a bit, seems to work. One thing is a bit odd to me- F is not used in kalman_update function, although it is one of the inputs. Can you explain why?

Another thing- is there any reason to calculate [x1,y1,x2,y2] simultaneously? I’ve implemented a single state Kalman filter (a simplified case f what you’ve described here, also described at Wikipedia), and used it four times. This way I can use it for any inter frame translation operators vector such as [x,y,Scale,Rotation, Occlusion ratio, etc…) without changing the Kalman filter model. Will be happy to hear your opinion on that.

Best regards,

Nikolay.

Of course you’re correct, the F parameter is not used and can be removed from the kalman_update function – that’s a typo.

To your second question, if your state variables are independent, they don’t need to be included in the same Kalman filter. Using multiple filters is perfectly reasonable (and should generate the same results).

Hi,

Thanks a lot for your quick tutorial, it is very helpful.

One question ..

For Q, you have …”this gives us a variance of 6.5 pixels”

then in the covariance matrix calculation, the 6.5 is squared again.

I thought the variance is already squared, am I missing something ???

Thanks,

Tien

Hi alister, I wanted to know how your code can be extended to work with continuous streaming of video instead of the saved images?If it is not possible to work with continuous video,then could you explain the process of framing the videos into images since I am unable to work with other videos apart from the sample data files you have provided. Thank you and keep up the wonderful work.

Take a look at this example of using OpenCV to capture live video from a camera. The

`cvQueryFrame`

function used in the example returns`IplImage`

objects, so you should be able to just substitute the`cvLoadImage`

function in my code with this function.I installed OpenCv1.1pre version,the one which you have written the code in. I have VS2008 and 2010 Express. In the initial step it returns error ??? Error using ==> loadlibrary at 279 Microsoft Visual C++ 2005 or 2008 is required to use this feature.

Error in ==> opencv_test_prototype at 4

loadlibrary(…

Now,I do not know if we have to extract opencv.sln via VS since I found in wiki that we have to install however,the steps mentioned there are incompatible with Vs2010 and the earlier version of opencv1.1. Please let me know how to resolve the issue.Thank you

hi

I need particle filter Matlb code

can you help me?

Do we have to give the coordinates of face for each and every frame? And till what value the loop has to run?

i have a problem in the exécution at th kalman filter matlab’s.he said that:??? Error using ==> imread at 315

File “images\franck_00000.jpg” does not exist.

Error in ==> kf at 47

img = imread(detections{1}{n});

can you help me please?!!

Hi Alister,

you wrote fantastic article, which was very useful for my master thesis. I’d like to quote you, but I couldn’t find your full name, can you post it ot me please?

Thank you very much for the article and for your reply as well.

Ales

It was very useful for me,thank you a lot,but I have a little question and this is about the way you have reached the variance of 6.5 for each coordinates,would you please explain it a little more?

Thank you

Hi ramtin. You can derive it from the properties of normal distribution (http://en.wikipedia.org/wiki/68-95-99.7_rule). Basically around 95% percent of estimated face corners positions will lay within the distance of 2 standard deviations from real position. We get the exact value of multipier from the table (http://en.wikipedia.org/wiki/Standard_deviation#Rules_for_normally_distributed_data). Assuming standard deviation of 5 pixels (the corner can be moved 5 pixels in each direction = 10px all together) we get:

1.959963984540 * std. deviation = 5

std. deviation = 2.55106728

variances = std. deviation ^2 = 6.50794429 =~ 6.5

hello Alister,

I’m really very new to kalman filter.Can you help me please to get a very special part of this artical in opencv, the part is the detection of a face tracking object .

detections = textscan(fid, ‘%[^,] %d %d %d %d’, ‘delimiter’, ‘,’);

and

nsamps = numel(detections{1});

for n = 1:nsamps

% read the next detected face location

meas_x1 = detections{2}(n);

meas_x2 = detections{4}(n);

meas_y1 = detections{3}(n);

meas_y2 = detections{5}(n);

thank you.

Hello Alister,

Thank you for this interesting artical, but i have some problem:

Can you tell me please how you are calculated the coordinates of the measure (images.csv) file because it is the major problem of mine.

Best regards,

hb2012

Good tutorial. But there is a slight problem. I was implementing this filter but due to some reason the K matrix elements were becoming Inf in Matlab and then NaN. Also the resulting tracking was worse than when face detection was applied straight away to each frame.

other than kalman filter is their any other filters for object detection

hi ramaraju,

there are other filters too for object detection, like particle filter

Hi, I’ve followed your code to follow pedestrians, kalman works well except for cases where the object tracked changed its speed suddenly!, so how can I add more speed to the model, or fix the acceleration value?

BR

Esteban

sir my self rakesh mandal, student of nit rourkela, recently i work in robust kalman filtering for time delayed system, now i want to work some real time application of kalman filter, my guide show these blog, i am trying to run your program, but error shows ??? Input argument “opencvPath” is undefined.

Error in ==> detect_faces at 9

loadlibrary(…

how i run these program, plz send me the exact code.

very good guide how Kalman Filter can be used together with Matlab. This is one of the reason that I choose to add this guide to my article http://www.intorobotics.com/how-to-detect-and-track-objects-using-matlab/

Hi, can you explain Q choise (and its coefficient/acceleration a), please? Thanks

*choice

Why a=0.1 ?

Every weekend i used to visit this webb site, because i

wish for enjoyment, as this this web page conations really good funny information too.

Hi alister,, good tutorial. At last, I learnt new thing about kalman filter. I have question, I am planning to combine this kalman filter with my project HOG human detection. Do you think it is possible? and I wanna ask about the parameter R, here you state that R = eye(4) * 42.25. Is it always constant for different objects, for example human body? or this value is only for face part?

Hi alister,

I want to know how you calculate matrix Q? I really don’t understand it.

Hello sir, thanks for the posts, I just found this blog today and I think you have the information am searching for regarding my dissertation. am working of kalman filter for facial tracking. I really don’t have much idea about kalman filter and would be happy if you can put me through in it. really need your guide on how to go about. however, I wish to implement it with MATLAB. I will be very glad if you can get back to me.

Hello,

Thanks a lot for this wonderful report and implementation.

I was wondering if you could explain a bit more the choice of the matrix {\mathbf{Q}}. I do not see the relation between the motion equations expressed before. For instance, the first coefficient of the state is computed as: {x_{1,t+1} = x_{1,t} + dx_t + d^2x_t/2 * T}

It looks like there is some unity problem here, since you are adding position and speed together.

Besides this, once you have the expression of the state x_{1,t+1}, how do you obtain the matrix Q? You argue “Based on our definition of the measurement noise {\mathbf{v}_{t}} above, our process noise matrix is defined as..”. However, what is the relation between the measurement noise and the process matrix? Which definition do you refer to? I feel like you should base this on the definition of the process noise {\mathbf{w}_{t}} instead.

Thanks in advance,

Lucas