Data Science

Real-time Emotion Detection System with PyTorch and OpenCV

Pinterest LinkedIn Tumblr

I built a real-time emotion detection system using PyTorch and OpenCV. And today, we’re going to test it on 2 different players reacting to the Novak Djokovic visa ban in Australia. If you want to learn how to build it from scratch, stick with me until to end of this tutorial.

In this tutorial, you will learn how to use the trained model from previous tutorial, to analysis the constant fluid state of human emotions reacting Novak Djorkovic visa ban in Australia in video formatted files.

This tutorial is the second series on Real-Time Emotion Detection system with PyTorch and OpenCV.

  1. Training an Emotion Detection System from Scratch using PyTorch (previous tutorial)
  2. Real-time Emotion Detection using PyTorch and OpenCV (this tutorial)

Let’s now configure our environment.

Configuring your Development Environment

To successfully follow this tutorial, you’ll need to have the necessary libraries: PyTorch, OpenCV, scikit-learn and other libraries installed on your system or virtual environment.

Good thing you don’t have to worry too much about OpenCV and scikit-learn installation techniques, as I’ve covered them in this tutorial here. Mostly for Linux users. As you can easily pip install them, and you’re ready to go.

However, if you are configuring your development environment for PyTorch specifically, I recommend you follow their Installation guild on their website. Trust me when I say it’s easy to use, and you’re ready to go asap.

The installation procedure for installing pytorch, emotion detector system pytorch
The installation procedure of PyTorch on your virtual environment

All that’s required of you is to select your preferences and run the install command inside of your virtual environment, via the terminal.

Project Structure

Before we get started implementing our Python script for this tutorial, let’s first review our project directory structure:

We have two Python script to review today:

  1. This contains additional method to resize our video frame.
  2. Loads our trained model from disk, makes predictions on different facial expression from videos, and displays the results on our screen

Let’s have a look inside these scripts:

This script contains a resize function already implemented which resizes each video frame while preserving the aspect ratio of the individual image. You can learn more about this code snippet via this video/ blog tutorial, I’ve prepared for you already. Moving on!

Implementing the Real-Time Emotion Detection System Script

We began importing our required Python packages (Lines 2 – 13). Then from there, we parsed our command-line arguments. Our script requires 5 arguments:

  • --video: The path to the video file we want to run the detection upon.
  • --model: The path to the trained model we’ve seen in the previous tutorial.
  • --prototxt: The path to the deployed prototxt.txt model architecture file.
  • --caffemodel: The path to the Caffe model containing the model’s weights.
  • --confidence: The minimum probability set to filter out weak detection for faces.

After we have our arguments defined (Lines 15 – 27), let’s move on to loading our face detection model, emotion detection model, and configuring our video-steam so our trained model can make real-time predictions for us:

Line 31, we load the face detection model using the --prototxt and --caffemodel files, and line 34 checks which device is available to utilize when training our model, either the CPU or GPU.

Line 37 – 38, we initialize the mapping to different target labels in a Python dictionary.

Then we load the model weights from the EmotionNet we created in the previous tutorial and call them to(device) to move the model to be either on our CPU or GPU and set the model to evaluation mode (Lines 41– 45).

To preprocess our dataset before feeding it towards the model, we called the Compose instance from torchvision.transforms module on Lines 48– 53, to take the detected face, convert it into a PIL format; into grayscale; perform a resize, and then set it to tensors.

To begin, let’s initialize our video steam which takes as an input the path to the video file (Line 56), grab any frame from the VideoCapture (Line 62); check if there’s any frame to be read (Lines 65 – 66). Resize the original video frame into 1500×1500 pixels (Line 69); made a copy of the image (Line 70), before reversing the color channel from BGR to RGB (Line 71).

On Line 74, we instantiate a 300×300 canvas filled with zeros of type uint8, where the output probabilities will be displayed to help us understand the emotion state of the players.

From Lines 77– 78, we grabbed the image’s width and height and converted the current frame into a blob using the cv2.dnn module.

Next, we set the generated blob as an input into the neural network loaded (Line 81) and then apply OpenCV’s deep learning-based face detector to find the number of faces in the input image (Line 82).

After we’ve been able to detect and localize faces, we’ll iterate through detections and grab the confidence to filter out weak faces below the minimum confidence set (Lines 85– 92). Then exact only the regions of interest (the faces) and multiple the (x, y, w, h) by the width and height, making sure the spatial dimension is large for the proceeding steps (Lines 95– 96).

Now it’s time to activate our emotion detection algorithm for inference:

Lines 101 – 104 crops out the region on interest (the face), apply the preprocessing steps onto the image, add a new dimension (C, H, W) => (N, C, H, W), and send the image to any available device.

Then to get the model’s predictions, Lines 109 – 112 outputs the predictions of the model, which is converted into probabilities.

Now it’s time to get the probabilities of different emotional states of the top tennis players.

To understand the human expressions to any given situation, we can’t just label or interpret the feeling of anyone based on a given output because as i mentioned earlier in the previous tutorial, we as human, our emotions are in a constant fluid state. As based on certain situation, we can have mixture emotion.

So rather than assign a single label or output to a given frame, it’s much better to represent the results in form of probabilities by displaying a bar chart of the fixed emotions we want to detect along with it’s probabilities values, that can later be study as a research task (Lines 118 – 134).

To round up with the script, we display both the original video frame and the frame which contains probability values. Then wait for a “q” key to be pressed on the keyboard to terminate any stated video analysis, close any open video, and stop any pointers (Lines 137 – 147).

Display OpenCV Flip Result

Now that’s implemented, it’s time to run our script. So, fire up your terminal, and execute the following command:

The output we’ll get should match the short video you can see in this video clip.


In this tutorial, you learnt how to use the trained model from the previous tutorial, to perform infer on real-time video steams using PyTorch and OpenCV.

What’s Next?

Now, what’s next? in the following tutorial, we will explore the library OpenCV’s functionalities. Until then, share, like the video above, comment, and subscribe.

Further Reading

We have listed some useful resources below if you thirst for more reading.

Write A Comment