Howdy Folks, Welcome to this another awesome tutorial. In this tutorial, you will build a Real-Time Face Detection Javascript Web App that can access your webcam and identifies your facial expressions, face landmarks, and much much more inside the browser! And we are going to make this possible by just using Javascript, HTML5 (canvas), and face-api.js. Sounds amazing right?

The finished product will look like this:

finished version

You can download the completed project from GitHub - Face Detection JS

Or you can get the corresponding final codes at the end of each section too...

Prerequisites

I assume you are already familiar with basic Javascript (ES6). And you should have a decent understanding of HTML5 Canvas and CSS. I am not going to explain the CSS code, because there will be only a few lines of CSS code. But if you didn't understand the CSS, then please comment below. I am always here to help you 😇. And also knowledge about Javascript Promises is expected.

Besides the things above mentioned, you will need the following:

  • A modern Web browser(such as Firefox or Chrome). Your browser should be up to date.
  • An HD webcam to get the live webcam video stream. The laptop's camera is ok too.
  • A clear surrounding with enough light. Because then only our model will be able to detect faces.
  • Finally, a Text Editor.

I am going to use "visual studio code" as my text editor. You are free to use whatever code editor you like. And finally, make sure you have installed Live Server by Ritwick Dey in visual studio code. If you don't have Live Server installed, then:

  • Open visual studio code
  • Go to Extensions tab (Ctrl + Shift + X)
  • Search for "Live Server" by Ritwick Dey
  • Then Install it

Now let's understand what this face-api.js is all about...

face-api.js

face-api.js is a javascript face recognition module/API for the browser, which is built on top of tensorflow.js core. You can use it with Node JS too. This is the module that is going to help us in detecting the faces in our webcam video stream. face-api.js provides us with so many pre-trained face detecting models.

Now if you are a beginner in Machine learning, then you may ask what is a model?

You can think of models as a file/algorithm that has been trained to recognize certain types of patterns in the data.

As I told earlier, There are several models available with face-api.js, including face detection, face landmark detection, facial expression recognition, etc. You can view all the available pre-trained models here. Each model solves a particular problem. You can use one model to detect face landmarks and the other one to detect facial expressions like that.

I hope now you got a decent understanding of the face-api.js module and all the models available.

In short, we are going to use the face-api.js (minified version) and all the needed models to build this project.

But wait, how our project will look like and how it will function?

Planning our Project

Visually our project will look like this:

breakdown of the finished product

Here we only had two HTML elements:

  • video element with id #video. This is where we will display the live webcam's video stream.
  • Then a canvas with id #canvas. We use this #canvas to draw the prediction results on top of the face as you see in the picture.

That's the breakdown of the HTML part. Now let's look at how we are going to make it work:

  • First, we will download the face-api.js module and all the pre-trained models.
  • Then, we will put them in our working directory.
  • Then, from within our javascript file, we will load all the needed models.
  • After that, we will capture the live webcam's stream and display it on the #video element.
  • Finally, we use face-api.js to make predictions and draw the bounding box and other details on top of the face.

Now let's get started...

Getting Started

Create a folder structure similar to below:

folder

  • First, create the root folder which holds everything and name it as "Face Detection JS" or anything you like.
  • Then open this folder inside visual studio code.
  • Then directly inside this root folder, create the following files:
    • index.html - this is going to hold all our HTML code.
    • styles.css - this will hold all of our CSS styles.
    • script.js - this is the most important file. This will hold all of our Javascript logic.
    • Then download this models folder and face-api.min.js file. Unzip each of them. Then put both of them directly inside our root folder. This models folder contains all the pre-trained models and the face-api.min.js file is the actual minified version of the face-api.js module.

As you can see, inside the models folder, we had put all the .json file and its corresponding weight files(shard files) together. And you should do the same too. 

That's it, now you would have a folder structure similar to above.

Now the HTML part.

HTML

Open "index.html" and type the following:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <!-- Link to our CSS stylesheet -->
    <link rel="stylesheet" type="text/css" href="./styles.css">
    <title>Face Detection JS</title>
  </head>
  <body>

    <!-- #video and #canvas will go here -->

    
    <!-- IMPORTANT Links -->
    <!-- Link to the face-api script file -->
    <script defer src="./face-api.min.js"></script>
    <!-- Link to our script.js file -->
    <script defer src="script.js"></script>
  </body>
</html>

Besides the normal HTML code, we had included the following:

  • First, we linked the styles.css file.
  • Then at the bottom, we linked the face-api.min.js file we downloaded. Remember, this is the most important file here. The defer attribute means, this file should only load after all the DOM elements had been successfully loaded.
  • Then below that, we linked our script.js file. This also has the defer attribute.

Now we need a #video element to put our live webcam stream and a dummy #canvas which we can utilize to draw the bounding box and other details.

Below this comment <!-- #video and #canvas will go here -->, type the following:

<!-- place to put the live webcam video stream -->
    <video id="video" width="600" height="450" autoplay="true"></video>

    <!-- later we will use this #canvas to draw bounding box around our face -->
    <canvas id="canvas"></canvas>
  • The #video element should have a definite width and height. We set autoplay to true. By setting the autoplay attribute to true, we ensure that our video starts to display automatically once we have our live webcam video stream.
  • Then we had that dummy #canvas.

That's it, now the final code for the index.html file should look like the following:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <!-- Link to our CSS stylesheet -->
    <link rel="stylesheet" type="text/css" href="./styles.css">
    <title>Face Detection JS</title>
  </head>
  <body>

    <!-- #video and #canvas will go here -->
    <!-- place to put the live webcam video stream -->
    <video id="video" width="600" height="450" autoplay="true"></video>

    <!-- later we will use this #canvas to draw bounding box around our face -->
    <canvas id="canvas"></canvas>

    
    <!-- IMPORTANT Links -->
    <!-- Link to the face-api script file -->
    <script defer src="./face-api.min.js"></script>
    <!-- Link to our script.js file -->
    <script defer src="script.js"></script>
  </body>
</html>

Now Open it with Live Server by right-clicking on the "index.html" file (inside visual studio code), then scroll down and click on the "Open with Live Server" option and take a look at it at the browser.

You won't see anything now. Because we only linked the "styles.css" file but didn't write any CSS styles. So now let's do that. Let's add some styles...

CSS

You can either copy or type these styles. Because CSS is not our main focus here.

Open the "styles.css" file you created and type or copy the following:

/* to select * everything */
* {
  padding: 0;
  margin: 0;
}

/* style for body */
body {
  padding: 0;
  margin: 0;
  width: 100vw; /* 100% view-port width */
  height: 100vh; /* 100% view-port height */
  display: flex;
  justify-content: center; /* centers horizontally */
  align-items: center; /* centers vertically */
}

/* #video */
/* we use hash # to select elements based on their id  */
/* initially give a light grey background */
#video {
  background-color: #444;
}

/* #canvas */
#canvas {
  position: absolute; /* makes its position absolute to DOM */
}

This is pretty self-explanatory. Try to decode it on your own. If you get stuck then please comment below.

At this stage, you should see something like this in the browser:

a placeholder grey box in place of where we will put our live webcam stream

Now it's time to make our app working. So let's work on the Javascript logic.

Javascript

The first thing, as always, is to grab a reference for the necessary HTML elements from the DOM. The only element we need is #video.

Open the "script.js" file and type the following:

// 1). grab needed elements' reference
// #video element
const video = document.getElementById('video');

Now let's load all the needed models below the above code:

// 2). load the models
async function loadModels() {
  /* ssdMobilenetv1 - for face detection */
  await faceapi.nets.ssdMobilenetv1.loadFromUri('./models');
  /* faceExpressionNet - to detect the expressions on face */
  await faceapi.nets.faceExpressionNet.loadFromUri('./models');
  /* faceLandmark68Net - to detect face landmarks */
  await faceapi.nets.faceLandmark68Net.loadFromUri('./models');

  // start the live webcam video stream
  startVideoStream();
}

// 3). startVideoStream() function definition

// activate the loadModels() function
loadModels()

We are importing all the needed models from faceapi.nets. We can directly use the faceapi variable out of the box because we had given the link to the face-api.min.js file in the index.html file. 

  • The models we needed are the following:
    • ssdMobilenetv1 - This model helps us in detecting the faces. SSD (Single Shot Multibox Detector) MobileNet V1 is a model based on MobileNet V1 that aims to obtain high accuracy in detecting face bounding boxes. This model basically computes the locations of each face in an image and returns the bounding boxes together with its probability for each face detected.
    • faceExpressionNet - This helps us in detecting the expressions on our faces. The face expression recognition model is lightweight, fast and provides reasonable accuracy
    • faceLandmark68Net - This helps in detecting face landmarks. This is also lightweight, fast, and accurate.
  • The async and await used here is to load models asynchronously. Because loading a model from a file takes much time. During this time we don't want to block other functions from happening. So by giving async keyword, we make that function to run in the background asynchronously.
  • After loading the models, we call a function named startVideoStream(). This function will be responsible for displaying the webcam video stream to the browser. We only called it but hadn't created it yet. So let's do that.

Type the following code below this comment // 3). startVideoStream() function definition:

function startVideoStream() {
  // check if getUserMedia API is supported
  if (navigator.mediaDevices.getUserMedia) {
    // get the webcam's video stream
    navigator.mediaDevices.getUserMedia({ video: true })
    .then(function(stream) {
      // set the video.srcObject to stream
      video.srcObject = stream;
    })
    .then(makePredictions)
    .catch(function(error) {
      /* if something went wrong, just console.log() error. */
      console.log(error);
    });
  }
}

// 4). makePredictions() function definition
  • We access the webcam through the getUserMedia API. Most of the modern browsers support this functionality. But anyway checking if this is supported is not bad and that's what we are doing inside the if statement.
  • If getUserMedia API is supported, then we can get the webcam video stream. getUserMedia accepts constraints as its arguments. Constraints are just javascript objects. Here we only need the video so we are specifying { video: true }
  • getUserMedia function returns a promise that resolves to an object of type MediaStream. If the promise successfully resolves, we get the webcam video stream
  • In our code, we are keeping things simple and just setting our stream to our video element's srcObject property like this:

video stream

  • Then after we streamed the webcam visuals to the browser, we activate another function called makePredictions. This is going to do the rest of the job. We will create it in a bit.
  • Finally, if any error happens, we catch it and then just console.log() it.

Now below this comment "// 4). makePredictions() function definition", let's make the makePredictions() function:

function makePredictions() {
  // get the #canvas
 
  /* get "detections" for every 100 milliseconds */
  
}

To make predictions, we can again use the faceapi. We can do something like this. You don't have to type this until I tell you to do so:

detections

This will detectAllFaces() in the element we give. In this case, we give the video element we grabbed earlier. This "detections" array we get back has all the things like the "prediction results" as well as the "bounding box" configurations to draw the bounding box around our faces!

If you want to get face landmarks and facial expressions, then you can chain them like this:

chaining

But there is a problem here. This code only runs once. But we want to run this constantly right? Because then only we can make continuous predictions. So we can do something like the following.

Type the following code below this comment "/* get "detections" for every 100 milliseconds */" inside makePredictions() function:

setInterval(async function() {
    /* this "detections" array has all the things like the "prediction results" as well as the "bounding box" configurations! */
    const detections = await faceapi.detectAllFaces(video).withFaceLandmarks().withFaceExpressions();

  }, 100);
  • We use the setInterval() to run this code for every 100 milliseconds.

Now let's get the #canvas and draw the bounding boxes and other things.

Type the following code below this comment "// get the #canvas" inside makePredictions() function:

const canvas = document.getElementById('canvas');
  // resize the canvas to the #video dimensions
  const displaySize = { width: video.width, height: video.height };
  faceapi.matchDimensions(canvas, displaySize);

I think this is pretty self-explanatory. So try to understand this on your own.

The above code will only grab the #canvas and give it some dimensions. But we need to draw the bounding boxes and other things. face-api.js also predefines some high-level drawing functions, which you can utilize, and the "detections" array we got earlier has all the information like where to draw and what to draw. Our life is so much easier now. 

Type the following code below this line inside setInterval() function:

below this line

/* resize the detected boxes to match our video dimensions */
    const resizedDetections = faceapi.resizeResults(detections, displaySize);

    // before start drawing, clear the canvas
    canvas.getContext('2d').clearRect(0, 0, canvas.width, canvas.height);

    // use faceapi.draw to draw "detections"
    faceapi.draw.drawDetections(canvas, resizedDetections);
    // to draw expressions
    faceapi.draw.drawFaceExpressions(canvas, resizedDetections);
    // to draw face landmarks
    faceapi.draw.drawFaceLandmarks(canvas, resizedDetections);
  • In the above code, first, we resize the detected boxes to match our video element's dimension.
  • Then clear the #canvas before drawing anything. Otherwise, our boxes will be drawn on top of each other like this:

real mess

  • The last 3 lines are the actual code that draws the bounding box and other elements.

That's it, the complete code for makePredictions() function should look like this:

// 4). makePredictions() function definition
function makePredictions() {
  // get the #canvas
  const canvas = document.getElementById('canvas');
  // resize the canvas to the #video dimensions
  const displaySize = { width: video.width, height: video.height };
  faceapi.matchDimensions(canvas, displaySize);

  /* get "detections" for every 100 milliseconds */
  setInterval(async function() {
    /* this "detections" array has all the things like the "prediction results" as well as the "bounding box" configurations! */
    const detections = await faceapi.detectAllFaces(video).withFaceLandmarks().withFaceExpressions();

    /* resize the detected boxes to match our video dimensions */
    const resizedDetections = faceapi.resizeResults(detections, displaySize);

    // before start drawing, clear the canvas
    canvas.getContext('2d').clearRect(0, 0, canvas.width, canvas.height);

    // use faceapi.draw to draw "detections"
    faceapi.draw.drawDetections(canvas, resizedDetections);
    // to draw expressions
    faceapi.draw.drawFaceExpressions(canvas, resizedDetections);
    // to draw face landmarks
    faceapi.draw.drawFaceLandmarks(canvas, resizedDetections);

  }, 100);
}

And here is the final code for the "script.js" file. Make sure the code you typed up to this point inside the "script.js" is exactly like the following:

// 1). grab needed elements' reference
// #video element
const video = document.getElementById('video');

// 2). load the models
async function loadModels() {
  /* ssdMobilenetv1 - for face detection */
  await faceapi.nets.ssdMobilenetv1.loadFromUri('./models');
  /* faceExpressionNet - to detect the expressions on face */
  await faceapi.nets.faceExpressionNet.loadFromUri('./models');
  /* faceLandmark68Net - to detect face landmarks */
  await faceapi.nets.faceLandmark68Net.loadFromUri('./models');

  // start the live webcam video stream
  startVideoStream();
}

// 3). startVideoStream() function definition
function startVideoStream() {
  // check if getUserMedia API is supported
  if (navigator.mediaDevices.getUserMedia) {
    // get the webcam's video stream
    navigator.mediaDevices.getUserMedia({ video: true })
    .then(function(stream) {
      // set the video.srcObject to stream
      video.srcObject = stream;
    })
    .then(makePredictions)
    .catch(function(error) {
      /* if something went wrong, just console.log() error. */
      console.log(error);
    });
  }
}

// 4). makePredictions() function definition
function makePredictions() {
  // get the #canvas
  const canvas = document.getElementById('canvas');
  // resize the canvas to the #video dimensions
  const displaySize = { width: video.width, height: video.height };
  faceapi.matchDimensions(canvas, displaySize);

  /* get "detections" for every 100 milliseconds */
  setInterval(async function() {
    /* this "detections" array has all the things like the "prediction results" as well as the "bounding box" configurations! */
    const detections = await faceapi.detectAllFaces(video).withFaceLandmarks().withFaceExpressions();

    /* resize the detected boxes to match our video dimensions */
    const resizedDetections = faceapi.resizeResults(detections, displaySize);

    // before start drawing, clear the canvas
    canvas.getContext('2d').clearRect(0, 0, canvas.width, canvas.height);

    // use faceapi.draw to draw "detections"
    faceapi.draw.drawDetections(canvas, resizedDetections);
    // to draw expressions
    faceapi.draw.drawFaceExpressions(canvas, resizedDetections);
    // to draw face landmarks
    faceapi.draw.drawFaceLandmarks(canvas, resizedDetections);

  }, 100);
}

// activate the loadModels() function
loadModels()

That's it!!!

Hooray, we had reached the end :). Now our application will work perfectly :) Hey go check it out in the browser...

Wrapping Up

I hope you enjoyed this tutorial.

For further exploration, check out this link - justadudewhohacks

If you had any doubts, then please comment them below. Thank you ;)