Friendly Faces

6 min readDec 16, 2020

(By Jimmy answering questions.jpg: Wikimania2009 Beatrice Murchderivative work: Sylenius (talk) — Jimmy answering questions.jpg, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=11309460)

Face Detection is quickly becoming easier and easier as AI and Neural Networks become better and more efficient, but how does it work and how can we apply it to Augmented Reality?

Quick Overview of Facial Detection

Facial Detection is a tool that can be used to detect human faces, often from an image, or in the case of video, a series of images. At its simplest, facial detection will tell you that there is a face and where it is, but more advanced implementations can gauge emotion, guess age, and more. In the image above, you can see the program being used (an OpenCV implementation) detects 3 of the faces present in the image, along with displaying where they are within the frame.

So, How Does It Work?

(Attribution, https://commons.wikimedia.org/w/index.php?curid=442472)

Almost all implementations of facial detection use a form of Artificial Intelligence, with the most popular one being eigenfaces, which I will focus on. By comparison, eigenfaces are much simpler than a lot of object-detection artificial intelligence implementations, such as Convolutional Neural Networks. There’s a few steps that I’ll break down to create a system that detects faces using eigenfaces.

Computing the Algorithm

First, you’d need to collect a bunch of data. This is always the most important (and often most difficult) step of any artificial intelligence application. The data in this case would be faces. Thankfully, there is no need to collect actual faces, pictures will do just fine. It’s crucial, however, that these images are standardized. In order for the detection to be effective, the pictures should all be taken from the same perspective, with the same camera, under the same lighting conditions. Of course, if you want to be able to (effectively) detect more than one face, you’re going to need variety in the dataset, you’re going to need a lot of people that look different. With all of those pictures, you’re still going to need to make some modifications. To be most effective, the images should be edited to have the eyes and mouth in the same position across all images.

Next, we need to get the average face. This is relatively easy, since we have a bunch of face pictures, and pictures are just numbers. The simplest way to do this is to average each pixel from each position of all of the pictures. For example, the top-left pixel of the “average face” image will be the average of all top-left pixels of all of the gathered data.

After finding the “average face”, we would get the difference between each dataset image and the average. This would give us a bunch of different ways that the human face can vary, an eigenface. In reality, this is used in addition to something called a “Covariance Matrix”, which is too complicated for this blog post, but thinking of these new eigenfaces as different ways the human face will deviate from an “average face” will suffice. The most important aspect of these eigenfaces describe directions of change from the average face. That’s an odd sentence, so let me give an example. People often have different cheekbones, some raised and more pronounced, some more receded. An eigenface can be used to describe that cheekbone difference. If there’s an eigenface called “A” which describes raised cheekbones, we could describe someone’s face with somewhat raised cheekbones as The Average Face + 50% of A. Again, that simplified equation specifies a face with “half-raised” cheekbones.

Apples and Oranges

This system is carried across all sorts of variance in the human face, where each eigenface describes a way (or several ways) that a face varies. We can use a combination of several eigenfaces to describe a very specific face, simply by adding them together. If you want to get into the nitty-gritty, one can think of each eigenface as a dimension. Okay, hold on, stick with me for at least a few more sentences. Pretend you were trying to describe apples which only had two properties: size and tastiness. You could imagine the “average apple” as an apple that has a size of 1 and a tastiness of 1. Any deviations from that apple could be described as dimensions (or axes) on an imaginary plot (like the image below).

Therefore, any other apple you could want to describe would exist somewhere in this plot. For example, “Apple A” is a bit bigger than the average apple, but isn’t quite as tasty. You could describe this apple as Average Apple + 0.1 Size — 0.4 Tastiness . This is exactly how it works with facial recognition, but instead of the axis being things like “size” or “tastiness”, they describe variations in the human face using images.

This sort of classification allows any face (just like apples) to be described using an equation relative to the average face.

Applying the dataset

Using this system to be able to detect and classify human faces is trivial after preparing the data and algorithm. The most common method to find the actual face on an image is to apply the “average face” as a mask over the input image. It basically scans the image for shapes that match the average and has a certain level of “faciness” that it needs to hit in order for that shape to be considered a human face. After the face has been detected on the image, key points can be extracted from it such as the position of the eyes and mouth, which is often very useful data.

Use Cases In Augmented Reality

A very mainstream use of Facial Detection/Recognition is used in iPhones. With most newer iPhones, you can unlock your phone with your face, your phone is doing facial recognition! However, there’s another way your phone uses facial detection: Memoji! The last few thousand years of mathematical and technological advancements have allowed you to morph your expressions onto a virtual unicorn. This uses the aforementioned “key points” of the face to be able to figure out how to modify the unicorn’s face to match yours.

A much cooler, futuristic application would be that of AR glasses. Imagine you’re out and about wearing your cool AR glasses that have all sorts of cool heads-up display gadgets. You wouldn’t want to get a text message alert in front of who you’re talking to! Facial recognition could be used to keep AR info out of the view of important things (like human faces).

Lastly, a more boring (yet very practical) implementation is similar to the AR glasses one: remote conferences like zoom. Facial recognition can be used to determine the focus of a video for use with digital backgrounds. It can also be used to determine who is the active speaker in the meeting to highlight, instead of full-screening the person who has a vacuum running in the next room over.

Thank you for getting all the way through (or skipping to the end)! My name is Ben Keener and I am a Full-Stack developer who loves learning and aspires to bring good ideas to life! If you’d like to give me feedback on this blog post or my bad puns, you can reach me in a number of ways: LinkedIn, Twitter, or GitHub.