Aug 29, 2019

Live avatars (like animoji in the browser) with face-api.js

In this article, I explain how we added live avatars to our app to help teammates feel more present with one another on distributed teams.

Background

Pesto is an online office for remote workers. In Pesto, I wanted a way for teammates to feel present together when working, without invading privacy. We started by trying "always on" video - but that was too creepy. Then, we decided to try something like streamed Animoji.

Here’s what it looks like in the end:

How it Works

Live avatars are remarkably easy to code up.

We built Pesto with React and Electron (for the desktop version). To implement live avatars, we did the following:

1) Add an avatar creation flow

We used the Avataaars library by Pablo Stanley (specifically this project by fangpenlin). We evaluated a couple other libraries for this but it had the most configurability / diversity (race, gender expression, etc).

2) Detect faces with face-api.js

I set face-api.js to detect the bounding box and expression attributes. I played with eyebrow detection but couldn't get it to feel right.

3) Transform the live avatar

Once the face has been detected, we use the bounding box to horizontally translate the avatar. We tried zooming the avatar in and out, but it felt strange to us, so we nixed it.

We use expression "happy" as a smile with teeth ("Smile"), "neutral" as a normal smile ("Twinkle"), and "sad" as a more neutral / serious expression ("Serious").

Development Tip

Development was much easier when I could actually see the video and photo that face-api.js was using. Example below:

Issues

The system works better than we expected, namely because face-api.js is pretty good at what it does. There are still a few gotchas...

Performance

Face detection with face-api.js can take anywhere from 10ms to 3-4s to run, depending on how many runs it’s done and how loaded your computer is.

This means that, sometimes, your otherwise snappy app turns laggy for a moment.

I minimize this by not running face-api.js too frequently (about once per 5s seems right) and by letting it “warm up” when it first boots. The first detection always seems to take the longest - more work to be done (might just be me being a bad developer).

Responsiveness

Since face-api.js takes time and CPU, we throttled it down to not run all the time. This means it doesn't have the responsive, real face experience of Animoji. Instead, it's more passive. However, it works very well for our use case - passive presence.

Devices

If you have multiple devices connected, managing the video device the app is using can be a nightmare, especially since it turns out Chrome might decide to ignore the one you pick anyways.

This especially becomes a problem if you use the device for other things (like we do when the user is in a video conference) - we still have some open bugs around this.

[Update] What happened next

We ended up cancelling this feature well before shutting down Pesto. In some ways, it's the worst of both worlds - you have to look at the camera and act actively engaged, but people don't actually get to see your face. It was a fun project though.