Aug 25, 2020

Fixing high usage with Firebase real-time database

Several months ago, Pesto ground to a halt. Our users couldn't click on anything, and some couldn't even load data at all. All writes were timing out to Firebase.

We looked at Firebase's console, and we found this scary graph under "Usage" (100%!).

In this article, I explain how we diagnosed this scale issue and resolved it.

Step 1: Profile writes

The Firebase team hinted that writes could be a problem and asked us to profile the database. For a guide on this, check out this article: https://firebase.google.com/docs/database/usage/profile

We ran the profiler for around 10s, then look at the "Write Speed" section to better understand our write volume. In retrospect, I'd recommend running it for several minutes, rather than seconds, to get better data. A few lines stood out (anonymized):

The rest of our writes combined for ~100. We decided to investigate these three.

Step 2: Trace paths to polling code

/integration1 and /integration2 were pretty simple. The code updating them polled every 10s and would write every time they polled. This ended up being far too much when we hit thousands of online users.

/users/$wildcard was more complicated; a lot of code paths wrote here. We were pretty confident one particular polling write was the problem however.

Pro tip

To improve our reporting, we've since switched all single path updates to sets. For example:

// don't do this
firebase.database().ref(`users/${userId}`).update({ name: name })

// do this
firebase.database().ref(`users/${userId}/name`).set(name)

Now, our reports break down our writes by sub-path.

Short term fix: Only write when your data changes

In all 3 instances, we were polling data on an interval, then immediately writing it to Firebase. However, Firebase doesn't check if the data is actually new; it writes it no matter what. In our case, 99% of the writes were the same data over and over again.

The fix? Change the code to only write data when the data changes. For example, imagine the following React function (using this useInterval implementation).

const PollingCode = ({userId}) => {
useInterval(() => {
let data = loadSomeData()
firebase.database().ref(`users/${myUserId}/integration1`).set(data)
}, 5000)
...
}

Instead, do the following.

const PollingCode = ({userId}) => {
let lastData = useRef(null)
useInterval(() => {
let data = loadSomeData()
if (lastData.current !== data) {
firebase.database().ref(`users/${myUserId}/integration1`).set(data)
lastData.current = data
}
}, 5000)
// ...
}

We were able to make these changes and deploy them quickly. As a result, our total write volume dropped by 80%. Success!

Long term fix: subscribe for changes, don't poll

Most modern APIs support a subscription model as well as a polling model. In the long term, it's better to subscribe for changes rather than polling.

Benefits

We've switched some polling logic to subscriptions, but not all of it yet.

More resources

Thanks for reading! If you're curious about how we use React with Firebase, check out my guides below: