MongoDB percentile

The 90th percentile of a dataset is the value that cuts off the bottom 90 percent of the data values from the top 10 percent of data values when all of the values are sorted from least to greatest.

To find the 90th percentile of a dataset in Google Sheets, you can use one of the following two functions:

  • =PERCENTILE(data, percentile)
  • =PERCENTILE.INC(data, percentile)

Both functions will return the same value.

For both functions, the data is the list of values in your dataset and percentile is the percentile you’d like to find between 0 and 1.

To find the 90th percentile, we will use 0.9 for k.

Note that there is also a function called =PERCENTILE.EXC that calculates percentiles between 0 and 1, exclusive. This function is rarely used in practice.

The following example shows how to calculate the 90th percentile of a dataset in Google Sheets.

Example: Calculating the 90th Percentile in Google Sheets

Suppose we have the following dataset that shows the exam scores of 20 students in a particular class:

MongoDB percentile

We can use the following formula to find the 90th percentile of the exam scores:

=PERCENTILE(A2:A21, 0.9)

The following screenshot shows how to use this formula in practice:

MongoDB percentile

The 90th percentile turns out to be 94.1.

This is the score that a student must receive in order to have a score that is greater than 90% of the exam scores in the entire class.

Notes

Keep in mind the following notes when calculating percentiles in Google Sheets:

  • The value for percentile must always be between 0 and 1.
  • The percentile function will display a #VALUE! Error if you enter a non-numeric value for k.
  • The data in our example was sorted from lowest to highest exam scores, but a dataset does not need to be pre-sorted in this manner for the percentile function to work.

    Back in 2021, we noticed some strange performance issues in our response times for USW2. Why are we bringing this up now? Well, we’re still benefiting from the updates we made and we’re looking to share the love.

    We first noticed our rate limit service (based of ) was showing consistent response times of 150ms for our processes in USW2. Weirdly, the same code was averaging around 5ms in the 99th percentile of requests. This latency was particularly odd since we average around a 2ms response time in the 50th percentile. So, there was something there we really needed to investigate.

    Our starting point was to compare metrics between USW2 and USW1. We added some additional metrics to report how long it took to fetch rate limit definitions from MongoDB (when the cache missed), which allowed us to get a better understanding of what was going on.

    MongoDB percentile

    The resulting graph confirmed the 150ms trend in USW2 but also showed 2-5ms response times from MongoDB in USE1. This isolated the issue, confirming that something was up with the MongoDB configuration affecting USW2 specifically.

    So, that was ‘Step 1: Locating the issue’ completed.

    Next, we took a deep dive into MongoDB documentation. The eureka moment came once we understood the MongoDB main topology and discovered the MongoDB default read preferences. MongoDB clients prefer the primary instance in a cluster when performing reads. It’s not always nice when your DB plays favorites.

    MongoDB percentile

    In the default MongoDB client configuration, the client in USW2 was reading cross region to the primary node in USE1. MongoDB defaults are set to where operations read from secondary members, unless the set has a single primary.

    Their documentation indicated that if we were to use readPreference=secondaryPreferred in our MongoDB client, we would be able to connect to the local MongoDB nodes instead of only the primary (USE1).

    MongoDB percentile

    Perfect. This structuring would allow us to optimize performance, spreading reads across multiple secondary servers where each server is responding to fewer read requests. We thought that updating our read preference would solve the issue but it’s never that simple, is it? Sometimes the situation is more of a clusterf@%* (pun intended).

    We really thought we nailed it with those read preferences but now we were failing to connect to the cluster using the URI:

    Turns out, we were using an older version of the MongoDB client (go.mongodb.org/mongo-driver v1.0.2) that didn’t support our TLS options. Once we figured that out, we were finally able to deploy and see if the update to our read preference had any impact once we upgraded to go.mongodb.org/mongo-driver v1.7.3, and fixed a minor URI format issue with our framework:

    PSA: If your URI Options are not working and the Mongo driver doesn’t give you an error, you are running an older version of Mongo and will need to update.

    This finally did the trick. The effect was dramatic, resulting in a ~30x improvement (response time of 5-10ms in the 99th percentile) in read performance for USW2.  Not bad.

    MongoDB percentile

    Our solution started with an investigation of fetch rates to isolate the affected server and led us to optimize our read preference logic based on MongoDB client documentation. Ultimately, we discovered this was also a version issue and we needed to update our MongoDB Go Driver to successfully connect to the cluster URI and improve our response time.

    Diagnosing root cause is part of the dev journey and resolving performance issues is done a layer at a time. If you’re experiencing similar latency in your metrics, start with assessing your servers, rate limiting services, and don’t forget to check the feature supports of your Mongo driver version.

    Investigations and updates like this are how we work to create a better overall system and user experience. So, if you want to hear more from our Engineering team, be sure to subscribe so you don’t miss out on future stories and operation insights.

    Send me the newsletter. I expressly agree to receive the newsletter and know that I can easily unsubscribe at any time.

    How to calculate percentile in MongoDB?

    e.g. np. percentile([0,1,10,30,100], 25) to get the 25th percentile of that array..
    accumulates on the field b ( accumulateArgs ).
    is initialised to an empty array ( init ).
    accumulates b items in an array ( accumulate and merge ).
    and finally performs the percentile calculation on b items ( finalize ).

    What is the 95th percentile for the dataset?

    What's the 95th percentile? In networking, the 95th percentile is the highest value remaining after the top 5% of a data set is removed. For example, if you have 100 data points, you begin by removing the five largest values. The highest value left represents the 95th percentile.

    How do I calculate the percentile?

    How to calculate percentile.
    Rank the values. Rank the values in the data set in order from smallest to largest..
    Multiply k by n. Multiply k (percent) by n (total number of values in the data set). ... .
    Round up or down. ... .
    Use your ranked data set to find your percentile..

    What is the percentile of a data set?

    What Is a Percentile in Statistics? In statistics, a percentile is a term that describes how a score compares to other scores from the same set. While there is no universal definition of percentile, it is commonly expressed as the percentage of values in a set of data scores that fall below a given value.