Everything in this world is built with finite resources.

Restaurants, for example, contain a maximum number of seats, and when the restaurant is forced to serve significantly more people than this number, the quality of service decreases (i.e. slow delivery) and the guest’s safety can even be put at risk.

The same principle is applied to Application Programming Interfaces (APIs), where a “rate limit” is applied to ensure the API can provide optimal quality of service for its users, while also ensuring the safety of the API’s users.

For example, rate limiting can protect the API from slow performance when too many bots are accessing the API for malicious purposes, or when a DDoS attack is currently affecting the API. Also, when too many legitimate users are accessing the API, a rate limit can be useful.

What is API rate limiting?

The basic principle of API rate limiting is fairly simple:  if access to the API is unlimited, anyone (or anything) can use the API as much as they want at any time, potentially preventing other legitimate users from accessing the API.

API rate limiting is, in a nutshell, limiting access for people (and bots) to access the API based on the rules/policies set by the API’s operator or owner.

We can think of rate limiting as a form of both security and quality control. This is why rate limiting is integral for any API product’s growth and scalability. Many API owners would welcome growth, but high spikes in the number of users can cause a massive slowdown in the API’s performance. Rate limiting can ensure the API is properly prepared to handle this sort of spike.

An API’s processing limits are typically measured in a metric called Transactions Per Second (TPS), and API rate limiting is essentially enforcing a limit to the number of TPS or the quantity of data users can consume. That is, we either limit the number of transactions or the amount of data in each transaction.

Why is API rate limiting necessary?

API rate limiting can be used as a defensive security measure for the API, and also a quality control method. As a shared service, the API must protect itself from excessive use to encourage an optimal experience for anyone using the API.

Rate limiting on both server-side and client-side is extremely important for maximizing reliability and minimizing latency, and the larger the systems/APIs, the more crucial rate limiting will be.

Here are some key benefits in implementing API rate limiting:

Protecting Resource Usage

All APIs operate on finite resources, and rate limiting is essential to improve the availability of API service for as many users as possible by avoiding excessive resource usages. While resource starvation can be caused by attackers via DDoS attacks, there are actually many DoS incidents that are caused by errors in software rather than outside attacks.

This is often called friendly-fire denial of service (DoS), and implementing rate limiting is crucial to avoid this issue.

Controlling Data Flow

This is especially important in APIs that process and transmit large volumes of data. Rate limiting can be implemented to control data flow, for example by merging many data streams into a single service.

For example, we can distribute data more evenly between two elements of the APIs by limiting the flow into each element. Thus, we can prevent a single API data processor from processing too many items while other processors are currently idle. This function is especially useful in complex APIs that involve different data streams.

Maximizing Cost-Efficiency

Rate limiting can be implemented to control cost, for example, to prevent using too many resources, which may accumulate large costs. Any resource consumed will always generate a cost, and the more requests an API gets, the more costs it will accumulate. Rate limiting can be extremely important to ensure the profitability of the API.

Controlling Quotas Between Users

When the capacity of an API’s service is shared among many users, rate limiting can (and should) be applied to individual users’ usage to ensure fair use without disrupting other users’ access. We can do this by applying the rate limit over a certain time period (i.e. per day) or by limiting the resource’s quantity when it’s possible. These allocation limits are often referred to as quotas.

How does API rate limiting work?

An API is a method to request a specific functionality of a program. While APIs are invisible to most users, they are essential for the application to perform optimally.

For example, when we order a ride on a rideshare service, an API is executed so that we, as a user, will get an accurate fare for the trip. We don’t interact directly with this API, but through the rideshare app’s interface we are making a request to the API, probably without our knowledge.

Every time an API responds to a request, the owner of the API has to pay for resources. In the example above, the rideshare app’s API integration will cause the fare calculation service to pay for compute time whenever an app user requests a ride.

Thus, any service that offers API for developers will implement a rate limit on how many API calls can be made. The limiting can be performed in various different ways, like limiting the number of API calls per hour, day, or unique user, or limiting the amount of data generated per call, among others.

API rate limiting can also help protect the API from malicious bot attacks and DDoS attacks. Bots can make repeated requests to an API to block its service from legitimate users, slow down its performance, or completely shut the API down for a time as a form of DDoS attack.

See DataDome in Action

Start measuring bot attacks today and find out which malicious bots are attacking your site.

Different Methods of Rate Limiting

As discussed above, we can actually use various methods in performing API rate limiting, but there are three most common methods:

1. Throttling

Throttling is performed by setting up a temporary state within the API, so the API can properly assess all requests. Based on certain rules, a specific type of request will be throttled during this temporary state; when throttled, a user may either be slowed considerably (by reducing the bandwidth service) or completely disconnected from the API.

We can implement throttling at the API level, user level, and application level, making it a versatile method for rate limiting.

2. Request Queues

Another popular method of rate limiting is “requests queues”, which limits the number of requests in any given period of time. For example, we can set the rate limit at three requests per second.

3. Algorithm-Based

In this approach, we are using algorithms to implement the API rate limit, and there are actually various ready-to-use algorithms we can use to implement rate limiting:

  • Fixed Window

In this method, we use a “fixed” number as a limit, and we use a simple incremental counter to count the number of requests. If this fixed window limit is reached in a set period of time (i.e. 3,000 per hour), then additional requests will be blocked temporarily.

  • Leaky Bucket

Here the requests are put in a FIFO (first in first out) queue, so the first user that enters the queue will get the first service from the API.

  • Sliding Log

In this method, a time-stamped log is used to identify different user logs. With each new request, the total number of the logs is calculated, and when logs exceed a certain rate limit, they will be discarded.

  • Sliding Window

Essentially combining fixed window and sliding log algorithms, with this approach both a counter and a log are used to determine a faster rate limiting process. The small number of data needed to assess each request allows a faster calculation process, making it ideal for processing a large number of requests.

API Rate Limiting With DataDome

With the DataDome bot protection solution, you can implement rate limiting to block selected traffic to your APIs (or any other endpoint of your choice), based on the number of requests generated during a specified time period (fixed window method).

Bad bots are blocked by default, but if the visitor is a good bot or an allow-listed bot, its requests will be allowed as long as they remain below the threshold. Once the number of requests meets the threshold, DataDome will either block the request or present a CAPTCHA challenge.

To apply rate limiting with DataDome, we can simply open the Response menu in the dashboard, then select Rate Limiting.

By selecting “new” in Response, you will open up a new dialog box that allows you to configure your Rate Limiting settings, such as:

  1. Defining the threshold for the number of hits/requests. All traffic will be allow-listed until the number of requests reaches this threshold.
  2. Define the time period when the threshold will be applied.
  3. Define whether to apply hard block or Captcha once the number of requests reaches the threshold.

You can also implement rate limiting with a Custom Rule:

Conclusion

Nowadays, users are sensitive about user experience and performance in apps and software, and a study by Dimensional Research has suggested that 80% of users will only tolerate three performance issues before uninstalling an app.

When API requests are unregulated, it can lead to slower performance both for the API itself and for the website or application. Poor website performance can also lead to other issues like weaker SEO performances and higher bounce rates.

In short, rate limiting is essential for security, efficiency, and ensuring the quality of your API, application, and website.