How to effectively monitor APIs - before users complain

1 Nov 2022

Now that you have an API ready to release to the world, how should you monitor it?

Monitoring is important to safeguard our reputation and commitment to the users as a failing API damages both. Monitoring alerts us of a failed API before the user realizes, so it can be fixed ASAP.

As a developer, I love to write the code, manifest my vision and then, throw it over the wall, to an Ops team for deploying and running. However, things have changed recently for the better. With the advent of Agile teams responsible for the overall software life-cycle and DevOps teams integrating development with Ops, now, the responsibility is widely shared with developers and Ops engineers.

Monitoring as testing

Monitoring can also be thought of as a runtime, continuous testing. We will be checking not just API correctness but also operational concerns.

Internal vs External Monitoring

Monitoring can be done both internally and externally.

Internal monitoring is a white-box technique needing access and insight into the API implementation and deployment. With right logging triggers, faults are identified and alerts are produced. A logging and alerting system is essential and on public clouds, one could use systems like AWS CloudWatch or GCP Operations suite.

External monitoring is a black-box technique needing no internal access at all. It is done using simulated users making API requests from different locations around the globe. This accurately describes what the user sees. However, being external, an alert raised may not have internal contextual information to resolve the issue. Thus, ideally, both approaches should be used in conjunction to resolve an incident.

Monitoring Challenges

Wearing the DevOps hat, we have multiple challenges to deal with making sure APIs are served well. A good monitoring solution would have these features.

Uptime

Uptime is the percentage of time API is available to the users. Ofcourse, 100% uptime is the goal and depending on SLA, it is a requirement and commitment.

Latency

Users have a very short attention span. Since most UIs are built on top of APIs, it is essential to have extremely fast API responses. A monitoring system should provide data both graphically and statistically showing the response times over time and providing insights.

Reachability

In a global economy, customers are all over the world. Thus, it is good to know the uptime and latency from the perspective of customer locations. Having a monitoring service from a number of different locations gives you insights from a customer perspective.

Correctness

Let’s get to it. An incorrect API response is worse than no response. While we tend to test during development, there could be unexpected errors during deployment. While internal monitoring exposes some of these concerns fairly effectively, having a few assertions on the external monitoring system gives much more confidence in real time.

A few example tests.

Check for proper HTTP response codes
Compare expected response by string or regex matching
Validate JSON responses with JSONPath expression matching
Assert the call time is below a maximum value

API Simulation Capability

An API is not an isolated entity and many times requires context. A few challenges are custom authentication schemes, complex input generation. At times, it may require to make network calls to get certain input values. A good monitoring solution not only provides easy ways to construct the request, an escape hatch should be provided to run a custom code snippet in a language like JavaScript.

Run Check now

During development and for spot checking, we like to send an API request from different locations in real time and hence an option to do so is immensely useful.

Alerting

A good monitoring solution checks all the above items and alerts the developers immediately with context. Since developers can use one of many popular messaging channels such as Email, Slack, MSTeams, Discord and so on, it would be essential to configure and support these options.

Attention is the most important commodity. Hence, a good monitoring system avoids false positive alerts. A good monitoring solution shall provide ways to threshold the failures to generate an alert. Some well-known options are to report only after a specified number of failures (ex. 3) or a specified amount of time (ex. 5 minutes)

Bonus Feature - Public status pages

As the title says, showing the API status directly to customers shows our confidence and openness.

Conclusion

I hope to have brought to your attention the salient aspects of an API or website monitoring system. ProAutoma Monitoring SaaS is a hand-built monitoring SaaS to address all the above concerns and more.

—

Get your ProAutoma Free Account to monitor your Website or API. 50K monthly checks are free, that is 5 sites checked every 5 minutes.

Try our Free API Tester tool to explore and test your APIs. All you need is a URL to get started.