How to handle On-Call at 3 AM? Monitoring And Alerting System Tear Down

Anu Ganesan
5 min readAug 4, 2021

On-Call has evolved over time starting from Pager to Pagerduty. The recent monitoring and alerting systems are all empowered by automation with predictive analytics isolating and resolving issues even before if it happens.

The success of a product depends on understanding the target audience and continuously improving customer activation and retention with best-in-class competitive analysis. But it also depends on how well the product is monitored with near-zero downtime.

Tear-Down Product to improve Product Sustainability & Profitability

The process of Tear-down involves experimenting the product along with customer testimonials and competitive analysis to find the next big thing. While the experimentation provides the “what” in the problem statement, all other customer testimonials done via customer survey, focus group, polling should focus on the “why” of the problem statement.

Don’t forget the human element, being inclusive and empathetic, while composing the user story to build the next big thing either it be a feature or enhancement.

Evolution of Monitoring System along with Product Lifecycle

Monitoring should not be an afterthought for any product. It should evolve along with product development.

  • Investment in a monitoring system should start at an early stage when products are being developed.
  • After the initial customer acquisition, in order to sustain the growth phase of the product, the monitoring system should progress along with product growth.
  • As the product matures with a growing customer base, the monitoring system should proactively detect and alert on product metrics indicating any product decline.

Investing in monitoring system is not feasible for an introductory product who is testing its grounds. An established enterprise would have a well-defined monitoring system wherein any new products are automatically monitored by registering into the monitoring system. For startups and SMBs, investing in monitoring system depends on the operational cost and the product growth. As the product grows, monitoring system becomes mandatory for the product sucess.

Investing in monitoring system should be a phased approach

Phase 1: Capture Product Metrics

During each phase of the product lifecycle, the Monitoring System should keep track of the following product metrics.

Metrics at Product Launch Phase

Number of Leads like number of signups, demos requested that measure product awareness

Promotional Metrics measuring social media activity like email open rate, click-through rates, etc.,

Website Traffic including the number of page views to product, landing, and company page

Metrics at Product Growth Phase

Product Trials assist in evaluating the real interest for the product, an early indicator for long term adoption

Customer Retention ascertain the growth metrics steering the product features to evolve with customer interest and competitive analysis

Metrics at Product Maturity Phase

Revenue provides the actual sales required for product sustainability

Market Share measuring how well the product is doing against the competitors

Phase 2: Link Technical Failures with Product Metrics

Make sure all the product releases and bugs captured are timestamped and associated to a metrics decline. Fail fast and fail safe with the collective information providing detailed event log like feature release date, bug captured, failure event etc.,

Phase 3: Centralized Alert Portal

All the failure metrics along with feature release data and bugs captured become irrelevant if there is no place to visualize the failure. Invest in an Alert Management System managing the alert lifecycle from creation till resolution

Monitoring a Product for its success is a repetitive process which kickstarts by monitoring for any failures. If there is any failed event, immediate acknowledgement and resolution should follow in order to stay competitive.

Phase 4: Effective Notification System

Product failures often miss detection without a notification system. An effective notification system alerts the appropriate team to fix the failures. At the same time be mindful of the alert severity and the time zone of the support team while sending notification.

An effective notification system should consider the user preference of how they would like to be notified and when to escalate.

Phase 5: Resolve Automatically

The time to resolve issues can be further improved along the product lifecycle journey by automating the resolution. Automation not only improves the mean time to acknowledge (MTTA) and resolve(MTTR) but also improves product sustainability and profitability.

Phase 6: Be Proactive with Predictive Analytics

Artificial Intelligence is not just a power house for product innovation but also for monitoring the product growth and success. Predictive Analytics enrich product growth by proactively detecting any unforeseen seen issues beforehand.

How to measure the effectiveness of monitoring system?

Monitoring System along with its process should undergo continuous improvement with iterative tear-down process.

While a product teardown involves experimenting and evaluating the product to identify the next big thing to implement in the upcoming user story, a monitoring system teardown will be evaluating the speed at which an issue is identified and resolved. Observability metrics define the effectiveness of the monitoring system measuring the mean time to fail, acknowledge and resolve issues.

Nobody wants to attend on-call at 3 AM.

An efficient monitoring system should be able to scale to growing product needs with the ability to fail-fast and fail-safe. It affects not just the product quality but also the morale of the team.

Boost both your product and team productivity with well-structured monitoring system meeting the demands of the different phases of product lifecycle.

Follow this space for more information on product management!!!

--

--