Skip to content
Go back

How to Handle Production Issues

What Is a “Production Issue”

In the product lifecycle, a production issue happens after release.

A production issue means the service provided to users is fully or partially unavailable, performs poorly, or the user experience is bad. In early product stages, speed of new features often takes priority over quality to win market share, which accumulates technical debt. When some of that debt explodes, it causes production issues, lowers customer satisfaction, and can even lead to direct financial loss.

Production Issue Flow Overview

Discover -> Handle -> Summarize -> Feedback

Severity

SeverityDescriptionHandling
CriticalHighest level, system or service fully down or unusableTake emergency measures immediately
MajorSecond level, system or service partially down or restrictedTake measures as soon as possible
NormalGeneral level, no obvious impact to system or serviceHandle within a reasonable time to prevent escalation
MinorLowest level, small issues or anomalies with no impactAddress gradually during routine maintenance

Handling Process

User Feedback & Monitoring Alerts

On-call Strategy

Weekly rotation: Every Monday at 10:00 AM, the on-call person for the week is announced in the DingTalk group, as shown below.

alert

Error Analysis Report

Every day we analyze logs and push a log error classification report to the DingTalk alert group.

Before leaving each day, the on-call RD analyzes the report and, if it is a production issue, reports it to QA for tracking.

Postmortem Report

Incident Postmortem (Critical)

review-report

Biweekly Review Report (Overall)

review-report


Share this post on:

Previous Post
AI Coding Assistants
Next Post
How to Understand User Stories