What Is a “Production Issue”
In the product lifecycle, a production issue happens after release.
A production issue means the service provided to users is fully or partially unavailable, performs poorly, or the user experience is bad. In early product stages, speed of new features often takes priority over quality to win market share, which accumulates technical debt. When some of that debt explodes, it causes production issues, lowers customer satisfaction, and can even lead to direct financial loss.
Production Issue Flow Overview
Discover -> Handle -> Summarize -> Feedback

Severity
| Severity | Description | Handling |
|---|---|---|
| Critical | Highest level, system or service fully down or unusable | Take emergency measures immediately |
| Major | Second level, system or service partially down or restricted | Take measures as soon as possible |
| Normal | General level, no obvious impact to system or service | Handle within a reasonable time to prevent escalation |
| Minor | Lowest level, small issues or anomalies with no impact | Address gradually during routine maintenance |
Handling Process
User Feedback & Monitoring Alerts

On-call Strategy
Weekly rotation: Every Monday at 10:00 AM, the on-call person for the week is announced in the DingTalk group, as shown below.

Error Analysis Report
Every day we analyze logs and push a log error classification report to the DingTalk alert group.
Before leaving each day, the on-call RD analyzes the report and, if it is a production issue, reports it to QA for tracking.
Postmortem Report
Incident Postmortem (Critical)

Biweekly Review Report (Overall)
