Skip to content

Incidents

Incidents Overview

Any event capable of causing a disruption in your organization's workflow qualifies as an incident. It's crucial to establish a methodical procedure for managing such events. Our incident management feature serves as a strategic solution for your organization to efficiently identify and resolve incidents.

Incidents are integrated within the Flanksource system, side by side with the configs & health checks. This allows for a comprehensive view and effective filtering of incidents relevant to your situation.

Involved People

Person Description
Incident Commander An individual responsible for overseeing the entire response to an incident.
Communicator An individual responsible for managing all communications related to an incident.
Creator The person who created the incident. Can also be the system if the incident was created automatically.

Severity Levels

Incidents can be classified into different severity levels based on their potential impact on your operations. The severity levels are as follows:

Severity Description
Critical This is the highest level of severity, indicating an incident that has caused or threatens to cause major disruptions.
High This level indicates a significant incident that has a substantial impact but doesn't qualify as critical.
Medium This level is for moderate incidents that cause some disruption but can be managed without significant diversion of resources.
Low This is for minor incidents that have minimal impact on normal operations.
Info This level is used for incidents that don't impact operations but still need to be recorded for informational purposes.

Types

Incidents can also be classified based on their nature. The incident types are as follows:

Type Description
Availability Incidents that affect the availability of services or resources.
Cost Incidents that cause unexpected increases in costs or resource usage.
Performance Incidents that impact the performance of services or resources.
Security Incidents involving security breaches or vulnerabilities.
Technical Debt Incidents caused by accumulated technical issues that haven't been addressed.
Compliance Incidents related to non-compliance with regulatory requirements or standards.
Integration Incidents that involve issues with integrated systems or services.
Reliability Incidents under this category involve issues that affect the consistent and dependable performance of services or resources over time

Status

Status describes the current state of the incident. The incident status labels are as follows:

Status Description
Open The incident has been reported and is awaiting investigation.
Closed The incident has been fully resolved and no further action is required.
Mitigated Temporary measures have been taken to manage the incident, but further investigation or action is needed.
Resolved The root cause of the incident has been addressed, resolving the issue.
Cancelled The incident was closed without a resolution, typically because it was a false alarm or duplicate.