Cloud incidents refer to Google Cloud and Amazon Web Services events, including outages and other known issues, that may affect the performance or availability of your services.
The DoiT Console provides advanced monitoring capabilities for the availability/uptime of your infrastructure on both Google Cloud and Amazon Web Services.
Access cloud incidents
To access the cloud incidents information,log in to the DoiT Console, select Governance from the top navigation bar, and then select Cloud incidents.
The Cloud incidents page consists of two parts:
Cloud infrastructure availability chart: Visualizes the impact of the cloud infrastructure on your overall service availability. You can change the time range and time interval of the chart, or use the filter to monitor specific cloud services.
Cloud incident details table: Lists cloud incidents according to your regions and services, including those without region information (for example, global services incidents). You can filter cloud incidents by their properties, including Status, Platform, Product, and Title. By default, only active incidents are listed. (Active cloud incidents are also shown on the top of the list of support tickets.)
View an incident
The cloud incidents details are from the cloud service providers. For AWS, we fetch information using the AWS Health API; for Google Cloud, we leverage Google's internal database to which we've been granted access.
To see the details of a specific cloud incident, select the View button at the rightmost end of the incident entry.
To get notified of cloud incidents, you can subscribe to:
Notification emails: Configure the notification preferences in your profile settings.
Alerts in Slack channel: Configure notifications for your shared Slack channel with DoiT.
Once subscribed, you'll be notified of incidents published afterward (you can check the existing/ongoing incidents in the DoiT Console).
You'll find the following information in the notification emails and alerts:
Affected products, Status
Affected regions: AWS Regions and Availability Zones, Google Cloud Locations
Severity level: The severity level of Google Cloud issues is indicated by the Exposure Level:
- alert/0.0: Informational.
- alert/1.0: Minor impact.
- alert/2.0: Published on the Google Cloud Status dashboard as Service Disruption.
- alert/3.0: Published on the Google Cloud Status dashboard as Service Outage.
Once an incident is published, you receive notifications (via email and/or Slack thread) when there is a status update.