Skip to main content

Cloud incidents

Cloud incidents refer to Google Cloud and Amazon Web Services events, including outages and other known issues, that may affect the performance or availability of your services.

The DoiT Console provides advanced monitoring capabilities for the availability/uptime of your infrastructure on both Google Cloud and Amazon Web Services.

Access cloud incidents

To access the cloud incidents information,log in to the DoiT Console, select Governance from the top navigation bar, and then select Cloud incidents.

The Cloud incidents_ page

The Cloud incidents page consists of two parts:

  • Cloud infrastructure availability chart: Visualizes the impact of the cloud infrastructure on your overall service availability. You can change the time range and time interval of the chart, or use the filter to monitor specific cloud services.

  • Cloud incident details table: Lists cloud incidents according to your regions and services, including those without region information (for example, global services incidents). You can filter cloud incidents by their properties, including Status, Platform, Product, and Title. By default, only active incidents are listed. (Active cloud incidents are also shown on the top of the list of support tickets.)

View an incident

The cloud incidents details are from the cloud service providers. For AWS, we fetch information using the AWS Health API; for Google Cloud, we leverage Google's internal database to which we've been granted access.

To see the details of a specific cloud incident, select the View button at the rightmost end of the incident entry.

Get notifications

To get notified of cloud incidents, you can subscribe to:

Note

Once subscribed, you'll be notified of incidents published afterward (you can check the existing/ongoing incidents in the DoiT Console).

You'll find the following information in the notification emails and alerts:

  • Issue summary

  • Affected products, Status

  • Affected regions: AWS Regions and Availability Zones, Google Cloud Locations

  • Severity level: The severity level of Google Cloud issues is indicated by the Exposure Level:

    • alert/0.0: Informational.
    • alert/1.0: Minor impact.
    • alert/2.0: Published on the Google Cloud Status dashboard as Service Disruption.
    • alert/3.0: Published on the Google Cloud Status dashboard as Service Outage.

Once an incident is published, you receive notifications (via email and/or Slack thread) when there is a status update.