PagerDuty Automation for Incident Remediation Documentation
Automation is a key component in the management of complex modern IT systems. Automation helps teams avoid mistakes, increase reliability, and reduce toil in their day to day tasks. While building a production environment might rely on a number of automation tools, the lifecycle of that environment will include unplanned incidents and other work that often is performed manually.
Human mistakes during an incident can increase the time to resolution or even make the problem worse. When our systems experience incidents during non-working hours, our team might be away from their computers, or unavailable, or even asleep. We want minimize the number of incidents that require human intervention, and save alerting the responder team to only those alerts that do need humans.
Who Is This For?#
This resource is for teams that develop or operate software applications who want to make effective use of automation tools during their incident response process.
What is Covered?#
Automation Use Cases in IT#
Many teams are already using a lot of automation to help get their tasks accomplished in a reliable, repeatable way. This section touches on some examples in:
Coming Soon: Automation for Incident Response Workflows#
Automation for Incident Remediation#
Actually fixing issues after they have alerted is the next step in your team's journey to uninterrupted sleep. This section covers some of the things you should be thinking about when you are planning to automate some of your common production issues.
Getting Started with Automated Incident Remediation#
Some things to keep in mind when you are working on automation for Incident Remediation:
Challenges to Automation#
Not everyone will be enthusiastic at the prospect of automating parts of their job, even if they don't particularly like some of the tasks. There are challenges to introducing automation goals to established teams. Some of these challenges are well understood and others are more abstract. We can refer to decades of research on systems automation for some tips and guidance.
References and Further Reading#
Some references we used to create this document. If you have suggestions for additions to this list, let us know!
This documentation is provided under the Apache License 2.0. In plain English, that means you can use and modify this documentation and use it both commercially and for private use. However, you must include any original copyright notices and the original LICENSE file.
Whether you are a PagerDuty customer or not, we want you to have the ability to use this documentation internally at your own company. You can view the source code for all of this documentation on our GitHub account; feel free to fork the repository and use it as a base for your own internal documentation.