Preparation & Planning
Build the foundation for effective incident response
- Document incident response plan with clear roles and escalation paths
- Define incident severity levels (P1-P4) with response time SLAs
- Establish communication channels (war room, Slack, PagerDuty)
- Maintain up-to-date asset inventory and ownership records
- Conduct regular tabletop exercises and simulations
- Pre-authorize response actions for common incident types
- Build relationships with legal, PR, and executive stakeholders
Detection & Analysis
Identify and assess security incidents quickly
- Centralize logs from all systems (SIEM integration)
- Implement real-time alerting with low false-positive rates
- Use threat intelligence feeds for indicator matching
- Establish baseline behavior for anomaly detection
- Deploy eBPF-based runtime monitoring for immediate visibility
- Create detection rules for MITRE ATT&CK techniques
- Document initial findings in standardized incident tickets
Containment Strategies
Limit the impact and spread of security incidents
- Implement network isolation for compromised systems
- Revoke compromised credentials and access tokens immediately
- Block malicious IPs and domains at firewall/WAF level
- Isolate affected containers/pods without destroying evidence
- Disable compromised service accounts and API keys
- Implement emergency change procedures for critical systems
- Document all containment actions with timestamps
Eradication & Remediation
Remove threats and fix root causes
- Identify and remove all malware and persistence mechanisms
- Patch vulnerabilities exploited in the attack
- Reset all potentially compromised credentials
- Review and harden misconfigured systems
- Rebuild compromised systems from known-good images
- Update detection rules based on observed TTPs
- Address root cause to prevent recurrence
Recovery & Restoration
Safely restore systems to normal operations
- Validate system integrity before restoration
- Restore from clean backups with integrity verification
- Implement enhanced monitoring during recovery period
- Gradually re-enable services with validation checkpoints
- Monitor for signs of persistent compromise or reinfection
- Document recovery procedures and timelines
- Update business continuity plans based on lessons learned
Post-Incident Activities
Learn from incidents and improve processes
- Conduct blameless post-mortem within 48-72 hours
- Document timeline, impact, and response effectiveness
- Identify gaps in detection, response, or communication
- Create action items with owners and deadlines
- Update runbooks and playbooks based on lessons learned
- Report metrics (MTTD, MTTR, impact) to leadership
- Share sanitized learnings with security community