Logging, Monitoring and Observability in Google Cloud
Logging, Monitoring and Observability in Google Cloud Course Details:
This three-day instructor-led course teaches participants techniques for monitoring, troubleshooting, and improving infrastructure and application performance in Google Cloud. Guided by the principles of Site Reliability Engineering (SRE), and using a combination of presentations, demos, hands-on labs, and real-world case studies, attendees gain experience with full-stack monitoring, real-time log management and analysis, debugging code in production, tracing application performance bottlenecks, and profiling CPU and memory usage.
Call (919) 283-1674 to get a class scheduled online or in your area!
Module 1: Introduction to Google Cloud Monitoring Tools
- Understand the purpose and capabilities of Google Cloud
- components: Logging, Monitoring, Error
- Reporting, and Service Monitoring
- Understand the purpose and capabilities of Google Cloud application performance management focused components: Debugger, Trace, and Profiler
Module 2: Avoiding Customer Pain
- Construct a monitoring base on the four golden signals: latency, traffic, errors, and saturation
- Measure customer pain with SLIs
- Define critical performance measures
- Create and use SLOs and SLAs
- Achieve developer and operation harmony with error budgets
Module 3: Alerting Policies
- Develop alerting strategies
- Define alerting policies
- Add notification channels
- Identify types of alerts and common uses for each
- Construct and alert on resource groups
- Manage alerting policies programmatically
Module 4: Monitoring Critical Systems
- Choose best practice monitoring project architectures
- Differentiate Cloud IAM roles for monitoring
- Use the default dashboards appropriately
- Build custom dashboards to show resource consumption and application load
- Define uptime checks to track aliveness and latency
Module 5: Configuring Google Cloud Services for Observability
- Integrate logging and monitoring agents into Compute Engine VMs and images
- Enable and utilize Kubernetes Monitoring
- Extend and clarify Kubernetes monitoring with Prometheus
- Expose custom metrics through code, and with the help of OpenCensus
Module 6: Advanced Logging and Anaylsis
- Identify and choose among resource tagging approaches
- Define log sinks (inclusion filters) and exclusion filters
- Create metrics based on logs
- Define custom metrics
- Link application errors to Logging using Error Reporting
- Export logs to BigQuery
Module 7: Monitoring Network Security and Audit Logs
- Collect and analyze VPC Flow logs and Firewall Rules logs
- Enable and monitor Packet Mirroring
- Explain the capabilities of Network Intelligence Center
- Use Admin Activity audit logs to track changes to the configuration or metadata of resources
- Use Data Access audit logs to track accesses or changes to
user-provided resource data - Use System Event audit logs to track GCP administrative actions
Module 8: Managing Incidents
- Define incident management roles and communication channels
- Mitigate incident impact
- Troubleshoot root causes
- Resolve incidents
- Document incidents in a post-mortem process
Module 9: Investigating Application Performance Issues
- Debug production code to correct code defects
- Trace latency through layers of service interaction to eliminate performance bottlenecks
- Profile and identify resource-intensive functions in an application
Module 10:
- Optimizing the Costs of Monitoring
- Analyze resource utilization cost for monitoring related components within Google Cloud
- Implement best practices for controlling the cost of monitoring within Google Cloud
*Please Note: Course Outline is subject to change without notice. Exact course outline will be provided at time of registration.
This course teaches participants the following skills:
- Plan and implement a well-architected logging and monitoring infrastructure
- Define Service Level Indicators (SLIs) and Service Level Objectives (SLOs)
- Create effective monitoring dashboards and alerts
- Monitor, troubleshoot, and improve Google Cloud infrastructure
- Analyze and export Google Cloud audit logs
- Find production code defects, identify bottlenecks, and improve performance
- Optimize monitoring costs
To get the most out of this course, participants should have:
- Google Cloud Platform Fundamentals: Core Infrastructure or equivalent experience
- Basic scripting or coding familiarity
- Proficiency with command-line to
This class is intended for the following participants:
- Cloud architects, administrators, and SysOps personnel
- Cloud developers and DevOps personnel