Advanced Datadog Health Checks to Maximize Results

The Health Check assessment reports against your current Datadog APM, Infrastructure, Logging and Security configuration and provides report findings aligned to Datadog & industry best practices with an overall grading including recommendations for improvements.

The assessment considers multiple aspects such as agent release, APM dependencies checks, quality of monitors, dashboards and the most effective use and storage of logs.

Outcomes

Comprehensive report including a rating of the overall alignment against Datadog best practices
Unveil potential areas for improvements to your configuration
Identify the existing knowledge of specific areas within your Datadog platform
Identify the efficiencies and effectiveness of the log management
Review your data tagging implementation and provide recommendations for improve consistencies, efficiencies, and alignment against Datadog best practices. This is critical to ensure maximum benefits from the platform and Datadog support service
Identification and recommendation of additional enablement to level up the knowledge of your existing Datadog team

Monitors

Review monitor lists
Alerts assessment
Alert automation potential
Escalations assessment – e.g. critical alerts to on-call, non-critical alerts to email / ticketing system
Assess for sub optimally configured monitors. Aimed at avoiding alert fatigue.
Quality notification content.

Monitor tagging & grouping convection and consistency
Identification of persistent monitor triggers
Point out any monitors in “NO DATA” state
Point out any monitors muted indefinitely, and check whether mutes have comments
Check suggested monitors from APM / services
Check recommended monitors (Monitors → Create new, recommended tag)

Monitor your infrastructure without having to learn a query language

Logs

Comprehensive Log monitor review
Filters and exclusions for indexing aimed at minimising costs while maintaining effective visibility
Retention periods assessed
Daily quotas
Use of log metrics within dashboards for improved visibility

APM

Assess Application, database, and API integrations for quality of data and integration and visibility
Service & application map dependencies review
End to end trace linkage of monitors and services
Code tracing library inspection
Transaction error rate, latency, alert, and telemetry context review
Deployment transport and automation review

Dashboards

Assess the views and visualizations alignment to your team or department
Report on dashboard structures, grouping, and tagging
Review dashboard alert settings against appropriate visibility to avoid alert fatigue and improve timely issue responses