How to Implement the 4 Golden Signals Alerts in New Relic Using Terraform

How to Implement the 4 Golden Signals Alerts in New Relic Using Terraform

Monitoring is essential to maintaining the reliability and performance of modern applications. The Four Golden SignalsLatency, Traffic, Errors, and Saturation—are critical metrics introduced by Google’s Site Reliability Engineering (SRE) principles to monitor system health.

Using New Relic for monitoring and Terraform for Infrastructure as Code (IaC), we can automate the deployment of alerts based on these four signals, ensuring proactive issue detection and faster resolution.

This article will guide you through implementing New Relic alerts for the Four Golden Signals using Terraform.


Prerequisites

Before you begin, ensure you have:

  1. A New Relic account and API key
  2. Terraform installed (>=1.0.0)
  3. The New Relic Terraform provider configured

If you haven’t configured Terraform with New Relic before, create a file called provider.tf and add the following:

terraform {
  required_providers {
    newrelic = {
      source  = "newrelic/newrelic"
      version = "~> 2.0"
    }
  }
}

provider "newrelic" {
  account_id = var.newrelic_account_id
  api_key    = var.newrelic_api_key
  region     = "US" 
}
        

Define the required variables in variables.tf:

variable "newrelic_account_id" {}
variable "newrelic_api_key" {}
        

And in terraform.tfvars:

newrelic_account_id = "YOUR_NEW_RELIC_ACCOUNT_ID"
newrelic_api_key    = "YOUR_NEW_RELIC_API_KEY"
        

Step 1: Create an Alert Policy

New Relic requires an alert policy to group related alerts. Let’s create one for Golden Signals Alerts in alerts.tf:

resource "newrelic_alert_policy" "golden_signals" {
  name = "Golden Signals Alerts"
  incident_preference = "PER_POLICY"
}
        

This policy ensures that all incidents follow a per-policy preference, meaning all violations will be grouped into a single incident.


Step 2: Create Alert Conditions for the Four Golden Signals

1. Latency (Response Time)

Latency measures how long requests take to complete. We can monitor response times using an APM metric condition.

resource "newrelic_alert_condition" "latency" {
  policy_id  = newrelic_alert_policy.golden_signals.id
  name       = "High Response Time"
  type       = "apm_app_metric"
  entities   = ["YOUR_APPLICATION_ID"]
  metric     = "response_time_web"
  condition_scope = "application"
  
  term {
    duration      = 5
    operator      = "above"
    priority      = "critical"
    threshold     = 2000  # 2 seconds
    time_function = "all"
  }
}
        

This alert will trigger if the average response time exceeds 2 seconds for 5 minutes.


2. Traffic (Request Throughput)

Traffic measures the number of incoming requests per minute (RPM).

resource "newrelic_alert_condition" "traffic" {
  policy_id  = newrelic_alert_policy.golden_signals.id
  name       = "Low Traffic"
  type       = "apm_app_metric"
  entities   = ["YOUR_APPLICATION_ID"]
  metric     = "throughput_web"
  condition_scope = "application"
  
  term {
    duration      = 5
    operator      = "below"
    priority      = "critical"
    threshold     = 10  # Alert if traffic drops below 10 RPM
    time_function = "all"
  }
}
        

This ensures we are alerted if the application receives less than 10 requests per minute.


3. Errors (Error Rate)

Monitoring error rates helps detect increasing failures in your application.

resource "newrelic_alert_condition" "errors" {
  policy_id  = newrelic_alert_policy.golden_signals.id
  name       = "High Error Rate"
  type       = "apm_app_metric"
  entities   = ["YOUR_APPLICATION_ID"]
  metric     = "error_percentage"
  condition_scope = "application"
  
  term {
    duration      = 5
    operator      = "above"
    priority      = "critical"
    threshold     = 5  # Alert if errors exceed 5%
    time_function = "all"
  }
}
        

This alert triggers if the error rate goes beyond 5% for 5 minutes.


4. Saturation (CPU Utilization)

Saturation refers to resource exhaustion, often represented by CPU or memory usage.

resource "newrelic_alert_condition" "saturation" {
  policy_id  = newrelic_alert_policy.golden_signals.id
  name       = "High CPU Utilization"
  type       = "infra_metric"
  entities   = ["YOUR_INFRASTRUCTURE_ENTITY_ID"]
  metric     = "cpuPercent"
  
  term {
    duration      = 5
    operator      = "above"
    priority      = "critical"
    threshold     = 90  # Alert if CPU usage exceeds 90%
    time_function = "all"
  }
}
        

This alert will trigger if CPU usage exceeds 90% for 5 minutes.


Step 3: Configure Alert Notifications

To receive notifications, we must configure a notification channel, such as Slack, email, or PagerDuty. Here’s how to set up a Slack notification channel:

resource "newrelic_notification_channel" "slack" {
  name = "Slack Alerts"
  type = "slack"
  
  config {
    url = "YOUR_SLACK_WEBHOOK_URL"
  }
}
        

Now, link it to our alert policy:

resource "newrelic_alert_policy_channel" "golden_signals_slack" {
  policy_id  = newrelic_alert_policy.golden_signals.id
  channel_ids = [newrelic_notification_channel.slack.id]
}
        

Step 4: Deploy the Configuration

Once the configuration is complete, apply it using Terraform:

terraform init
terraform plan
terraform apply
        

Terraform will create the alerts in New Relic, ensuring automatic monitoring of the 4 Golden Signals.


Conclusion

By implementing New Relic alerts with Terraform, you can proactively monitor application health based on the Four Golden Signals:

Latency - Detects slow response times

Traffic - Monitors request throughput

Errors - Alerts on increased error rates

Saturation - Tracks high CPU usage

Using Infrastructure as Code (IaC) ensures that your alerting setup is consistent, repeatable, and version-controlled.

Start monitoring your application effectively with New Relic and Terraform today! 🚀

Spread the Knowledge! 🚀

If you found this guide helpful, repost it to help others learn how to automate New Relic alerts with Terraform! 🔁

Let’s empower more developers and SREs to build reliable, well-monitored systems—one alert at a time! 💡✨ #DevOps #Terraform #NewRelic

Wagner Santos

Senior Frontend Engineer | React | Web developer | TypeScript | JavaScript | AWS

1mo

Valuable information! Learning from experienced professionals is always great.

Fabrício Ferreira

Senior Flutter Engineer | Mobile Developer | Mobile Engineer | Dart | Android | iOS | Kotlin

2mo

Love this perspective, Elison G. 😍

Like
Reply
Mauro Marins

Senior .NET Software Engineer | Senior Full Stack Developer | C# | .Net Framework | Blazor | Azure | AWS | React | Entity Framework | Microservices

2mo

💡 Great insight!

João Paulo Ferreira Santos

Data Engineer | AWS | Azure | Databricks | Data Lake | Spark | SQL | Python | Qlik Sense | Power BI

2mo

Very informative!

Julio César

Senior Software Engineer | Java | Spring Boot | React | Angular | AWS | APIs

2mo

Great advice

To view or add a comment, sign in

More articles by Elison G.

Insights from the community

Others also viewed

Explore topics