Monitoring & Logging

Prometheus & Grafana
ELK Stack (Elasticsearch, Logstash, Kibana)
Datadog
New Relic

Prometheus & Grafana

Prometheus CLI Commands

Check Prometheus version

prometheus --version

Start Prometheus

prometheus --config.file=prometheus.yml

Reload Prometheus configuration

curl -X POST http://localhost:9090/-/reload

List active targets

curl http://localhost:9090/api/v1/targets

Query Prometheus API

curl http://localhost:9090/api/v1/query?query=up

View Prometheus metrics

curl http://localhost:9090/metrics

List running alerts

curl http://localhost:9090/api/v1/alerts

Prometheus Configuration (prometheus.yml)

scrape_configs:
  - job_name: 'node'
    static_configs:
    - targets: ['localhost:9100']
  - job_name: 'kubernetes'
    static_configs:
    - targets: ['kube-state-metrics:8080']

Prometheus Alert Manager Configuration (alertmanager.yml)

route:
  receiver: 'slack'

receivers:
  - name: 'slack'
    slack_configs:
    - channel: '#alerts'
      send_resolved: true
      api_url: 'https://hooks.slack.com/services/your_webhook_url'

Useful PromQL Queries

Check target availability

up

CPU usage

100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) \* 100)

Memory usage

(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) \* 100

HTTP request count

sum(rate(http_requests_total[5m]))

Disk space usage

node_filesystem_free_bytes / node_filesystem_size_bytes \* 100

Active Kubernetes pods

count(kube_pod_status_phase{phase="Running"})

Grafana CLI Commands

Check Grafana version

grafana-server -v

Start Grafana

systemctl start grafana-server

Stop Grafana

systemctl stop grafana-server

Restart Grafana

systemctl restart grafana-server

Enable Grafana on boot

systemctl enable grafana-server

Grafana API Commands

List dashboards

curl -X GET http://localhost:3000/api/search\?query\=\&type\=dash-db -H "Authorization: Bearer <API_TOKEN>"

Create dashboard

curl -X POST http://localhost:3000/api/dashboards/db -H "Content-Type: application/json" -H "Authorization: Bearer <API_TOKEN>" --data '@dashboard.json'

Add Prometheus data source

curl -X POST http://localhost:3000/api/datasources -H "Content-Type:application/json" -H "Authorization: Bearer <API_TOKEN>" --data '{ "name":"Prometheus", "type": "prometheus", "url": "http://localhost:9090", "access":"proxy" }'

List all users

curl -X GET http://localhost:3000/api/users -H "Authorization: Bearer <API_TOKEN>"

Integrating Prometheus with Grafana

Add Prometheus Data Source in Grafana
- Grafana → Configuration → Data Sources → Add Prometheus
- Set URL to http://localhost:9090
- Click Save & Test
Import a Prebuilt Dashboard
- Grafana → Dashboards → Import
- Enter Dashboard ID from Grafana Repository
- Select Prometheus as data source → Import
Create a New Dashboard
- Grafana → Dashboards → New Dashboard
- Add PromQL queries for visualization
- Choose Panel Type (Graph, Gauge, Table, etc.)
- Click Save Dashboard

Grafana Alerting Setup

Create Alert in Grafana

Open Dashboard → Edit Panel
Click Alert → Create Alert
Set Condition (e.g., CPU usage > 80%)
Define Evaluation Interval (e.g., Every 1 min)
Configure Notification Channels (Slack, Email, PagerDuty, etc.)
Click Save Alert

Configure Slack Alerts in Grafana

Grafana → Alerting → Notification Channels
Click Add New Channel
Select Slack, enter Webhook URL
Click Save & Test

ELK Stack (Elasticsearch, Logstash, Kibana)

1. Elasticsearch Commands

Elasticsearch CLI Commands

Check Elasticsearch version

elasticsearch --version

Start Elasticsearch

systemctl start elasticsearch

Stop Elasticsearch

systemctl stop elasticsearch

Restart Elasticsearch

systemctl restart elasticsearch

Enable Elasticsearch on boot

systemctl enable elasticsearch

Check Elasticsearch status

curl -X GET "http://localhost:9200"

Cluster health

curl -X GET "http://localhost:9200/_cluster/health?pretty"

List cluster nodes

curl -X GET "http://localhost:9200/_cat/nodes?v"

List all indices

curl -X GET "http://localhost:9200/_cat/indices?v"

Delete an index

curl -X DELETE "http://localhost:9200/index_name"

Index Management

Create an index

curl -X PUT "http://localhost:9200/index\_name"

Search index

curl -X GET "http://localhost:9200/index\_name/\_search?pretty"

Insert a document

curl -X PUT "http://localhost:9200/index\_name/\_doc/1" -H "Content-Type: application/json" -d '{"name": "DevOps"}'

Retrieve a document

curl -X GET "http://localhost:9200/index\_name/\_doc/1?pretty"

Delete a document

curl -X DELETE "http://localhost:9200/index\_name/\_doc/1"

Elasticsearch Query Examples

Search for 'DevOps'

curl -X GET "http://localhost:9200/index\_name/\_search?q=name:DevOps&pretty"

Query using JSON

curl -X GET "http://localhost:9200/index\_name/\_search" -H "Content-Type: application/json" -d '{ "query": { "match": { "name": "DevOps" } } }'

Get all documents

curl -X GET "http://localhost:9200/index\_name/\_search?pretty" -H "Content-Type: application/json" -d '{ "size": 10, "query": { "match_all": {} } }'

2. Logstash Commands

Logstash CLI Commands

Check Logstash version

logstash --version

Start Logstash with a config file

logstash -f /etc/logstash/logstash.conf

Start Logstash

systemctl start logstash

Stop Logstash

systemctl stop logstash

Restart Logstash

systemctl restart logstash

Enable Logstash on boot

systemctl enable logstash

Sample Logstash Configuration (logstash.conf)

input {
  file {
    path => "/var/log/syslog" 
    start_position => "beginning"
  }
}

filter {
  grok {
    match => {
      "message" => "%{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME:host} %{DATA:process}: %{GREEDYDATA:log_message}"
      }
  }
}

output { 
  elasticsearch {
    hosts => ["http://localhost:9200"] 
    index => "logstash-logs"
  }
  stdout { codec => rubydebug }
}

3. Kibana Commands

Kibana CLI Commands

Check Kibana version

kibana --version

Start Kibana

systemctl start kibana

Stop Kibana

systemctl stop kibana

Restart Kibana

systemctl restart kibana

Enable Kibana on boot

systemctl enable kibana

Kibana API Commands

Check Kibana status

curl -X GET "http://localhost:5601/api/status" -H "kbn-xsrf: true"

List all Kibana spaces

curl -X GET "http://localhost:5601/api/spaces/space" -H "kbn-xsrf: true"

Kibana Dashboard Import Example

curl -X POST "http://localhost:5601/api/saved\_objects/\_import" -H "kbn-xsrf: true" --form file=@dashboard.ndjson

4. Integrating ELK Stack

1. Configure Elasticsearch in Kibana

Go to Kibana → Management → Stack Management → Data Views
Click Create Data View
Set Index Pattern as logstash-*
Click Save

2. Configuring Logstash to Send Logs to Elasticsearch

Open /etc/logstash/logstash.conf
Ensure the output points to Elasticsearch:

output { 
  elasticsearch {
    hosts => ["http://localhost:9200"] 
    index => "logstash-logs"
  }
}

Restart Logstash:

systemctl restart logstash

5. Visualizing Logs in Kibana

Go to Kibana → Discover
Select logstash-* Data View
Apply Filters & View Logs

6. ELK Stack Monitoring

Monitor Elasticsearch Cluster Health

Get Cluster health

curl -X GET "http://localhost:9200/_cluster/health?pretty"

Get Node health

curl -X GET "http://localhost:9200/_cat/nodes?v"

Get index health

curl -X GET "http://localhost:9200/_cat/indices?v"

Monitor Logstash Logs

tail -f /var/log/logstash/logstash-plain.log

journalctl -u logstash -f

Monitor Kibana Logs

tail -f /var/log/kibana/kibana.log

journalctl -u kibana -f

Datadog

Datadog Agent Installation (Linux)

DD_API_KEY=your_api_key bash -c "$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script.sh)"

Enable Log Monitoring

sudo nano /etc/datadog-agent/datadog.yaml

logs_enabled: true

systemctl restart datadog-agent

Datadog Agent CLI Commands

Check Datadog agent version

datadog-agent version

Check Datadog agent status

datadog-agent status

Run a specific check

datadog-agent check <integration>

Gather logs and configuration for

datadog-agent flare troubleshooting

Start Datadog agent

systemctl start datadog-agent

Stop Datadog agent

systemctl stop datadog-agent

Restart Datadog agent

systemctl restart datadog-agent

Enable Datadog agent on boot

systemctl enable datadog-agent

Metric Queries

CPU usage

avg:system.cpu.user{*}

Top 5 disk users

top(avg:system.disk.used{*}, 5, 'mean')

Datadog API Commands

List all metrics

curl -X GET "https://api.datadoghq.com/api/v1/metrics" -H "DD-API-KEY:<API_KEY>"

Submit a custom metric

curl -X POST "https://api.datadoghq.com/api/v1/series" \
-H "DD-API-KEY: <API_KEY>" \
-H "Content-Type: application/json" \
--data '{ "series": [{ "metric": "custom.metric", "points": [[1633000000, 10]],"type": "gauge", "tags": ["env:prod"] }] }'

Datadog Configuration

/etc/datadog-agent/datadog.yaml

api_key: "<YOUR_API_KEY>"
site: "datadoghq.com"
apm_config:
  enabled: true

Datadog Log Collection Setup

/etc/datadog-agent/datadog.yaml

logs_enabled: true

systemctl restart datadog-agent

Monitor Logs in Datadog UI

Go to Datadog → Logs → Live Tail
Filter logs by service, environment, or host

Datadog Monitoring Commands

Check configuration validity

datadog-agent configcheck

Get the hostname recognized by Datadog

datadog-agent hostname

Check agent health

datadog-agent health

Datadog Kubernetes Agent Installation

kubectl create secret generic datadog-secret --from-literal=api-key=<YOUR_API_KEY>

kubectl apply -f https://raw.githubusercontent.com/DataDog/datadog-agent/main/Dockerfiles/manifests/agent.yaml

Datadog Kubernetes Monitoring

List Datadog agent pods

kubectl get pods -n datadog

Check logs of a Datadog agent pod

kubectl logs -n datadog <pod-name>

Describe Datadog agent pod

kubectl describe pod <pod-name> -n datadog

Datadog Integrations for DevOps

Install Docker integration

datadog-agent integration install -t docker

Install Kubernetes integration

datadog-agent integration install -t kubernetes

Install AWS integration

datadog-agent integration install -t aws

Install Prometheus integration

datadog-agent integration install -t prometheus

Install Jenkins integration

datadog-agent integration install -t jenkins

Install GitLab integration

datadog-agent integration install -t gitlab

Datadog Log Collection for Docker

docker run -d --name datadog-agent \
-e DD_API_KEY=<YOUR_API_KEY> \
-e DD_LOGS_ENABLED=true \
-e DD_CONTAINER_EXCLUDE="name:datadog-agent" \
-v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /proc/:/host/proc/:ro \
-v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
datadog/agent

Datadog APM (Application Performance Monitoring)

Check trace agent status

datadog-agent trace-agent status

Show trace agent configuration

datadog-agent trace-agent config

Restart the trace agent

datadog-agent trace-agent restart

Datadog CI/CD Monitoring

Track CI/CD pipeline duration

curl -X POST "https://api.datadoghq.com/api/v1/series" \
-H "DD-API-KEY: <API_KEY>" \
-H "Content-Type: application/json" \
--data '{ "series": [{ "metric": "ci.pipeline.duration", "points": [[1633000000, 30]], "type": "gauge", "tags": ["pipeline:deploy"] }] }'

Datadog Synthetic Monitoring (API Tests)

curl -X POST "https://api.datadoghq.com/api/v1/synthetics/tests" \
-H "DD-API-KEY: <API_KEY>" \
-H "Content-Type: application/json" \
--data '{
  "config": { "request": { "method": "GET", "url": "https://example.com" },
  "assertions": [{ "operator": "is", "type": "statusCode", "target": 200 }] },
  "locations": ["aws:us-east-1"],
  "message": "Website should be reachable",
  "name": "Website Availability Test",
  "options": { "monitor_options": { "renotify_interval": 0 } },
  "tags": ["env:prod"],
  "type": "api"
}'

Datadog Dashboard & Alerts

Create a new dashboard

curl -X POST "https://api.datadoghq.com/api/v1/dashboard" \
-H "Content-Type: application/json" \
-H "DD-API-KEY: <API_KEY>" \
--data '{
 "title": "DevOps Dashboard", 
 "widgets": [
    {
       "definition": {
         "type": "timeseries", 
         "requests": [
            { "q": "avg:system.cpu.user{*}" }
         ]
       }
    }
 ]
}'

Create an alert

curl -X POST "https://api.datadoghq.com/api/v1/monitor" \
-H "Content-Type: application/json" \
-H "DD-API-KEY: <API_KEY>" \
--data '{
  "name": "High CPU Usage", 
  "type": "query alert",
  "query": "avg(last_5m):avg:system.cpu.user{*} > 80", 
  "message": "CPU usage is too high!",
  "tags": ["env:prod"]
}'

Datadog Incident Management

curl -X POST "https://api.datadoghq.com/api/v1/incidents" \
-H "Content-Type: application/json" \
-H "DD-API-KEY: <API_KEY>" \
--data '{
  "data": {
  "type": "incidents",
  "attributes": {
    "title": "Production Outage",
    "customer_impact_scope": "global",
    "customer_impact_duration": 30,
    "severity": "critical",
    "state": "active",
    "commander": "DevOps Team"
    }
  }
}'

New Relic

Install New Relic Agent

For Linux Servers

curl -Ls https://download.newrelic.com/install/newrelic-cli/scripts/install.sh | newrelic install

Query Logs & Metrics

NRQL Queries (New Relic Query Language)

SELECT average(cpuPercent) FROM SystemSample SINCE 30 minutes ago

SELECT count(*) FROM Transaction WHERE appName = 'my-app'

New Relic Agent CLI Commands

Check New Relic agent version

newrelic-daemon -v

Start New Relic infrastructure agent

systemctl start newrelic-infra

Stop New Relic infrastructure agent

systemctl stop newrelic-infra

Restart New Relic infrastructure agent

systemctl restart newrelic-infra

Enable agent on boot

systemctl enable newrelic-infra

View New Relic agent logs

journalctl -u newrelic-infra -f

New Relic API Commands

List all applications

curl -X GET "https://api.newrelic.com/v2/applications.json" -H "X-Api-Key:<API_KEY>" -H "Content-Type: application/json"

List monitored servers

curl -X GET "https://api.newrelic.com/v2/servers.json" -H "X-Api-Key:<API_KEY>"

Record a deployment

curl -X POST "https://api.newrelic.com/v2/applications/<APP_ID>/deployments.json" -H "X-Api-Key:<API_KEY>" -H "Content-Type: application/json" -d '{ "deployment": { "revision": "1.0.1", "description": "New deployment", "user": "DevOps Team" } }'

New Relic Configuration

/etc/newrelic-infra.yml

license_key: "<YOUR_LICENSE_KEY>"
log_file: /var/log/newrelic-infra.log
custom_attributes:
  environment: production

New Relic Log Monitoring Setup

Enable Log Forwarding

Edit: /etc/newrelic-infra.yml:

logs:
  enabled: true
  include:
  - /var/log/syslog
  - /var/log/nginx/access.log

Restart the agent:

systemctl restart newrelic-infra

View Logs in New Relic UI

Go to New Relic → Logs
Filter logs by application, environment, or tags

New Relic Monitoring Commands

Check agent status

newrelic-infra --status

Run a diagnostic test

newrelic-infra --test