Docs
Zero to first alert in 20 minutes.
Tracegrid watches your infrastructure and sends AI-explained incident cards to Slack. Pick the install that matches your stack.
Before you start
Prerequisites
- A Tracegrid API key — from app.tracegrid.app → Settings → API Keys.
- A Slack workspace where you can add an incoming webhook.
- One of: Linux server, Kubernetes cluster, Docker host, AWS ECS task, or Azure Container App.
Step 1
Connect Slack
Create an incoming webhook and hand the URL to Tracegrid:
- Open Slack Incoming Webhooks → Create your Slack app → From scratch.
- Name it “Tracegrid”, pick your workspace.
- Enable Incoming Webhooks → Add New Webhook to Workspace → choose a channel (e.g.
#incidents). - Copy the URL (
https://hooks.slack.com/services/…) and paste it into Settings → Slack in the dashboard.
Step 2
Install the agent
Linux VM (systemd)
curl -sSL https://tracegrid.app/install.sh | bashPrompts for API key, backend URL (https://api.tracegrid.app), and host name. Verify with systemctl status tracegrid-agent.
Docker Compose
curl -OL https://tracegrid.app/docker-compose.agent.yml
export TRACEGRID_API_KEY=gw_your_key_here
docker compose -f docker-compose.agent.yml up -dMonitors host CPU/mem/disk and every container — crashes, OOM kills, health checks, crash loops.
Kubernetes (Helm)
helm repo add tracegrid https://charts.tracegrid.app
helm repo update
helm install tracegrid-agent tracegrid/tracegrid-agent \
--set agent.apiKey=gw_your_key_here \
--set agent.clusterName=productionWatch only some namespaces with --set namespaces.watchOnly={production,staging}.
AWS ECS (sidecar)
Add the sidecar to your task definition:
{
"name": "tracegrid-sidecar",
"image": "ghcr.io/pradipkhuman/tracegrid-ecs-sidecar:latest",
"essential": false,
"environment": [
{ "name": "TRACEGRID_API_KEY", "value": "gw_your_key_here" },
{ "name": "TRACEGRID_SERVICE_NAME", "value": "your-service-name" }
],
"cpu": 64,
"memory": 64
}Detects OOM kills via the kernel counter even when the app emits zero logs.
Azure Container Apps (sidecar)
containers:
- name: tracegrid-sidecar
image: ghcr.io/pradipkhuman/tracegrid-azure-sidecar:latest
resources:
cpu: 0.25
memory: 0.5Gi
env:
- name: TRACEGRID_API_KEY
secretRef: tracegrid-api-keyDetects spot evictions, maintenance windows, and reboots via Azure IMDS.
Step 3
Verify & trigger a test
Confirm the agent is reporting, then fire a demo incident:
# Linux: tail the agent logs
journalctl -u tracegrid-agent -f
# Trigger a demo incident card in Slack
curl -X GET https://api.tracegrid.app/internal/demo-incident \
-H "X-Api-Key: your_internal_api_key"Within seconds a fully-formed HIGH_CPU card lands in your Slack channel — with the AI explanation and suggested fix. That’s the whole loop.
Need a hand? support@tracegrid.app or check system status.