eBPF-based GPU causal observability agent with MCP server. Traces CUDA Runtime/Driver APIs and host kernel events to build causal chains explaining GPU latency
Overview
Ingero is an eBPF-based agent that provides production-safe, kernel-level causal tracing for GPU workloads. It answers: "Why is my GPU training/inference slow right now?"
MCP Tools (7)
- get_check — System diagnostics (kernel, BTF, NVIDIA, CUDA, GPU processes)
- get_trace_stats — Per-operation p50/p95/p99 latency stats
- get_causal_chains — Root cause analysis with severity ranking and fix recommendations
- get_stacks — Resolved call stacks (Python source file, function, line number)
- run_sql — Read-only SQL queries against the event database
- run_demo — Run synthetic demo scenarios (no GPU or root needed)
- get_test_report — GPU integration test results
How It Works
Ingero traces 4 layers via eBPF:
- CUDA Runtime API (libcudart.so) — cudaMalloc, cudaFree, cudaLaunchKernel, cudaStreamSync
- CUDA Driver API (libcuda.so) — cuLaunchKernel, cuMemcpy, cuCtxSynchronize, cuMemAlloc
- Host kernel tracepoints — sched_switch, mm_page_alloc, oom_kill, process lifecycle
- System context from /proc — CPU, memory, load, swap
These produce causal chains: System context + host event → CUDA call → root cause.
Usage
# Start MCP server (stdio — for Claude Code, Cursor, etc.)
ingero mcp --db ~/.ingero/ingero.db
# Start MCP server (HTTPS — for remote clients)
ingero mcp --db ~/.ingero/ingero.db --http :8090
Key Features
- <2% overhead, zero code changes, single binary
- Python 3.10/3.11/3.12 source line attribution via DWARF
- SQLite storage with 10 GB rolling cap
- Kubernetes support (DaemonSet, Helm chart, pod metadata)
- Tested on A10, A100, H100, GH200, RTX 3090, RTX 4090
Server Config
{
"mcpServers": {
"ingero": {
"command": "ingero",
"args": [
"mcp"
],
"env": {}
}
}
}