Submit

Ingero

@ingero-io

eBPF-based GPU causal observability agent with MCP server. Traces CUDA Runtime/Driver APIs and host kernel events to build causal chains explaining GPU latency
Overview

Ingero is an eBPF-based agent that provides production-safe, kernel-level causal tracing for GPU workloads. It answers: "Why is my GPU training/inference slow right now?"

MCP Tools (7)

  • get_check — System diagnostics (kernel, BTF, NVIDIA, CUDA, GPU processes)
  • get_trace_stats — Per-operation p50/p95/p99 latency stats
  • get_causal_chains — Root cause analysis with severity ranking and fix recommendations
  • get_stacks — Resolved call stacks (Python source file, function, line number)
  • run_sql — Read-only SQL queries against the event database
  • run_demo — Run synthetic demo scenarios (no GPU or root needed)
  • get_test_report — GPU integration test results

How It Works

Ingero traces 4 layers via eBPF:

  1. CUDA Runtime API (libcudart.so) — cudaMalloc, cudaFree, cudaLaunchKernel, cudaStreamSync
  2. CUDA Driver API (libcuda.so) — cuLaunchKernel, cuMemcpy, cuCtxSynchronize, cuMemAlloc
  3. Host kernel tracepoints — sched_switch, mm_page_alloc, oom_kill, process lifecycle
  4. System context from /proc — CPU, memory, load, swap

These produce causal chains: System context + host event → CUDA call → root cause.

Usage

# Start MCP server (stdio — for Claude Code, Cursor, etc.)
ingero mcp --db ~/.ingero/ingero.db

# Start MCP server (HTTPS — for remote clients)
ingero mcp --db ~/.ingero/ingero.db --http :8090

Key Features

- <2% overhead, zero code changes, single binary
- Python 3.10/3.11/3.12 source line attribution via DWARF
- SQLite storage with 10 GB rolling cap
- Kubernetes support (DaemonSet, Helm chart, pod metadata)
- Tested on A10, A100, H100, GH200, RTX 3090, RTX 4090

Server Config

{
  "mcpServers": {
    "ingero": {
      "command": "ingero",
      "args": [
        "mcp"
      ],
      "env": {}
    }
  }
}
© 2025 MCP.so. All rights reserved.

Build with ShipAny.