#pytorch
2 results found
I
Ingero
eBPF-based GPU causal observability agent with MCP server. Traces CUDA Runtime/Driver APIs and host kernel events to build causal chains explaining GPU latency
F
Forge - GPU Kernel Optimization
Turn slow PyTorch into fast CUDA/Triton kernels. 32 parallel swarm agents optimize your code on real datacenter GPUs (B200, H200, H100, A100) with up to 14x speedup over torch.compile.