#pytorch

2 results found

Ingero

eBPF-based GPU causal observability agent with MCP server. Traces CUDA Runtime/Driver APIs and host kernel events to build causal chains explaining GPU latency

Forge - GPU Kernel Optimization

Turn slow PyTorch into fast CUDA/Triton kernels. 32 parallel swarm agents optimize your code on real datacenter GPUs (B200, H200, H100, A100) with up to 14x speedup over torch.compile.

Build with ShipAny.

Explore
Playground
Blog
Cases
DXT
Partners

Privacy
Terms