Atlantis
Open source community
Self-hosted Terraform pull-request automation. Runs plan and apply from VCS webhooks with no per-resource SaaS billing. Apache 2.0; operators supply compute, state backend, and platform engineering capacity.
A curated directory for senior SREs, infrastructure leads, and platform teams. Every tool is organized by lifecycle stage — plus a DevEx layer for AI coding assistants — and independently rated for AI maturity, so you can tell the difference between tools where AI is core and tools that bolted on a chatbot.
Open source community
Self-hosted Terraform pull-request automation. Runs plan and apply from VCS webhooks with no per-resource SaaS billing. Apache 2.0; operators supply compute, state backend, and platform engineering capacity.
Kubeshop
AI-based collaborative Kubernetes troubleshooting platform operating within Slack, Teams, Discord, and Mattermost. AI Insights provide troubleshooting and operations guidance for cluster issues. Real-time alert enrichment adds K8s context to notifications. Enables ChatOps remediation — execute kubectl commands and runbooks directly from chat. Converts passive telemetry notifications into human-in-the-loop operational actions inside the collaboration platform.
Continue Dev, Inc.
Open-source VS Code and JetBrains extension connecting any LLM for chat, autocomplete, and agentic coding.
Open-source terminal agent with 1M token context, multimodal support, and generous free daily usage limits.
Grafana Labs
Open-source observability platform with ML-powered Sift investigations and an AI assistant that generates PromQL/LogQL queries from natural language. Adaptive Telemetry automatically drops high-cardinality data before indexing, cutting ingest costs. The open-core model lets you self-host Grafana OSS free or use managed Cloud tiers.
Harness, Inc.
Enterprise CD platform with ML-based deployment verification (AIDA). Auto-detects performance and quality regressions during canary deployments by comparing metrics against historical baselines, then triggers rollback when anomalies exceed thresholds. Predictive deployment risk scoring analyzes code change characteristics to flag high-risk releases before they ship.
Infracost Inc.
Proactive FinOps platform that shifts cost management left into CI/CD and IDEs. Parses Terraform, CloudFormation, and CDK plans to generate cost breakdowns before deployment, and equips AI coding agents (Claude Code, GitHub Copilot, Cursor) with a live cloud pricing API covering 10M+ prices to generate budget-compliant infrastructure on the first attempt.
K8sGPT (CNCF Sandbox)
AI-powered Kubernetes cluster analyzer and remediation tool. Built-in analyzers scan pods, services, deployments, ingresses, and events for misconfigurations and failures, providing plain-English explanations via multiple AI backends (OpenAI, Azure, Bedrock, local models). Operator mode enables continuous in-cluster monitoring. Experimental auto-remediation patches supported resources. MCP server exposes cluster operations as tools for AI assistants.
Solo.io
CNCF Sandbox framework for running AI agents natively in Kubernetes for automated diagnostics and cluster operations.
Kilo
Open-source AI coding agent for VS Code, JetBrains, and CLI with 500+ models at provider rates and zero markup.
Stackwatch (acquired by IBM via Apptio)
Kubernetes cost allocation platform that breaks cloud spend down to namespace, pod, and label level with real-time monitoring. ML-driven rightsizing recommendations analyze historical usage patterns to suggest optimal resource requests and limits, while anomaly detection catches cost spikes before the bill arrives. Available as open-core self-hosted and fully managed SaaS.
n8n
Fair-code workflow automation with 500+ integrations, AI agent nodes, and self-hostable deployment for platform teams.
CNCF
CNCF graduated time-series database and metrics scraper. Pull-based model, multi-dimensional data, PromQL, Alertmanager. The default monitoring backbone for Kubernetes. AI angle is downstream: exemplars, vector embeddings via plugins, and AI features in Grafana, Robusta, K8sGPT, and others built on top of Prometheus data.
Robusta
Kubernetes troubleshooting and self-healing platform. Open-source core provides rule-based alert enrichment and auto-remediation playbooks that trigger operational actions — restart pods, scale deployments, rollback, run commands — in response to Prometheus alerts. HolmesGPT adds AI-powered cross-system investigation spanning AWS, GCP, OpenShift, and Kubernetes, generating root cause narratives and fix suggestions.
Semgrep
High-velocity SAST and supply chain security platform powered by Semgrep Assistant. Uses AI Memories to auto-triage findings with 96% accuracy and generate context-aware autofix code patches tailored to your codebase style. The open-source engine drives community adoption while the cloud platform adds management, reporting, and CI/CD blocking policies.
Terramate GmbH
Orchestration platform for Terraform and OpenTofu stacks with AI Mate assistant, MCP Server, and Catalyst framework for building AI agents. Uses DAG orchestration to manage complex stack dependencies. AI features are built into the CLI and Cloud interface — not a wrapper — enabling context-aware infrastructure changes, code generation, and troubleshooting inside the provisioning workflow.
Warp
Rust-based AI terminal with natural-language command generation, agentic workflows, and team collaboration.