Amp Code
Amp
Terminal coding agent with multi-model routing and specialized subagents for complex infrastructure tasks.
A curated directory for senior SREs, infrastructure leads, and platform teams. Every tool is organized by lifecycle stage — plus a DevEx layer for AI coding assistants — and independently rated for AI maturity, so you can tell the difference between tools where AI is core and tools that bolted on a chatbot.
Amp
Terminal coding agent with multi-model routing and specialized subagents for complex infrastructure tasks.
Aqua Security
Cloud native application protection platform delivering full-lifecycle container and Kubernetes security with AI-powered behavioral analytics for runtime threat detection. The dedicated Secure AI module extends protection to LLM workloads, detecting OWASP Top 10 for LLM risks, model poisoning, and prompt injection while maintaining supply chain integrity through SBOM generation.
Bito
Multi-repo codebase knowledge graph available via MCP, providing system context to AI coding agents across repositories.
CircleCI
Cloud and self-hosted CI/CD with a strong reputation for fast pipelines and config-as-code. Native AI features: AI Test Insights for flaky test detection, ML-driven test splitting, and pipeline anomaly detection. Resource classes from small to GPU. Integrates with every major SCM and cloud, with first-class macOS for iOS builds.
Anthropic
Terminal-native coding agent that edits files, runs commands, and manages git workflows with deep codebase reasoning.
CloudZero, Inc.
Cloud cost intelligence platform with an AnyCost engine that ingests billing and usage data from any source without requiring perfect tags. ML-driven anomaly detection identifies cost spikes in real-time, while unit cost analytics attribute spend to specific features, customers, and deployments to align engineering decisions with business outcomes.
Codeium (Windsurf)
Free, fast AI completion plus Windsurf, an agentic IDE. Cascade is its agent: reads the codebase, plans, edits, verifies. Supports 70+ IDEs including VS Code, JetBrains, and Neovim via plugin. Self-hosted and air-gapped option for regulated teams. Now operating under the Windsurf brand as the flagship product.
CodeRabbit
AI PR review agent with context-aware analysis, one-click fixes, and automated test generation.
Anysphere
AI-first IDE forked from VS Code, by Anysphere. Tab completion is the headline (deep multi-file edit predictions), plus Composer for repo-wide changes, agent mode, and chat that understands the whole codebase via embeddings. Bring-your-own model (Claude, GPT-5, Gemini). Privacy mode keeps your code out of training.
Datadog
Full-stack observability platform powered by Watchdog anomaly detection and Bits AI autonomous SRE. Continuously baselines metrics across hosts, containers, and traces to eliminate static thresholds and surface root causes. Bits AI handles incident investigation autonomously — correlating signals, querying logs, and proposing remediations without requiring manual runbook execution.
Cognition Labs
Autonomous AI software engineer that plans, codes, tests, and ships from a cloud environment or terminal.
env0 Inc.
IaC orchestration platform with embedded AI — Cloud Analyst for cost and compliance insights, AI PR Summaries, and an IaC Code Generator. Manages Terraform, OpenTofu, Pulumi, and CloudFormation with cost estimation, policy enforcement, and workflow automation. AI features are context-aware inside the provisioning lifecycle, not a bolted-on chatbot.
Factory
Agentic platform deploying autonomous Droids across desktop, CLI, and cloud for enterprise-scale software tasks.
Firefly
Cloud asset management platform with Thinkerbell AI agents for autonomous infrastructure operations. Continuously scans multi-cloud and Kubernetes estates for configuration drift, then triggers AI-assisted auto-remediation. One-click remediation, natural language infrastructure queries, and Cloud Resilience Posture Management with automated cross-region failover (RTO under 1 hour). Automated IaC generation brings unmanaged ClickOps resources under Terraform/Pulumi governance.
FireHydrant (acquired by Freshworks December 2025)
Runbook-driven incident management platform that automates response coordination from detection through retrospective. AI Copilot auto-generates incident summaries, links similar historical incidents, transcribes war room meetings, and drafts retrospectives. Deep service catalog mapping enforces consistency across complex microservice architectures.
GitGuardian
AI-enhanced secrets detection platform using ML for false-positive reduction (Secret Enricher) and permission-scope analysis (Secrets Analyzer) across 450+ secret types. Scans code repositories, Slack workspaces, Jira, and CI/CD pipelines to prevent secrets sprawl, with ggshield pre-commit hooks extended to AI coding assistants like Cursor and Claude Code.
GitHub (Microsoft)
GitHub-native CI/CD that runs workflows triggered by repo events. Hosted runners across Linux, macOS, and Windows or self-hosted on your infrastructure. Tight integration with GitHub Copilot for AI-assisted workflow authoring, Copilot Autofix for security findings, and Copilot agentic PR reviews. The default deploy plumbing for any team already on GitHub.
GitHub (Microsoft)
AI pair programmer from GitHub. Inline completions, multi-line suggestions, slash commands, chat for explaining and refactoring, and agent mode that can author PRs end-to-end. Trained on public code with enterprise filters for license-safe output. Available in VS Code, JetBrains, Neovim, Visual Studio, and the GitHub web UI.
GitLab
Built-in CI/CD for GitLab: pipelines defined in .gitlab-ci.yml, runners on Linux, macOS, and Windows, and Auto DevOps for opinionated deploys. GitLab Duo brings AI code suggestions, vulnerability explanations, root cause analysis on failed jobs, and chat-based incident triage. The single-application platform sell remains its differentiator vs GitHub plus add-ons.
Grafana Labs
Open-source observability platform with ML-powered Sift investigations and an AI assistant that generates PromQL/LogQL queries from natural language. Adaptive Telemetry automatically drops high-cardinality data before indexing, cutting ingest costs. The open-core model lets you self-host Grafana OSS free or use managed Cloud tiers.
Greptile
AI PR review agent with full-codebase graph analysis and multi-hop dependency investigation across 30+ languages.
Harness, Inc.
Enterprise CD platform with ML-based deployment verification (AIDA). Auto-detects performance and quality regressions during canary deployments by comparing metrics against historical baselines, then triggers rollback when anomalies exceed thresholds. Predictive deployment risk scoring analyzes code change characteristics to flag high-risk releases before they ship.
Pineapple Technology Ltd
Slack-native incident management platform that auto-generates timelines, assigns action items via AI, and runs structured retrospectives without leaving the war room. AI SRE features include an assistant that investigates root cause, drafts post-mortems, and correlates signals across your observability stack.
Infracost Inc.
Proactive FinOps platform that shifts cost management left into CI/CD and IDEs. Parses Terraform, CloudFormation, and CDK plans to generate cost breakdowns before deployment, and equips AI coding agents (Claude Code, GitHub Copilot, Cursor) with a live cloud pricing API covering 10M+ prices to generate budget-compliant infrastructure on the first attempt.
Komodor
Autonomous AI SRE platform for cloud-native infrastructure. Klaudia AI Agents perform autonomous investigation of Kubernetes issues by correlating deployment changes, config drift, alerts, and telemetry to identify root cause. Automated remediation playbooks execute operational actions — restart, scale, cordon, drain — with governance guardrails. Continuous drift detection and dynamic pod rightsizing bridge observability data to operational action.
LaunchDarkly, Inc.
Feature management platform with AI-powered Guarded Rollouts. Sequential testing engine progressively increases traffic while monitoring metrics for regressions — ML detects statistically significant negative impact and automatically pauses or rolls back the rollout. Separates deployment from release, enabling rollback without redeployment. First FedRAMP-authorized feature management solution.
n8n
Fair-code workflow automation with 500+ integrations, AI agent nodes, and self-hostable deployment for platform teams.
New Relic
Unified observability platform with ingest-based pricing and New Relic AI (NRAI) for natural language querying, automated root cause analysis, and AIOps alert correlation. MCP server integration enables agentic AI workflows via AWS DevOps Agent. 100 GB/month free tier covers most small production environments.
Octopus Deploy Pty Ltd
Deployment automation platform with AI Deployment Failure Analyzer that examines logs, process configs, and error details to identify root cause and suggest remediation. Recovery Agent diagnoses deployment failures with a single click. MCP Server enables external AI agents to query Octopus infrastructure for change management and audit workflows.
Orca Security
Agentless cloud security platform using patented SideScanning technology to read cloud configuration and workload runtime state out-of-band without deploying agents. Embeds GenAI-powered investigation and natural language querying to explain attack paths, correlate risks across multi-cloud environments, and guide remediation including paused and stopped workloads.
PagerDuty, Inc.
Event intelligence and AIOps platform that uses ML-based alert grouping, change correlation, and probable-origin analysis to cut noise by up to 90%. Gen-AI agents (Insights, SRE, Shift, Scribe) automate triage, root-cause investigation, on-call handoffs, and incident documentation across the full respond lifecycle.
Qodo
Enterprise AI code-review platform with in-IDE agents, PR automation, and multi-repo context engine.
Robusta
Kubernetes troubleshooting and self-healing platform. Open-source core provides rule-based alert enrichment and auto-remediation playbooks that trigger operational actions — restart pods, scale deployments, rollback, run commands — in response to Prometheus alerts. HolmesGPT adds AI-powered cross-system investigation spanning AWS, GCP, OpenShift, and Kubernetes, generating root cause narratives and fix suggestions.
Rootly Inc.
AI-native incident management platform built for SRE and DevOps teams. Orchestrates the entire respond lifecycle from detection to retrospective with AI-powered alert grouping, root cause analysis, conversational AI assistant in Slack, and automated post-mortem generation.
Scalr Inc.
Remote operations backend for Terraform and OpenTofu with Scalr AI (launched June 2025). Provides intelligent error analysis, AI-generated plan summaries, and natural language policy explanations. Maintains run history, state management, and cost estimation in a unified control plane. Best fit for teams scaling past local Terraform execution who need an opinionated backend with embedded AI assistance.
Semgrep
High-velocity SAST and supply chain security platform powered by Semgrep Assistant. Uses AI Memories to auto-triage findings with 96% accuracy and generate context-aware autofix code patches tailored to your codebase style. The open-source engine drives community adoption while the cloud platform adds management, reporting, and CI/CD blocking policies.
Snyk
AI-native security platform combining DeepCode AI and Evo by Snyk to perform reachability analysis, risk-based prioritization, and auto-generated fix suggestions across SAST, SCA, container, and IaC scanning. Uses symbolic AI to determine whether a vulnerability is reachable in your specific code path, cutting noise by surfacing only exploitable issues with one-click remediation in the IDE and CI pipeline.
Spacelift Inc.
Policy-as-code CI/CD platform for IaC with Spacelift Intelligence (launched March 2026). Runs Terraform, OpenTofu, Pulumi, Ansible, and CloudFormation with OPA guardrails, drift detection, and a private module registry. AI features surface plan summaries, policy violations, and remediation paths inside run context — not a side chatbot. Purpose-built for platform teams needing auditability and multi-stack support.
Tabnine
Enterprise AI coding assistant with on-prem and air-gapped deployment, zero data retention, and organizational context.
HashiCorp (IBM)
HashiCorp declarative IaC for provisioning across 4,000+ providers. HCL syntax, plan/apply lifecycle, modular composition. Now under IBM. Source code switched from MPL to BUSL in 2023 (driver of the OpenTofu fork). HCP Terraform adds remote state, policy-as-code, agents, and run tasks. AI features focus on Stacks and provider workflows.
Warp
Rust-based AI terminal with natural-language command generation, agentic workflows, and team collaboration.
Wiz
Agentless CNAPP and CSPM solution that uses an AI-powered unified risk graph to correlate vulnerabilities, misconfigurations, exposed secrets, and identity risks across AWS, Azure, GCP, Kubernetes, and Snowflake. Prioritizes risks based on actual exploitability and blast-radius analysis rather than theoretical severity, enabling teams to remediate the 1% of issues that matter.
DoiT International (acquired Zesty February 2025)
AI-powered cloud optimization platform that autonomously manages Kubernetes resources via Kompass — continuously rightsizing pod CPU/memory, autoscaling persistent volumes, and managing spot instance migrations with sub-40-second replacement. ML models analyze real-time usage patterns for ongoing adjustments without code changes. The Commitment Manager automates AWS Savings Plan and RI purchasing with micro-commitment strategies.