Tested on v0.126.0 (docs & pricing, May 2026) · Next review: August 2026
Overview
Factory.ai is a multi-agent software engineering platform built around a coordinator that decomposes work, dispatches specialized agents called Droids, and synthesizes results. The coordinator maintains high-level context and does not write code; implementation, review, testing, documentation, knowledge indexing, and deployment-oriented actions are handled by separate Droid roles.
The product is positioned as an orchestration runtime rather than a single inline coding assistant: model-agnostic routing (OpenAI, Anthropic, Gemini, and OpenAI-compatible endpoints on the documented BYOK path), first-class connectors to Linear and Jira for ticket-triggered workflows, and optional persistent execution via Droid Computers (managed cloud environments or machines registered outbound to relay.factory.ai). The vendor closed a $150M Series C at a $1.5B valuation in April 2026, with Khosla Ventures, Sequoia Capital, and Blackstone named in public materials associated with that round.
Marketing materials referencing “31× feature delivery speed” (examples cited include MongoDB and EY) measure wall-clock delegation time from ticket to pull request, not aggregate engineering throughput; methodology is not published and the figure has no independent verification in open literature. Materials describing the system as “autonomous” coexist with product behavior that, in the default configuration, requires explicit approval before git push.
Quick Facts
Vendor
Factory
Primary surface
CLI, web, IDE access (per pricing page Free/BYOK and paid tiers)
Review Droid. Reviews diffs, flags regressions, posts inline commentary, and performs STRIDE- and OWASP-oriented security reviews per vendor documentation.
Test Droid. Authors unit and integration tests, runs suites, surfaces regressions.
Docs Droid. Updates README files, internal documentation, and changelogs.
Knowledge Droid. Maintains a persistent semantic index combining HyperCode (multi-resolution structure with call graphs and latent-space similarity) and ByteRank retrieval. The codebase is indexed once with cost amortized across later sessions; this indexing layer is described in vendor materials as the primary architectural anchor relative to the Code Droid alone.
Deployer Droid. Triggers CI/CD pipelines and observes deployment outcomes.
Agent loop (documented). Task ingestion → plan presented for approval → parallel dispatch where applicable → serial code execution with iteration → review and synthesis → human approval before git push in the default posture.
Serial execution on mutations. Parallel implementation was reported in vendor materials to increase merge conflicts and divergent architectural state; therefore workers that change code run serially. Parallelism is reserved for read-only operations such as research or validation runs.
Creator–verifier separation. Fresh validator agents inspect completed work to reduce implementation bias in review.
Autonomy flags (CLI).
--auto low: file creation and editing
--auto medium: adds package installation, git commit/checkout, and build commands
--auto high: adds git push, deployment commands, and database migrations
--skip-permissions-unsafe: proceeds without confirmation prompts in isolated containers only
Missions. Described as parallel multi-agent execution in a research preview (March 2026); vendor documentation raises open questions about whether parallelization improves outcomes versus serial patterns.
Example CLI patterns as documented:
# Register a BYOM Droid Computer; connects outbound to relay.factory.ai (no inbound firewall rules)
droid computer register
# Example autonomy preset for a CI task (commands vary by installation)
droid exec --auto medium -- npm run build
# GitHub Actions — vendor publishes Factory-AI/droid-action; execution stays on customer runners
name: factory-droid
on:
workflow_dispatch:
jobs:
droid:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: Factory-AI/droid-action@v1
with:
task: "Fix failing unit tests on main"
Platforms and Integration
Version control. Official documentation lists deep GitHub integration (GitHub App, @droid commands on pull requests, GitHub Actions). GitLab is supported for cloud and self-hosted instances using OAuth 2.0. Bitbucket and Azure DevOps are not documented in official Factory materials available as of May 2026.
CI/CD. GitHub Actions via Factory-AI/droid-action, CircleCI, Jenkins, and generic pipelines that shell out to droid exec. Execution remains on customer-controlled runners; Factory brokers model traffic.
Issue trackers and chat. Linear and Jira support assignment-driven triggers and status write-backs. Slack mentions of the Factory integration launch agents and post threaded progress. Sentry and PagerDuty are documented for incident-oriented automation (for example postmortem drafting and tests from stack traces). Notion is referenced for extracting specification context.
Interfaces. The Free/BYOK tier description lists CLI, web, and IDE access without Droid Computers or team features; paid tiers add cloud and local background agents per the May 2026 pricing page.
Desktop app availability aligns with Factory’s communicated April 8, 2026 desktop release milestone. MCP support is flagged for calendar Q1 2026 alongside cross-editor integration milestones listed in Factory’s roadmap-style communications around the same period.
In Practice
Coordinator overhead and token economics. Vendor documentation directs budgeting roughly 2–3× raw third-party API spend for coordinator traffic and multi-step validation. Model-specific multipliers apply (Claude Opus characterized at roughly 3–5× the credit consumption of Sonnet for comparable tasks in Factory’s documented guidance). Three concurrent rolling windows govern usage (5-hour, weekly, monthly). Hitting a cap forces fallback to Droid Core (open-weight models with separate limits) or purchase of prepaid credits ($10 minimum, non-expiring). Teams and Enterprise tiers are described as exempt from rolling limits.
Rolling limits and opacity. Practitioner writeups commonly describe unpredictable consumption when multipliers stack with rolling windows; some accounts report exhausting trial allowances on a single feature. The combination of concurrent windows and per-model multipliers complicates bottom-up forecasting relative to a single monthly cap.
Ticket-first workflows. Concise or underspecified Linear/Jira tickets correlate with shallow outputs more often than tickets that spell out acceptance criteria—consistent with dispatch and planning flows that ingest ticket text as structured context. Teams that primarily steer work through chat or informal requests report more friction than organizations that anchor work in issue trackers.
AskUser variance. Observers report occasional clarifying questions that appear inferable from existing repository or ticket context, increasing operator round-trips.
Context transparency. Comparisons with tools that expose fine-grained token accounting note Factory responses truncating sometimes without matching in-product token ledger detail.
Quality variance. Non-vendor writeups cite incorrect generated service structure, extended correction loops, and broken authentication in delivered samples; those anecdotes are not tied to a controlled public benchmark.
Bring own API keys (OpenAI, Anthropic, Gemini, OpenAI-compatible endpoints); pay providers directly; CLI + web + IDE access; no Droid Computers; no team features
Pro
$20
Baseline rolling rate limits; cloud and local background agents; usage dashboard
Plus
$100
~5× Pro usage; Droid Computers (managed cloud environments) — added between January and April 2026
Max
$200
~10× Pro usage; early access features
Teams
Custom (up to 150 seats)
SSO/SAML/SCIM, Zero Data Retention, admin controls, exempt from rolling rate limits
Enterprise
Custom
Unlimited seats, dedicated compute partition, on-prem options, full admin controls
Mechanics. Usage is metered across three rolling windows simultaneously (5-hour, weekly, monthly). When limits are exceeded, the product falls back to Droid Core or supports prepaid credits. Teams and Enterprise remove rolling limit caps in the stated policy. Token multipliers vary by model choice.
Droid Computers and tier gating. Managed Droid Computers are tied to the Plus tier and above in the May 2026 pricing matrix; Pro does not include Droid Computers. Workflows that depend on managed persistent environments therefore require at least Plus under the published page.
Free tier interpretation. The $0 BYOK tier is not “unmetered Factory inference”; it routes model spend to external accounts and excludes Droid Computers and team capabilities.
API keys from supported providers for BYOK usage on Free and for model routing on paid plans as configured.
Git hosting on a documented platform (GitHub or GitLab per official docs) for native VCS integrations.
For BYOM Droid Computers, an outbound path to relay.factory.ai and a supported host OS (Linux, macOS, or Windows per Factory documentation).
CI runners with permission to check out repositories and execute droid exec or the published GitHub Action where used.
Security and Data
Deployment patterns (vendor documentation).
Cloud-managed orchestration. Droids run locally while Factory cloud coordinates; code traverses Factory infrastructure for Knowledge Droid indexing.
Hybrid enterprise. Inference can remain inside a customer VPC with reduced metadata shipped to Factory cloud.
Air-gapped / on-prem. No outbound internet; models served locally (Ollama, vLLM) with enterprise arrangements.
Data handling controls. Zero Data Retention is listed on Teams and Enterprise tiers in Factory’s security pages. Vendor statements include “no training on customer code.” Droid Shield performs secret scanning client-side before content leaves the machine. Encryption is described as AES-256 at rest and TLS 1.2+ in transit. EU-only data residency shipped in v0.126.0 (May 2026). SIEM-export partners named alongside audit logging include Splunk, Datadog, and ELK.
Certifications and frameworks. SOC 2 Type 1 is listed as achieved. SOC 2 Type 2 is described for Teams/Enterprise in vendor copy without an independently observable SOC 2 Type 2 artifact linked from Factory’s public-facing trust disclosures as of May 2026. ISO 42001, GDPR, and CCPA surface in Factory’s publicly stated compliance list. HIPAA is not explicit on the public security page; Business Associate Agreement availability requires direct confirmation with the vendor. FedRAMP is not documented.
Residual exposure. Any cloud-managed indexing path implies code and derived artifacts transiting vendor systems for Knowledge Droid features unless a hybrid or air-gapped architecture is contracted.
Claude Code (Anthropic) is a single-agent terminal and IDE-integrated tool carrying widely quoted SWE-bench figures—for example Claude Code with Opus 4.5 at 80.9%—and granular token accounting in comparisons with Factory. Published competing pricing bands sit around $20–$100/month in the comparative matrix below.
Cursor ($20–$200/month in the table below) combines a VS Code-derived IDE with agentic features and multi-model support (Claude, GPT, Gemini, Grok).
GitHub Copilot Workspace ($10–$39/user/month) focuses on GitHub-centric agent flows with GPT, Claude, and Gemini model options; VCS scope is GitHub-only in the comparison matrix below.
Devin 2.0 ($20–$500/month) is listed as an end-to-end autonomous product with a sandboxed persistent environment and GitHub integration.
SWE-agent is open source with API pass-through costs; comparative benchmarking tables circulating in the ecosystem list a 33.6% SWE-bench headline for that agent harness.
Dimension
Factory.ai
Claude Code
Cursor
GitHub Copilot Workspace
Devin 2.0
SWE-agent
Pricing
$20–$200/mo + custom
$20–$100/mo
$20–$200/mo
$10–$39/user/mo
$20–$500/mo
Free (API costs)
Agent model
Multi-agent coordinator
Single-agent + subagents
Single-agent + parallel background
Single-agent
End-to-end autonomous
Single-agent
Model support
Model-agnostic
Claude only
Claude, GPT, Gemini, Grok
GPT, Claude, Gemini
Proprietary
Any (API)
Deployment
Cloud, hybrid, air-gapped
Cloud only
Cloud only
Cloud only
Cloud only
Self-hosted
Persistent env
Yes (Droid Computers)
No
No
No
Yes (sandboxed)
No
VCS
GitHub, GitLab
Any
Any
GitHub only
GitHub
GitHub
Ticket-native
Linear, Jira first-class
No
No
GitHub Issues
No
No
FedRAMP
Not documented
Not listed
Not listed
Microsoft-covered posture noted for Copilot Workspace parent ecosystem
Not documented
N/A
Terminal Bench
63.1% (#1 Dec 2025)
N/A
N/A
N/A
N/A
N/A
SWE-bench
50.5% (Sonnet)
80.9% (Opus 4.5)
~48%
~55%
Limited data
33.6%
Terminal Bench and SWE-bench measure different task shapes; cross-comparing headline numbers without scenario context blurs methodology. Terminal Bench emphasizes end-to-end completion through terminal UIs (Factory listed at 63.1% #1 as of December 2025). SWE-bench scores quoted here pair specific harnesses with named models—for example Factory Droid with Claude Sonnet at 50.5% versus Claude Code with Opus 4.5 at 80.9%—so variance reflects both scaffolding and backbone model choice.
Version & Freshness Metadata
Last reviewed: 2026-05-15 · Version referenced: v0.126.0 (EU residency note) with docs/pricing snapshot May 2026 · Next review: 2026-08-15 · Changelog: https://docs.factory.ai/changelog/release-notes