# The Lifecycle Has Changed

## A field guide to building, shipping, and running software in the agent era — for the people who already know how to do it the old way.

**Published:** May 17, 2026
**Audience:** CxO, architect, industry analyst
**Assumed baseline:** a modern digital native or enterprise delivering apps according to AWS Well-Architected or equivalent best practice.

---

## Opening hook

You already know how to ship software. Your teams run trunk-based development, blue/green deploys, multi-region failover, SLO-driven on-call, FinOps reviews, SBOMs, threat modelling, design systems, and a CI pipeline that has not had a red main in eleven weeks. You have an AWS Well-Architected review on the calendar. You know what good looks like.

That stack — the one we collectively spent ten years building between roughly 2012 and 2022 — is now the *baseline*, not the frontier. Between November 2022 and May 2026, the entire shape of app delivery changed underneath it. Not because new frameworks arrived. Because the *primary author of code* is no longer a human, the *primary consumer of your API* is increasingly not a browser, the *primary failure mode* is no longer a null pointer, and the *unit of value you ship* is no longer a feature.

This document is for the people who already did the last transformation and now have to lead the next one. It is opinionated, it is concise, and where it cites a vendor it does so because the underlying primitive matters, not because the logo does.

> **The thesis in one sentence.** Pre-2022 we shipped deterministic code that humans wrote and humans used; in May 2026 we ship probabilistic capabilities that agents write, agents call, and agents partly operate — and almost every control plane we built for the first world is the wrong shape for the second.

---

## Eight paradigm shifts

### 1. Human-authored → agent-authored code
- **Pre-2022.** Engineers write code; assistants suggest the next token.
- **May 2026.** Claude Code, Cursor 3, Codex Desktop, Devin, Windsurf and OpenCode operate as multi-file agents. SWE-bench Verified scores cluster 60-78%. Nubank publicly reports an 8-12× efficiency multiplier on a 6M-line migration done by a fleet of Devins.
- **Benefit.** Throughput rises sharply on bounded, well-specified work; the marginal cost of an additional refactor approaches zero.
- **Risk.** Skill atrophy in juniors; IP exposure on training-set provenance; maintenance debt on code no human truly read line by line.

### 2. Prescriptive specs → intent-based specs
- **Pre-2022.** PRDs, Jira tickets, Figma frames, acceptance criteria. The spec is the truth.
- **May 2026.** The spec is the *eval*. Spec-Driven Development reframes the PRD as a constitution plus a Given/When/Then scenario set that compiles directly to executable evaluations. CLAUDE.md / AGENTS.md sit at the repo root.
- **Benefit.** Intent captured once, consumed by humans and agents.
- **Risk.** Eval coverage is the new test coverage; most orgs have ~1% of it.

### 3. Human-in-the-loop → human-on-the-loop
- **Pre-2022.** Every meaningful state change is human-initiated and human-approved.
- **May 2026.** Approval is graduated. Routine work runs human-out-of-the-loop with eval gates; bounded autonomous work runs on-the-loop with sampled review; novel or high-blast-radius work stays human-in-the-loop.
- **Benefit.** Latency and unit cost collapse for the 80% of work that is repetitive.
- **Risk.** Trust calibration becomes the hardest engineering problem in the org.

### 4. Static apps → agentic apps
- **Pre-2022.** Apps are request/response. State lives in a database; logic lives in a stateless tier.
- **May 2026.** Apps are agentic: long-running, stateful, tool-using, planning, sometimes multi-agent. Cloudflare Agents SDK models this as one Durable Object per agent — millions of named, hibernating instances.
- **Benefit.** Software finally fits the shape of the work.
- **Risk.** Cascading agent failures and cost runaway (5-30× tokens per task vs a chatbot, per Gartner).

### 5. REST → MCP and A2A
- **Pre-2022.** REST, gRPC, GraphQL.
- **May 2026.** The Model Context Protocol (Anthropic, Nov 2024 → donated to the Linux Foundation's Agentic AI Foundation, Dec 2025; spec 2025-11-25; >500 public servers; OAuth 2.1 baked in) is the de-facto integration substrate for LLMs and tools. Google's A2A handles agent-to-agent.
- **Benefit.** The N×M integration problem collapses to N+M.
- **Risk.** Multi-tenant auth on MCP servers is unstandardised; "Shadow MCP" is the new Shadow IT.

### 6. Deterministic → probabilistic systems
- **Pre-2022.** Same input, same output.
- **May 2026.** Distributions of outputs. Testing becomes evaluation: LLM-as-judge, pairwise comparison, golden datasets, regression suites grown from production failures. SLOs include hallucination rate, refusal rate, and tool-selection accuracy alongside p99 latency.
- **Benefit.** Software can handle the ambiguous middle of real human work.
- **Risk.** Reproducibility, audit, and incident forensics all degrade.

### 7. Ship code → ship capabilities
- **Pre-2022.** You ship features. Pricing is per-seat.
- **May 2026.** You ship capabilities — discrete outcomes an agent can invoke. Pricing moves to per-outcome, per-agent-task, or per-resolution. Klarna, Intercom, Zendesk and Salesforce have all publicly priced agent tiers this way.
- **Benefit.** Revenue tracks customer value, not seat sprawl.
- **Risk.** Token economics and inference cost become a CFO-level conversation.

### 8. Cloud-centric → edge-native and AI-at-the-edge
- **Pre-2022.** Centralise compute in three to five regions; pay egress; accept latency.
- **May 2026.** Every agent step adds latency, and multi-step workflows compound it. Edge-native platforms (Cloudflare Workers + Workers AI, Vercel Functions, Fly Machines, Modal) co-locate inference, state, vector search and tools at the network edge.
- **Benefit.** Lower p99, better data-residency posture, fewer "glue" services.
- **Risk.** Heavy training and large GPU clusters still belong in hyperscaler regions; edge is not the answer to everything.

---

## The lifecycle, stage by stage

### 1. Conception and ideation
- **Pre-2022.** Customer interviews, JTBD workshops, a few napkin sketches, a strategy deck.
- **May 2026.** Ideation runs as a conversation with a research agent against a corpus of customer transcripts, support tickets, win/loss notes, and competitive telemetry. The human job becomes taste and selection, not generation.
- **Shift.** Abundant ideas, scarce judgement.
- **Risk.** Convergent mediocrity — every team prompts the same models against the same public web and arrives at the same three ideas.

### 2. Planning and roadmapping
- **Pre-2022.** Quarterly OKRs, RICE scoring, capacity planning in spreadsheets.
- **May 2026.** Roadmaps are living artefacts maintained partly by agents that watch issue trackers, telemetry, and revenue dashboards. Linear, Jira and Productboard all ship MCP servers. Capacity planning incorporates an inference budget alongside headcount.
- **Shift.** Plan-then-execute → continuously re-planned execution.
- **Risk.** "Always re-planning" looks identical to "no plan at all" without strong governance.

### 3. Specification and product/UX design
- **Pre-2022.** Figma, design systems, usability testing, prescriptive PRDs.
- **May 2026.** Dual UX. Every surface is designed for two audiences: humans, and the agents acting on their behalf. Designers ship a human UI, an llms.txt, structured data, and a machine-readable capability manifest. v0, Lovable, Bolt and Figma Make produce working React from intent.
- **Shift.** UX is no longer a single channel; it is a contract surface for humans and agents simultaneously.
- **Risk.** If you only design for humans, you become invisible to the agents recommending products.

### 4. Architecture and technical design
- **Pre-2022.** Microservices, event streams, ADRs, a Well-Architected review.
- **May 2026.** Architecture absorbs four new primitives: a model layer (provider-agnostic routing through an AI Gateway), a memory layer (vector + episodic + scratchpad), a tool layer (MCP servers), and an agent runtime (durable, hibernatable, per-entity).
- **Shift.** The system diagram grows a "cognition plane" alongside the data and control planes.
- **Risk.** Vendor lock-in to a single model provider is the new database lock-in — only worse, because behaviour, not just API shape, is proprietary.

### 5. Development and coding
- **Pre-2022.** Humans write code in an IDE; reviewers gate via pull request.
- **May 2026.** Most net-new code is drafted by an agent and curated by a human. Anthropic, Google and Microsoft have all publicly reported 30-50%+ of net new code in some internal teams being agent-generated. The discipline that scales is eval-gated PRs, agent-readable repo conventions (AGENTS.md, CLAUDE.md), MCP servers for every internal system, and sandboxed code execution for agent-run tests.
- **Shift.** Engineers move from authors to editors, architects, and reviewers of machine output.
- **Risk.** Review fatigue, hallucinated APIs, plausible-looking but subtly wrong concurrency. Copy-pasted licensed code is back, with attribution stripped.

### 6. Testing and quality assurance
- **Pre-2022.** Unit → integration → e2e → manual exploratory. Test pyramid.
- **May 2026.** The pyramid gains two new layers: evals (golden datasets, LLM-as-judge, pairwise comparison) and red-teaming (prompt injection, jailbreak, hallucination, behavioural). OpenAI Evals, DeepEval, Braintrust, Langfuse, Latitude and Confident AI's DeepTeam are the working toolchain.
- **Shift.** From binary pass/fail to multi-dimensional scoring with thresholds.
- **Risk.** Evals are expensive and a bad eval shipped with confidence is worse than no eval at all.

### 7. CI/CD, promotion and release management
- **Pre-2022.** GitHub Actions / Buildkite / Argo. Feature flags via LaunchDarkly or Unleash. Canary, blue/green, progressive delivery.
- **May 2026.** CI gains an evals stage that runs before the deploy stage. Feature flags become capability gates that route traffic between models, prompt versions, and agent configurations. Model upgrades roll out as canaries with eval-based promotion. Dynamic fallback is standard.
- **Shift.** From "ship code" to "ship a (code, prompt, model, eval) tuple."
- **Risk.** Prompt and model versions are now part of your supply chain — and most orgs do not track them as such.

### 8. Launch and go-to-market
- **Pre-2022.** SEO, content marketing, paid acquisition, sales motions, app store optimisation.
- **May 2026.** Discovery happens increasingly inside ChatGPT, Claude, Perplexity, Comet, and Gemini. GEO (Generative Engine Optimisation) is the new SEO. Princeton research shows specific GEO techniques can raise visibility in AI responses by ~27-41%. llms.txt at the domain root, schema.org markup, server-rendered substantive content, and not blocking AI crawlers are table stakes.
- **Shift.** Your audience now includes the model that recommends you.
- **Risk.** A brand can be invisible inside agentic clients while looking healthy in Google Search Console.

### 9. Observability and monitoring
- **Pre-2022.** Logs, metrics, traces. Dashboards. Humans watch graphs.
- **May 2026.** AI-native observability. OpenTelemetry GenAI semantic conventions (v1.40.0, adopted natively by Datadog, Honeycomb, New Relic, Grafana, Langfuse) standardise the gen_ai.* namespace. Honeycomb's Agent Timeline (May 2026) renders multi-agent multi-trace workflows as one coherent timeline.
- **Shift.** From dashboards humans read to agents that read dashboards and propose hypotheses.
- **Risk.** PII leaks via prompt content captured in spans; default content capture off, redact at the Collector.

### 10. Scaling and reliability engineering
- **Pre-2022.** Autoscaling groups, multi-AZ, chaos engineering, SLOs and error budgets.
- **May 2026.** Reliability now means token-budget SLOs, model-fallback SLOs, and agent-step budgets alongside the classic latency/availability ones. The Cloudflare AI Gateway pattern — caching, retries, model fallback, request buffering across reconnects — is becoming standard practice.
- **Shift.** From "is it up?" to "is it cheap, fast, and grounded enough?"
- **Risk.** Cost and quality are now part of the SLO surface.

### 11. Iteration and continuous improvement
- **Pre-2022.** A/B testing, experimentation platforms, growth loops.
- **May 2026.** Every production failure becomes an eval case. Prompt optimisation (DSPy, GEPA-style auto-generated evals) iterates the prompt the way we used to iterate code. Memory systems (Letta, Mem0) let agents learn from their own sessions.
- **Shift.** From experimentation on humans to experimentation on prompts, models, and agent configurations.
- **Risk.** Silent behavioural drift when an upstream model update changes behaviour at step 3 of a 10-step agent.

### 12. Security and compliance
- **Pre-2022.** WAF, bot management, IAM, SBOM, SOC 2, ISO 27001, threat modelling.
- **May 2026.** OWASP Top 10 for Agentic Applications (ASI, 2026 edition) names agent goal hijack, indirect prompt injection, tool misuse, memory poisoning, recursive hijacking, and rogue sub-agents as top categories. Agent identity (OAuth for agents, scoped per-task tokens) is the new IAM.
- **Shift.** Treat every model as a hostile user; treat every agent as a service principal.
- **Risk.** Three-letter regulators are still catching up; you will be asked to prove controls you do not yet have.

### 13. Decommissioning and lifecycle management
- **Pre-2022.** Sunset notices, data export, archive, delete.
- **May 2026.** Decommissioning a capability now requires model deprecation drills, prompt and eval archives, and AIBOM retention to prove what model and data were in use when a specific output was generated.
- **Shift.** From deleting code to retaining provenance of probabilistic systems.
- **Risk.** Most teams have no AIBOM. EU AI Act high-risk obligations are enforceable from 2 August 2026; ISO/IEC 42001 is becoming a procurement requirement.

---

## The new architecture and stack

### The cognition plane

- **Inference layer.** A model gateway in front of multiple providers (OpenAI, Anthropic, Google, Mistral, Llama hosted on Workers AI/Bedrock/Vertex/Together/Fireworks). Cloudflare AI Gateway, Portkey, LiteLLM, Helicone, Bedrock Gateway.
- **Agent runtime.** Per-entity stateful, hibernatable execution. Cloudflare Agents SDK on Durable Objects is the most cohesive primitive shipping today; alternatives include Temporal + Bedrock AgentCore, Inngest, Restate, AWS Step Functions.
- **Memory.** Vectorize, Pinecone, pgvector, Turbopuffer for semantic memory; Letta and Mem0 for episodic; D1, DynamoDB, Postgres for structured.
- **Tools.** MCP servers wrapping internal APIs, with OAuth 2.1 scoping. Public MCP registries now exceed 500 servers.
- **Egress and gateway.** An AI Gateway for caching, rate limiting, retries, model fallback, prompt logging, cost attribution, and prompt-injection scanning.

### The delivery plane

- **Compute.** Edge-native (Cloudflare Workers, Vercel Functions, Fly Machines) for the agent loop and user-facing surfaces; hyperscaler GPU (AWS, GCP, Azure, CoreWeave) for training and large-batch inference.
- **Storage.** R2/S3 with zero-egress for training data portability; KV/D1/DynamoDB for hot state.
- **CI/CD.** GitHub Actions or equivalent, with an evals stage (Braintrust, Langfuse, OpenAI Evals, DeepEval) gating promotion, and capability gates routing traffic.

### Why edge matters now

In a pre-agent app the user does ~1 round trip per interaction. In an agent app the agent does 5-30 round trips per user interaction. Every 50 ms of avoidable network latency, multiplied by 15 hops, is a 750 ms tax on perceived intelligence. Co-locating inference, state, vector search and tools on a single network and runtime collapses that tax. This is why Cloudflare's Workers AI + Vectorize + Durable Objects + AI Gateway + R2 + KV + D1 + Queues + Hyperdrive + Browser Rendering + Agents SDK reads as one stack rather than a basket: the glue code an AWS reference architecture would have you write is mostly absent.

### Where edge falls short, honestly

Large model training, multi-hundred-GPU clusters, specialised hardware (TPU pods, B200/H200 clusters), and customer-data-sovereignty regimes that require named in-country regions are still hyperscaler territory. Bedrock, Vertex AI, and Azure AI Foundry are credible primary stacks; Modal, Together, Fireworks, Replicate and Baseten are credible inference specialists. The right pattern for most enterprises is edge for the agent loop, hyperscaler for the heavy iron.

---

## Risk, governance and control matrix

| Risk | Control surface | Owners | Standards anchor |
|---|---|---|---|
| Non-determinism / reproducibility | Deterministic harness around probabilistic core; full trace capture | SRE + Platform | NIST AI RMF Measure |
| Hallucination / grounding | RAG with provenance, citation-required outputs, eval thresholds | Product + ML | ISO 42001 §8 |
| Prompt injection (direct + indirect) | Prompt firewall, content security policy on tool output, allow-listed sources | Security | OWASP LLM01 / ASI01 |
| Tool misuse / confused deputy | Per-tool scoped tokens, MCP server portals, human approval on destructive actions | Security + Platform | OWASP ASI02 |
| Agent identity at scale | OAuth for agents, per-agent service identities, short-lived scoped tokens | IAM | NIST AI RMF Govern |
| Shadow AI / Shadow MCP | Egress controls, AI Gateway logging, sanctioned tooling | Security + IT | AU VAISS G1, G6 |
| Cost runaway | Per-agent token budgets, model routing, circuit breakers | FinOps + Eng | FinOps Foundation FOCUS |
| Model supply-chain | AIBOM (CycloneDX ML-BOM / SPDX 3.0), signed model artefacts | Security + Legal | EU AI Act Art. 11; OWASP AIBOM |
| Regulatory exposure | AI inventory, risk classification, conformity assessment | Legal + Risk | EU AI Act; ISO 42001; NIST AI RMF; AU VAISS |
| Skill atrophy / over-reliance | Agent-off pairing, post-mortems, deliberate practice | Engineering leadership | — |
| Vendor lock-in | Model-agnostic gateway, eval portability, open weights as fallback | Architecture | — |
| Cascading agent failure | Per-hop budgets, circuit breakers, idempotency keys | SRE | OWASP ASI05 |
| IP / copyright in generated code | Provenance scanning, license filters, indemnified providers | Legal + Eng | — |
| Trust calibration | Graduated autonomy ladder, eval-backed confidence scoring | Product | NIST AI RMF Manage |
| Eval maturity gap | Treat evals as test pyramid layer 0; growth from production failures | Engineering | (internal) |

---

## The SHIFT adoption framework

A deliberately simple ladder a CxO can use on a whiteboard tomorrow.

- **S — Specify with evals.** Before you adopt a single new tool, write the evals that define "good" for your top three workflows. If you cannot write the eval, you cannot ship the agent.
- **H — Harness the loop.** Pick your agent runtime, your model gateway, and your memory layer. Standardise on two coding agents, one model gateway, one agent SDK. Make MCP your default integration shape.
- **I — Instrument everything.** Adopt OpenTelemetry GenAI semantic conventions on day one. Default content capture off, redact at the Collector, capture token usage and tool selection always.
- **F — FinOps the tokens.** Treat per-agent-task cost as a first-class SLO. Per-team budgets, model routing by complexity, circuit breakers, weekly unit economics ("cost per resolved ticket," not "cost per token").
- **T — Trust, then transfer.** Move workloads up the autonomy ladder one rung at a time. Each promotion is gated by an eval threshold, a cost ceiling, and a red-team sign-off.

### The five-rung autonomy ladder

1. **Suggest.** Agent proposes; human always acts.
2. **Draft.** Agent acts in a sandbox; human reviews before merge.
3. **Execute (bounded).** Agent acts in production within a typed, audited tool set; human samples.
4. **Operate.** Agent acts autonomously within a budget and policy; human is paged on threshold breach.
5. **Delegate.** Agent owns the outcome end-to-end; human reviews aggregate metrics.

A useful rule of thumb: **no workflow should be more than one rung above its eval coverage.** A Level-4 agent on Level-1 evals is an incident waiting to be filed.

---

## What to do on Monday

1. **Architecture review.** Map your current stack to the cognition plane. Where are the holes? Where is the glue code that an integrated primitive would delete?
2. **Risk officer.** Walk the 15-row matrix. For each row, name a control owner. Where you cannot name one, that is your next hire.
3. **One workflow.** Pick a single bounded workflow and move it deliberately up the autonomy ladder. Write the evals first. Instrument with OpenTelemetry GenAI on day one. Set a token budget. Ship to rung 2, then rung 3, only when the evals say you can.

The lifecycle has changed. The teams that win the next decade are the ones who notice in time — and the ones who treat evals, agent identity, AIBOM, and edge proximity as load-bearing, not optional.