Four Labs, Four Bets: What the First Four Months of 2026 Reveal About AI Strategy

A close read of every major release from Claude Code, ChatGPT, Gemini, and Grok between January and April 2026, and the strategy hiding inside each one.

May 18, 2026 By Rishab 13 min read

Four Labs, Four Bets: What the First Four Months of 2026 Reveal About AI Strategy AI May 18, 2026 13 min /ai/four-labs-four-bets/ The first four months of 2026 produced very different shipping logs from Anthropic, OpenAI, Google, and xAI. Read together, they reveal four very different strategic bets about where AI value will actually accrue.

If you read the press releases from the four labs that matter, you would think they were running the same race. New models. Bigger context windows. Better benchmarks. A steady drumbeat of agentic this and reasoning that. The vocabulary is identical. The numbers are identical. The choreography is identical.

But if you put the actual shipping logs side by side, January through April 2026, the picture inverts. The four labs are not running the same race. They are running four different races, on four different tracks, with four different definitions of victory. The shared vocabulary is a coincidence of the press cycle. The bets underneath are structurally different, and they will produce structurally different companies.

I spend most of my weekends building with all four of these products, either at hackathons or inside the companies I help on AI adoption. The interesting part is not which one is "best." The interesting part is what each one chose to ship and, more revealingly, what each one chose not to. The first four months of any year are the part of the calendar where strategy leaks through the cracks. Q1 product decisions are downstream of Q3 planning meetings the previous year. By April, the priorities are visible.

What follows is a close read of every major release from each of the four labs between January 1 and April 30, 2026, and an argument about what each one actually believes.

the same four months, four different stories

Here is the inventory, compressed.

Claude Code.

January: SKILL.md support, session forking, cloud handoff, and the --from-pr flag.
February: Claude Opus 4.6 and Sonnet 4.6, agent teams in research preview, fast mode, automatic memories, and PDF page ranges.
March: MCP elicitation support and a tool search layer that lazy-loads only the tools a session needs.
April: Forked subagents on external builds, persistent /model selection, the Bedrock service tier flag, and a steady stream of OAuth and PR resume fixes.

ChatGPT and OpenAI.

January: Voice retired on the macOS app, Custom GPTs transitioned to GPT-5.2, and reasoning thinking time reduced.
February: A quiet fix to the Extended thinking regression in GPT-5.2 Thinking.
March: The entire GPT-5.1 family retired from ChatGPT, legacy deep research mode removed, and a GPT-5.3 Instant tone refresh.
April: GPT-5.5 and GPT-5.5 Pro on April 23, API access the following day. Alongside the models, ChatGPT Atlas shipped agent mode on macOS for paid tiers, and ChatGPT Pulse moved from preview into general availability.

Gemini.

January: Personal Intelligence in beta for AI Pro and AI Ultra subscribers, and a Gemini for TV preview at CES.
February: Gemini 3.1 Pro on February 20, with a one-million-token context window and 114 tokens-per-second output. Nano Banana 2 on February 26, across the Gemini chatbot, Search AI Mode, and Lens.
March: Gemini 3.1 Flash Lite released to developers. Gemini App Actions shipped in the Pixel Drop on Android 16 QPR3. Deeper Gemini integration into Docs, Sheets, Slides, and Drive.
April: Gemma 4 on April 2, Deep Research Max, the Gemini Enterprise Agent Platform, Google Vids video generation at no cost, Colab Learn Mode, and Translate pronunciation practice. Project Mariner Computer Use landed inside Gemini 3.1 Pro and Flash. Cloud Next 2026 unveiled Workspace Studio, a no-code agent builder spanning Gmail, Docs, Sheets, Drive, Meet, and Chat.

Grok and xAI.

January: Grok 4.1 reached all users on grok.com, X, and the mobile apps, with hallucinations reduced from 12.09 percent to 4.22 percent.
February: Grok Imagine 1.0 with ten-second videos at 720p and meaningfully improved audio.
March: Extend from Frame, enabling chained clips up to fifteen seconds.
April: Grok 4.3 beta on April 17 for SuperGrok Heavy subscribers at three hundred dollars per month, introducing native video input, a one-million-token context window, built-in reasoning, and in-chat generation of PDFs, presentation slides, and spreadsheets. The Grok 5 flagship, originally promised for Q1, slipped to Q2 and remained in training through April.

Read as a list, the four shipping logs look comparable in volume. Read as a strategy, they are unrecognizably different from each other.

Claude Code, the protocol bet

If you look at what Anthropic actually shipped in Claude Code over the first four months of 2026, almost none of it is consumer-facing. Almost none of it is a feature you would screenshot. The shipping log reads like infrastructure documentation:

Skills and plugins (a portable, declarative way to teach the model new behavior).
MCP elicitation and tool search (smarter, lazier tool loading inside a session).
Forked subagents and persistent model selection (state management for long-running agent work).
Bedrock service tiers, PR URL resume, OAuth fixes, agent SDK improvements (the plumbing).

This is intentional. Anthropic is building for the builder.

The SKILL.md format is the clearest signal. A skill is a folder with a markdown file and some assets. It is invoked automatically when a model thinks it needs it. The format is deliberately open: by March, the same SKILL.md files were running not just in Claude Code, but in Cursor, Gemini CLI, Codex CLI, and Antigravity IDE. That is the giveaway. Anthropic is not trying to lock skills into Claude. It is trying to make skills a standard, and to be the first lab whose tools speak the standard natively.

This is the Model Context Protocol playbook continued. MCP, which Anthropic introduced in late 2024, became by mid-2026 the de facto way agents discover and call tools. Skills extend the same logic to behavior: a portable, declarative way to give a model a new capability without retraining it. The combination is consequential. If MCP is how a model talks to the world, and Skills are how a model learns a new way of working, then the protocols above the model become the layer where switching costs accumulate. The model itself becomes more replaceable, but the orchestration that surrounds it becomes harder to leave.

The tool search release in March is the same idea applied to performance. Earlier versions of Claude Code loaded every available tool into context upfront, which inflated token usage and slowed responses. Tool search introduced lazy loading: the agent searches for the tools it needs and pulls in only those schemas. This is a humble engineering improvement. It is also a tacit admission that the bet is not on bigger context windows. It is on smarter routing inside a normal-sized window.

Anthropic spent the first four months of 2026 making the layer above the model thicker, not the model itself smarter.

Opus 4.6 and Sonnet 4.6 in February were real. They lead the field on SWE-bench. But they are notable for what they enable in Claude Code, not for what they enable as raw inference. The product strategy assumes that the model wins by being the most useful surface for skilled builders, not by being the most impressive chat partner for a casual user.

This is, in Christensen's language, a bet that profits will migrate upward, into the orchestration and tooling layer, as model capability commoditizes. Anthropic is positioning itself one layer above where it competes today. If the bet is right, the model becomes a feature of the platform rather than the platform itself.

ChatGPT, the surface bet

OpenAI's first four months of 2026 are the opposite move. The model release cadence is, by OpenAI's own standards, slow. GPT-5.2 in January was an incremental update to Custom GPTs. GPT-5.3 Instant in March was a tone refresh, primarily reducing "teaser-style phrasing" in responses. GPT-5.5 on April 23 is the first release of the year that is unambiguously a model launch, positioned for "complex real-world work" spanning coding, research, data analysis, document creation, spreadsheets, computer use, and multi-step tool use.

Four months. One serious model. Compare that to OpenAI's 2024 release cadence and the slowdown is visible.

What replaced model velocity was surface velocity. ChatGPT Atlas, an OpenAI browser built on Chromium, shipped agent mode for Plus, Pro, and Business users on macOS. Agent mode lets ChatGPT complete end-to-end tasks inside the browser: research a meal plan, build a grocery list, add the groceries to a shopping cart. The release notes specifically highlight that agent mode is now "more persistent on repetitive and tedious tasks," like processing hundreds of emails and extracting action items. ChatGPT Pulse, which does asynchronous research once a day based on a user's past chats and memory, rolled out to Atlas for Pro users.

The pattern is unmistakable. OpenAI is no longer trying to be a destination. It is trying to be ambient.

The strategic logic is consistent with where OpenAI's structural strengths actually are.

What OpenAI has:

The strongest consumer brand in AI.
The largest weekly active user base of any AI product.
Microsoft as a distribution partner across every Fortune 500 IT footprint.

What OpenAI does not have:

Silicon of its own.
A mobile operating system.
A productivity suite.

The natural move, when your strengths are brand and user attention, is to extend the surface area where that attention lives.

The browser is the most defensible version of this. If you can put an agent inside the browser, the agent inherits everything the browser sees: every site, every login, every form, every workflow that already happens in a tab. The chat window stops being the product. The browser becomes the product, and the chat is a sidebar. Pulse extends this further into time. The agent works while you sleep.

The risk in this strategy is that it crowds the model layer. If GPT-5.5 is the only serious model launch in four months, and the focus is on surfaces that wrap the model, then by the time GPT-6 arrives, the gap with Anthropic on autonomous coding and with Google on multimodal reasoning may be uncomfortable to close. The retirement of GPT-5.1 in March, three months after launch, signals an internal acknowledgment that the model layer is moving fast enough that older variants need to be pruned aggressively. The slowdown is not because there is nothing to ship. It is because the bet has shifted.

OpenAI is betting that the lab that owns the user's attention surface wins, even if it is briefly second on benchmarks.

This is a defensible bet, but it is a different bet than Anthropic's. Anthropic is selling to the builder above the model. OpenAI is selling to the user in front of the model. They are not really competing on the same dimension.

Gemini, the stack bet

Google's shipping log in early 2026 reads like no one else's, because no one else has the same stack to ship into.

Look at the four months in sequence. CES brought Gemini for TV. January launched Personal Intelligence into the AI Pro and AI Ultra subscriptions. February delivered Gemini 3.1 Pro and Nano Banana 2, the latter shipped simultaneously across the chatbot, Search AI Mode, and Lens. March released a cheaper Flash Lite tier, shipped Gemini App Actions through the Pixel Drop on Android 16 QPR3, and pushed Gemini integration deeper into Workspace. April delivered Gemma 4, Deep Research Max, the Gemini Enterprise Agent Platform, free Google Vids video generation, Colab Learn Mode, and Translate pronunciation practice. Project Mariner Computer Use landed inside both Gemini 3.1 Pro and Flash. Cloud Next 2026 introduced Workspace Studio for building agents across Gmail, Docs, Sheets, Drive, Meet, and Chat.

Every one of those releases is connected to a product Google already owns. Not in passing. Structurally.

Gemini for TV runs on Google's TV operating system.
Gemini App Actions ship in the Pixel Drop.
Personal Intelligence runs on Workspace data.
Nano Banana 2 ships into Search and Lens, the two products with several billion daily users.
Workspace Studio targets the productivity suite that competes most directly with Microsoft 365.
Mariner runs in Chrome.
Gemma 4 runs on the open-weights stack that Google's developer ecosystem has been adopting for inference on TPUs.

The number that gives the strategy away is the API request volume. In March 2025, Google reported thirty-five billion Gemini API requests. In January 2026, that number was eighty-five billion. That is a hundred and forty-two percent year-over-year growth, on a base most labs would consider their endgame.

Google is the only lab whose model strategy is fully congruent with its distribution strategy.

Apple has comparable distribution but does not have a frontier model.
OpenAI has the model and the brand, but no operating system, no browser at scale yet, no productivity suite.
Anthropic has the model and a protocol bet, but no consumer distribution.
Google has the silicon (Ironwood TPUs), the model (Gemini), the productivity surface (Workspace), the operating system (Android), the browser (Chrome), the search engine, and the cloud.

Every release inside the four months above gets multiplied by an existing surface that already touches a billion users.

This is the vertical integration playbook in its most complete form. The risk in this strategy is not that it fails on the merits. The risk is regulatory: a company that owns this many layers of the stack draws antitrust scrutiny by gravity. The risk is also internal: vertical integration only compounds if the teams across the surfaces cooperate, and Google's history with cross-product cooperation is, generously, mixed. Workspace Studio is a useful tell here. It is a no-code agent builder that depends on Workspace, Drive, Calendar, Meet, Gmail, and Chat all behaving as a single platform rather than as legacy product lines. If Workspace Studio works in 2026, the stack play is real. If it stalls, Google has the strategy but not the operating reality.

No other lab can ship a single model release into a TV, a phone, a search bar, a browser, and a productivity suite on the same day. Google can, and is.

The strategic implication for the other three labs is uncomfortable. Anthropic and OpenAI are both, in different ways, trying to build a layer that wraps an existing operating system they do not own. Google is the operating system.

Grok, the creator bet

Grok's first four months of 2026 tell a different kind of story, and it is the one worth being most honest about.

The headline release of the year, Grok 5, did not ship. It was promised for Q1, was reported at six trillion parameters trained on Colossus 2, and slipped to Q2. As of late April, Grok 5 remained in training. This matters. The other three labs all shipped a frontier model in the window, and Grok did not.

What Grok did ship, however, is interesting in its own right and reveals a different definition of the prize.

Grok 4.1 rolled out broadly in January, and the most noteworthy number was the hallucination rate, which dropped from 12.09 percent to 4.22 percent, a sixty-five percent improvement. This is the kind of metric that matters specifically for the use case Grok is optimizing for: real-time information retrieval and commentary on the X platform, where confident wrongness is the most expensive failure mode.

Grok Imagine 1.0 in February enabled ten-second videos at 720p with significantly better audio. Extend from Frame in March chained clips up to fifteen seconds. Grok 4.3 beta in April added native video input, a one-million-token context window, and in-chat generation of PDFs, presentation slides, and spreadsheets, behind a three-hundred-dollar-per-month SuperGrok Heavy paywall.

The pattern is creator-oriented:

Video in, video out.
Slides generated inside the chat.
Image generation as a first-class output.
Real-time commentary tuned for the X platform.

The pricing is built around a small audience willing to pay a premium for a maximalist feature set, rather than a mass audience paying twenty dollars a month for a general assistant.

This is a coherent strategy, but it is a niche one. Grok is the lab least focused on enterprise procurement, least focused on developer protocol, and least focused on operating system distribution. It is focused on the creator economy and on the X platform's installed base, with a willingness to lean into capabilities (video, slides, image) that the other three labs are slower to release because of safety review and regulatory exposure.

The slipping Grok 5 timeline is the part that matters strategically. If you are competing on creator features inside a single distribution surface, frontier model lag is survivable. If you are competing on enterprise reliability and developer protocol, frontier model lag is a problem. xAI is, by its release pattern, choosing the first race.

Whether that race is large enough to justify a hundred-billion-dollar-plus valuation is a separate question. The honest summary of the first four months of 2026 for xAI is that the company shipped what it could ship, did it competently, and did not ship what it had promised. That is not a failure. It is also not the trajectory of the other three labs.

the pattern beneath

Step back from the individual release notes, and four sentences capture the four bets.

Anthropic is betting that the layer above the model is where the durable value sits, and that an open protocol they author wins by becoming the standard.
OpenAI is betting that the surface in front of the model is where the durable value sits, and that owning the browser and the agent loop wins by absorbing user attention.
Google is betting that the stack underneath the model is where the durable value sits, and that owning silicon, operating system, productivity suite, and search wins by compounding distribution.
xAI is betting that the niche around the model is where the durable value sits, and that owning the creator economy on the X platform wins by going first on capabilities the bigger labs will not ship.

Three of these bets are mutually compatible. Anthropic's protocol layer, OpenAI's surface layer, and Google's stack can all coexist, and probably will, for years. They are competing for adjacent kinds of value, not for the same kind of value. The customer who pays Anthropic for the developer protocol is not the same customer who pays Google for the integrated productivity stack, and neither is necessarily the customer who runs ChatGPT Atlas as their daily browser. Multi-model deployment is now the default in the enterprise, and it remains the default in the prosumer market. None of these companies need a winner-takes-all outcome to justify their current valuation. They each need to win their layer.

The fourth bet, xAI's, is the one most exposed to compression. Niche capability advantages are the most volatile kind of moat. When Google or Anthropic decides to ship a video model or a slide generator, the niche feature set is closed in months, not years. The frontier model lag becomes more punishing as the other labs accelerate. The bet only works if xAI either lands Grok 5 quickly and convincingly, or if the X-platform integration deepens enough that the distribution surface itself becomes the moat. Both are possible. Neither is settled by the April 2026 shipping log.

Four labs, four bets, four layers of the same stack. The interesting question is not which one is best. It is which layer turns out to be the load-bearing one.

I do not think anyone knows the answer yet. The honest read of the first four months of 2026 is that each lab made its choice visible, the choices are coherent, and they are about to be tested against each other over the next eighteen months.

where this leaves the builder

For anyone building on top of these systems, the right reading of this shipping log is not "which lab is winning." It is "which lab is most useful for the kind of work I am doing." A rough map:

Autonomous coding, internal developer platforms, or anything that needs a model to use many tools reliably with low token spend. Claude Code is, in my hands, ahead. The Skills format, MCP elicitation, tool search, and forked subagents are not loud features, but they compound. The token economics matter at scale. The agent reliability matters at scale. The cross-tool portability of skills matters when you do not want to bet a stack on a single vendor.
Consumer-facing automation, ambient agents, or anything that needs to operate inside a user's daily browser flow. ChatGPT Atlas with agent mode is the most complete product. Pulse is the right name for the right shape of feature. The risk is the model layer cadence, which is something to monitor over the next two quarters.
Enterprise productivity, especially anything inside Workspace, or anything that requires multimodal at low latency and free or near-free pricing. Gemini is increasingly the path of least resistance. The Workspace Studio bet is the one to watch, because it is either going to be a serious agent-building platform by the end of 2026 or a useful warning that vertical integration has limits.
Creator output, video, or anything where the X platform is part of the distribution loop. Grok 4.3 is the most opinionated option. The pricing is real, the niche is real, the frontier model lag is real. Build accordingly.

The thing not to do, which I see too many teams do, is treat the four labs as interchangeable and pick one on price or familiarity. They are not interchangeable. They are not even running the same race. The first four months of 2026 made that visible. The next four are likely to make it permanent.