desplega.shState of AI archive

desplega labs newsletter archive - type `help`

19 State of AI issues loaded from Resend.

guest@desplega.sh:~$

State of AI newsletter archive

#01 - Jul 15, 2026

State of AI: The Harness Bash Gambit

Just Bash at Brex, CLI Harnesses review, Chess in AI era

Read plain text

Our agent-swarm https://github.com/desplega-ai/agent-swarm crossed 600 stars, and 50 daily active swarms around the globe (we know of).

Here's what else happened.

----------------------------------------

The Harness Competency Paradox

tl;dr: Your team is using Claude Code, Codex, Cursor or whatever they want. Whatever they want!?

State of CLI Coding Agents https://blog.arcbjorn.com/state-of-cli-coding-agents-2026?utm_source=tldrdev is a satisfying deep dive into the evolution of our CLIs during the past 3 years. It declares the winners are:

* Claude Code. The best UX, and they know it. They are so sure about it that they are willing to charge and lock you in Apple style. I use Android devices. You do you.

* Codex. A similar set of functionalities, and ever improving feature parity, and it’s open source. This last part is key. Nowadays adapting a harness is cheap, closed source makes something cheap impossible. They don’t.

* Oh-my-pi. The article leans strong on why Omp is a favorite: more features. 2 things that we actually believe are key: AST & Task routing. This means less token consumption, and faster results. We believe everyone will start adding these layers soon, and Omp should keep running ahead.

The interesting bit here is how the industry evolved in the past 3 years. No mention of coding agents, e.g. Devin, or other tools meant to take over your dev SDLC. That’s telling.

→ Adopt. You don’t need to read the whole article, just this table https://blog.arcbjorn.com/state-of-cli-coding-agents-2026?utm_source=tldrdev#feature-leaders. Each row should highlight an area in which you may want to improve. Each column will give you ideas on where to look

----------------------------------------

Jus’ Bash.

tl;dr: 3 lessons from Brex migrating their agents from Docker to just-bash.

Simple solutions for complex problems is a mantra some people forget, just-bash reminded Brex https://x.com/brexHQ/status/2077063945085415655 of 3 rules you must follow in the AI Native era.

* Scale. Their results benefit from just-bash not after 10 line items, but after 1000s. That’s when their context becomes tricky. And actually, that’s when you see the real benefit. Many early tech individuals will think that “works for me” is still valid in a Pareto-like fashion. It’s not.

* Atomicity. The power Brex discovered with just-bash is the agent native architecture Every.to https://every.to/guides/agent-native has been preaching for months now. Granular & composable commands allow your agents to develop a myriad of unexpected solutions. I’m sure Peano and Gödel may be having a conversation about that.

* Compounding. At the end they reflect on how looking at the traces they generated new tools/scripts/short-cuts for repeated actions. That is how your system learns and avoids spending infinite tokens.

→ Adopt. We’ve been doing this in the swarm https://github.com/desplega-ai/agent-swarm for a while, just copy it. Run us over!

----------------------------------------

The Orangutan Opening

tl;dr: Anish Giri is a controversial figure, and they reflect on how chess works nowadays.

You can see how the study of the game has shifted https://x.com/anishgiri/status/2075935342327144888. In his very own words:

Gotcha! https://x.com/anishgiri/status/2075950714665251315/photo/1

→ Challenge. If it resonates, it’s twice as dangerous. Trust is built through human interaction, and honest feedback. Think how to have a 30 min call with someone else, before blindly agreeing online…

----------------------------------------

Gotcha! 15-minutes https://calendar.app.google/jkGPyw8aTrEMH1NQ8 is enough.

Or keep reading newsletters https://www.desplega.sh/newsletters. Your call.
Best,

----------------------------------------

Sent with ❤ by Desplega Labs http://desplega.sh/ from Barcelona
unsubscribe .

#02 - Jul 08, 2026

State of AI: The Script of MultiMetaVerso

Meta, Argentina's DAO corp, AI scripting benchmark.

Read plain text

agent-swarm https://github.com/desplega-ai/agent-swarm/tags was featured in the tl;dr IT newsletter https://tldr.tech/it/2026-06-30. Here's what else happened.

----------------------------------------

Scripting your LLMs away

tl;dr: We finally ran a benchmark on our ‘learn by scripting’ strategy. 10x savings on toolcalls https://www.agent-swarm.dev/blog/code-mode-token-savings.

If you grep our archive https://www.desplega.sh/newsletters, or you’ve been part of our workshops, you know we are proponents of 3 ways of learning: weighted memories, deterministic verification steps, and systematically reducing LLM’s scope. For the last one we have been working on scripting the toil https://www.pleasedontdeploy.com/p/our-strategy-to-deal-with-llms-prices, similar to Google’s famous Eliminating Toil https://sre.google/sre-book/eliminating-toil/ strategy.

You should remember:

* It’s not a new idea. Codemode variants have been around for a while. Anthropic https://www.anthropic.com/engineering/code-execution-with-mcp and Cloudflare https://blog.cloudflare.com/code-mode/ have written about it. It continues to take second place in most teams, given the complexity for generalization.

* Seed scripts are gamechanger. Our agent swarm maintains an active internal repository with scripts https://github.com/desplega-ai/agent-swarm/blob/ccabfb506a21632cb5b6227d380f0af801da4eeb/src/prompts/session-templates.ts#L455. These scripts are created & maintained by our swarm exclusively.

* 10x is a conservative floor. Our results were 0.02$ vs 2.44$ using sonnet-5 prices. Tool results would stay in the context, so roundtrips will count many times. We used ‘simple’ flows to minimize this in our benchmark.

→ Adopt. Learning means providing simpler deterministic solutions to complex problems. Memory is not enough. This is mandatory.

----------------------------------------

DAO AI CORP.

Tl;dr: Milei’s dream of a non-human company with only stakeholders… almost.

Polsia https://polsia.com/ was the first project to try creating companies with minimum, if any, human intervention. Milei is trying https://www.reuters.com/world/americas/argentinas-plan-ai-run-companies-cant-avoid-humans-2026-07-03/ to take it one step further:

* Non-human corporation. The company will be responsible for its decisions, with the difference that it may not have an HR department. The question is: who answers beyond the company capital?

* DAO. The model allows for companies that are decentralized autonomous organizations (DAOs), built on blockchain, enabling members to vote on proposals with digital tokens. With a small hiccup, anonymity won’t be a thing.

* Human accountability. The other limiting factor, indeed, is that your AI cannot sign some papers, so you’ll need human supervision.

→ Challenge. Is this a sign of a bubble? Is this Argentina signaling an investment opportunity? Or is the future one in which everyone invests in different AI companies?

I asked Fable for a joke on this, it wasn’t funny. It did mention the figurehead concept which takes us to…

----------------------------------------

Metaverse means datacenters

Tl;dr: Is Mark believing too hard, again? https://finance.yahoo.com/technology/ai/articles/laying-off-8-000-employees-121545621.html?guccounter=1&guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce_referrer_sig=AQAAABYGbYEtHNghZadHhtQg03zRfcUJ5wjRD3Bm8x77q3JwnFb2F9jrKtgEE4_6DvD-Adin9FiHhc6MplhXgL2-IlEuY7wRe56mJRp8IoCQlNavOJeyFadwOVKFbqR9VKtstqfI73KNLds8IFgSYmoN93YWlcIRb1IQ9V76aWO0C01W

After announcing all employee activity would be recorded, most engineers would work on AI, and continuing the rounds of layoffs, Meta woke up to the wrong culture. The chat with your CEO:

* Accountability. You can delegate responsibility, not accountability. That accountability is limited for how much you can oversee. That’s at core for many roles.

* It’s not taste, it’s knowledge. Arguing taste is inherently human assumes taste is a moat, ie. you have better taste than others. We recommend you steer away from it, instead take one from Mark's book: “it’s not accelerating in the way we expected”.

* Culture. This is the most obvious and loud insider voice we heard. Top performers are looking for new opportunities, nobody is really doing their job. Everyone is waiting for doomsday. Sounds familiar? I hope not.

AI startups are far from having a working business model https://x.com/GergelyOrosz/status/2074507651245895818?s=20. If you do, that’s what you need to protect.

→ Wait. Meta wants to sell shovels too, and may start looking again into their next multi-billion dollar moneypit. Anthropic and OpenAI may be delaying their IPO. Many startups haven’t hit their ambitious AI driven targets, Bending Spoons made a business of optimizing failed ventures...

----------------------------------------

Yesterday we found out more Ops teams are trying the swarm. Tell us your story!

Or keep reading newsletters https://www.desplega.sh/newsletters. Your call.
Best,

----------------------------------------

Sent with ❤ by desplega.sh http://desplega.sh/ from Barcelona
unsubscribe

#03 - Jun 25, 2026

State of AI: Slacking on a Flat Highway

The flat curve society, code-health strategies & Claude Tag

Read plain text

We shipped https://github.com/desplega-ai/agent-swarm/tags. Here's what else happened.

----------------------------------------

Steve flats out

tl;dr: The flat curve society https://steve-yegge.medium.com/the-flat-curve-society-36c8b01eb33b is an essay on what happens once AI becomes highly regulated.

This time, it’s not about a new AI framework, it’s about seeing the finish line ahead of us. It’s an arbitrary and pragmatic reflection on the state of AI. What to look for:

* Enforced Plateau. Either because of pricing, problem space complexity, or regulation, the thesis is that new model generations will be more restricted. The first two parts of the argument (pricing & complexity) have become evident to many in the past months.

* SaaS unit economics. Token usage maturity & subsidized pricing has led teams to believe they could effectively replace SaaS providers. This year, people are realizing that they have 2 options: pay for access to expert software, and/or become AI Literate.

* AI Literacy. The final part of the essay describes how he’s seen training work in organizations. From employees that with 0 AI usage, towards making them operate swarms of agents. It’s a process, and he doesn’t delve into the resistance that comes into play, but he calls it out clearly.

The first two points have been addressed from different perspectives in the past few months. The third one, though…

→ Adopt. If your team did one of our multi-day workshops, then you’ll notice the AI Literacy blueprint matches ours. We agree with it. And for Tech teams, it’s needed. The biggest risk factor? The article says it cleanly: “The manager _must_ opt the team in” → change management.

----------------------------------------

Highway to legacy code.

tl;dr: Tornhill https://www.linkedin.com/feed/update/urn:li:activity:7474758377361907713/ keeps pushing for broader understanding on building tech debt.

We deep-dive one more time on Adam Tornhill (‘your code as a crime scene’ author). We strongly believe in 6 months everyone will be coping with:

* Complex unstructured code. Agents are good at writing it, and are very bad at understanding it. In 6 months, many will be at a state where they would need some mandatory ‘refactoring time’.

* Obscure design decisions. Naming conventions https://adamtornhill.substack.com/i/200246581/why-this-matters-in-ai-first-development will become key, and those forgetting about coding guidelines will pay the highest price. The “two different variables with the same name” is a problem for humans and LLMs alike.

* Quality is the final factor. Companies that rush into agentic coding https://www.luminarventures.com/post/founder-series-a-conversation-with-adam-tornhill-founder-of-codescene without the right engineering foundation risk quality issues. Your teams need help adopting these methods safely and effectively.

→ Challenge. Don’t outsource the thinking!
Introduce small practices that will stretch the horizon. We have seen good results with our weekly code-health workflow https://docs.agent-swarm.dev/docs/playbooks/code-health-alert-management leveraging knip https://github.com/webpro-nl/knip and desloppify https://github.com/peteromallet/desloppify. We also experimented with code-maat https://github.com/adamtornhill/code-maat to give a better understanding to our agents. If you know of better OSS libraries, reply!

----------------------------------------

Trojan Slacking Horse

tl;dr: It’s like Linear, Vercel, or anyone? SOTA model that you can tag https://www.anthropic.com/news/introducing-claude-tag. What could go wrong?

Claude Tag is here, a true peer, with an uncertain comp package.

* Governance. You can restrict-access https://claude.com/docs/claude-tag/admins/restrict-access. You can decide what Claude is allowed to do, who’s allowed to invoke it, and what activities require a specific owner… and you can set per channel & org limits. No user budgeting just yet.

* Routines, and other pre-established prompts. Yes, you can use all the power of slack https://claude.com/docs/claude-tag/users/proactivity to trigger Claude, that’s good but… I’m sure you are doing it already.

* 1 slack message at a time. Anthropic is learning how your company operates https://techcrunch.com/2026/06/23/anthropics-claude-tag-is-learning-your-company-one-slack-message-at-a-time/, nice! It's also becoming deeply ingrained with your core business. That's the lock-in. Not the "raw channel memory", but the company operating context https://alphasignalai.substack.com/p/the-real-claude-tag-question-is-context.

→ Challenge. You probably already use multiple providers (Linear, PostHog, Devin, etc.), your own software or an OSS solutions for this. Well done! Why? Your decision matrix https://www.linkedin.com/posts/ecura_decision-matrix-ugcPost-7463213353201618944-Ff7J/?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAPiJJwB0LmpHdW8bLd4wZ0_Zuy-prVZFNo should red-flag: (a) provider dependency, and (b) cost-efficiency.

As Yegge mentioned, for most tasks, you probably don’t need to be that smart.

----------------------------------------

Thoughts? Hit reply!

Or keep reading newsletters https://www.desplega.sh/newsletters. Your call.
Best,

----------------------------------------

Sent with ❤ by desplega.sh http://desplega.sh/ from Barcelona
unsubscribe

#04 - Jun 17, 2026

State of AI: Grepping Anthropic & Eve.

Vercel launches Eve, Anthropic backpedals, grep vs vector search.

Read plain text

Good news, everybody!

Agent-swarm https://agent-swarm.dev/ executed at least 50,000+ tasks in the last 30 days.
The number is likely higher… but we'll never know ;)

We are defining our roadmap, give us your feedback https://forms.gle/XFMxhE3PbcGHzU2K6.

----------------------------------------

Anthropic backpedals

tl;dr: Silent email saying you can still use claude -p… after Fable got disabled.

Given the recent Anthropic fumbles, we are waiting on news. I’m sure they’ll have something more controversial to say, still..

Please avoid lock-in, from any provider. The uprising of predatory behavior from providers should make it that you think twice where your memory and workflows live. Before you choose your next provider for this critical IP, ask yourself: can I operate without them?

If the answer is no, that’s why you should love your Tech team.
→ Challenge. Re-think your solutions across your whole company (Marketing, Sales, Ops, CX, Tech, UX). Prices will continue to increase. Which takes us to Vercel...

----------------------------------------

Dawn of eve

tl;dr: Vercel aims to be your agentic infrastructure https://vercel.com/blog/introducing-eve. We are expectant.

The trend continues, now with a new landing and the release of Eve. Eve is an OSS (Apache 2.0) framework to build your own agents in their style. What we are seeing:

* Vercel is accelerating. A year ago, agents triggered less than 3% of their deployments. Now, they trigger around 29%, and they expect half to come from agents "soon".

* Sales, Marketing, Ops, and more. They claim their use cases go across the board, and we believe it. They have a main routing agent V, something that we consider fundamental for any AI Native company out there.

* Memory, Governance & Infra. We are deep-diving into how intertwined it’s to Vercel https://github.com/vercel/eve infra, and how it actually provides governance and memory beyond the basics. The project seems early and it's not meant to be a Company Agentic OS, however the primitives are there.

Historically, vercel has been extremely developer friendly, and able to keep moving at a fast pace while providing quality products. We are excited to see what's next here.

→ Wait. Ask your codex/claude/swarm/etc. to create a report answering what can you improve in your system. You’d be surprised. Change management is the challenge.

----------------------------------------

Don’t overthink it, grep it.

Tl;dr: Last week we took some key learnings from "Is Grep All You Need? How Agent Harnesses Reshape Agentic Search" https://arxiv.org/abs/2605.15184, with positive performance improvement.

The swarm wrote a nice article with our learnings https://www.agent-swarm.dev/blog/is-grep-all-you-need-agent-memory. Check the charts. Basically:

* Embeddings aren’t the problem. Weighted formulas based on time were. A curated 76-day-old memory, even if it was an exact match, kept only about 2.3% of its score.

* Minimum similarity. It’s better to not have memories than returning irrelevant ones, that’s a lesson difficult to see immediately.

* Vectorial is worse. grep beat vector retrieval for every harness-model pair tested. Opus under Chronos reached 93.1% vs 83.6%. GPT under Codex hit 93.1% vs 75.9%.

Definitely something to learn from, as it matched our own experience in the area.

→ Adopt. Read the paper! And consider (a) overall scores depend strongly on which harness & tool-calling style is used, because (b) they do not claim that grep "beats" vectors in general, only that it can win end-to-end under their task distribution (e.g. scientific synthesis, visual-heavy documents, code semantics).

----------------------------------------

Do you have something to say? Reach out! https://calendar.google.com/calendar/appointments/schedules/AcZssZ2KIm8jUERawt74OzBLFH0CG-XOrdvI4wbwDbXPu1yfEaxSI3x_-_b67Iu4xxbsS1-jkQqklo9Z
https://calendar.google.com/calendar/appointments/schedules/AcZssZ2KIm8jUERawt74OzBLFH0CG-XOrdvI4wbwDbXPu1yfEaxSI3x_-_b67Iu4xxbsS1-jkQqklo9Z
https://calendar.google.com/calendar/appointments/schedules/AcZssZ2KIm8jUERawt74OzBLFH0CG-XOrdvI4wbwDbXPu1yfEaxSI3x_-_b67Iu4xxbsS1-jkQqklo9ZOr keep reading newsletters. Your call.
Best,

----------------------------------------

Sent with ❤ by desplega.sh http://desplega.sh/ from Barcelona
unsubscribe

#05 - Jun 10, 2026

State of AI: Fables are nice, the real world is messier

Fable, Hard Takeoff, and dynamic workflows.

Read plain text

We shipped https://github.com/desplega-ai/agent-swarm/tags. Here's what else happened.

Intelligence on swapping providers continues to be the trend. 3 short updates.

----------------------------------------

Anthropic says: please don’t deploy!

tl;dr: Silent capability degradation on "sensitive" topics is undetectable without systematic benchmarking

It seems Anthropic has concerns on Hard Takeoff https://www.lesswrong.com/w/ai-takeoff approaching, that is: AI can start self-improving itself faster than humans can understand what’s actually happening. Three things to keep in mind:

We live in a world where creating is becoming “easy”, and perfecting is the challenge. Here the challenges we see:

* Focus on code and model improvement. The article is narrowed scoped to software. We know complexity usually lies in the edge (human interactions). Check it out https://www.anthropic.com/institute/recursive-self-improvement

* They are restricting users. We’ve seen anecdotal evidence of prompts being restricted simply on the assumption that you are doing something they consider wrong. It’s a reminder of Google cracking down Gemini in the early days. Check this https://x.com/SemiAnalysis_/status/2064482714149896431?s=20.

* The Jobs Apocalypse never happened. Altman and Amodei have been vocal about this for over a year, however we haven’t really seen that happen. There’s a job re-allocation happening, though. See here. https://time.com/article/2026/05/26/sam-altman-ai-job-losses-openAI-/

→ Challenge. The most concerning part is that your models could purposely (and transparently) start ignoring your commands or giving you suboptimal responses.

----------------------------------------

Flash Moral Fable

tl;dr: It’s great, but is it worth the price? (and not just the financial one)...

We used this opportunity to revisit prices since Haiku 3 launched.

* Nerfing and IP terror. Gergely puts it best https://x.com/GergelyOrosz/status/2064618497150210391?s=20, this time you are at their disposal. These are two meaningful disadvantages on how Anthropic is operating: collecting data, and controlling model performance at will.

* CFOs don’t see the ROI. As prices increase, CFOs are growing more concerned about the expenditure, and the ROI they see. It’s becoming a trending topic in multiple forums. If you are leading the transformation, be ready to shift from “are you using AI?” to “why are we using AI?”

* Change our mind. Pricewise, Fable cost is about 90x DeepSeek v4 Flash, 50x M3, and 2x Opus 4.8. ROI could be eroded with that price tag. DeepSeek v4 Flash & M3 are the reigning bargains.

→ Wait. Two things you need to start thinking about (a) how do you reduce friction when changing providers in your company, and (b) how can you ensure the right models are used for the right tasks. This is an engineering problem now.
Ideas? Let’s chat https://calendar.app.google/1zzvrHnW7zxfXFt49.

----------------------------------------

Don't take shoveling advice from shovel sellers.

tl;dr: “You should be designing loops that prompt your agents.” - Peter Steinberger. https://x.com/steipete/status/2063697162748260627?s=20

They say loops, or dynamic workflows, we say asymptotically deterministic.

* AI should be doing less and less. Back in February https://tomtunguz.com/hybrid-state-machine-agents/, many started implementing strategies to reduce the % of LLM usage in their workflows. For us, ~37% of our workflows steps are deterministic, and that number keeps growing. Those script steps? Generated autonomously by the swarm from compounded learnings.

* Dynamic but deterministic. A good example on how Claude doesn’t optimize on consumption, their Dynamic Workflows https://claude.com/blog/introducing-dynamic-workflows-in-claude-code are not optimized for your bill. Instead try something like the one-off script workflow https://docs.agent-swarm.dev/docs/guides/script-workflow-runs strategy.

* Stripe Minions https://stripe.dev/blog/minions-stripes-one-shot-end-to-end-coding-agents. Building your infra for your agents is key, so that coding becomes less of an ordeal with human interaction and better suited with agentic tooling.

→ Adopt. Workflows are how you get the best out of your cloud agent swarms. This is true for every function in your team. Unifying those systematizes https://www.youtube.com/watch?v=B246K_G7mHU how Tech, Product, UX, Marketing, Sales, CX, etc operate. Furthermore, the specialization would allow you to use cheaper models, where the ROI is clear.

----------------------------------------

Thinking of self-hosting SSO? Check this https://docs.agent-swarm.dev/docs/guides/self-hosted-sso.

Or keep reading newsletters. Your call.
Best,

----------------------------------------

Sent with ❤ by desplega.sh http://desplega.sh/ from Barcelona
unsubscribe here.

#06 - Jun 02, 2026

State of AI: How fast do you (really) need to run?

Open vs Closed LLMs, 9-9-7 fallacy, Don't outsource learning.

Read plain text

We shipped https://github.com/desplega-ai/agent-swarm/tags. Here's what else happened.

----------------------------------------

So CLOSED yet, so OPEN.

tl;dr: Minimax 3 https://www.minimax.io/models/text/m3, launched yesterday, matches GPT5.5 (launched 6 weeks ago), the gap is closing.

Top models are getting more expensive, and they have an edge, is it worth it?

* Innovation Labs. By any standards, at a given point in time, the latest closed models are the ones bringing groundbreaking innovation. It’s undeniable. But...

* Open Models accelerate. Open Models are reducing the time to catch-up, from 1.5yrs, to a quarter. Particularly true if you have specialized agents per discipline (e.g. a coding agent, a browser agent, a content agent). In those cases, the potential upside is eroded fast, if any at all.

* Enough is Enough. Invest time choosing an accurate productivity metric. Define how ‘good’ looks for that particular area of your business. Then optimize. See 12 ways to get it wrong for SWEs. https://third-bit.com/2026/05/20/twelve-ways-to-be-wrong/

"Deepseek caught up to Opus 4.6’s capabilities on GPQA in 3 months, and MiniMax did the same on SWE-Bench." - Stephen O’Grady https://redmonk.com/sogrady/2026/05/15/open-ai-models/; true and already outdated.

→ Wait. Don’t upgrade immediately to the latest available model, first decide what is the milestone you need to unlock. Above all, start specializing your harnesses and models. That’s where you get cost effective.

----------------------------------------

Don’t outsource thinking, analyzing, learning, but…

tl;dr: “The people who blindly trust agent output are in the former camp. They're sheeple, overdrinking from a fountain of mediocrity” - Ghosty creator https://x.com/mitchellh/status/2060088112257372610?s=20.

The industry is learning what LLMs are capable and incapable of, and swinging from left to right. What you should remember, there’s a healthy middle:

* It’s your criteria. And you need to embed it into your agentic flows. That’s what Senior engineers and TLs always did, build the framework so others can operate without knowing the whole.

* Expertise saves time. We also know this, deep expertise is always better to achieve solutions with less iterations. That’s why the industry is so focused on hiring Sr. individuals.

* Delegate until it hurts, hurts. Whenever you delegate, you know the result is probably not going to be as you wished. The real question is where and how can you make it so that the quality drop doesn’t happen.

In all the conversations during the past week, the learning is always the same one: we weren’t ready to give away this part of our process.

→ Adopt. Frame it in a different way: what are the areas in which you can get reasonable proof-of-work without human mediation? Those are the areas to focus on agentic first, while building better validation & verification systems. For the rest, you know how to handle it, you’ve been doing it for a while.

Need Ideas? Check our Agentic Playbooks https://docs.agent-swarm.dev/docs/playbooks

----------------------------------------

9-9-7 fallacy.

tl;dr: “A Company that is constantly on fire is a company that is not operating well” - CEO Linear https://x.com/karrisaarinen/status/2061139112426623054?s=20.

We live in a world where creating is becoming “easy”, and perfecting is the challenge. Here the challenges we see:

* Brain Fry. We discussed it in the past, resting and leisure time help creativity. Time is what helps perfect a solution. If you don’t stop, the quality is going to drop. Check it out https://hbr.org/2026/03/when-using-ai-leads-to-brain-fry

* Grindmaxxing is not the objective. There are moments in which it’s needed. However if that’s the narrative, then it means you are losing sight of what actually matters. Going fast in the wrong direction, or believing that because you move fast you are on the right track is a problem in itself.

* Trust is a function of how careful you are. Industry leaders will actually trust you less, not more if you move too fast. For businesses Trust and Stability come first, before innovation. After all, they’ll build it themselves if they thought it was critical and core for them.

The point from Nico Laqua https://x.com/nico_laqua/status/2061140358235578740?s=20 seems to be mostly around motivation and search for higher meaning expressed as worked hours. It’s a trap.
→ Challenge. The key here is not to discuss how much one should or shouldn’t work, but focus on what is the objective you are trying to achieve. Tokenmaxxing https://x.com/chamath/status/2044126353205932452?s=20 being a good example of it.

----------------------------------------

Our swarm just learned how to use Whatsapp, Attio & Google Drive https://docs.agent-swarm.dev/docs/integrations. Wanna compare notes? https://calendar.app.google/1Qz8vjsQg9Hed9W8A
https://calendar.app.google/1Qz8vjsQg9Hed9W8A
https://calendar.app.google/1Qz8vjsQg9Hed9W8AOr keep reading newsletters. Your call.
Best,

----------------------------------------

Sent with ❤ by desplega.sh http://desplega.sh/ from Barcelona
unsubscribe here.

#07 - May 28, 2026

State of AI: Days of Future Past.

AI in the 1950s, 2000s & 2010s, Polsia raises $30M, Agentic Playbooks.

Read plain text

We shipped https://github.com/desplega-ai/agent-swarm/tags. Here's what else happened.

Agent swarm https://github.com/desplega-ai/agent-swarm is closing to 500 ⭐ in github, but the most exciting part is that the list of collaborators keeps growing. Don't be shy! Reach out contact@desplega.sh.

----------------------------------------

A blast from the past: Great minds through tectonic technology shifts.

tl;dr: Please Don’t Deploy https://www.pleasedontdeploy.com/p/it-doesnt-matter-again explores what was thought of Tech in the 1950s, 2000s and 2010s.

‘Those that forget the past are condemned to repeat it’, we found 3 lessons worth remembering:

* IT definition accounted for AI. Written in 1958, Management in the 1980s https://stacks.stanford.edu/file/druid:fv912fw0448/fv912fw0448.pdf, references explicitly the work from the AI pioneers. It questions what work will look like in the future, once AI can do the “top job”.

* Tech job changes, always. In 2003, ‘IT doesn’t matter https://www.classes.cs.uchicago.edu/archive/2014/fall/51210-1/required.reading/ITDoesntMatter.pdf’ argued how once a specific technology becomes table-stakes, it’s no longer a competitive advantage. You need to move to the next level. Tunguz sees it as a transition towards Agent Gravity https://tomtunguz.com/agent-gravity/.

* Bezos, Ballmer, and others saw it. Steve Yegge’s Google Platform Rant https://courses.cs.washington.edu/courses/cse452/23wi/papers/yegge-platform-rant.html summarizes how a visionary CEO in 2002 shifted their team work and started building AWS years before its launch. IT still mattered. Today AWS has captured ~30% of a previously inexistent market.

You can build it yourself, you can rely on experts, you can sit and watch. The last one will lead to diminishing leverage from IT.

→ Adopt. Time to talk with your CEO, or your Tech leader. Your Tech team needs to change their day-to-day, heavily. If you need ideas, check Dan Shipper latest podcast with Lenny https://www.lennysnewsletter.com/p/the-ai-paradox-dan-shipper.

----------------------------------------

AI Slop -> Polsia https://polsia.com/ raises $30M

tl;dr: Taras https://www.tarasyarema.com/blog/2026-05-27-slop-machines makes an argument on why you should never outsource the thinking.

‘AI That Runs Your Company While You Sleep’ means it’s not your company:

* Human-in-the-loop is key. Good companies are built through painfully long iterative processes. It’s not that what the AI will do is not good, it’s simply not what you want.

* Cognitive Meltdown. An incredible Executive Coach used to say that companies grow as fast as their founders can learn. That learning only happens if you know what you are trying.

* Lock-in. These platforms are learning with your money, while sharing revenue with you. It’s the best of both worlds for them, but how do you move away from a provider and ensure they don’t become competitors?

Agent Swarm is MIT/OSS https://agent-swarm.dev/ exactly for this last point. The danger is that when a system learns for you, it also learns for them.

→ Challenge. If you want to use a slop(t) machine, use it as such, have fun. If you have higher hopes, the numbers are quite telling https://x.com/brookejlacey/status/2059362649889178090.

----------------------------------------

Agentic Playbooks

tl;dr: We started a library with our existing Agentic Playbooks https://docs.agent-swarm.dev/docs/playbooks for our agent-swarm.dev http://agent-swarm.dev.

You can use them as template for any agentic operating system you are running, what you would see:

* GTM Playbooks. Research competitors, relevant topics, generate tooling, social media post drafts, and email drafts. Also, Account Management reports, lead generation, and more. Everything meant to be reviewed and validated by humans in the loop. That’s how the system learns.

* UX - Product - Tech Playbooks. Those workflows that ensure healthy code, functional user experience, research your observability metrics, and help you with the grunt work before the true decision making happens.

* Hot Patterns. 5 patterns https://docs.agent-swarm.dev/docs/playbooks/patterns we rely heavily on for all our automated agentic workflows. When to use them, motivation behind it, and how they have worked thus far.

It’s a good source for ideas, and improvements you may want to do in your own systems.

→ Adopt. A short read will give you new ideas, and if you have different ones, share them.

----------------------------------------

If you are still struggling to make your team understand what AI Native https://www.pleasedontdeploy.com/p/you-are-not-ai-native-yet looks like for you, reach out https://calendar.app.google/1Qz8vjsQg9Hed9W8A!

Or keep reading newsletters. Your call.
Best,

----------------------------------------

Sent with ❤ by desplega.sh http://desplega.sh/ from Barcelona
unsubscribe here.

#08 - May 11, 2026

State of AI: The Bayesian Coin Flip.

Coinbase layoff, Anthropic & SpaceX, Bayesian memory vs LLMs

Read plain text

We shipped https://github.com/desplega-ai/agent-swarm/tags. Here's what else happened.

----------------------------------------

Outcome: Coinbase https://www.reuters.com/business/world-at-work/coinbase-cut-about-14-workforce-2026-05-05/ Layoff https://x.com/i/status/2051616759145185723

tl;dr: @championswimmer https://x.com/championswimmer/status/2051807284691612099explores what these layoffs mean .

Triggered by the prospects of being laid off in 3 weeks, Arnav wrote a leadership essay:

* Input < Output < Outcome. Dex Horthy (HumanLayer’s founder) https://www.linkedin.com/posts/dexterihorthy_the-funniest-thing-about-the-token-counting-activity-7457427197834928128-50S9?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAPiJJwB0LmpHdW8bLd4wZ0_Zuy-prVZFNo arrived at the same conclusion. Our thesis: teams stopped asking Why? And instead, started spraying and praying.

* AI Native vs AI First. Coinbase aims to have one person teams running multi-agent frameworks. DRIs https://block.xyz/inside/from-hierarchy-to-intelligence as defined by Jack Dorsey.

* The unknown. Arnav’s essay can be read as ‘companies are buying time to learn how to work with this new technology’. We don’t disagree with that.

What’s missing is how long you can go ‘spraying and praying’ with a given user base. Anthropic, Cloudflare, Github, Amazon, Coinbase https://x.com/brian_armstrong/status/2052855725857329254?s=20, etc. have been taking a lot of heat for their half baked solutions.

→ Wait. If your CFO is measuring teams in output and/or input, start tracking outcomes. Then reflect, how much more can your business lower the quality bar https://x.com/brian_armstrong/status/2052855725857329254?s=20? Center the conversation in the business top needs.

----------------------------------------

Bayesian learning much?

tl;dr: While many argue about context compaction, we are quietly turning to probabilistic memory.

Three good reads on why memory should be a posterior:

* MACLA https://arxiv.org/abs/2512.18950. Freeze the LLM, push all the adaptation into an external procedural memory that tracks Beta posteriors over success rates. Pick actions by expected utility, not recency.

* Bayesian Teaching https://www.nature.com/articles/s41467-025-67998-6. Google trains LLMs to approximate Bayesian inference by distilling from an optimal Bayesian system. Belief-updating as a learnable skill.

* Multi-LLM Orchestration https://arxiv.org/abs/2601.01522. Treat each model as an approximate likelihood, aggregate across them with Bayes rule + priors. 51% of the cost savings come from aggregation alone.

We've been experimenting on it with the swarm. Check this research https://github.com/desplega-ai/agent-swarm/blob/main/thoughts/taras/research/2026-05-04-bayesian-learning-memory.md or our swarm post https://www.agent-swarm.dev/blog/deep-dive-memory-poisoning-decay-model.

→ Challenge. You can ship the ~80% with three SQL columns and a rating loop. Start with rated feedback, add a verification step, and don't let compaction eat your "why."

----------------------------------------

Anthropic flips the script

tl;dr: Anthropic increases limits, removes peak hours and opens their memory.

After 2 months of bad news, Anthropic is back in the game:

* Anthropic ♥️ SpaceX https://www.anthropic.com/news/higher-limits-spacex. More computing power, higher limits, no more peak hours. OpenAI, your move.

* Downloadable Memory https://platform.claude.com/docs/en/managed-agents/memory. The new memory store API allows you to manage, create and download these text memory files optimized for Claude. Def a win for those fully bought in the Anthropic ecosystem.

* Dream on https://platform.claude.com/docs/en/managed-agents/dreams. Easy-to-use standard API to compact your agent memories. It’s versioned, and you can recreate or delete it. Check Bayesian learning much? to understand the thought behind it.

Anthropic lives another day, but we learned our lesson. Next time a hotswap would be easier than ever.

→ Adopt. If you are using Managed Agents, you need to leverage these capabilities. Our only advice, watch over the Dream API performance. As you know, generic solutions like /compact give mixed results.

----------------------------------------

Go contribute to an open source library https://github.com/desplega-ai/agent-swarm…

Or keep reading newsletters. Your call.
Best,

----------------------------------------

Sent with ❤️ by desplega.sh http://desplega.sh/ from Barcelona
unsubscribe here .

#09 - May 04, 2026

State of AI: 60Bn for an IDE that skillfully deletes your DB.

What to tell your CEO when talking about the real moat in AI age.

Read plain text

We shipped https://github.com/desplega-ai/agent-swarm/releases. Here's what else happened.

-> May 13th Future Of Software Engineering Edition https://luma.com/x6hk7v26. Be #63! https://www.meetup.com/meetup-de-innoit-consulting-en-madrid/events/314295942/
-> Thinking of building an Agentic OS? You are not alone. Let's chat.

----------------------------------------

Take my money: 60Bn for Cursor.

tl;dr: The harness is the missing piece for a trillion-dollar business. If you build with AI, now you have confirmation.

After poaching 2 senior engineers, SpaceX preempted https://techcrunch.com/2026/04/22/how-spacex-preempted-a-2b-fundraise-with-a-60b-buyout-offer/ a 2Bn round with a 60Bn purchase option. Why do we care?

* Harness > model. The IDE is the distribution channel for models, and captures more value than whoever ships the next SOTA model. Look out for Claude Cowork or Workspace Agents.

* Watch the model picker. As Composer becomes the default, third-party models will slide down the menu, the moat deepens.

* Talent signal. The most valuable AI engineers right now build harnesses, not weights.

Build your own abstraction layer. If a harness owns your workflows, you are locked in. The muscle to fast-swap providers is mandatory.

→ Wait. Keep an eye on the next Cursor model release, that’s the big bet.

----------------------------------------

Where’s my data?

tl;dr: Guess what? Evals, Skills and Prompts don’t enforce anything. Tools do.

5yr old PocketOS became a new cautionary tale https://x.com/lifeof_jer/status/2048103471019434248 for every CEO. Tell them:

* API Key wasn't given, it was found. The agent found an API key with full permissions (sudo-style).

* “We have evals for this”. Prompt engineering and evals treated as deterministic assertions, they are not.

* The lazy lesson. They argue providers should add limitations to their CLIs, and MCPs. Nope. You don’t outsource accountability.

→ Adopt. It happens in all our workshops, people try to polish prompts, skills and evals. They are a good practice, however, enforcement needs to be deterministic at a tool level. Focus on this first.

----------------------------------------

Google SRE’s got skills

tl;dr: Small but mighty. Skills help you not read documentation. Google knows it.

Google kicked off their official repo https://github.com/google/skills, don’t waste time reading our commentary. Tell your CLI to check them out:

* BigQuery https://github.com/google/skills/blob/main/skills/cloud/bigquery-basics/SKILL.md.

* K8S https://github.com/google/skills/blob/main/skills/cloud/gke-basics/SKILL.md.

* VertexAI/Gemini https://github.com/google/skills/blob/main/skills/cloud/gemini-api/SKILL.md.

→ Challenge. If you have your own, take your time. If you don’t, give it a spin. Make sure you have safeguards. You don’t want to end up with extra costs or missing a db.

----------------------------------------

Go contribute to an open source library https://github.com/desplega-ai/agent-swarm…

Or keep reading newsletters. Your call.
Best,

----------------------------------------

Sent with ❤️ by desplega.sh http://desplega.sh/ from Barcelona
unsubscribe here .

#10 - Apr 29, 2026

State of AI: A healthy admission to step off the Gas

Last week was all about code health, and scaling with quality.

Read plain text

We shipped https://github.com/desplega-ai/agent-swarm/releases. Here's what else happened.

May 13th Future Of Software Engineering Edition https://luma.com/x6hk7v26.
Together with Innoit, after a great event in Barcelona, with incredible speakers. Join us! https://luma.com/desplega?e=evt-7DT4NcQiPSgQ4Uy

----------------------------------------

Claude Confession

tl;dr: Yes, they admitted to it, your harness wasn’t working well.

Three changes that increased https://www.anthropic.com/engineering/april-23-postmortem your Anthropic API bill:

* March 4th: From High to Medium. They changed it silently, and denied it. An effective performance degradation.

* March 26th: clear cache. Clearing your cache after 1 hour of inactivity could be valid, but it really didn’t work that way.

* April 16th: reduce verbosity. Drop in quality, because less verbose code, meant worse code.

It does feel like beating https://x.com/trq212/status/2048495545375990245?s=20 a dead horse https://x.com/GergelyOrosz/status/2048520013053632689?s=20.

→ Challenge. Start by assessing your dependence on closed harnesses (devin & codex, too), and find ways to make workflows independent of those (skills, AGENTS.md, memory). OpenCode https://github.com/anomalyco/opencode and pi https://github.com/badlogic/pi-mono leading open source alternatives.

----------------------------------------

Adam Tornhill is (a throw)back.

tl;dr: Code Health MCP https://www.linkedin.com/posts/adam-tornhill-71759b48_agenticai-aicoding-technicaldebt-activity-7455158358451531776-z6yx?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAPiJJwB0LmpHdW8bLd4wZ0_Zuy-prVZFNo launch is interesting, the fundamentals are on point.

The author of Your Code as a Crime Scene https://www.adamtornhill.com/articles/crimescene/codeascrimescene.htm is back, and your agents should be grateful:

* Functions are too long https://adamtornhill.substack.com/p/how-long-should-a-function-be-and. A 2022 study that smells like outdated thoughts on Software Engineering. Declarative naming seems to be the real learning.

* From coding to system design https://adamtornhill.substack.com/p/coding-is-dead-but-it-still-smells. This post earns its place for those underestimating that the real cost of software development (95% according to the article), comes after release.

* Focus on code-health https://codescene.com/blog/agentic-ai-coding-best-practice-patterns-for-speed-with-quality. Insights on how keeping healthy code improves your AI code generation. This is the piece worth a full reading.

→ Adopt. Do you have scheduled AI audits over your code? You need them weekly or daily to ensure your code health doesn’t derail.

----------------------------------------

Did you read Gas City https://steve-yegge.medium.com/welcome-to-gas-city-57f564bb3607?

tl;dr: If you haven’t, you can just read the three lessons below.

Gas City https://github.com/gastownhall/gascity, a linear improvement over Gas Town, argues about enterprise ready Agentic Operating Systems. These are the key learnings:

* Light Factories. A simple concept, observability for everything that’s going on within agents. Particularly interesting in a day and age where people are starting to use agent factories that they (or their agents) don’t really understand.

* Recursive pack deployments. You can deploy pre-defined agent ‘packs’, and those will eventually interact with each other. Many have this already, the fundamental shift is the explicit declarative process.

* Version control for agent activity. An extension of Gas Town, now you can have recorded traces on each task performed by an agent. Tracked using DOLT https://github.com/dolthub/dolt.

→ Wait. Two bland arguments: SaaS replacement, and Enterprise Ready. The former has a weird math, ‘a basic tool could be replaced by a team of 3 to 5 ICs’. It doesn’t delve into the financial, and organizational implications and it disregards the benefits of experts bringing innovation to your company. The latter oversimplifies the key problem: it’s not about compliance, it’s about large teams using a Gas Universe.

----------------------------------------

Go contribute to an open source library https://github.com/desplega-ai/agent-swarm…

Or keep reading newsletters. Your call.
Best,

----------------------------------------

Sent with ❤️ by desplega.sh http://desplega.sh/ from Barcelona
unsubscribe here .

#11 - Apr 20, 2026

State of AI: Stop Delving on Vercel without Context

Scarcity in AI, Vercel security incident, Anthropic Design & Opus 4.7

Read plain text

We shipped https://github.com/desplega-ai/agent-swarm/releases. Here's what else happened.

----------------------------------------

Scarcity in AI

tl;dr: Moore’s Law https://en.wikipedia.org/wiki/Moore%27s_law strikes back, and why Tomasz Tunguz may be wrong.

Tomasz Tunguz, author of Winning with Data https://www.goodreads.com/book/show/30305946-winning-with-data, published this post https://www.linkedin.com/pulse/scarcity-ai-tomasz-tunguz-kgtgc/ worth pushing back on. The core:

* Available but Slow. Adopting new models is becoming slower. Switching providers will become harder due to Memory lock-in. Relationship and purchase-power based selling is starting.

* Inflationary Commodity. As subsidizing slows down, AI scaffolding becomes bloated, and a demand for quality materializes, price per task (outcome) is increasing. If you assume limited supply, the picture worsens.

* Corollary: Forced Diversification. Developers are forced to invest in balancing cost-quality-price, from SLMs to on-premise deployments.

→ Challenge. The counter argument is simple: as we learn this technology, we learn how to solve complex problems with, albeit outdated, cheaper models. Software will reduce the cost curve. Overzealously focusing on Moore’s Second Law https://en.wikipedia.org/wiki/Moore%27s_second_law seems outdated.

----------------------------------------

Vercel, Context.ai http://context.ai and Delve in a single incident

tl;dr: Beyond the incident itself, a swift reminder that security starts with your weakest link.

Vercel had a prompt response https://vercel.com/kb/bulletin/vercel-april-2026-security-incident, with Guillermo Rauch https://x.com/rauchg/status/2045995362499076169?s=20at the forefront of the resolution. Impressive. Here are the takeaways:

* A single breached employee. This is actually an old school hack. Someone took control of an account through a less secure external provider. Darn.

* AI means sophistication. Before, you’d have months between an initial breach and data extraction (and detection) would happen. Based on the ‘minimum impact’, it seems the time was much shorter for both (~1 month). Encryption at rest, and the environment isolation saved the day.

* Context.ai https://context.ai/security-update (2024), certified SOC2 by Delve https://techcrunch.com/2026/03/22/delve-accused-of-misleading-customers-with-fake-compliance/. Gergely Orosz https://x.com/GergelyOrosz/status/2046220165080109201?s=20 made a point https://x.com/GergelyOrosz/status/2046216095258861835?s=20 of it. Remember: Compliance ensures you follow minimum practices, it simply hopes for good outcomes.

→ Adopt. Breaches are going to happen, tend your garden. Revisit efforts on zero-trust https://en.wikipedia.org/wiki/Zero_trust_architecture, abac, least-privilege access, and other approaches.

----------------------------------------

Claude Opus 4.7 & Claude Design

tl;dr: Anthropic goes after UX & 40% price hike https://www.linkedin.com/pulse/price-precision-tomasz-tunguz-t3uec/ on Opus.

After telling us they wouldn’t release Mythos, two new updates from Anthropic to keep us busy:

* Opus 4.7 https://www.anthropic.com/news/claude-opus-4-7 consumes ~30% more https://tokens.billchambers.me/leaderboard of tokens than Opus 4.6. The new tokenizer is improved for text processing. They adapted the usage limits https://x.com/ClaudeDevs/status/2044868953206612154. Be cognizant if you are using API keys. Migration guide https://platform.claude.com/docs/en/about-claude/models/migration-guide#migrating-to-claude-opus-4-7.

* Degradation drama. Complaints have been flooding X regarding Opus getting worse over time. Some benchmarks https://marginlab.ai/trackers/claude-code/ do not support it though…

* Claude Design https://www.anthropic.com/news/claude-design-anthropic-labs came strong, raising the question of whether software like Figma still makes sense.

→ Wait. Start tinkering w/4.7, try out design. Build your own intuition of what works and what doesn't. Beware, your token allowance will run out fast.

----------------------------------------

Go contribute to an open source library https://github.com/desplega-ai/agent-swarm…

Or keep reading newsletters. Your call.
Best,

----------------------------------------

Sent with ❤️ by desplega.sh http://desplega.sh/ from Barcelona
unsubscribe here .

#12 - Apr 13, 2026

State of AI: AIE, Mythos, Memory week

Read plain text

We shipped https://github.com/desplega-ai/agent-swarm/releases. Here's what else happened.

----------------------------------------

AIE EUROPE EDITION

tl;dr: Europe is leading agentic innovation, nothing on models, nor infra.

A few take aways from two days of tech leaders visiting UK:

* Vercel's CTO https://x.com/cramforce said it on stage: Europe leads agentic innovation https://www.youtube.com/watch?v=O_IMsEg91g8&t=2400s with AI SDK https://ai-sdk.dev/, Pi https://pi.dev/, OpenClaw https://openclaw.ai/ (fastest growing github project ever), etc. No European frontier model in sight.

* MCP crossed 110M monthly downloads https://www.youtube.com/live/_zdroS0Hc74?si=uGgticN7XRWGbUyS&t=2007. 60% of Vercel’s traffic is agents https://www.youtube.com/live/O_IMsEg91g8?si=bxKj5rQ9zvqzZjvN&t=2190. Cursor replaced 15K lines with a 200-line skill https://www.youtube.com/live/_zdroS0Hc74?si=orXAycIJ9H_hHrw9&t=9017. The harness is eating the codebase.

* Mario (Pi) https://www.youtube.com/live/_zdroS0Hc74?si=fuJsjkZGpeEhXO2P&t=4534 and Armin (Flask) https://www.youtube.com/live/_zdroS0Hc74?si=mkF9MqYLljSm6V3D&t=5218 both warned: agents compound errors with zero learning. "Slow the f*** down" https://www.youtube.com/live/_zdroS0Hc74?si=eyZrHOZR1Wt6iiuk&t=4370 was the most applauded line of the conference.

→ Wait. A lot was said, however there’s still uncertainty. For example, David Soria Parra (co-creator MCP) laid out a roadmap https://www.youtube.com/live/_zdroS0Hc74?si=MJI_MiAeIaYJib_A&t=2272: progressive discovery, programmatic tool calling, skills over MCP. We’ll see.

----------------------------------------

MYTHOS, CLAUDE MANAGED AGENTS, & ADVISORY STRATEGY.

tl;dr: Anthropic GTM strategy is becoming attention grabbing for basic feature releases, we’ll keep it short.

* Mythos https://www-cdn.anthropic.com/08ab9158070959f88f296514c21b7facce6f52bc.pdf, a powerful model that leaders in the industry got early access https://www.anthropic.com/glasswing. Or is it a new business model? https://pipnet.substack.com/p/what-if-anthropic-might-have-found

* Claude Managed Agents https://platform.claude.com/docs/en/managed-agents/overview, it’s like Devin https://devin.ai/ for more than code. Agent-swarm https://agent-swarm.dev/ is self-adding support.

* Advisory strategy https://x.com/claudeai/status/2042308622181339453?s=20, ~11% savings https://x.com/claudeai/status/2042308627478773808?s=20; anecdotal evidence indicates results are better when using models from different families.

→ Challenge. Be cautious. Advisory strategy seems straightforward to adopt, managed agents could be unsustainable, i.e. unpredictable costs & reliability https://status.claude.com/ upon scaling.

----------------------------------------

MEMORY WEEK

tl;dr: The backslash to Anthropic https://www.youtube.com/watch?v=3DNkDIVKtK8 continues, now with a focus on memory.

We have been waiting for this! We are moving beyond harnesses to memory as core IP for agents:

* AWS now lets you mount S3 as a filesystem https://aws.amazon.com/s3/features/files/, validating what Archil https://archil.com/ pioneered, and making a collaboration layer on top http://agent-fs.dev inevitable.

* LangChain https://blog.langchain.com/your-harness-your-memory/ & Letta https://x.com/sarahwooders/status/2040121230473457921 AI https://www.letta.com/blog/benchmarking-ai-agent-memory on why memory is the new lock-in with closed harnesses. Here's how we solved it https://github.com/desplega-ai/agent-swarm/pull/327.

* Codex Closed Memory https://tonylee.im/en/blog/codex-compaction-encrypted-summary-session-handover/: Open Source is not enough, how Codex locks-in https://developers.openai.com/api/docs/guides/compactionon compaction.

→ Adopt. Every day, every hour, every minute, your agents are learning something new. Make sure you have full control over your data… remember, there’s no GDPR for agents.

----------------------------------------

Contribute to an open source library https://github.com/desplega-ai/agent-swarm, subscribe https://www.desplega.sh/ a friend, get a free workshop https://tinyurl.com/agent-swarm-workshop...

Or keep reading newsletters. Your call.
Best,

----------------------------------------

Sent with ❤️ by desplega.sh http://desplega.sh/ from Barcelona
unsubscribe here .

#13 - Apr 06, 2026

State of AI: RPI RIP, HyperAgents, and property-based testing

State of AI: Open, Pricy, Tokenmaxxing Claude

Read plain text

We shipped https://github.com/desplega-ai/agent-swarm/releases. Here's what else happened.

→ Next Event: Future of Engineering
May 13th - 630pm - Madrid - Join us! https://luma.com/x6hk7v26

----------------------------------------

LEAKING ANTHROPIC IS COMING FROM YOUR MONIES

tl;dr: Three times this week Claude Code had controversial news, source code leakage, overpricing, and… OpenClaw gets blocked.

This past 2 years we’ve seen a consumption trend upwards together with dependency increase. You should start looking to hedge your bets:

* Price++. Even though token pricing keeps dropping, Claude Code is becoming more expensive as a whole. Reddit https://www.reddit.com/r/Anthropic/ is packed https://www.reddit.com/r/ClaudeCode/ with complaints, and they admitted to it https://x.com/lydiahallie/status/2038686571676008625. Be careful, no refunds expected.

* 25$ for flawed security review. Claude Code leaked their source code https://news.ycombinator.com/item?id=47584540, in a classic “dependabot https://github.com/dependabot” flawed update. The cherry on top: they are copying OpenCode https://x.com/ecura/status/2039262788648890534...

* Anthropic vs Open Source. Starting April 5th, at noon https://x.com/bcherny/status/2040206440556826908, OpenClaw, and other ~400 3rd party apps https://x.com/steipete/status/2040811558427648357 are blocked from subscription use, they can only consume API tokens. Many will move to Codex https://github.com/openai/codex or OpenCode https://opencode.ai/ or pi.dev https://pi.dev/; innovation will stall for Claude Code.

→ Adopt. Depending on a closed provider could seriously damage your business. Time to abstract yourself from the terminal.

----------------------------------------

THE SLM COMEBACK: TOKENMAXXING & GEMMA 4

tl;dr: Performative productivity is eating away the benefits of AI, choose your models wisely.

Check this short video https://www.youtube.com/watch?v=joFNVS1nHh8 and the counter-argument https://tomtunguz.com/tokenmaxxing/ from Winning with Data author, Tunguz https://www.goodreads.com/book/show/30305946-winning-with-data. Main take aways:

* Once you know the task, choose the model. AT&T slashed their AI budget 90% https://www.forbes.com/sites/johnkoetsier/2026/02/10/att-says-slms-run-at-10-of-the-cost-of-llms-while-being-about-as-accurate/.

* From Tokens to Task. Move from input to output, outcome https://www.whatmatters.com/okrs-explained/input-output-outcome-key-results may be too far away.

* Gemma 4 https://deepmind.google/models/gemma/gemma-4/, for well-scoped tasks. Gemini Flash 2.5 will be deprecated soon https://ai.google.dev/gemini-api/docs/deprecations, Gemma is the open, cheaper and faster alternative. If you consume +30mm tokens/month, you should deploy your own.

→ Challenge. We are getting into an usage optimization cycle. If you have deep pockets, keep running.

----------------------------------------

OPEN SOURCE STANCE

Tl;dr: Amidst the backlash against Anthropic, Open Source community marks a major win.

Multiple https://www.youtube.com/watch?reload=9&v=08NqrRQArNw voices https://x.com/GergelyOrosz/status/2039282434596864116 last https://x.com/hwchase17/status/2039787730402705653 week https://x.com/_avichawla/status/2039598698548850949 reflecting https://x.com/steipete on https://x.com/cramforce/status/2038988255484608740 on why OSS should be the default:

* Vendor lock-in is slow & expensive. In the past 2 years we went from ChatGPT, to Copilot, to Devin, to Cursor, to Claude Code. Now Open Models https://x.com/masondrxy/status/2039768211554492420are becoming the next hot thing. You don’t want to be a hostage...

* When you can patch faster than ever. AI allows you to fix your own blocking issues faster than any provider, hence close-source software, that degrades 10x faster, adds an unnecessary time/cost burden while you watch your tools fail.

* Open Source == innovation. Anthropic explicitly references Open Code, you should too. Our thoughts on FOSS https://www.pleasedontdeploy.com/p/next-6-months-the-open-source-revolution, and why everyone should contribute more https://www.library.hbs.edu/working-knowledge/open-source-software-the-nine-trillion-resource-companies-take-for-granted.

→ Adopt. List the business critical providers can be abstracted by open source alternatives.

----------------------------------------

Go contribute to an open source library https://github.com/desplega-ai/agent-swarm…

Or keep reading newsletters. Your call.
Best,

----------------------------------------

Sent with ❤️ by desplega.sh http://desplega.sh/ from Barcelona
unsubscribe here .

#14 - Mar 30, 2026

State of AI: RPI RIP, HyperAgents, and property-based testing

Read plain text

Ahoy! We shipped. Here's what else did.

Announcing: Future of Engineering Madrid
The second week of May we will be hosting an event in Madrid. More details to come!

----------------------------------------

RIP RPI: DON’T OUTSOURCE THE THINKING.

tl;dr: Key learnings https://www.youtube.com/watch?v=YwZR6tc7qYg&list=WL&index=2 on how to scale agentic coding across an organization. More here https://thehumansintheloop.substack.com/p/making-agents-mainstream-for-dev-with-dexter-horthy.

The core idea is simple:

1. Do not outsource the thinking. You'll need knowledg https://erikjohannes.no/posts/20260130-outsourcing-thinking/index.htmle for navigating the world.

2. Read the code. When the stakes are high, aim for 2/3x rather than 10x.

3. Context control continues to be king. Monolithic prompts don’t work, never did. Control the number of instructions per prompt (aim for <40).

→ Challenge. We are torn. On one hand, human review has always been key for us. On the other hand, maturity in the CI/CD pipeline is what we know most teams are lacking.

----------------------------------------

PROPERTY-BASED TESTING

tl;dr: If you barely read the code, you need evolved ways to test your code. Hegel is a reminder of what’s already out there.

Antithesis released a property-testing wrapper https://antithesis.com/blog/2026/hegel/ on Hypothesis https://hypothesis.works/. With Hypothesis https://hypothesis.works/, you write tests which should pass for all inputs in whatever range you describe, and Hypothesis https://hypothesis.works/ will randomly choose which of those inputs to check. Why is this relevant:

1. AI code can be treated as a black box. PBT Or fuzzing https://tybug.dev/specs/.

2. Easy to adopt. LLMs can translate unit-tests to PBT ones, somewhat easily.

3. It’s a validation spec, vs simply a spec. Prompts don’t enforce, this does.

→ Wait. You should start experimenting with more advanced verification methods, together with observability tools. We are tinkering with PBT for the web https://github.com/antithesishq/bombadil.

----------------------------------------

META’S HYPERAGENTS

tl;dr: Your agents get a supervisor that helps them improve.

We’ve seen this trend for a while, agent-swarm.dev http://agent-swarm.dev itself is an example of it, however Meta’s paper https://arxiv.org/pdf/2603.19461 shows the impact.

1. Hyperagents monitor agents. A supervisor that modifies their report is key.

2. Hyperagents are self referential. They can modify their task agents, or themselves.

3. Open Source. Not production ready https://github.com/facebookresearch/hyperagents, it is good to learn the concepts from it.

→ Adopt. For your coding agents, you need to start thinking on loops where your agents compound learnings. The real challenge is how do you become more efficient and effective while doing so.

Spoiler: We are working on this, stay tuned in the next edition 👀

----------------------------------------

Agent Swarm now is LLM agnostic, supports workflows, skill libraries, and agent-fs.dev http://agent-fs.dev.
1 minute to deploy. 7 days free. Simply reply to this email...

Or keep reading newsletters. Your call.
Best,

----------------------------------------

Want to change how you receive these emails? You can unsubscribe here .
Sent with ❤️ by desplega.sh http://desplega.sh/ from Barcelona

#15 - Mar 16, 2026

State of AI: Karpathy’s Brain Fry

State of AI: Compounding is the new MOAT

Read plain text

Aloha!

Announcing: 1-click Agent Swarm.

You can now deploy an agent swarm in under a minute.
Pick your agent templates https://templates.agent-swarm.dev/, and go. 14 days free trial -> email us for beta access.

----------------------------------------

KARPATHY VS THE WORLD

tl;dr: Whoever ships this at scale, wins. Andrej Karpathy’s autoresearch https://github.com/karpathy/autoresearch is an early example of what we called Beast Mode https://www.pleasedontdeploy.com/p/4-leaps-of-agentic-coding-where-do.

The core idea is simple: “any metric you care about that is reasonably efficient to evaluate [...] can be autoresearched by an agent swarm”.

* Extremely scoped. Experiments are limited to editing a single file.

* Trivial learning loop. Logs to keep track of the experiments, and learnings.

* Karpathy's Autoresearch guide for dummies is gold. See here https://x.com/hooeem/status/2030720614752039185.

→ Challenge. Try to understand this post https://x.com/karpathy/status/2031135152349524125, and give yourself a reason why this is not relevant for you. Once you have at least 1 valid reason, discuss it with your favorite LLM (we recommend notebooklm).

----------------------------------------

BRAIN FRY IS MULTITASKING

tl;dr: Everyone in your team is re-learning what their job is about. The Germane load is simply too much for most.

This study by hbr https://hbr.org/2026/03/when-using-ai-leads-to-brain-fry validates with numbers the Brain Fry claims:

* 39% major errors spike
Either because they lack attention, control, or knowledge, over their AI tools.

* Oversight is the trigger
For those workflows that require a non-trivial oversight, think of reviewing a design document, the Brain Fry indicator is larger. The same is true for those reviewing content, or GTM strategies.

* 4+ AI tools lead to chaos
Using more than 3 different AI tools day-to-day leads to drops in productivity.

→ Wait. The roles less affected by this are those roles that traditionally involved high-context switching, there’s hope. Your Engineering, Marketing, etc teams need time to become more effective AI managers.

----------------------------------------

FILESYSTEMS ARE THE MOAT

tl;dr: Giving agents a bash tool https://github.com/vercel-labs/just-bash makes them smart, giving them files https://www.llamaindex.ai/blog/files-are-all-you-need makes them amazing!

We are starting to see a trend: more and more conversations happen around file systems. Last year already proved that CLIs are better than MCPs most of the time, and also the standardization of skills https://agentskills.io/home.

* File systems open the door to self-improvement
E.g. Openclaw SOUL.md http://soul.md, IDENTITY.md files approach.

* File systems are key for context management
Independently of the approach of your agent coding, context management happens outside https://vercel.com/blog/how-to-build-agents-with-filesystems-and-bash#why-filesystems-work-for-context-management the harness context window.

* It’s still greenfield
Even though the trend is growing, no clear solution exists to solve this issue. So far some options are: sandbox providers with storage (Modal, daytona, stripes.dev), Archil offering sharable FUSE disks, GCP workspace CLI (GSuite based).

→ Adopt. Start thinking of how you provide file systems to your agents. Figure out a way to have persistence, visibility and control over them.

Spoiler: We are working on something related, stay tuned 👀

----------------------------------------

Agent Swarm is live. 1 minute to deploy. 14 days free. Simply reply to this email…

Or keep reading newsletters. Your call.
Best,

----------------------------------------

Want to change how you receive these emails? You can unsubscribe from this list .

Sent with ❤️ by desplega.sh http://desplega.sh/ from Barcelona

#16 - Mar 06, 2026

State of AI: Reducing Token consumption

Read plain text

Hello everyone!

March 17th Future of Software Engineering in Barcelona.

Supersonik CEO, Daniel Carmona, & Enginy CTO, Jaume Puig, join us to discuss what would be the role of SWEs in the next 2 years. Food & drinks will be provided.

Join us! https://luma.com/9cwffm7p (limited spots)

----------------------------------------

QUICK WIN: CONTEXT-MODE

The key insight: MCPs and CLIs dump unnecessary data into your LLM context.

We incorporated context-mode https://github.com/mksglu/context-mode to agent-swarm.dev https://www.agent-swarm.dev/ in 2 straight-forward PRs https://github.com/desplega-ai/agent-swarm/pull/125.

Context-mode helps locally by.

* Context Saving, Sandbox tools keep raw data out of the context window.

* SQLite storage for intermediate results, you are able to recall those when needed, if needed.

→ Adopt. We’ve seen a token consumption reduction of ~80%.

----------------------------------------

NEW PLUGIN: GOOGLE WORKSPACE CLI

Google released a CLI to dynamically access all their GSuite APIs. Easy to install, difficult to use.

A few highlights:

* Open source project https://github.com/googleworkspace/cli, full of learnings on how to create scaffolding CLIs for your products.

* It’s official but… Google doesn’t officially support https://github.com/googleworkspace/cli?tab=readme-ov-file#disclaimer this product.

* It’s safe to use locally, and it seems to “do the job”.

* Blocker: authentication continues to be the biggest pain point.

→ Wait. Most of us already have a solution for these scenarios, if that’s the case for you too, for now, don’t waste time learning or migrating into this, simply keep an eye on it. Revisit in a few weeks.

----------------------------------------

SUPERHUMAN ADAPTABLE INTELLIGENCE

Yann LeCun shared the latest thoughts on how AI will evolve in this must-read paper https://arxiv.org/abs/2602.23643.

Tl;dr:

* Moravec's Paradox https://en.wikipedia.org/wiki/Moravec%27s_paradox: Neither humanity’s intelligence nor human intelligence are general.

* Specialization is a predictable consequence of limited resources, competing objectives, & environments.

* SAI is measured by the speed with which it takes an agent to acquire new skills & learn new tasks.

* The major thesis: compounding prediction error makes long-horizon interaction brittle.

The competitive advantage of having a self-learning scoped agent powered by LLMs seems to be the best current approach. Hence the efforts to build agentic swarms. These swarms will become more powerful as LLMs improve.

You can see early indicators in the adoption and success of this approach in OpenClaw, or our previous newsletter on self-learning agents https://github.com/desplega-ai/agent-swarm/pull/85.

→ Adopt. Start thinking on systems that can self-learn, not only new tasks, and follow an objective function, but also recognize when a new specialized agent needs to be created.

In the next 6 months, agentic frameworks will start creating their own specialized agents, and multiply.

----------------------------------------

We continue to offer tailored workshops on Agent Swarm https://www.agent-swarm.dev/.
Free. Hands-on. Your team leaves with a working swarm.

Do you have your agent swarm https://github.com/desplega-ai/agent-swarm running?
Best,

----------------------------------------

Unsubscribe from this list .

#17 - Feb 27, 2026

State of AI: Compounding is the new MOAT

Read plain text

Hello everyone!

This is a series of brief notes every couple of weeks to recap what happened in the AI engineering world, and our takes on it. Here's what we learned this week.

AGENT SWARM NOW HAS A SOUL

> We rebuilt Agent Swarm https://www.agent-swarm.dev/ (github https://github.com/desplega-ai/agent-swarm) based on learnings from OpenClaw https://openclaw.ai/.

The key insight: what makes agents (and teams) unique is identity, not capability. Each worker now defines its own character, specialization, and memory.

* SOUL.md http://soul.md gives agents persistent identity across sessions

* Self-evolution means the system improves without you touching it

* Setup takes 5 minutes. Want help? contact@desplega.sh

Adopt / Wait / Challenge → Adopt.

We run it ourselves, daily.

AGENTS.MD: WHAT THE RESEARCH ACTUALLY SAYS

A new paper https://arxiv.org/pdf/2602.11988 benchmarked context files extensively. The numbers:

* LLM context files increase cost ~20-23%

* Steps per task increase by +2.45 (SWE-Bench https://www.swebench.com/) and +3.92 (AgentBench https://github.com/THUDM/AgentBench) avg.

* Reasoning tokens up 14-22% depending on model

The paper's conclusion:

> Context files have marginal effect and only work when written by humans.

Our take:

* Don't skip context files, control them.

* Skip the /init command. Write it yourself.

* If you've corrected the same agent mistake twice, it belongs in AGENTS.md.

* Include SDLC steps like edge cases, manual testing steps, more here https://www.tarasyarema.com/blog/2026-02-18-introducing-semantic-distance.

* This understanding is what motivated the /desplega:research https://skills.sh/desplega-ai/ai-toolbox/researching skill, & claude-code plugin https://github.com/desplega-ai/ai-toolbox/tree/main/cc-plugin/base#agentic-coding-101-with-claude-code.

Adopt / Wait / Challenge → Wait.

Use hand-crafted contexts for now.

CLAUDE CODE REMOTE CONTROL SHIPS

You can now continue local Claude Code sessions from your phone or any browser via remote control https://code.claude.com/docs/en/remote-control.

We've been using it. It works. The laptop-to-phone handoff is the killer use case.

Adopt / Wait / Challenge →Adopt.

Run /remote-control or enable it globally in settings.

ANTHROPIC'S DISTILLATION DRAMA

Anthropic posted https://x.com/AnthropicAI/status/2025997928242811253?s=20 about detecting distillation attacks https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks on their models. The numbers cited (DeepSeek's 150k exchanges) feel small. desplega.sh did 500k LLM exchanges as a bootstrapped startup.

Make of that what you will https://www.youtube.com/watch?v=_k22WAEAfpE.

Adopt / Wait / Challenge → Wait.

Check open-weight distilled models on Hugging Face (TeichAI) https://huggingface.co/TeichAI. Some run locally via Ollama.

----------------------------------------

We ran our company workshop #10 with Orbio.ai https://www.linkedin.com/posts/ecura_stop-trading-quality-for-speed-the-t-shirt-activity-7432088498079264768-6jMt?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAPiJJwB0LmpHdW8bLd4wZ0_Zuy-prVZFNo in Madrid. Free. Hands-on. Your team leaves with a working agent swarm.

What’s holding you back?
Best,

----------------------------------------

Want to change how you receive these emails? You can unsubscribe from this list .

Sent with ❤️ by desplega.sh http://desplega.sh/ from Barcelona

#18 - Feb 18, 2026

State of AI: How to measure your AI SDLC maturity

Hello everyone,

Read plain text

Hello everyone,

If you are receiving this email, it's because you joined one of our events. This is a series of brief notes every couple of weeks to recap what happened in the AI engineering world, and our takes on it.

Announcement!
We are organizing a soirée to discuss how we see the Software Engineer world evolving in the upcoming years. We have 2 amazing guests, and spots are limited (food & drinks will be provided).

As you know, we are fully bootstrapped, and we are very excited to host this first event, so if you say yes, we will expect you there! :D

Join here https://luma.com/9cwffm7p; more details to come.

INTRODUCING SEMANTIC DISTANCE

This past week Moss and Geoffrey Huntley wrote about the “back pressure” https://ghuntley.com/pressure/concept, bringing it back to attention for most. It expands on this early writing https://banay.me/dont-waste-your-backpressure/ from Moss https://banay.me/dont-waste-your-backpressure/, and for most of us, it is a reminder on how important it is to build the right safety mechanisms to keep accelerating our delivery.

There’s one flaw, though: how do you measure progress in this new world?

Our exploration

Taras wrote about Semantic Distance https://www.tarasyarema.com/blog/2026-02-18-introducing-semantic-distance, reflecting on NLP’s understanding of such concept, the key take aways:

1. Divide it in levels of abstraction: Form, Meaning, Behavior, Intent.

2. Focus first on the verification layers: Form & Meaning, expanding from there.

3. Over index on ‘layers of boilerplate’ that now are cheap – think of unit-tests as append only, add APIs simply for internal use, etc.

It’s a moment in which we need to re-think what software and infrastructure are, check Guillermo Rauch’s thoughts on APIs https://www.linkedin.com/pulse/apis-guillermo-rauch-aeupf/.

ADOPT / WAIT / CHALLENGE -> ADOPT

You need to be defining more and more of these automated methods at least in the verification layer.

VALIDATION VS VERIFICATION

From the above, the most difficult challenge is in level 4, Intent. It seems most frameworks continue to be focused on allowing you to build faster but not necessarily to give you visibility on what’s been built.

So, how do you start adding ways to control for Intent?

Our exploration

Eze wrote about 3 easy strategies to start adding a strong Validation https://www.pleasedontdeploy.com/p/validation-not-verification-3-strategies layer for your coding agents. In three lines:

1. Tools for your Agents
Tooling for monitoring, observability, reliability, testing, deployment, etc should be meant to be used by a software, e.g. less UX, more APIs/CLIs/etc.

2. Tools for reviewers
Go beyond screenshots, and PR descriptions or code inline documentation, add tools that allow you to review your software behavior in real-time, record it, and replay it.

3. Objective functions
Whenever possible, define functions that your agents could use deterministically to measure the validity of the output generated. This goes into performance, reliability, usability, etc. e.g., you could include a WCAG https://www.w3.org/WAI/standards-guidelines/wcag/ engine to optimize for accessibility.

ADOPT / WAIT / CHALLENGE -> ADOPT

Tools for your agents could happen in 30 minutes, and they can tell you what it needs. Don’t wait!

----------------------------------------

We are providing advanced support for teams that want to effectively leverage the latest AI technologies, workflows and tools to solve complex problems with AI.

If you are interested, reach out.
Best,

----------------------------------------

Want to change how you receive these emails? You can unsubscribe from this list .

Sent with ❤️ by desplega.sh http://desplega.sh from Barcelona

#19 - Feb 10, 2026

State of AI: Is OpenAI catching up?

Hello everyone!

Read plain text

Hello everyone!

If you are receiving this email, it's because you joined one of our Claude Code events in the past weeks, or will do shortly.

We are launching a series of brief notes every few weeks with a recap of what happened in the AI engineering world, and our takes on it.

These last few weeks have been full of new stuff that you should know, if you want to leap first into the vibe coder mode. Here are the 3 main key takeaways.

TEAMS (SWARMS) BETA IN CLAUDE CODE

Around the time Anthropic launched Opus 4.6, they announced Agent Teams in Claude Code https://code.claude.com/docs/en/agent-teams (v2.1.32 https://code.claude.com/docs/en/changelog#2132). It’s an experimental feature that lets you create a team of agents and delegate tasks to them. This crosses the line of the sub-agents in terms of context isolation, creating completely fresh Claude Code processes and setting a communication bridge between your main session (lead) and the team.

Our exploration

1. Tmux didn’t work properly (if that’s something you care about);

2. No easy way to control what sub-instances are really doing and why.

3. Teams are strictly scoped to a single session;
This last point is extremely underwhelming, as we believe the power of swarms / teams is to actually push the limit of coding across sessions, projects, and even machines.

The only reasonable use case we see as of now (if you really want to try it out) is to perform tasks like review with multiple team members using different perspectives (e.g. security / performance / coverage) where having follow-ups with each member might be beneficial. Having said so, spinning sub-agents is good enough & cheaper right now.

ADOPT / WAIT / CHALLENGE → WAIT

Wait till they figure out how they want to expose it properly. Too buggy and hidden magic to adopt yet.

Codex 5.3 v.s. Opus 4.6

At the end of last week OpenAI and Anthropic launched an upgrade to their flagship models with less than one hour of difference. It started with Opus 4.6 https://www.anthropic.com/news/claude-opus-4-6 launching, and shortly after Codex 5.3 https://openai.com/index/introducing-gpt-5-3-codex/ followed.

Our exploration
We are power users of Claude Code, so we’ve been trying out Opus 4.6 since its release without many substantial improvements compared to Opus 4.5. It does indeed follow a bit better your commands and remembers it’s CLAUDE.md more consistently.

Codex 5.3 is a major leap compared to 5.2. Especially in terms of speed (which was one of the most problematic points) and feedback loops. Codex 5.3 is much closer to Opus, ie. it provides feedback on the operations it’s performing, rather than reading 25 files and then following up.

We are extremely bullish on the real Anthropic response to this huge leap by OpenAI, as they’ve been winning the agentic coding since Opus 4.5 came out. We are eager to see the moves in the next few weeks.

ADOPT / WAIT / CHALLENGE → CHALLENGE

Codex 5.3 feels like a big leap in speed and quality, matching the gap with Opus. It’s time to invest a few cycles on figuring out if it has the growth potential to justify a shift.

OPENAI LAUNCHES CODEX APP

On February 2 OpenAI launched the Codex app https://openai.com/index/introducing-the-codex-app/, an app that lets you use their models with a clean and minimalistic feel. Its main features are: easy project management, in-app git diffs, worktrees, and “automations”.

Apart from the automations, there’s nothing that really differs from already existing apps like Conductor https://www.conductor.build/, Superset https://superset.sh/ or opencode app https://opencode.ai/ (and many others).

Our exploration

1. It feels like a consumer oriented app, more like ChatGPT for code, than a professional agentic coding tool.

2. It misses basic stuff like MCP configuration, project level skills (it only lets you get them from a marketplace) and gives no permission controls.

3. The worst part is that there’s no documentation available either…

ADOPT / WAIT / CHALLENGE → WAIT

Try if already using codex CLI, don’t if using anything else.

----------------------------------------

And that was a hectic week!

We will be hosting another session the 17th of February, if you know anyone who might be interested in joining, here’s the event link: https://luma.com/vcw3cq1b.

For those interested in taking the leap, we are providing advanced support for start-ups and enterprises that want to effectively leverage the latest AI technologies, workflows and tools to solve complex problems with AI. If you are interested, send us an email and we can talk.

Best,

----------------------------------------

Want to change how you receive these emails? You can unsubscribe from this list .

Sent with ❤️ by desplega.sh http://desplega.sh from Barcelona