Blogs
/
Building Real-World Agentic Software in the Age of Software 3.0

Building Real-World Agentic Software in the Age of Software 3.0

Vipin Chandran

04 Jul 2025

The software world is shifting. Again. With the rise of Large Language Models (LLMs), we’re witnessing the emergence of what Andrej Karpathy, the erstwhile AI brain of Tesla, calls SOFTWARE 3.0 - a paradigm where agents, not traditional APIs or scripts, are central to how systems are designed and run.

But between the keynote stages and investor decks, there’s a tension: the hype of autonomous agents promising to replace entire workflows overnight vs. the reality of building systems that are stable, repeatable, and safe.

The truth lies in between - and it’s unfolding now.

Software 3.0: From Code to Language, From Functions to Agents

“English is the new programming language.” - Jensen Huang, CEO of NVIDIA

Karpathy's keynote at AI Startup School lays out the skeleton of Software 3.0: programming moves from strict syntax to natural language, and LLMs act not as assistants but as core infrastructure. He describes today’s LLM agents as "stochastic (probabilistic) simulations of people" - capable of complex reasoning, but also unpredictable and non-deterministic...uh something only Andrej can come up with.

This shift opens up a powerful new dimension: we can “vibe code” systems - experimenting, iterating, and building with creative AI tooling.

But as Karpathy and others note, vibe coding alone is not enough for production systems.

When there is a bug in the application, you cannot tell your client, “I’ve exhausted my tokens for the day, let me fix it up tomorrow.”

The Case for Repeatable, Auditable Agents

In real-world use, you don’t want agents rethinking every execution. You want them to:

Draft, implement, test, and improve repeatable processes
Follow a defined contract with guardrails and fallbacks
Be composable, so you can chain them into larger workflows
Be observable, so decisions can be explained and audited

I don’t want agents to design every individual process execution from scratch. That would make the process unreliable, unauditable, and very expensive.

Repeatability isn’t a constraint. It’s the foundation for governance, reuse, and commercialization - whether you’re chaining agent services in an enterprise flow or exposing them as paid APIs.

Don’t Forget the NFRs: Cost, Duration, Compliance

A critical yet often overlooked requirement is the layer of nonfunctional transparency. If agents are to be used at scale, they need to publish and adhere to metadata around:

Duration (performance)
Cost per execution
Compliance footprints
Failure and fallback handling
Audit logs

This lets another agent or a human decide - intelligently and accountably - whether to invoke a particular agent service in a larger process chain.

Michael Truell, CEO of Cursor, made headlines in a recent Y Combinator conversation by consistently level-setting the AI agent narrative. When the interviewer pushed visions of fully autonomous software engineering agents, Truell brought the discussion back to earth:

Agent-based development is real, but only for specific, narrow workflows
Human taste and review remain crucial
Guardrails, observability, and rollback mechanisms are essential
You can’t deploy vibe code without the engineering muscle behind it

Cursor's approach reflects what agentic systems should be: powerful, but bounded.

CI Pipelines: Where Vibe Coding Does Work — With Guardrails

Not all automation needs GPT-4-level cognition. In fact, CI/CD pipelines and DevOps flows are a perfect example where:

“Vibe coding” via agents can be extremely productive
The domain is structured and repeatable
Standardization allows the use of smaller, cheaper models
Prompt engineering + platform engineering enables reuse and compliance

In my experience, CI pipelines can be vibe coded. But you need to create the right systems - platform engineering and prompt engineering as part of the agentic software - and standardization is key.

This represents a realistic middle ground:

Creative LLMs to scaffold and evolve pipelines
Structured platforms to capture, govern, and improve them
Smaller models for performance-efficient execution within bounded use cases

Building Blocks for Real Agentic Systems

Here’s what a production-grade agentic platform should look like:

Component	Purpose
Prompt Registry	Version and reuse proven prompts
Agent Templates	Define repeatable, parameterised behaviour
Agent Contracts	Standardise I/O, NFRs, and fallbacks
Observability Layer	Track outcomes, performance, and compliance
Model Router	Assign small vs large models based on context
Human Oversight Hooks	Ensure discretion where it matters

Finally....

Software 3.0 isn’t just about generative intelligence. It’s about engineering systems where intelligent behaviour becomes repeatable, composable, and trustworthy.

That means:

Grounding agent systems in standardisation and contracts
Prioritising non-functional requirements
Treating prompt engineering as software engineering
And accepting that platforms, not just models, will define the winners in this new wave

Artificial Intelligence

Have a project concept in mind? Let's collaborate and bring your vision to life!

Connect with us & let’s start the journey

Let’s Talk

Share this article

Vipin Chandran

AI governance is the set of policies, practices, and ethical guidelines that steer the development and use of AI systems. Think of it as the “rules of the road” for AI ensuring these technologies align with societal values, comply with regulations, and avoid harm. It’s not about oppressing innovation; it’s about creating guardrails so innovation advances safely.