Why AI Engineering is closer to Software Engineering than Machine Learning?
Why it’s less about models and more about building products: A Software Engineer’s Perspective
I just finished Chapter 1 of the “AI Engineering” book, and it completely reframed how I think about building AI applications.
As a software engineer, I thought AI engineering would be mostly about understanding complex ML algorithms.
Turns out, that’s not where most of the work happens anymore.
The workflow inversion that changes everything
Traditional ML followed a familiar pattern: collect labeled data for months, train a model, tune hyperparameters, and eventually ship something useful.
That workflow is largely obsolete for most products.
Today, teams start with the product. They pick a strong foundation model (GPT-4, Claude), wire it through an API, build a demo quickly, and put it in front of users. Only after real feedback do they decide whether custom models or fine-tuning are worth it.
The book calls this workflow inversion. It explains why so many AI demos appear overnight and why production-quality systems take much longer.
This isn’t just faster, it fundamentally changes who can build AI applications and what skills matter most.
Scale: the one word that explains everything
Old ML systems depended on labeled data. Want to detect fraud? You needed humans to tag thousands of transactions. At scale, labeling became the bottleneck. ImageNet alone required labeling a million images, costing tens of thousands of dollars.
Foundation models bypass this using self-supervision. Every sentence provides its own training data.
Instead of a human labeling data, the model labels itself. Think about how you’d finish this sentence: “The server crashed because...” Your brain already has a shortlist of likely next words - “the memory,” “a bug,” “too many requests.” That intuition came from reading thousands of similar sentences over your lifetime.
Language models learn the same way, just faster and at massive scale. Feed it “The server crashed because” and it predicts the next word. Then the next. Then the next. Every sentence on the internet - billions of them - becomes a free training example. No human labels. No bottleneck. Just pattern recognition at a scale no human team could ever match.
This breakthrough is why we can now have models with 100+ billion parameters trained on essentially all public internet data.
Where do AI Engineers actually work?
After reading this chapter, I realized AI engineering differs from traditional ML engineering in three major ways:
1. Application development: you’re adapting models, not building them
The book makes this distinction crystal clear:
Traditional ML engineering = building models, while AI engineering = adapting existing models.
This means less obsessing over model internals you can't control, more focus on
Prompt engineering (getting models to do what you want through instructions)
RAG (Retrieval-Augmented Generation - connecting models to your data)
Finetuning (adjusting model weights for your specific needs)
This is where most engineers operate. Prompt design, user interfaces, orchestration, evaluation, and feedback loops live here.
2. Evaluation becomes the biggest challenge
Traditional software engineering is binary. A test passes or fails. An API returns the correct data, or it doesn’t.
With traditional close-ended tasks like fraud detection, you compare outputs to expected results. Simple.
AI systems don’t behave that way.
For open-ended outputs like summaries, answers, and conversations, there is no single correct response. Evaluation becomes probabilistic and contextual.
You are no longer testing a function. You are testing a system: model, prompt, context, retrieval, and sampling parameters combined. There are infinite possible responses to any prompt. How do you evaluate if a chatbot’s answer is “good”?
Google announced that Gemini outperforms ChatGPT on benchmarks. Looking closer, Gemini used 32 reasoning samples per prompt while ChatGPT used 5. When both used the same setup, ChatGPT performed better.
This illustrates the core issue: model quality alone is meaningless without evaluating the full system configuration.
The practical implication is uncomfortable but clear. Evaluation infrastructure must exist from day one.
Not after the demo works. Day one.
3. Models are bigger and need more optimization
Foundation models consume significantly more compute and have higher latency than traditional ML models. For autoregressive models (generating tokens sequentially), if it takes 10ms per token, a 100-token response takes 1 second. That’s far from the 100ms latency users expect.
This makes inference optimization, making models faster and cheaper, even more critical than before.
The last mile problem nobody warned me about
Here’s the reality check from the book that every developer needs to hear:
Initial success with foundation models is misleadingly easy.
The author shares a sobering example from LinkedIn’s 2024 report: they built 80% of their desired experience in one month.
It’s a success, right? Wrong. It took four more months to push past 95%.
This matches what I’ve observed in demo videos versus production apps. Building a cool demo takes a weekend. Building a reliable product takes months of fighting edge cases, hallucinations, and product kinks.
The 3 Layer Stack you need to understand
The book breaks down the AI application stack into three layers:
Layer 1: Application Development (where most action happens)
Prompt engineering
Evaluation
Building user interfaces
Layer 2: Model Development
Training and finetuning
Dataset engineering
Inference optimization
Layer 3: Infrastructure
Model serving
Compute management
Monitoring
What struck me: the infrastructure layer hasn’t changed much. Resource management and monitoring needs remain the same. The big innovations are in layers 1 and 2.
What This Means for Your Career
The assumption that AI engineering requires an ML background is outdated.
The book’s analysis of 205 open-source AI applications found the most successful engineers weren’t ML researchers they were software engineers who moved fast. Shipped demos. Got feedback. Iterated. The same loop every good engineer already knows.
LinkedIn’s 2023 survey shows “Generative AI” and “Prompt Engineering” profile additions grew 75% per month. Across all engineering disciplines not just ML. Backend engineers wrapping AI into existing APIs. DevOps engineers automating pipelines with LLMs. Mobile developers adding AI features to apps.
The tooling followed. Python libraries, JavaScript SDKs, REST APIs, CLI tools. Whatever stack you work in, there’s already an AI integration path built for it.
What transfers from software engineering: systems thinking, debugging under uncertainty, cost and latency tradeoffs, API design, writing code that fails gracefully. These aren’t ML skills. They’re engineering skills and they matter more at the application layer than knowing how transformers work internally.
The gap between “software engineer” and “AI engineer” is smaller than the job postings make it look.
The Real Questions You Need to Answer
Before building any AI application, the book recommends asking three critical questions:
1. Why are you building this?
According to a 2023 Gartner study cited in the book, 7% of companies said they’d go out of business without AI (business continuity). For them, AI isn’t optional, it’s existential.
For others, it’s about opportunities: boosting profits, improving productivity, staying competitive.
2. What role will AI play?
The book references Apple’s framework:
Critical vs. complementary (can your app work without AI?)
Reactive vs. proactive (responding to requests vs. showing info opportunistically)
Dynamic vs. static (personalized per user vs. shared model)
The more critical AI is to your app, the higher your quality bar needs to be.
3. How defensible is your application?
Here’s the uncomfortable truth the book addresses: if something is easy for you to build with ChatGPT, it’s easy for competitors too.
One VC partner quoted in the book said they’ve “seen many startups whose entire products could be a feature for Google Docs.”
Your moat likely comes from data, not technology. The book suggests: get to market first, gather usage data, and use those insights to continuously improve.
What I’m Taking Away
After reading Chapter 1, here’s what I’m focusing on:
Start with applications, not models. Build a demo with existing models first. Only invest in custom models once you’ve validated the use case.
Evaluation is not optional. According to the book, this is where AI engineering differs most from traditional development. Build evaluation into your workflow from day one.
Think in layers. Understand which layer you’re working in (application, model, or infrastructure) and what techniques apply at each level.
Embrace the learning curve. The book makes it clear: the 0-to-60 is fast, but 60-to-100 is grinding. Plan for it.
Your full-stack skills matter. The ability to iterate quickly and build good interfaces is becoming as important as understanding ML fundamentals.
That’s what I learned from Chapter 1. I’m continuing through the rest of the book to dive deeper into prompt engineering, evaluation, and the practical techniques for building AI applications.
What’s your experience with AI engineering so far? What surprised you most when you started working with foundation models?
If you found this useful, I’ll be sharing more insights as I work through the rest of the book. Let me know what topics you’d like me to focus on!
Happy building!
