Building AI Products in 2025: What Changed From Last Year
We shipped three AI features this year. Each one taught us something different about how the landscape has changed.
Here’s the update for anyone building AI into their products.
The Model Landscape Changed
2024 Reality
- GPT-4 was king
- Claude was the alternative
- Open source was “catching up”
2025 Reality
- Multiple frontier models at near-parity
- Claude 3.5 genuinely competes with GPT-4
- Llama 3 and Mistral are production-ready for many use cases
- Google’s Gemini is actually good now
What this means: You have options. Model switching costs have dropped. Don’t lock yourself to one provider.
Cost Structures Shifted
2024 Costs
- GPT-4: $30-60/1M input tokens
- Fine-tuning: Expensive and slow
- Open source: Cheap but hard to run
2025 Costs
- GPT-4 Turbo: $10-30/1M tokens
- Claude: Similar pricing
- Open source via APIs (Together, Anyscale): $1-5/1M tokens
- Self-hosted: Actually viable for some use cases
Our AI feature costs dropped 60% year-over-year doing the same things. That changes product economics dramatically.
What Actually Works Now
RAG (Retrieval Augmented Generation)
Mature and reliable. If you need AI to work with your data, RAG is the answer.
What’s new: Vector databases got better (Pinecone, Weaviate, pgvector). Chunking strategies are better understood. Hybrid search combining vectors and keywords works better than either alone.
AI in Existing Workflows
Not “AI products” but AI features inside normal products. Summarization, suggestions, automation of tedious tasks.
This is where the real value is. Not chatbots—AI that makes existing software better.
If you’re adding AI to your product, consider working with AI consultants Melbourne who understand both the technology and product integration. We wasted months learning things that experts know from experience.
Agents (Kind Of)
Simple agent workflows work. Research, multi-step data processing, automation of defined processes.
Complex agents that “do everything”? Still unreliable. The hype exceeded reality.
What Still Doesn’t Work
Autonomous Everything
AI that runs without human oversight still fails too often for most business applications. Plan for human-in-the-loop.
Perfect Accuracy
If you need 99.9% accuracy, AI alone won’t get you there. Build verification and fallback systems.
Replacing Judgment
AI can gather information and draft outputs. It can’t make decisions that require understanding context, politics, or nuance.
The New Architecture Patterns
Pattern 1: Multi-Model Pipelines
Use cheap, fast models for classification and routing. Use expensive, powerful models only when needed.
Example: Small model determines query type → Routes to appropriate specialist model → Expensive model handles complex cases only
Savings: 40-60% versus using frontier models for everything.
Pattern 2: Hybrid AI + Rules
Don’t make AI do what rules can do. Use deterministic logic where possible, AI where necessary.
Example: AI extracts entities → Rules validate and transform → AI handles edge cases
More reliable than pure AI approaches.
Pattern 3: Graceful Degradation
Design for AI failure. What happens when the model is slow? Wrong? Unavailable?
Every AI feature should have a fallback. Even if the fallback is “show an error message,” plan for it.
The Team You Need
2024: “We need ML engineers to build AI.”
2025: “We need product engineers who understand AI.”
The skillset shifted. Building AI products now is more about integration, product sense, and prompt engineering than traditional ML.
What we actually hire for:
- Can evaluate model outputs critically
- Understands prompt engineering and limitations
- Can build robust systems around unreliable components
- Has product sense for where AI helps vs. hurts
Traditional ML skills matter less unless you’re doing something truly custom.
The Prediction for 2026
AI becomes table stakes. Not a differentiator, just expected.
The competitive advantage moves from “having AI” to “having AI that actually makes the product better.”
Most AI features today are mediocre. The bar will rise. Users will expect AI that genuinely works, not AI theater.
Build accordingly.