Six Ways AI Projects Go Wrong
The hard part of building with generative AI isn't the AI — it's the same things that sink every software project.
We’ve seen enough AI engagements now to notice the patterns. The failures aren’t usually technical. They’re strategic mistakes made before a single prompt is written.
1. Using AI when you don’t need it
Generative AI is general-purpose enough that it can be applied to almost anything. That’s exactly the problem. A team that builds an LLM-based scheduler to cut energy costs may have just reinvented a greedy algorithm with a much higher inference bill.
Ask the harder question first: what’s the simplest thing that could solve this? If the answer isn’t AI, that’s a win, not a failure.
2. Blaming the model for bad product design
When an AI feature underperforms, the instinct is to tune the model. Often the problem is that the feature was designed around what the model does rather than what the user needs.
A meeting summarizer that produces long, comprehensive summaries isn’t failing at AI — it’s failing at product. Users wanted their action items, not a transcript. Since foundation models are largely commoditized, the differentiation now lives entirely in the product layer.
3. Reaching for complexity too soon
Agentic frameworks, vector databases, fine-tuning — these tools exist for real problems, but teams adopt them before the simpler version has been tried. Early abstraction hides what’s actually happening, makes debugging harder, and introduces bugs from the frameworks themselves.
Start with API calls and prompts. Add complexity only when you’ve exhausted what’s beneath you.
4. Treating the demo as the destination
A working demo is not a working product. The pattern is consistent: getting to 80% of the desired experience takes a month; getting past 95% takes four more. The last few points involve accuracy/latency tradeoffs, edge cases, tonal consistency, reliability under API changes, compliance, and abuse vectors.
Plan for the long tail. It’s longer than it looks from the demo.
5. Dropping human evaluation
AI-as-judge evaluation is useful, but it inherits the biases and limitations of whatever model you’re using to judge. The best teams still have humans reviewing 30–1,000 outputs per day — not because automation doesn’t work, but because fifteen minutes looking at actual outputs catches things that automated pipelines miss entirely.
Manual data inspection has the highest value-to-prestige ratio in the field. Do it anyway.
6. Crowdsourcing your strategy
Asking everyone in the company to submit AI ideas sounds inclusive. What it produces is a long list of Slack bots and text-to-SQL interfaces, because individuals optimize for their own pain points rather than organizational impact.
Strategic prioritization of AI initiatives is leadership work. Without it, you’ll ship a lot of things that don’t add up to much.
The common thread: generative AI surfaces the same weaknesses that have always existed in software projects — unclear problem definitions, poor product thinking, premature optimization, and underinvestment in evaluation. The model isn’t the hard part. The discipline around it is.