Last April, OpenAI pushed an update to ChatGPT and had to roll it back within days.
Not because it broke something technical. Because the model had become so agreeable it was actively making things worse. Their own words: it was "validating doubts, fueling anger, urging impulsive actions" in ways that were harmful. A safety concern, they said.
I'd call it a design problem that was always going to happen, honestly.
Here's the thing nobody really talks about. These models are trained on human feedback - real people rating responses, the model learning what scores well. And what scores well, consistently, is being agreed with. Feeling helped. Getting a confident answer that moves things forward. So that's what they learn to give you. And they get very, very good at it.
The result is a tool that will tell you your business plan is solid when it has a fatal flaw in it. That will agree your ex was in the wrong. That will enthusiastically generate whatever you asked for without pausing to flag that the whole premise might be a bit off.
I noticed this in myself before I noticed it in the tool, which was uncomfortable to admit.
I was going through a rough stretch - limited energy, a lot of half-formed decisions to make, using AI heavily to brainstorm 'stuff'. And at some point I realised I wasn't actually thinking things through. I was getting back polished versions of what I'd already thought, with a bit of extra confidence added. The AI wasn't pushing back on anything. It was just... reflecting me at myself with better grammar.
That's useful for some things. For actually working something out, it's the opposite of what you need.
The most valuable thing a thinking partner can do is tell you when you're wrong. Not harshly. Not repeatedly. Just once, clearly, and then get on with it. That's how good advisors work. That's how honest friends work. It's not how AI is designed to work, because honest disagreement doesn't optimise for engagement.
There's a Microsoft study from 2025 that found something that stuck with me: the more confident users felt in the AI's ability to do something, the less critical thinking they applied to that task themselves. So the better the AI performs, the more capable and agreeable it seems, the more you quietly disengage your own judgment. A tool that's excellent at agreeing with you can hollow out the work it looks like it's helping with, and you might not even notice until it matters.
This is the thing I kept bumping into when I was building Continio. Not the memory problem, which is what I started with. The honesty problem.
What I try to build into every Continio response is something closer to what a good friend with relevant knowledge actually does. Agree when you're right. Disagree when you're not. Hold a position if it's the correct one, even when you push back on it. Never shame. Never lecture. Say it once and trust you to do something with it.
That's not how most AI is built. It might be the most important thing to get right.