Models can be wrong while sounding certain. That combination is worse than obvious errors because reviewers relax.
Why it happens
- Training rewards helpful, complete-sounding answers.
- Large context can bury contradictions.
- No built-in “I don’t know” unless the workflow demands it.
Practical mitigations
| Control | Effect |
|---|---|
| Require citations to approved sources | Traceable claims |
| Confidence thresholds + human route | Blocks auto-send |
| Eval cases for known traps | Catches regressions |
| Separate draft from send | Human accountability |
Workflow pattern
Generate → attach sources used → checker flags unsupported sentences → human edits or rejects → log final.
For customer-facing work, never skip the send gate because the draft “sounds right.”