Not all examples are equal. Identify representative failures, de‑duplicate near copies, and separate brittle traps from systemic gaps. Weight by user impact, recency, and confidence. Keep holdout seeds sacred. When results improve on curated slices and broad sets together, you know learning generalized, not merely memorized patches.
Choose the right lever for the observed deficiency. Supervised fine‑tuning shapes instruction adherence, while preference methods like DPO or RLHF align trade‑offs between helpfulness and harmlessness. Retrieval upgrades improve grounding. Document costs, stability, and side effects. Favor simple interventions first, proving value before graduated complexity expands operational burden.
Use safety classifiers, regex filters, and formatting templates to reduce harm and inconsistency, but keep them transparent. Surface masked failures in dashboards. If guardrails carry the whole load, the core model stagnates. Reserve them for safety and polish, while deeper fixes reshape knowledge, reasoning, and retrieval foundations.