Ground-truth sampling, confidence thresholds, and continuous QA pipelines.