Large language models require a new form of oversight: capability-based monitoring