References [1 ] Qiskit: Anopen-sourceframeworkforquantumcomputing,2019
–Neural Information Processing Systems
If during an entire episode of placing L gates the threshold ξ was never reached a reward of 5 is issued. The extreme reward values 5 are crucial for the performanceoftheagent. Given this figure of merit, a circuit with a smaller number of gates yields a higher discounted sum of rewards. This could be achieved, e.g., by using automated postprocessing methods to optimize the circuits (e.g. a Qiskit Terra transpiler [1]). For instance, the vast majority of rotations gates used by the agent are RY gates, in all cases we analyzed.
Neural Information Processing Systems
Feb-10-2026, 02:59:28 GMT