Goto

Collaborating Authors

 nh 1


Purest Quantum State Identification

Yu, Yingqi, Chen, Honglin, Wu, Jun, Xie, Wei, Li, Xiangyang

arXiv.org Artificial Intelligence

Precise identification of quantum states under noise constraints is essential for quantum information processing. In this study, we generalize the classical best arm identification problem to quantum domains, designing methods for identifying the purest one within $K$ unknown $n$-qubit quantum states using $N$ samples. %, with direct applications in quantum computation and quantum communication. We propose two distinct algorithms: (1) an algorithm employing incoherent measurements, achieving error $\exp\left(- \Omega\left(\frac{N H_1}{\log(K) 2^n }\right) \right)$, and (2) an algorithm utilizing coherent measurements, achieving error $\exp\left(- \Omega\left(\frac{N H_2}{\log(K) }\right) \right)$, highlighting the power of quantum memory. Furthermore, we establish a lower bound by proving that all strategies with fixed two-outcome incoherent POVM must suffer error probability exceeding $ \exp\left( - O\left(\frac{NH_1}{2^n}\right)\right)$. This framework provides concrete design principles for overcoming sampling bottlenecks in quantum technologies.


Multi-Step Greedy and Approximate Real Time Dynamic Programming

Efroni, Yonathan, Ghavamzadeh, Mohammad, Mannor, Shie

arXiv.org Machine Learning

Real Time Dynamic Programming (RTDP) is a well-known Dynamic Programming (DP) based algorithm that combines planning and learning to find an optimal policy for an MDP. It is a planning algorithm because it uses the MDP's model (reward and transition functions) to calculate a 1-step greedy policy w.r.t.~an optimistic value function, by which it acts. It is a learning algorithm because it updates its value function only at the states it visits while interacting with the environment. As a result, unlike DP, RTDP does not require uniform access to the state space in each iteration, which makes it particularly appealing when the state space is large and simultaneously updating all the states is not computationally feasible. In this paper, we study a generalized multi-step greedy version of RTDP, which we call $h$-RTDP, in its exact form, as well as in three approximate settings: approximate model, approximate value updates, and approximate state abstraction. We analyze the sample, computation, and space complexities of $h$-RTDP and establish that increasing $h$ improves sample and space complexity, with the cost of additional offline computational operations. For the approximate cases, we prove that the asymptotic performance of $h$-RTDP is the same as that of a corresponding approximate DP -- the best one can hope for without further assumptions on the approximation errors. $h$-RTDP is the first algorithm with a provably improved sample complexity when increasing the lookahead horizon.