Computerized adaptive testing (CA T), as a tool that can efficiently measure student's ability, has been widely used in various standardized tests (e.g., GMA T and
A finite episodic Markov decision process (MDP) is a tuple (S,A,H,ฮฑ,P,r) where S and A are the finite sets of states and actions withS = |S|,A = |A|, H is the (fixed) episode length andฮฑ is the initial state distribution.
We study the problem of learning a linear model to set the reserve price in an auction, given contextual information, in order to maximize expected revenue fromtheseller side.