Goto

Collaborating Authors

 terminate



Further Details

Neural Information Processing Systems

A.1 Dataset Details The 20 micro-variations of the 5 macro-variations of the scene were created with the rule of swapping at least two furniture pieces and perturbing the positions of a subset of the other furniture pieces. The occurrences of various furniture objects in these 100 micro-variations are illustrated in Figure 1. Several furniture objects such as'Beanbag' and'Chair' occur more frequently with multiple instances in a some scenes while others such as'Table 03' occur less frequently. We also analyze the object categories of all objects in the original 6 'FRL-apartment' space recreations. We map each of the 92 objects to a semantic category and list the counts per semantic category in a histogram in Figure 1. Since these spaces have a large kitchen area, there is a larger ratio of kitchen objects such as'Kitchen utensil' and'Bowl'. Top down views of the 5 'macro variations' of the scenes are shown in Figure 1. These variations are 5 semantically plausible configurations of furniture in the space generated by a 3D artist. Each surface is annotated with a bounding box, enabling procedural placement of objects on the surfaces. For each of these 5 variations, we generate 20 additional variations, giving 105 scene layouts. Objects are procedurally added on furniture and surfaces using the annotated supporting surface and containment volume information provided by ReplicaCAD.







XDO: ADoubleOracleAlgorithmfor Extensive-FormGames

Neural Information Processing Systems

Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algorithm for two-player zero-sum games that has been empirically shown to find approximate Nash equilibria in large games.


bbc92a647199b832ec90d7cf57074e9e-Supplemental.pdf

Neural Information Processing Systems

Before defining our algorithm at each iterationt we first lighten our notation with a shorthandba(X) = b(ˆp(t 1)(X),a) (at different iterationt, ba denotes different functions), andb(X) is the vector of (b1(X),,bK(X)). For the intuition of the algorithm, consider the t-th iteration where the current prediction function is ˆp(t 1). Thestatement of the theorem is identical; the proof is also essentially the same except for the use of some new technicaltools. Conversely, if ˆp is LB decision calibrated, then kE[p (X) ˆp(X)|U]k1 = 0 almost surely (because if the expectation of a non-negative random variable is zero, the random variable must be zero almost surely), which implies thatˆp is distributioncalibrated. For BKa we use the VC dimension approach.