A Experimental Details
–Neural Information Processing Systems
We dynamically batch model calls onto the GPU in order to increase inference speed. For OBL, there are dependencies between policy and belief training. The entire inference and training infrastructure for a single policy or belief model uses a machine with 30 CPU cores and 2 GPUs, one GPU for training and one for simulation. We use their public-lstm architecture design. We use a 3-layer feedforward neural network to encode the entire private observation.
Neural Information Processing Systems
Aug-15-2025, 07:54:35 GMT
- Technology: