A Additional experimental details
–Neural Information Processing Systems
For each function generated, we sample 228 data points that we separate into 100 context points and 128 target points and train the model using the loss function in (2). Each input x is a 32-dimensional vector, and each dimension is sampled from a uniform distribution U[ 3, 3]. We then randomly select 100 samples from the data points with function values lower than the 20th percentile as the few-shot data. We normalize the score to [0, 1] using the worst and the best value in the large dataset. A.2 ExPT pretraining details Architectural details In all experiments, we use the same ExPT architecture.
Neural Information Processing Systems
Mar-27-2025, 12:07:23 GMT
- Technology: