Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments

Open in new window