CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification
–Neural Information Processing Systems
Recent Vision-Language-Action (VLA) models built on pre-trained VisionLanguage Models (VLMs) require extensive post-training, resulting in high computational overhead that limits scalability and deployment.
Neural Information Processing Systems
Jun-22-2026, 18:03:18 GMT
- Genre:
- Research Report > Experimental Study (1.00)
- Workflow (0.94)
- Industry:
- Health & Medicine (0.46)
- Media (0.46)
- Leisure & Entertainment (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Robots (1.00)
- Machine Learning > Neural Networks (0.67)
- Natural Language > Large Language Model (0.48)
- Representation & Reasoning > Spatial Reasoning (0.46)
- Information Technology > Artificial Intelligence