Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution

Reddy, N Dinesh, Snyder, Dylan, Kiragu, Lona, Mohin, Mirajul, Amin, Shahrear Bin, Pillai, Sudeep

Nov-21-2025–arXiv.org Artificial Intelligence

We introduce Orion, a visual agent that integrates vision-based reasoning with tool-augmented execution to achieve powerful, precise, multi-step visual intelligence across images, video, and documents. Unlike traditional vision-language models that generate descriptive outputs, Orion orchestrates a suite of specialized computer vision tools, including object detection, keypoint localization, panoptic segmentation, Optical Character Recognition (OCR), and geometric analysis, to execute complex multi-step visual workflows. The system achieves competitive performance across MMMU, MMBench, DocVQA, and MMLongBench while extending monolithic VLM capabilities to production-grade visual intelligence. Through its agentic, tool-augmented approach, Orion enables autonomous visual reasoning that bridges neural perception with symbolic execution, marking the transition from passive visual understanding to active, tool-driven visual intelligence. Try Orion for free at: https://chat.vlm.run Learn more at: https://www.vlm.run/orion

large language model, machine learning, orion, (19 more...)

arXiv.org Artificial Intelligence

Nov-21-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.41)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.74)
  - Machine Learning > Neural Networks
    - Deep Learning (0.74)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found