Goto

Collaborating Authors

 Harpale, Abhay


Lumos : Empowering Multimodal LLMs with Scene Text Recognition

arXiv.org Artificial Intelligence

We introduce Lumos, the first end-to-end multimodal question-answering system with text understanding capabilities. At the core of Lumos is a Scene Text Recognition (STR) component that extracts text from first person point-of-view images, the output of which is used to augment input to a Multimodal Large Language Model (MM-LLM). While building Lumos, we encountered numerous challenges related to STR quality, overall latency, and model inference. In this paper, we delve into those challenges, and discuss the system architecture, design choices, and modeling techniques employed to overcome these obstacles. We also provide a comprehensive evaluation for each component, showcasing high quality and efficiency.


A textual transform of multivariate time-series for prognostics

arXiv.org Machine Learning

Prognostics or early detection of incipient faults is an important industrial challenge for condition-based and preventive maintenance. Physics-based approaches to modeling fault progression are infeasible due to multiple interacting components, uncontrolled environmental factors and observability constraints. Moreover, such approaches to prognostics do not generalize to new domains. Consequently, domain-agnostic data-driven machine learning approaches to prognostics are desirable. Damage progression is a path-dependent process and explicitly modeling the temporal patterns is critical for accurate estimation of both the current damage state and its progression leading to total failure. In this paper, we present a novel data-driven approach to prognostics that employs a novel textual representation of multivariate temporal sensor observations for predicting the future health state of the monitored equipment early in its life. This representation enables us to utilize well-understood concepts from text-mining for modeling, prediction and understanding distress patterns in a domain agnostic way. The approach has been deployed and successfully tested on large scale multivariate time-series data from commercial aircraft engines. We report experiments on well-known publicly available benchmark datasets and simulation datasets. The proposed approach is shown to be superior in terms of prediction accuracy, lead time to prediction and interpretability.