jarvis
Energy-Aware Data-Driven Model Selection in LLM-Orchestrated AI Systems
Smirnova, Daria, Nasiri, Hamid, Adamska, Marta, Yu, Zhengxin, Garraghan, Peter
As modern artificial intelligence (AI) systems become more advanced and capable, they can leverage a wide range of tools and models to perform complex tasks. Today, the task of orchestrating these models is often performed by Large Language Models (LLMs) that rely on qualitative descriptions of models for decision-making. However, the descriptions provided to these LLM-based orchestrators do not reflect true model capabilities and performance characteristics, leading to suboptimal model selection, reduced accuracy, and increased energy costs. In this paper, we conduct an empirical analysis of LLM-based orchestration limitations and propose GUIDE, a new energy-aware model selection framework that accounts for performance-energy trade-offs by incorporating quantitative model performance characteristics in decision-making. Experimental results demonstrate that GUIDE increases accuracy by 0.90%-11.92% across various evaluated tasks, and achieves up to 54% energy efficiency improvement, while reducing orchestrator model selection latency from 4.51 s to 7.2 ms.
Jarvis: Towards Personalized AI Assistant via Personal KV-Cache Retrieval
Xu, Binxiao, Feng, Junyu, Lu, Shaolin, Luo, Yulin, Yan, Shilin, Liang, Hao, Lu, Ming, Zhang, Wentao
The rapid development of Vision-language models (VLMs) enables open-ended perception and reasoning. Recent works have started to investigate how to adapt general-purpose VLMs into personalized assistants. Even commercial models such as ChatGPT now support model personalization by incorporating user-specific information. However, existing methods either learn a set of concept tokens or train a VLM to utilize user-specific information. However, both pipelines struggle to generate accurate answers as personalized assistants. We introduce Jarvis, an innovative framework for a personalized AI assistant through personal KV-Cache retrieval, which stores user-specific information in the KV-Caches of both textual and visual tokens. The textual tokens are created by summarizing user information into metadata, while the visual tokens are produced by extracting distinct image patches from the user's images. When answering a question, Jarvis first retrieves related KV-Caches from personal storage and uses them to ensure accuracy in responses. We also introduce a fine-grained benchmark built with the same distinct image patch mining pipeline, emphasizing accurate question answering based on fine-grained user-specific information. Jarvis is capable of providing more accurate responses, particularly when they depend on specific local details. Jarvis achieves state-of-the-art results in both visual question answering and text-only tasks across multiple datasets, indicating a practical path toward personalized AI assistants. The code and dataset will be released.
LLM-based Question-Answer Framework for Sensor-driven HVAC System Interaction
Lee, Sungmin, Kang, Minju, Lee, Joonhee, Lee, Seungyong, Kim, Dongju, Hong, Jingi, Shin, Jun, Zhang, Pei, Ko, JeongGil
Question-answering (QA) interfaces powered by large language models (LLMs) present a promising direction for improving interactivity with HVAC system insights, particularly for non-expert users. However, enabling accurate, real-time, and context-aware interactions with HVAC systems introduces unique challenges, including the integration of frequently updated sensor data, domain-specific knowledge grounding, and coherent multi-stage reasoning. In this paper, we present JARVIS, a two-stage LLM-based QA framework tailored for sensor data-driven HVAC system interaction. JARVIS employs an Expert-LLM to translate high-level user queries into structured execution instructions, and an Agent that performs SQL-based data retrieval, statistical processing, and final response generation. To address HVAC-specific challenges, JARVIS integrates (1) an adaptive context injection strategy for efficient HVAC and deployment-specific information integration, (2) a parameterized SQL builder and executor to improve data access reliability, and (3) a bottom-up planning scheme to ensure consistency across multi-stage response generation. We evaluate JARVIS using real-world data collected from a commercial HVAC system and a ground truth QA dataset curated by HVAC experts to demonstrate its effectiveness in delivering accurate and interpretable responses across diverse queries. Results show that JARVIS consistently outperforms baseline and ablation variants in both automated and user-centered assessments, achieving high response quality and accuracy.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Jarvis, Google's web-browsing AI, is now officially known as Project Mariner
Earlier today, Google debuted Gemini 2.0. The company says its new machine learning model won't just enhance its existing products and services. It will also power entirely new experiences. To that point, Google previewed Project Mariner, an AI agent that can navigate within a web browser. Mariner is an experimental Chrome extension that is currently available to select "trusted testers."
Google is reportedly developing 'Jarvis' AI that could take over your web browser
Google may be close to unveiling an AI agent that can operate a web browser to help users automate everyday tasks. The Information reports that the company is working on a "computer-using agent" under the codename Project Jarvis, and it may be ready to be previewed as soon as December. According to sources that spoke to The Information, Jarvis "responds to a person's commands by capturing frequent screenshots of what's on their computer screen, and interpreting the shots before taking actions like clicking on a button or typing into a text field." Jarvis is reportedly made to work only with web browsers -- particularly Chrome -- to assist with common tasks like research, shopping and booking flights. It comes as Google continues to expand the capabilities of its Gemini AI, the next-gen model of which is expected to be revealed in December, as reported by The Verge.
- Information Technology > Communications > Web (0.89)
- Information Technology > Artificial Intelligence (0.82)
JARViS: Detecting Actions in Video Using Unified Actor-Scene Context Relation Modeling
Lee, Seok Hwan, Son, Taein, Seo, Soo Won, Kim, Jisong, Choi, Jun Won
Video action detection (VAD) is a formidable vision task that involves the localization and classification of actions within the spatial and temporal dimensions of a video clip. Among the myriad VAD architectures, two-stage VAD methods utilize a pre-trained person detector to extract the region of interest features, subsequently employing these features for action detection. However, the performance of two-stage VAD methods has been limited as they depend solely on localized actor features to infer action semantics. In this study, we propose a new two-stage VAD framework called Joint Actor-scene context Relation modeling based on Visual Semantics (JARViS), which effectively consolidates cross-modal action semantics distributed globally across spatial and temporal dimensions using Transformer attention. JARViS employs a person detector to produce densely sampled actor features from a keyframe. Concurrently, it uses a video backbone to create spatio-temporal scene features from a video clip. Finally, the fine-grained interactions between actors and scenes are modeled through a Unified Action-Scene Context Transformer to directly output the final set of actions in parallel. Our experimental results demonstrate that JARViS outperforms existing methods by significant margins and achieves state-of-the-art performance on three popular VAD datasets, including AVA, UCF101-24, and JHMDB51-21.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > South Korea > Seoul > Seoul (0.04)
Imagining a Future of Designing with AI: Dynamic Grounding, Constructive Negotiation, and Sustainable Motivation
Vaithilingam, Priyan, Arawjo, Ian, Glassman, Elena L.
We ideate a future design workflow that involves AI technology. Drawing from activity and communication theory, we attempt to isolate the new value large AI models can provide design compared to past technologies. We arrive at three affordances -- dynamic grounding, constructive negotiation, and sustainable motivation -- that summarize latent qualities of natural language-enabled foundation models that, if explicitly designed for, can support the process of design. Through design fiction, we then imagine a future interface as a diegetic prototype, the story of Squirrel Game, that demonstrates each of our three affordances in a realistic usage scenario. Our design process, terminology, and diagrams aim to contribute to future discussions about the relative affordances of AI technology with regard to collaborating with human designers.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (6 more...)
- Overview (0.46)
- Research Report (0.40)
- Leisure & Entertainment > Games > Computer Games (1.00)
- Information Technology (0.93)
Congress Wants Tech Companies to Pay Up for AI Training Data
Do AI companies need to pay for the training data that powers their generative AI systems? The question is hotly contested in Silicon Valley and in a wave of lawsuits levied against tech behemoths like Meta, Google, and OpenAI. In Washington, DC, though, there seems to be a growing consensus that the tech giants need to cough up. Today, at a Senate hearing on AI's impact on journalism, lawmakers from both sides of the aisle agreed that OpenAI and others should pay media outlets for using their work in AI projects. "It's not only morally right," said Richard Blumenthal, the Democrat who chairs the Judiciary Subcommittee on Privacy, Technology, and the Law that held the hearing.
- North America > United States > District of Columbia > Washington (0.26)
- North America > United States > California (0.26)
- Law (1.00)
- Information Technology (1.00)
- Media > News (0.77)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.81)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.50)
Measuring Lexical Diversity in Texts: The Twofold Length Problem
The impact of text length on the estimation of lexical diversity has captured the attention of the scientific community for more than a century. Numerous indices have been proposed, and many studies have been conducted to evaluate them, but the problem remains. This methodological review provides a critical analysis not only of the most commonly used indices in language learning studies, but also of the length problem itself, as well as of the methodology for evaluating the proposed solutions. The analysis of three datasets of English language-learners' texts revealed that indices that reduce all texts to the same length using a probabilistic or an algorithmic approach solve the length dependency problem; however, all these indices failed to address the second problem, which is their sensitivity to the parameter that determines the length to which the texts are reduced. The paper concludes with recommendations for optimizing lexical diversity analysis.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
GitHub - microsoft/JARVIS: JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
This project is under construction and we will have all the code ready soon. Language serves as an interface for LLMs to connect numerous AI models for solving complicated AI tasks! We introduce a collaborative system that consists of an LLM as the controller and numerous expert models as collaborative executors (from HuggingFace Hub). However, it means that Jarvis is restricted to models running stably on HuggingFace Inference Endpoints. Now you can access Jarvis' services by the Web API.