AITopics | Adamopoulos, George

Collaborating Authors

Adamopoulos, George

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark

Roger, Alexis, Humane, Prateek, Kaplan, Daniel Z., Gupta, Kshitij, Sun, Qi, Adamopoulos, George, Lim, Jonathan Siu Chi, Anthony, Quentin, Fennell, Edwin, Rish, Irina

arXiv.org Artificial IntelligenceJan-16-2025

The proliferation of Vision-Language Models (VLMs) in the past several years calls for rigorous and comprehensive evaluation methods and benchmarks. This work analyzes existing VLM evaluation techniques, including automated metrics, AIbased assessments, and human evaluations across diverse tasks. We first introduce Robin - a novel suite of VLMs that we built by combining Large Language Models (LLMs) and Vision Encoders (VEs) at multiple scales, and use Robin to identify shortcomings of current evaluation approaches across scales. Next, to overcome the identified limitations, we introduce CHIRP - a new long form response benchmark we developed for more robust and complete VLM evaluation. We provide open access to the Robin training code, model suite, and CHIRP benchmark to promote reproducibility and advance VLM research. Recently, a lot of significant advances have been made in Vision-Language Models (VLMs), driven by breakthroughs in computer vision and natural language processing Chen et al. (2022); Li et al. (2023b); Liu et al. (2023b); Sun et al. (2023). However, existing VLM benchmarks, often designed for specific tasks (e.g., VQAv2 Goyal et al. (2017)), struggle to accurately reflect real-world VLM performance and capture nuanced differences between models Hsieh et al. (2024). This is particularly evident when evaluating models with significant architectural variations, where standard benchmark scores remain similar despite noticeable differences in human-perceived model quality. To address this issue, we introduce CHIRP, a hybrid VLM benchmark that combines automated metrics' scalability with human evaluators' nuanced judgment. We argue that this approach is crucial for capturing the complexities of VLM behavior, which traditional benchmarks often fail to represent. To demonstrate the limitations of existing benchmarks and the efficacy of our proposed method, we introduce Robin, a suite of VLMs trained at various scales, inspired by the Pythia language model suite Biderman et al. (2023). By systematically varying the Vision Encoder (VE) and the Large Language Model (LLM) sizes, we will show that while benchmark scores remain largely unaffected, human evaluations reveal significant differences in the models' outputs quality. Our findings underscore the need for more robust and human-centric VLM evaluation methodologies. CHIRP paves the way for developing more reliable and informative VLM benchmarks, ultimately leading to the creation of more effective and impactful VLMs. Our Contributions: We investigate the drawbacks of relying on automatic metrics and show the benefits of AI-based and human-based evaluations of VLMs. We train and release an open-source collection of VLMs named Robin. Robin is a scaling suite based on LLMs and VEs of different sizes.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.09672

Country:

North America > United States (0.46)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation (0.68)
Energy (0.67)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent

Jucys, Karolis, Adamopoulos, George, Hamidi, Mehrab, Milani, Stephanie, Samsami, Mohammad Reza, Zholus, Artem, Joseph, Sonia, Richards, Blake, Rish, Irina, Şimşek, Özgür

arXiv.org Artificial IntelligenceJul-16-2024

Understanding the mechanisms behind decisions taken by large foundation models in sequential decision making tasks is critical to ensuring that such systems operate transparently and safely. In this work, we perform exploratory analysis on the Video PreTraining (VPT) Minecraft playing agent, one of the largest open-source vision-based agents. We aim to illuminate its reasoning mechanisms by applying various interpretability techniques. First, we analyze the attention mechanism while the agent solves its training task - crafting a diamond pickaxe. The agent pays attention to the last four frames and several key-frames further back in its six-second memory. This is a possible mechanism for maintaining coherence in a task that takes 3-10 minutes, despite the short memory span. Secondly, we perform various interventions, which help us uncover a worrying case of goal misgeneralization: VPT mistakenly identifies a villager wearing brown clothes as a tree trunk when the villager is positioned stationary under green tree leaves, and punches it to death.

large language model, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2407.12161

Country:

North America > United States (0.28)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.65)

Industry: Leisure & Entertainment > Games > Computer Games (0.74)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
(2 more...)

Add feedback

Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting

Rasul, Kashif, Ashok, Arjun, Williams, Andrew Robert, Ghonia, Hena, Bhagwatkar, Rishika, Khorasani, Arian, Bayazi, Mohammad Javad Darvishi, Adamopoulos, George, Riachi, Roland, Hassen, Nadhir, Biloš, Marin, Garg, Sahil, Schneider, Anderson, Chapados, Nicolas, Drouin, Alexandre, Zantedeschi, Valentina, Nevmyvaka, Yuriy, Rish, Irina

arXiv.org Artificial IntelligenceFeb-8-2024

Over the past years, foundation models have caused a paradigm shift in machine learning due to their unprecedented capabilities for zero-shot and few-shot generalization. However, despite the success of foundation models in modalities such as natural language processing and computer vision, the development of foundation models for time series forecasting has lagged behind. We present Lag-Llama, a general-purpose foundation model for univariate probabilistic time series forecasting based on a decoder-only transformer architecture that uses lags as covariates. Lag-Llama is pretrained on a large corpus of diverse time series data from several domains, and demonstrates strong zero-shot generalization capabilities compared to a wide range of forecasting models on downstream datasets across domains. Moreover, when fine-tuned on relatively small fractions of such previously unseen datasets, Lag-Llama achieves state-of-the-art performance, outperforming prior deep learning approaches, emerging as the best general-purpose model on average. Lag-Llama serves as a strong contender to the current state-of-art in time series forecasting and paves the way for future advancements in foundation models tailored to time series data.

data mining, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2310.08278

Country:

Oceania > Australia (0.93)
North America > Canada > Quebec > Montreal (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry: Energy > Renewable (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback