AITopics | Reilly, Dominick

Collaborating Authors

Reilly, Dominick

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LLAVIDAL: Benchmarking Large Language Vision Models for Daily Activities of Living

Chakraborty, Rajatsubhra, Sinha, Arkaprava, Reilly, Dominick, Govind, Manish Kumar, Wang, Pu, Bremond, Francois, Das, Srijan

arXiv.org Artificial IntelligenceJun-13-2024

Large Language Vision Models (LLVMs) have demonstrated effectiveness in processing internet videos, yet they struggle with the visually perplexing dynamics present in Activities of Daily Living (ADL) due to limited pertinent datasets and models tailored to relevant cues. To this end, we propose a framework for curating ADL multiview datasets to fine-tune LLVMs, resulting in the creation of ADL-X, comprising 100K RGB video-instruction pairs, language descriptions, 3D skeletons, and action-conditioned object trajectories. We introduce LLAVIDAL, an LLVM capable of incorporating 3D poses and relevant object trajectories to understand the intricate spatiotemporal relationships within ADLs. Furthermore, we present a novel benchmark, ADLMCQ, for quantifying LLVM effectiveness in ADL scenarios. When trained on ADL-X, LLAVIDAL consistently achieves state-of-the-art performance across all ADL evaluation metrics. Qualitative analysis reveals LLAVIDAL's temporal reasoning capabilities in understanding ADL. The link to the dataset is provided at: https://adl-x.github.io/

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.0939

Country:

North America > United States (0.14)
Europe > France (0.14)

Genre: Research Report (0.82)

Industry:

Information Technology (0.67)
Health & Medicine > Therapeutic Area (0.46)
Health & Medicine > Health Care Providers & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
(2 more...)

Add feedback

Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders

Das, Srijan, Jain, Tanmay, Reilly, Dominick, Balaji, Pranav, Karmakar, Soumyajit, Marjit, Shyam, Li, Xiang, Das, Abhijit, Ryoo, Michael S.

arXiv.org Artificial IntelligenceDec-27-2023

Vision Transformers (ViTs) have become ubiquitous in computer vision. Despite their success, ViTs lack inductive biases, which can make it difficult to train them with limited data. To address this challenge, prior studies suggest training ViTs with self-supervised learning (SSL) and fine-tuning sequentially. However, we observe that jointly optimizing ViTs for the primary task and a Self-Supervised Auxiliary Task (SSAT) is surprisingly beneficial when the amount of training data is limited. We explore the appropriate SSL tasks that can be optimized alongside the primary task, the training schemes for these tasks, and the data scale at which they can be most effective. Our findings reveal that SSAT is a powerful technique that enables ViTs to leverage the unique characteristics of both the self-supervised and primary tasks, achieving better performance than typical ViTs pre-training with SSL and fine-tuning sequentially. Our experiments, conducted on 10 datasets, demonstrate that SSAT significantly improves ViT performance while reducing carbon footprint. We also confirm the effectiveness of SSAT in the video domain for deepfake detection, showcasing its generalizability. Our code is available at https://github.com/dominickrei/Limited-data-vits.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2310.20704

Country:

North America > United States (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.68)
Energy (0.48)
Information Technology > Security & Privacy (0.36)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback