AITopics | frag

Collaborating Authors

frag

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FRAG: Frame Selection Augmented Generation for Long Video and Long Document Understanding

Huang, De-An, Radhakrishnan, Subhashree, Yu, Zhiding, Kautz, Jan

arXiv.org Artificial IntelligenceApr-25-2025

There has been impressive progress in Large Multimodal Models (LMMs). Recent works extend these models to long inputs, including multi-page documents and long videos. However, the model size and performance of these long context models are still limited due to the computational cost in both training and inference. In this work, we explore an orthogonal direction and process long inputs without long context LMMs. We propose Frame Selection Augmented Generation (FRAG), where the model first selects relevant frames within the input, and then only generates the final outputs based on the selected frames. The core of the selection process is done by scoring each frame independently, which does not require long context processing. The frames with the highest scores are then selected by a simple Top-K selection. We show that this frustratingly simple framework is applicable to both long videos and multi-page documents using existing LMMs without any fine-tuning. We consider two models, LLaVA-OneVision and InternVL2, in our experiments and show that FRAG consistently improves the performance and achieves state-of-the-art performances for both long video and long document understanding. For videos, FRAG substantially improves InternVL2-76B by 5.8% on MLVU and 3.7% on Video-MME. For documents, FRAG achieves over 20% improvements on MP-DocVQA compared with recent LMMs specialized in long document understanding. Code is available at: https://github.com/NVlabs/FRAG

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2504.17447

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Add feedback

FRAG: A Flexible Modular Framework for Retrieval-Augmented Generation based on Knowledge Graphs

Gao, Zengyi, Cao, Yukun, Wang, Hairu, Ke, Ao, Feng, Yuan, Xie, Xike, Zhou, S Kevin

arXiv.org Artificial IntelligenceJan-22-2025

To mitigate the hallucination and knowledge deficiency in large language models (LLMs), Knowledge Graph (KG)-based Retrieval-Augmented Generation (RAG) has shown promising potential by utilizing KGs as external resource to enhance LLMs reasoning. However, existing KG-RAG approaches struggle with a trade-off between flexibility and retrieval quality. Modular methods prioritize flexibility by avoiding the use of KG-fine-tuned models during retrieval, leading to fixed retrieval strategies and suboptimal retrieval quality. Conversely, coupled methods embed KG information within models to improve retrieval quality, but at the expense of flexibility. In this paper, we propose a novel flexible modular KG-RAG framework, termed FRAG, which synergizes the advantages of both approaches. FRAG estimates the hop range of reasoning paths based solely on the query and classify it as either simple or complex. To match the complexity of the query, tailored pipelines are applied to ensure efficient and accurate reasoning path retrieval, thus fostering the final reasoning process. By using the query text instead of the KG to infer the structural information of reasoning paths and employing adaptable retrieval strategies, FRAG improves retrieval quality while maintaining flexibility. Moreover, FRAG does not require extra LLMs fine-tuning or calls, significantly boosting efficiency and conserving resources. Extensive experiments show that FRAG achieves state-of-the-art performance with high efficiency and low resource consumption.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.09957

Country:

Europe (0.68)
North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fusion with Diffusion for Robust Visual Tracking Yu Zhou

Neural Information Processing SystemsMar-14-2024, 06:13:47 GMT

A weighted graph is used as an underlying structure of many algorithms like semisupervised learning and spectral clustering. If the edge weights are determined by a single similarity measure, then it hard if not impossible to capture all relevant aspects of similarity when using a single similarity measure. In particular, in the case of visual object matching it is beneficial to integrate different similarity measures that focus on different visual representations. In this paper, a novel approach to integrate multiple similarity measures is proposed. First pairs of similarity measures are combined with a diffusion process on their tensor product graph (TPG).

Neural Information Processing Systems

Country:

Asia > China (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre:

Research Report (0.48)
Overview (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Arnold: An Autonomous Agent to Play FPS Games

Chaplot, Devendra Singh (Carnegie Mellon University) | Lample, Guillaume (Carnegie Mellon University)

AAAI ConferencesFeb-14-2017

Advances in deep reinforcement learning have allowed autonomous agents to perform well on Atari games, often outperforming humans, using only raw pixels to make their decisions. However, most of these games take place in 2D environments that are fully observable to the agent. In this paper, we present Arnold, a completely autonomous agent to play First-Person Shooter Games using only screen pixel data and demonstrate its effectiveness on Doom, a classical first-person shooter game. Arnold is trained with deep reinforcement learning using a recent Action-Navigation architecture, which uses separate deep neural networks for exploring the map and fighting enemies. Furthermore, it utilizes a lot of techniques such as augmenting high-level game features, reward shaping and sequential updates for efficient training and effective performance. Arnold outperforms average humans as well as in-built game bots on different variations of the deathmatch. It also obtained the highest kill-to-death ratio in both the tracks of the Visual Doom AI Competition and placed second in terms of the number of frags.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States (0.15)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

Learning to Act by Predicting the Future

Dosovitskiy, Alexey, Koltun, Vladlen

arXiv.org Artificial IntelligenceFeb-14-2017

We present an approach to sensorimotor control in immersive environments. Our approach utilizes a high-dimensional sensory stream and a lower-dimensional measurement stream. The cotemporal structure of these streams provides a rich supervisory signal, which enables training a sensorimotor control model by interacting with the environment. The model is trained using supervised learning techniques, but without extraneous supervision. It learns to act based on raw sensory input from a complex three-dimensional environment. The presented formulation enables learning without a fixed goal at training time, and pursuing dynamically changing goals at test time. We conduct extensive experiments in three-dimensional simulations based on the classical first-person game Doom. The results demonstrate that the presented approach outperforms sophisticated prior formulations, particularly on challenging tasks. The results also show that trained models successfully generalize across environments and goals. A model trained using the presented approach won the Full Deathmatch track of the Visual Doom AI Competition, which was held in previously unseen environments.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1611.01779

Genre: Research Report > New Finding (0.88)

Industry: Leisure & Entertainment > Games > Computer Games (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Facebook and Intel reign supreme in 'Doom' AI deathmatch

#artificialintelligenceSep-22-2016, 23:25:30 GMT

There were two "tracks" for agents to compete on, offering very different challenges. Track 1 featured a map known to the teams, and rocket launchers were the only weapons. The agents started with a weapon but were able to collect ammo and health kits. Track 2 was a far harder challenge. It featured three maps, unknown to teams, and a full array of weapons and items.

artificial intelligence, machine learning, social media, (18 more...)

#artificialintelligence

Country:

Europe > Switzerland (0.05)
Europe > Finland (0.05)

Genre: Research Report (0.48)

Industry: Leisure & Entertainment > Games (0.47)

Technology:

Information Technology > Communications > Social Media (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback

Facebook and Intel reign supreme in 'Doom' AI deathmatch

EngadgetSep-22-2016, 18:11:40 GMT

On the island of Santorini, Greece, a group of AIs has been facing off in an epic battle of Doom. This is VizDoom, a contest born from one man's idea: To improve the state of artificial intelligence by teaching computers the art of fragging. That simple notion then spiraled into a battle between tech giants, universities and coders. Over the past few months they've all been honing their bots (known as "agents"), building up to one, final death match. Okay, it was a lot more than one match.

artificial intelligence, machine learning, social media, (17 more...)

Engadget

Country:

Europe > Greece (0.24)
Europe > Switzerland (0.04)
Europe > Finland (0.04)

Genre: Research Report (0.48)

Industry:

Information Technology (0.49)
Education (0.49)
Leisure & Entertainment > Games (0.47)

Technology:

Information Technology > Communications > Social Media (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback

AI will frag each other with rocket launchers in 'Doom'

EngadgetApr-26-2016, 03:55:18 GMT

The other round mixes it up a bit by allowing multiple weapons and items in full deathmatch on a trio of unknown maps. This is incredibly important for machine learning because rather than typical bots in a game, the controllers here don't have access to the underlying code or map layout -- everything is picked up by visual learning.

artificial intelligence, machine learning, rocket launcher, (2 more...)

Engadget

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Fusion with Diffusion for Robust Visual Tracking

Zhou, Yu, Bai, Xiang, Liu, Wenyu, Latecki, Longin J.

Neural Information Processing SystemsDec-31-2012

A weighted graph is used as an underlying structure of many algorithms like semi-supervised learning and spectral clustering. The edge weights are usually deter-mined by a single similarity measure, but it often hard if not impossible to capture all relevant aspects of similarity when using a single similarity measure. In par-ticular, in the case of visual object matching it is beneficial to integrate different similarity measures that focus on different visual representations. In this paper, a novel approach to integrate multiple similarity measures is pro-posed. First pairs of similarity measures are combined with a diffusion process on their tensor product graph (TPG). Hence the diffused similarity of each pair of ob-jects becomes a function of joint diffusion of the two original similarities, which in turn depends on the neighborhood structure of the TPG. We call this process Fusion with Diffusion (FD). However, a higher order graph like the TPG usually means significant increase in time complexity. This is not the case in the proposed approach. A key feature of our approach is that the time complexity of the dif-fusion on the TPG is the same as the diffusion process on each of the original graphs, Moreover, it is not necessary to explicitly construct the TPG in our frame-work. Finally all diffused pairs of similarity measures are combined as a weighted sum. We demonstrate the advantages of the proposed approach on the task of visual tracking, where different aspects of the appearance similarity between the target object in frame t and target object candidates in frame t+1 are integrated. The obtained method is tested on several challenge video sequences and the experimental results show that it outperforms state-of-the-art tracking methods.

artificial intelligence, frag, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.35)

Add feedback