Belluno Province
OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal Retrieval
Yang, Wei, Fu, Jingjing, Wang, Rui, Wang, Jinyu, Song, Lei, Bian, Jiang
Vision-language retrieval-augmented generation (RAG) has become an effective approach for tackling Knowledge-Based Visual Question Answering (KB-VQA), which requires external knowledge beyond the visual content presented in images. The effectiveness of Vision-language RAG systems hinges on multimodal retrieval, which is inherently challenging due to the diverse modalities and knowledge granularities in both queries and knowledge bases. Existing methods have not fully tapped into the potential interplay between these elements. We propose a multimodal RAG system featuring a coarse-to-fine, multi-step retrieval that harmonizes multiple granularities and modalities to enhance efficacy. Our system begins with a broad initial search aligning knowledge granularity for cross-modal retrieval, followed by a multimodal fusion reranking to capture the nuanced multimodal information for top entity selection. A text reranker then filters out the most relevant fine-grained section for augmented generation. Extensive experiments on the InfoSeek and Encyclopedic-VQA benchmarks show our method achieves state-of-the-art retrieval performance and highly competitive answering results, underscoring its effectiveness in advancing KB-VQA systems.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (5 more...)
- Research Report (0.82)
- Workflow (0.70)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.87)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing
Ye, Xiaoju, Wang, Zhichun, Wang, Jingyuan
Limited by the context window size of Large Language Models(LLMs), handling various tasks with input tokens exceeding the upper limit has been challenging, whether it is a simple direct retrieval task or a complex multi-hop reasoning task. Although various methods have been proposed to enhance the long-context processing capabilities of LLMs, they either incur substantial post-training costs, or require additional tool modules(e.g.,RAG), or have not shown significant improvement in realistic tasks. Our work observes the correlation between the attention distribution and generated answers across each layer, and establishes the attention allocation aligns with retrieval-augmented capabilities through experiments. Drawing on the above insights, we propose a novel method InfiniRetri that leverages the LLMs's own attention information to enable accurate retrieval across inputs of infinitely length. Our evaluations indicate that InfiniRetri achieves 100% accuracy in the Needle-In-a-Haystack(NIH) test over 1M tokens using a 0.5B parameter model, surpassing other method or larger models and setting a new state-of-the-art(SOTA). Moreover, our method achieves significant performance improvements on real-world benchmarks, with a maximum 288% improvement. In addition, InfiniRetri can be applied to any Transformer-based LLMs without additional training and substantially reduces inference latency and compute overhead in long texts. In summary, our comprehensive studies show InfiniRetri's potential for practical applications and creates a paradigm for retrievaling information using LLMs own capabilities under infinite-length tokens. Code will be released in link.
- Europe > Italy > Veneto > Belluno Province (0.04)
- North America > United States > California (0.04)
- Overview (0.93)
- Research Report > Promising Solution (0.48)
Towards physics-informed neural networks for landslide prediction
For decades, solutions to regional scale landslide prediction have mostly relied on data-driven models, by definition, disconnected from the physics of the failure mechanism. The success and spread of such tools came from the ability to exploit proxy variables rather than explicit geotechnical ones, as the latter are prohibitive to acquire over broad landscapes. Our work implements a Physics Informed Neural Network (PINN) approach, thereby adding to a standard data-driven architecture, an intermediate constraint to solve for the permanent deformation typical of Newmark slope stability methods. This translates into a neural network tasked with explicitly retrieving geotechnical parameters from common proxy variables and then minimize a loss function with respect to the available coseismic landside inventory. The results are very promising, because our model not only produces excellent predictive performance in the form of standard susceptibility output, but in the process, also generates maps of the expected geotechnical properties at a regional scale. Such architecture is therefore framed to tackle coseismic landslide prediction, something that, if confirmed in other studies, could open up towards PINN-based near-real-time predictions.
- Asia > Middle East > Republic of Türkiye (0.14)
- Asia > Nepal (0.05)
- South America > Venezuela (0.04)
- (10 more...)