Goto

Collaborating Authors

 look-back


Look-Back: Implicit Visual Re-focusing in MLLM Reasoning

Yang, Shuo, Niu, Yuwei, Liu, Yuyang, Ye, Yang, Lin, Bin, Yuan, Li

arXiv.org Artificial Intelligence

Multimodal Large Language Models (MLLMs) have achieved remarkable progress in multimodal reasoning. However, they often excessively rely on textual information during the later stages of inference, neglecting the crucial integration of visual input. Current methods typically address this by explicitly injecting visual information to guide the reasoning process. In this work, through an analysis of MLLM attention patterns, we made an intriguing observation: with appropriate guidance, MLLMs can spontaneously re-focus their attention on visual inputs during the later stages of reasoning, even without explicit visual information injection. This spontaneous shift in focus suggests that MLLMs are intrinsically capable of performing visual fusion reasoning. Building on this insight, we introduce Look-Back, an implicit approach designed to guide MLLMs to ``look back" at visual information in a self-directed manner during reasoning. Look-Back empowers the model to autonomously determine when, where, and how to re-focus on visual inputs, eliminating the need for explicit model-structure constraints or additional input. We demonstrate that Look-Back significantly enhances the model's reasoning and perception capabilities, as evidenced by extensive empirical evaluations on multiple multimodal benchmarks.


Quantifying Qualitative Insights: Leveraging LLMs to Market Predict

Lee, Hoyoung, Choi, Youngsoo, Kwon, Yuhee

arXiv.org Artificial Intelligence

Recent advancements in Large Language Models (LLMs) have the potential to transform financial analytics by integrating numerical and textual data. However, challenges such as insufficient context when fusing multimodal information and the difficulty in measuring the utility of qualitative outputs, which LLMs generate as text, have limited their effectiveness in tasks such as financial forecasting. This study addresses these challenges by leveraging daily reports from securities firms to create high-quality contextual information. The reports are segmented into text-based key factors and combined with numerical data, such as price information, to form context sets. By dynamically updating few-shot examples based on the query time, the sets incorporate the latest information, forming a highly relevant set closely aligned with the query point. Additionally, a crafted prompt is designed to assign scores to the key factors, converting qualitative insights into quantitative results. The derived scores undergo a scaling process, transforming them into real-world values that are used for prediction. Our experiments demonstrate that LLMs outperform time-series models in market forecasting, though challenges such as imperfect reproducibility and limited explainability remain.


Look-back Decoding for Open-Ended Text Generation

Xu, Nan, Zhou, Chunting, Celikyilmaz, Asli, Ma, Xuezhe

arXiv.org Artificial Intelligence

Look-back, an improved decoding algorithm that leverages the Kullback-Leibler divergence Figure 1: Maximum similarity of hidden states and to track the distribution distance between current normalized minimum KL divergence between current and historical decoding steps. Thus Lookback step and history (a) or prefix (b) from GPT2 on 1,000 can automatically predict potential repetitive instances of WikiText-103. Compared with human continuation, phrase and topic drift, and remove tokens (a): repetition has much smaller minKL but that may cause the failure modes, restricting undistinguishable high maxHidden with history text, (b): the next token probability distribution within a pseudo topic drift by switching to continuation of another plausible distance to the history. We perform instance has much higher minKL but similar high decoding experiments on document continuation maxHidden with prefix text.


A Look-back at Tech 2019

#artificialintelligence

Anyone would be forgiven for being heartily sick of hearing the term Artificial Intelligence. But over the whole of the past decade, AI has grown to already being something that impacts our lives everyday. Very soon it will be threaded through everything like DNA and we will stop noticing it, just as we don't notice how electricity is part of daily life. Deep Learning, which when a computer system is fed a big amount of data to recognise patterns much faster than human beings can also goes on to arm itself with knowledge humans didn't know. We use more simple AI and in many ways everyday -- it's right on your phone and in your smart speakers and you're probably a part of the datasets being analysed by many algorithms for different purposes. If you want to remind yourself of the speed of AI, play a song and just ask Google or Shazam to identity it.