AITopics | Pacific Ocean

Collaborating Authors

Pacific Ocean

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

Gao, Timin, Chen, Peixian, Zhang, Mengdan, Fu, Chaoyou, Shen, Yunhang, Zhang, Yan, Zhang, Shengchuan, Zheng, Xiawu, Sun, Xing, Cao, Liujuan, Ji, Rongrong

arXiv.org Artificial IntelligenceApr-24-2024

With the advent of large language models(LLMs) enhanced by the chain-of-thought(CoT) methodology, visual reasoning problem is usually decomposed into manageable sub-tasks and tackled sequentially with various external tools. However, such a paradigm faces the challenge of the potential "determining hallucinations" in decision-making due to insufficient visual information and the limitation of low-level perception tools that fail to provide abstract summaries necessary for comprehensive reasoning. We argue that converging visual context acquisition and logical reasoning is pivotal for tackling visual reasoning tasks. This paper delves into the realm of multimodal CoT to solve intricate visual reasoning tasks with multimodal large language models(MLLMs) and their cognitive capability. To this end, we propose an innovative multimodal CoT framework, termed Cantor, characterized by a perception-decision architecture. Cantor first acts as a decision generator and integrates visual inputs to analyze the image and problem, ensuring a closer alignment with the actual context. Furthermore, Cantor leverages the advanced cognitive functions of MLLMs to perform as multifaceted experts for deriving higher-level information, enhancing the CoT generation process. Our extensive experiments demonstrate the efficacy of the proposed framework, showing significant improvements in multimodal CoT performance across two complex visual reasoning datasets, without necessitating fine-tuning or ground-truth rationales. Project Page: https://ggg0919.github.io/cantor/ .

cantor, information, visual information, (13 more...)

arXiv.org Artificial Intelligence

2404.16033

Country:

Oceania > New Zealand (0.05)
Oceania > Australia (0.05)
South America > Ecuador (0.04)
(9 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.34)
Health & Medicine > Therapeutic Area > Neurology (0.34)
Health & Medicine > Consumer Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Retrieval Head Mechanistically Explains Long-Context Factuality

Wu, Wenhao, Wang, Yizhong, Xiao, Guangxuan, Peng, Hao, Fu, Yao

arXiv.org Artificial IntelligenceApr-23-2024

Despite the recent progress in long-context language models, it remains elusive how transformer-based models exhibit the capability to retrieve relevant information from arbitrary locations within the long context. This paper aims to address this question. Our systematic investigation across a wide spectrum of models reveals that a special type of attention heads are largely responsible for retrieving information, which we dub retrieval heads. We identify intriguing properties of retrieval heads:(1) universal: all the explored models with long-context capability have a set of retrieval heads; (2) sparse: only a small portion (less than 5\%) of the attention heads are retrieval. (3) intrinsic: retrieval heads already exist in models pretrained with short context. When extending the context length by continual pretraining, it is still the same set of heads that perform information retrieval. (4) dynamically activated: take Llama-2 7B for example, 12 retrieval heads always attend to the required information no matter how the context is changed. The rest of the retrieval heads are activated in different contexts. (5) causal: completely pruning retrieval heads leads to failure in retrieving relevant information and results in hallucination, while pruning random non-retrieval heads does not affect the model's retrieval ability. We further show that retrieval heads strongly influence chain-of-thought (CoT) reasoning, where the model needs to frequently refer back the question and previously-generated context. Conversely, tasks where the model directly generates the answer using its intrinsic knowledge are less impacted by masking out retrieval heads. These observations collectively explain which internal part of the model seeks information from the input tokens. We believe our insights will foster future research on reducing hallucination, improving reasoning, and compressing the KV cache.

arxiv preprint arxiv, information, retrieval head, (13 more...)

arXiv.org Artificial Intelligence

2404.15574

Country:

North America > United States > California > San Francisco County > San Francisco (0.05)
Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Unified Replay-based Continuous Learning Framework for Spatio-Temporal Prediction on Streaming Data

Miao, Hao, Zhao, Yan, Guo, Chenjuan, Yang, Bin, Zheng, Kai, Huang, Feiteng, Xie, Jiandong, Jensen, Christian S.

arXiv.org Artificial IntelligenceApr-23-2024

The widespread deployment of wireless and mobile devices results in a proliferation of spatio-temporal data that is used in applications, e.g., traffic prediction, human mobility mining, and air quality prediction, where spatio-temporal prediction is often essential to enable safety, predictability, or reliability. Many recent proposals that target deep learning for spatio-temporal prediction suffer from so-called catastrophic forgetting, where previously learned knowledge is entirely forgotten when new data arrives. Such proposals may experience deteriorating prediction performance when applied in settings where data streams into the system. To enable spatio-temporal prediction on streaming data, we propose a unified replay-based continuous learning framework. The framework includes a replay buffer of previously learned samples that are fused with training data using a spatio-temporal mixup mechanism in order to preserve historical knowledge effectively, thus avoiding catastrophic forgetting. To enable holistic representation preservation, the framework also integrates a general spatio-temporal autoencoder with a carefully designed spatio-temporal simple siamese (STSimSiam) network that aims to ensure prediction accuracy and avoid holistic feature loss by means of mutual information maximization. The framework further encompasses five spatio-temporal data augmentation methods to enhance the performance of STSimSiam. Extensive experiments on real data offer insight into the effectiveness of the proposed framework.

learning, prediction, spatio-temporal prediction, (15 more...)

arXiv.org Artificial Intelligence

2404.14999

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
Asia > China (0.04)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Continuing Education (0.71)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Deep Multi-View Channel-Wise Spatio-Temporal Network for Traffic Flow Prediction

Miao, Hao, Wang, Senzhang, Zhang, Meiyue, Guo, Diansheng, Sun, Funing, Yang, Fan

arXiv.org Artificial IntelligenceApr-23-2024

Accurately forecasting traffic flows is critically important to many real applications including public safety and intelligent transportation systems. The challenges of this problem include both the dynamic mobility patterns of the people and the complex spatial-temporal correlations of the urban traffic data. Meanwhile, most existing models ignore the diverse impacts of the various traffic observations (e.g. vehicle speed and road occupancy) on the traffic flow prediction, and different traffic observations can be considered as different channels of input features. We argue that the analysis in multiple-channel traffic observations might help to better address this problem. In this paper, we study the novel problem of multi-channel traffic flow prediction, and propose a deep \underline{M}ulti-\underline{V}iew \underline{C}hannel-wise \underline{S}patio-\underline{T}emporal \underline{Net}work (MVC-STNet) model to effectively address it. Specifically, we first construct the localized and globalized spatial graph where the multi-view fusion module is used to effectively extract the local and global spatial dependencies. Then LSTM is used to learn the temporal correlations. To effectively model the different impacts of various traffic observations on traffic flow prediction, a channel-wise graph convolutional network is also designed. Extensive experiments are conducted over the PEMS04 and PEMS08 datasets. The results demonstrate that the proposed MVC-STNet outperforms state-of-the-art methods by a large margin.

flow prediction, prediction, traffic flow, (15 more...)

arXiv.org Artificial Intelligence

2404.15034

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Consumer Products & Services > Travel (1.00)
Transportation > Ground > Road (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Efficient infusion of self-supervised representations in Automatic Speech Recognition

Prabhu, Darshan, Mirishkar, Sai Ganesh, Wasnik, Pankaj

arXiv.org Artificial IntelligenceApr-19-2024

Self-supervised learned (SSL) models such as Wav2vec and HuBERT yield state-of-the-art results on speech-related tasks. Given the effectiveness of such models, it is advantageous to use them in conventional ASR systems. While some approaches suggest incorporating these models as a trainable encoder or a learnable frontend, training such systems is extremely slow and requires a lot of computation cycles. In this work, we propose two simple approaches that use (1) framewise addition and (2) cross-attention mechanisms to efficiently incorporate the representations from the SSL model(s) into the ASR architecture, resulting in models that are comparable in size with standard encoder-decoder conformer systems while also avoiding the usage of SSL models during training. Our approach results in faster training and yields significant performance gains on the Librispeech and Tedlium datasets compared to baselines. We further provide detailed analysis and ablation studies that demonstrate the effectiveness of our approach.

architecture, representation, ssl model, (13 more...)

arXiv.org Artificial Intelligence

2404.12628

Country:

Asia > India (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Pacific Ocean (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.89)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)

Add feedback

CMNEE: A Large-Scale Document-Level Event Extraction Dataset based on Open-Source Chinese Military News

Zhu, Mengna, Xu, Zijie, Zeng, Kaisheng, Xiao, Kaiming, Wang, Mao, Ke, Wenjun, Huang, Hongbin

arXiv.org Artificial IntelligenceApr-18-2024

Extracting structured event knowledge, including event triggers and corresponding arguments, from military texts is fundamental to many applications, such as intelligence analysis and decision assistance. However, event extraction in the military field faces the data scarcity problem, which impedes the research of event extraction models in this domain. To alleviate this problem, we propose CMNEE, a large-scale, document-level open-source Chinese Military News Event Extraction dataset. It contains 17,000 documents and 29,223 events, which are all manually annotated based on a pre-defined schema for the military domain including 8 event types and 11 argument role types. We designed a two-stage, multi-turns annotation strategy to ensure the quality of CMNEE and reproduced several state-of-the-art event extraction models with a systematic evaluation. The experimental results on CMNEE fall shorter than those on other domain datasets obviously, which demonstrates that event extraction for military domain poses unique challenges and requires further research efforts. Our code and data can be obtained from https://github.com/Mzzzhu/CMNEE.

argument, dataset, extraction, (13 more...)

arXiv.org Artificial Intelligence

2404.12242

Country:

Asia > Russia (0.14)
North America > United States (0.14)
Asia > Afghanistan (0.05)
(17 more...)

Genre: Research Report (0.40)

Industry:

Government > Military (1.00)
Government > Regional Government > Asia Government > China Government (0.60)

Technology:

Information Technology > Software (0.84)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent

Chen, Wei, Li, Zhiyuan

arXiv.org Artificial IntelligenceApr-18-2024

A multimodal AI agent is characterized by its ability to process and learn from various types of data, including natural language, visual, and audio inputs, to inform its actions. Despite advancements in large language models that incorporate visual data, such as GPT-4V, effectively translating image-based data into actionable outcomes for AI agents continues to be challenging. In this paper, we introduce a multimodal model that incorporates the concept of functional token specifically designed for AI agent applications. To ensure compatibility with edge devices, our model is optimized to a compact size of less than 1B parameters. Like GPT-4, our model can process both English and Chinese. We demonstrate that this model is capable of operating efficiently on a wide range of edge devices, including as constrained as a Raspberry Pi.

arxiv preprint arxiv, information, language model, (14 more...)

arXiv.org Artificial Intelligence

2404.11459

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > China (0.04)
Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.94)
Information Technology > Services (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)

Add feedback

Scientist share world's first 'conversation' between humans and whales - and say it's the first step to understanding aliens

Daily Mail - Science & techApr-17-2024, 16:00:28 GMT

Scientists claim they have had the first one-on-one conversation with a whale. The team from the SETI Institute and the University of California'spoke' with a 38-year-old humpback whale, named Twain, off the coast of Alaska. They used an underwater microphone to send out whale calls, 'whup/throp' sounds, and received 36 responses that seemed like Twain was actively engaged in a communicative exchange. AI-powered algorithms analyzed the replies, revealing Twain may have shared a greeting call with the team on a boat in the Pacific Ocean. While speaking to a different species has never been done in this manner, researchers are using the experience to hopefully one day converse with extraterrestrial life.

communication, twain, whale, (14 more...)

Daily Mail - Science & tech

Country:

Pacific Ocean (0.26)
North America > United States > Alaska (0.26)
North America > United States > California (0.25)
North America > United States > Hawaii (0.05)

Genre: Research Report (0.30)

Technology: Information Technology > Artificial Intelligence (0.92)

Add feedback

NASA confirms object that struck Florida home came from pallet of batteries intended to burn up in atmosphere

FOX NewsApr-17-2024, 00:54:58 GMT

Ten U.S. and 2 United Arab Emirates astronauts have just completed 2 years of training NASA confirmed on Monday that an object that crashed into a Naples, Florida, home last month was a piece of hardware from the International Space Station that was supposed to burn up on re-entry before reaching the surface of Earth. Alejandro Otero said a piece of equipment from the International Space Station hit his Naples home, posting photos of the object on X in response to an astronomer who was tracking where and when the equipment would enter the Earth's atmosphere. Otero told the astronomer it looked like one of the pieces had missed Fort Myers, and landed inside his home. "Tore through the roof and went thru 2 floors," he posted on X, adding that it almost hit his son. FLORIDA MAN SAYS SPACE OBJECT CRASHED INTO HIS HOUSE.

battery, nasa, pallet, (12 more...)

FOX News

Country:

North America > United States > Florida > Collier County > Naples (0.38)
Asia > Middle East > UAE (0.25)
Pacific Ocean (0.05)
North America > Central America (0.05)

Industry:

Government > Space Agency (1.00)
Government > Regional Government > North America Government > United States Government (0.97)

Technology: Information Technology > Artificial Intelligence (0.36)

Add feedback

Variational quantization for state space models

David, Etienne, Bellot, Jean, Corff, Sylvain Le

arXiv.org Artificial IntelligenceApr-17-2024

Forecasting tasks using large datasets gathering thousands of heterogeneous time series is a crucial statistical problem in numerous sectors. The main challenge is to model a rich variety of time series, leverage any available external signals and provide sharp predictions with statistical guarantees. In this work, we propose a new forecasting model that combines discrete state space hidden Markov models with recent neural network architectures and training procedures inspired by vector quantized variational autoencoders. We introduce a variational discrete posterior distribution of the latent states given the observations and a two-stage training procedure to alternatively train the parameters of the latent states and of the emission distributions. By learning a collection of emission laws and temporarily activating them depending on the hidden process dynamics, the proposed method allows to explore large datasets and leverage available external signals. We assess the performance of the proposed method using several datasets and show that it outperforms other state-of-the-art solutions.

dataset, sylvain le corff variational quantization, time sery, (12 more...)

arXiv.org Artificial Intelligence

2404.11117

Country:

Oceania > Australia (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Europe > France (0.04)
(11 more...)

Genre: Research Report > Promising Solution (0.48)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback