Goto

Collaborating Authors

 Pacific Ocean


LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models

arXiv.org Artificial Intelligence

Large language models (LLMs) require continual knowledge updates to stay abreast of the ever-changing world facts, prompting the formulation of lifelong model editing task. While recent years have witnessed the development of various techniques for single and batch editing, these methods either fail to apply or perform sub-optimally when faced with lifelong editing. In this paper, we introduce LEMoE, an advanced Mixture of Experts (MoE) adaptor for lifelong model editing. We first analyze the factors influencing the effectiveness of conventional MoE adaptor in lifelong editing, including catastrophic forgetting, inconsistent routing and order sensitivity. Based on these insights, we propose a tailored module insertion method to achieve lifelong editing, incorporating a novel KV anchor routing to enhance routing consistency between training and inference stage, along with a concise yet effective clustering-based editing order planning. Experimental results demonstrate the effectiveness of our method in lifelong editing, surpassing previous model editing techniques while maintaining outstanding performance in batch editing task. Our code will be available.


CLIMATELI: Evaluating Entity Linking on Climate Change Data

arXiv.org Artificial Intelligence

Climate Change (CC) is a pressing topic of global importance, attracting increasing attention across research fields, from social sciences to Natural Language Processing (NLP). CC is also discussed in various settings and communication platforms, from academic publications to social media forums. Understanding who and what is mentioned in such data is a first critical step to gaining new insights into CC. We present CLIMATELI (CLIMATe Entity LInking), the first manually annotated CC dataset that links 3,087 entity spans to Wikipedia. Using CLIMATELI (CLIMATe Entity LInking), we evaluate existing entity linking (EL) systems on the CC topic across various genres and propose automated filtering methods for CC entities. We find that the performance of EL models notably lags behind humans at both token and entity levels. Testing within the scope of retaining or excluding non-nominal and/or non-CC entities particularly impacts the models' performances.


Banishing LLM Hallucinations Requires Rethinking Generalization

arXiv.org Artificial Intelligence

Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in external knowledge sources. Through extensive systematic experiments, we show that these traditional approaches fail to explain why LLMs hallucinate in practice. Specifically, we show that LLMs augmented with a massive Mixture of Memory Experts (MoME) can easily memorize large datasets of random numbers. We corroborate these experimental findings with a theoretical construction showing that simple neural networks trained to predict the next token hallucinate when the training loss is above a threshold as it usually does in practice when training on internet scale data. We interpret our findings by comparing against traditional retrieval methods for mitigating hallucinations. We use our findings to design a first generation model for removing hallucinations -- Lamini-1 -- that stores facts in a massive mixture of millions of memory experts that are retrieved dynamically.


Bosch Street Dataset: A Multi-Modal Dataset with Imaging Radar for Automated Driving

arXiv.org Artificial Intelligence

This paper introduces the Bosch street dataset (BSD), a novel multi-modal large-scale dataset aimed at promoting highly automated driving (HAD) and advanced driver-assistance systems (ADAS) research. Unlike existing datasets, BSD offers a unique integration of high-resolution imaging radar, lidar, and camera sensors, providing unprecedented 360-degree coverage to bridge the current gap in high-resolution radar data availability. Spanning urban, rural, and highway environments, BSD enables detailed exploration into radar-based object detection and sensor fusion techniques. The dataset is aimed at facilitating academic and research collaborations between Bosch and current and future partners. This aims to foster joint efforts in developing cutting-edge HAD and ADAS technologies. The paper describes the dataset's key attributes, including its scalability, radar resolution, and labeling methodology. Key offerings also include initial benchmarks for sensor modalities and a development kit tailored for extensive data analysis and performance evaluation, underscoring our commitment to contributing valuable resources to the HAD and ADAS research community.


Sphere Neural-Networks for Rational Reasoning

arXiv.org Artificial Intelligence

The success of Large Language Models (LLMs), e.g., ChatGPT, is witnessed by their planetary popularity, their capability of human-like communication, and also by their steadily improved reasoning performance. However, it remains unclear whether LLMs reason. It is an open problem how traditional neural networks can be qualitatively extended to go beyond the statistic paradigm and achieve high-level cognition. Here, we present a novel qualitative extension by generalising computational building blocks from vectors to spheres. We propose Sphere Neural Networks (SphNNs) for human-like reasoning through model construction and inspection, and develop SphNN for syllogistic reasoning, a microcosm of human rationality. SphNN is a hierarchical neuro-symbolic Kolmogorov-Arnold geometric GNN, and uses a neuro-symbolic transition map of neighbourhood spatial relations to transform the current sphere configuration towards the target. SphNN is the first neural model that can determine the validity of long-chained syllogistic reasoning in one epoch without training data, with the worst computational complexity of O(N). SphNN can evolve into various types of reasoning, such as spatio-temporal reasoning, logical reasoning with negation and disjunction, event reasoning, neuro-symbolic unification, and humour understanding (the highest level of cognition). All these suggest a new kind of Herbert A. Simon's scissors with two neural blades. SphNNs will tremendously enhance interdisciplinary collaborations to develop the two neural blades and realise deterministic neural reasoning and human-bounded rationality and elevate LLMs to reliable psychological AI. This work suggests that the non-zero radii of spheres are the missing components that prevent traditional deep-learning systems from reaching the realm of rational reasoning and cause LLMs to be trapped in the swamp of hallucination.


Are Language Models Actually Useful for Time Series Forecasting?

arXiv.org Artificial Intelligence

Large language models (LLMs) are being applied to time series tasks, particularly time series forecasting. However, are language models actually useful for time series? After a series of ablation studies on three recent and popular LLM-based time series forecasting methods, we find that removing the LLM component or replacing it with a basic attention layer does not degrade the forecasting results -- in most cases the results even improved. We also find that despite their significant computational cost, pretrained LLMs do no better than models trained from scratch, do not represent the sequential dependencies in time series, and do not assist in few-shot settings. Additionally, we explore time series encoders and reveal that patching and attention structures perform similarly to state-of-the-art LLM-based forecasters.


How underwater drones could shape a potential Taiwan-China conflict

MIT Technology Review

The report's authors detail a number of ways that use of drones in any South China Sea conflict would differ starkly from current practices, most notably in the war in Ukraine, often called the first full-scale drone war. Since Russia invaded Ukraine in 2022, drones have been aiding in what military experts describe as the first three steps of the "kill chain"--finding, targeting, and tracking a target--as well as in delivering explosives. The drones have a short life span, since they are often shot down or made useless by frequency jamming devices that prevent pilots from controlling them. Quadcopters--the commercially available drones often used in the war--last just three flights on average, according to the report. Drones like these would be far less useful in a possible invasion of Taiwan.


Autonomous Decision Making for Air Taxi Networks

arXiv.org Artificial Intelligence

Future urban air mobility systems are expected to be operated by rideshare companies as fleets, which will require fully autonomous air traffic control systems and an order of magnitude increase in airspace capacity. Such a system must not only be safe, but also highly responsive to customer demand. This paper proposes the air traffic network problem (ATNP), which models the optimization problem of future cooperative air taxi networks. We propose a three-phase decision making model that efficiently assigns vehicles to passengers, determines flight levels to reduce collision risk, and resolves aircraft conflicts by selectively applying Monte Carlo tree search. We develop a simulator for the ATNP and show that our approach has increased safety and reduced passenger waiting time compared to greedy and first-dispatch protocols over potential vertiport layouts across the Bay Area and New York City.


A Hopfieldian View-based Interpretation for Chain-of-Thought Reasoning

arXiv.org Artificial Intelligence

Chain-of-Thought (CoT) holds a significant place in augmenting the reasoning performance for large language models (LLMs). While some studies focus on improving CoT accuracy through methods like retrieval enhancement, yet a rigorous explanation for why CoT achieves such success remains unclear. In this paper, we analyze CoT methods under two different settings by asking the following questions: (1) For zero-shot CoT, why does prompting the model with "let's think step by step" significantly impact its outputs? (2) For few-shot CoT, why does providing examples before questioning the model could substantially improve its reasoning ability? To answer these questions, we conduct a top-down explainable analysis from the Hopfieldian view and propose a Read-and-Control approach for controlling the accuracy of CoT. Through extensive experiments on seven datasets for three different tasks, we demonstrate that our framework can decipher the inner workings of CoT, provide reasoning error localization, and control to come up with the correct reasoning path.


TSI-Bench: Benchmarking Time Series Imputation

arXiv.org Artificial Intelligence

Effective imputation is a crucial preprocessing step for time series analysis. Despite the development of numerous deep learning algorithms for time series imputation, the community lacks standardized and comprehensive benchmark platforms to effectively evaluate imputation performance across different settings. Moreover, although many deep learning forecasting algorithms have demonstrated excellent performance, whether their modeling achievements can be transferred to time series imputation tasks remains unexplored. To bridge these gaps, we develop TSI-Bench, the first (to our knowledge) comprehensive benchmark suite for time series imputation utilizing deep learning techniques. The TSI-Bench pipeline standardizes experimental settings to enable fair evaluation of imputation algorithms and identification of meaningful insights into the influence of domain-appropriate missingness ratios and patterns on model performance. Furthermore, TSI-Bench innovatively provides a systematic paradigm to tailor time series forecasting algorithms for imputation purposes. Our extensive study across 34,804 experiments, 28 algorithms, and 8 datasets with diverse missingness scenarios demonstrates TSI-Bench's effectiveness in diverse downstream tasks and potential to unlock future directions in time series imputation research and analysis.