Goto

Collaborating Authors

 South America


Leveraging LLMs for Hypothetical Deduction in Logical Inference: A Neuro-Symbolic Approach

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have exhibited remarkable potential across a wide array of reasoning tasks, including logical reasoning. Although massive efforts have been made to empower the logical reasoning ability of LLMs via external logical symbolic solvers, crucial challenges of the poor generalization ability to questions with different features and inevitable question information loss of symbolic solver-driven approaches remain unresolved. To mitigate these issues, we introduce LINA, a LLM-driven neuro-symbolic approach for faithful logical reasoning. By enabling an LLM to autonomously perform the transition from propositional logic extraction to sophisticated logical reasoning, LINA not only bolsters the resilience of the reasoning process but also eliminates the dependency on external solvers. Additionally, through its adoption of a hypothetical-deductive reasoning paradigm, LINA effectively circumvents the expansive search space challenge that plagues traditional forward reasoning methods. Empirical evaluations demonstrate that LINA substantially outperforms both established propositional logic frameworks and conventional prompting techniques across a spectrum of five logical reasoning tasks. Specifically, LINA achieves an improvement of 24.34% over LINC on the FOLIO dataset, while also surpassing prompting strategies like CoT and CoT-SC by up to 24.02%. Our code is available at https://github.com/wufeiwuwoshihua/nshy.


Paved or unpaved? A Deep Learning derived Road Surface Global Dataset from Mapillary Street-View Imagery

arXiv.org Artificial Intelligence

We have released an open dataset with global coverage on road surface characteristics (paved or unpaved) derived utilising 105 million images from the world's largest crowdsourcing-based street view platform, Mapillary, leveraging state-of-the-art geospatial AI methods. We propose a hybrid deep learning approach which combines SWIN-Transformer based road surface prediction and CLIP-and-DL segmentation based thresholding for filtering of bad quality images. The road surface prediction results have been matched and integrated with OpenStreetMap (OSM) road geometries. This study provides global data insights derived from maps and statistics about spatial distribution of Mapillary coverage and road pavedness on a continent and countries scale, with rural and urban distinction. This dataset expands the availability of global road surface information by over 3 million kilometers, now representing approximately 36% of the total length of the global road network. Most regions showed moderate to high paved road coverage (60-80%), but significant gaps were noted in specific areas of Africa and Asia. Urban areas tend to have near-complete paved coverage, while rural regions display more variability. Model validation against OSM surface data achieved strong performance, with F1 scores for paved roads between 91-97% across continents. Taking forward the work of Mapillary and their contributors and enrichment of OSM road attributes, our work provides valuable insights for applications in urban planning, disaster routing, logistics optimisation and addresses various Sustainable Development Goals (SDGS): especially SDGs 1 (No poverty), 3 (Good health and well-being), 8 (Decent work and economic growth), 9 (Industry, Innovation and Infrastructure), 11 (Sustainable cities and communities), 12 (Responsible consumption and production), and 13 (Climate action).


Can Knowledge Editing Really Correct Hallucinations?

arXiv.org Artificial Intelligence

Large Language Models (LLMs) suffer from hallucinations, referring to the nonfactual information in generated content, despite their superior capacities across tasks. Meanwhile, knowledge editing has been developed as a new popular paradigm to correct the erroneous factual knowledge encoded in LLMs with the advantage of avoiding retraining from scratch. However, one common issue of existing evaluation datasets for knowledge editing is that they do not ensure LLMs actually generate hallucinated answers to the evaluation questions before editing. When LLMs are evaluated on such datasets after being edited by different techniques, it is hard to directly adopt the performance to assess the effectiveness of different knowledge editing methods in correcting hallucinations. Thus, the fundamental question remains insufficiently validated: Can knowledge editing really correct hallucinations in LLMs? We proposed HalluEditBench to holistically benchmark knowledge editing methods in correcting real-world hallucinations. First, we rigorously construct a massive hallucination dataset with 9 domains, 26 topics and more than 6, 000 hallucinations. Then, we assess the performance of knowledge editing methods in a holistic way on five dimensions including Efficacy, Generalization, Portability, Locality, and Robustness. Through HalluEditBench, we have provided new insights into the potentials and limitations of different knowledge editing methods in correcting hallucinations, which could inspire future improvements and facilitate the progress in the field of knowledge editing. Considering Table 1: Performance measured by Accuracy (%) the high cost of retraining LLMs from scratch, of Llama2-7B before editing ("Pre-edit") and after knowledge editing has been designed as a new applying typical knowledge editing methods ("Postedit") paradigm to correct erroneous or outdated factual on common existing evaluation datasets. When such datasets are adopted to evaluate the performance of LLMs after being edited, it is hard to directly use the scores to judge the effectiveness of different knowledge editing techniques in correcting hallucinations, which is the motivation of applying knowledge editing to LLMs. To better illustrate this point, following the evaluation setting in (Zhang et al., 2024e), we conducted a preliminary study to examine the pre-edit and post-edit performances of Llama2-7B on the aforementioned Who is the Chief Scientist of OpenAI? Who is the Chief Scientist of OpenAI? Who is the Chief Scientist of OpenAI?


NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models

arXiv.org Artificial Intelligence

Hallucinations in Large Language Models (LLMs) remain a major obstacle, particularly in high-stakes applications where factual accuracy is critical. While representation editing and reading methods have made strides in reducing hallucinations, their heavy reliance on specialised tools and training on in-domain samples, makes them difficult to scale and prone to overfitting. This limits their accuracy gains and generalizability to diverse datasets. This paper presents a lightweight method, Norm Voting (NoVo), which harnesses the untapped potential of attention head norms to dramatically enhance factual accuracy in zero-shot multiple-choice questions (MCQs). NoVo begins by automatically selecting truth-correlated head norms with an efficient, inference-only algorithm using only 30 random samples, allowing NoVo to effortlessly scale to diverse datasets. Afterwards, selected head norms are employed in a simple voting algorithm, which yields significant gains in prediction accuracy. NoVo demonstrates exceptional generalization to 20 diverse datasets, with significant gains in over 90% of them, far exceeding all current representation editing and reading methods. NoVo also reveals promising gains to finetuning strategies and building textual adversarial defence. NoVo's effectiveness with head norms opens new frontiers in LLM interpretability, robustness and reliability. One of the most significant challenges facing Large Language Models (LLMs) today is their tendency to hallucinate--outputs that are factually incorrect or entirely fabricated (Zhang et al., 2023b). This flaw is particularly serious in high-stakes applications like finance and healthcare, where even small errors can lead to huge losses and compromised patient safety (Kang & Liu, 2023; Pal et al., 2023). Reducing factual hallucinations is a critical research area with major practical benefits, essential for realising the full potential of LLMs to revolutionise these industries by enhancing efficiency and decision-making, and safeguarding against costly and harmful errors (Kaddour et al., 2023). Given these serious risks and the high cost of retraining LLMs, it is crucial to find affordable techniques to reduce factual hallucinations. Although inference techniques such as retrieval augmentation and prompt engineering work well, they come with significant limitations: latency and external dependencies, and the need for user expertise, respectively (Zhao et al., 2024; Sahoo et al., 2024). In response, we turn to representation editing and reading methods (REAR) (Zou et al., 2023), which operate within the model, ensuring rapid response times and eliminating the need for external data or user interaction.


EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

arXiv.org Artificial Intelligence

GPT-4o, an omni-modal model that enables vocal conversations with diverse emotions and tones, marks a milestone for omni-modal foundation models. However, empowering Large Language Models to perceive and generate images, texts, and speeches end-to-end with publicly available data remains challenging in the open-source community. Existing vision-language models rely on external tools for the speech processing, while speech-language models still suffer from limited or even without vision-understanding abilities. To address this gap, we propose EMOVA (EMotionally Omni-present Voice Assistant), to enable Large Language Models with end-to-end speech capabilities while maintaining the leading vision-language performance. With a semantic-acoustic disentangled speech tokenizer, we notice surprisingly that omni-modal alignment can further enhance vision-language and speech abilities compared with the corresponding bi-modal aligned counterparts. Moreover, a lightweight style module is proposed for flexible speech style controls (e.g., emotions and pitches). For the first time, EMOVA achieves state-of-the-art performance on both the vision-language and speech benchmarks, and meanwhile, supporting omni-modal spoken dialogue with vivid emotions.


Towards Safe Multilingual Frontier AI

arXiv.org Artificial Intelligence

Linguistically inclusive LLMs -- which maintain good performance regardless of the language with which they are prompted -- are necessary for the diffusion of AI benefits around the world. Multilingual jailbreaks that rely on language translation to evade safety measures undermine the safe and inclusive deployment of AI systems. We provide policy recommendations to enhance the multilingual capabilities of AI while mitigating the risks of multilingual jailbreaks. We examine how a language's level of resourcing relates to how vulnerable LLMs are to multilingual jailbreaks in that language. We do this by testing five advanced AI models across 24 official languages of the EU. Building on prior research, we propose policy actions that align with the EU legal landscape and institutional framework to address multilingual jailbreaks, while promoting linguistic inclusivity. These include mandatory assessments of multilingual capabilities and vulnerabilities, public opinion research, and state support for multilingual AI development. The measures aim to improve AI safety and functionality through EU policy initiatives, guiding the implementation of the EU AI Act and informing regulatory efforts of the European AI Office.


Fourier Head: Helping Large Language Models Learn Complex Probability Distributions

arXiv.org Machine Learning

As the quality of large language models has improved, there has been increased interest in using them to model non-linguistic tokens. For example, the Decision Transformer recasts agentic decision making as a sequence modeling problem, using a decoder-only LLM to model the distribution over the discrete action space for an Atari agent. However, when adapting LLMs to non-linguistic domains, it remains unclear if softmax over discrete bins captures the continuous structure of the tokens and the potentially complex distributions needed for high quality token generation. We introduce a neural network layer, constructed using Fourier series, which we can easily substitute for any linear layer if we want the outputs to have a more continuous structure. We perform extensive analysis on synthetic datasets, as well as on large-scale decision making and time series forecasting tasks. We also provide theoretical evidence that this layer can better learn signal from data while ignoring high-frequency noise. All of our results support the effectiveness of our proposed Fourier head in scenarios where the underlying data distribution has a natural continuous structure. For example, the Fourier head improves a Decision Transformer agent's returns by 46% on the Atari Seaquest game, and increases a state-of-the-art times series foundation model's forecasting performance by 3.5% across 20 benchmarks unseen during training.


Meta is reportedly developing a search engine for its chatbot

Engadget

Stung from the hit it took from an Apple privacy feature three years ago, Meta is reportedly looking to decrease its dependence on Google and Microsoft. The Information said on Monday that Meta is developing a search engine for its chatbot. The company also recently partnered with Reuters to help its AI answer news-related questions. Meta has reportedly been working on indexing the web for at least eight months. The company's goal is said to be to integrate the indexes into Meta AI, giving the chatbot an alternative to Google Search and Microsoft Bing.


Deep Recurrent Stochastic Configuration Networks for Modelling Nonlinear Dynamic Systems

arXiv.org Machine Learning

Deep learning techniques have shown promise in many domain applications. This paper proposes a novel deep reservoir computing framework, termed deep recurrent stochastic configuration network (DeepRSCN) for modelling nonlinear dynamic systems. DeepRSCNs are incrementally constructed, with all reservoir nodes directly linked to the final output. The random parameters are assigned in the light of a supervisory mechanism, ensuring the universal approximation property of the built model. The output weights are updated online using the projection algorithm to handle the unknown dynamics. Given a set of training samples, DeepRSCNs can quickly generate learning representations, which consist of random basis functions with cascaded input and readout weights. Experimental results over a time series prediction, a nonlinear system identification problem, and two industrial data predictive analyses demonstrate that the proposed DeepRSCN outperforms the single-layer network in terms of modelling efficiency, learning capability, and generalization performance.


Predicting sub-population specific viral evolution

arXiv.org Artificial Intelligence

Forecasting the change in the distribution of viral variants is crucial for therapeutic design and disease surveillance. This task poses significant modeling challenges due to the sharp differences in virus distributions across sub-populations (e.g., countries) and their dynamic interactions. Existing machine learning approaches that model the variant distribution as a whole are incapable of making location-specific predictions and ignore transmissions that shape the viral landscape. In this paper, we propose a sub-population specific protein evolution model, which predicts the time-resolved distributions of viral proteins in different locations. The algorithm explicitly models the transmission rates between sub-populations and learns their interdependence from data. The change in protein distributions across all sub-populations is defined through a linear ordinary differential equation (ODE) parametrized by transmission rates. Solving this ODE yields the likelihood of a given protein occurring in particular sub-populations. Multi-year evaluation on both SARS-CoV-2 and influenza A/H3N2 demonstrates that our model outperforms baselines in accurately predicting distributions of viral proteins across continents and countries. We also find that the transmission rates learned from data are consistent with the transmission pathways discovered by retrospective phylogenetic analysis.