Goto

Collaborating Authors

 Rule-Based Reasoning


Generating Global and Local Explanations for Tree-Ensemble Learning Methods by Answer Set Programming

arXiv.org Artificial Intelligence

We propose a method for generating rule sets as global and local explanations for tree-ensemble learning methods using Answer Set Programming (ASP). To this end, we adopt a decompositional approach where the split structures of the base decision trees are exploited in the construction of rules, which in turn are assessed using pattern mining methods encoded in ASP to extract explanatory rules. For global explanations, candidate rules are chosen from the entire trained tree-ensemble models, whereas for local explanations, candidate rules are selected by only considering rules that are relevant to the particular predicted instance. We show how user-defined constraints and preferences can be represented declaratively in ASP to allow for transparent and flexible rule set generation, and how rules can be used as explanations to help the user better understand the models. Experimental evaluation with real-world datasets and popular tree-ensemble algorithms demonstrates that our approach is applicable to a wide range of classification tasks.


Improving Legal Entity Recognition Using a Hybrid Transformer Model and Semantic Filtering Approach

arXiv.org Artificial Intelligence

Legal Entity Recognition (LER) involves identifying key entities such as parties, dates, monetary amounts, and legal provisions from legal documents. Automating this process is crucial for improving efficiency in legal workflows, including contract review, compliance monitoring, and litigation support. Traditional Named Entity Recognition (NER) methods, such as rule-based systems and classical machine learning models like Conditional Random Fields (CRFs), require extensive feature engineering and struggle to adapt to new legal terminologies. Transformer-based models, particularly BERT [1], have shown great promise in various NLP tasks, including LER. **Legal-BERT**, a finetuned variant of BERT for legal texts, has demonstrated superior performance


Dual-AEB: Synergizing Rule-Based and Multimodal Large Language Models for Effective Emergency Braking

arXiv.org Artificial Intelligence

Abstract-- Automatic Emergency Braking (AEB) systems are a crucial component in ensuring the safety of passengers in autonomous vehicles. Through extensive experimentation, we have validated the effectiveness of our method. The Autonomous Emergency Braking (AEB) system is a critical safety feature in autonomous vehicles, designed to information, making it impossible to predict an impending mitigate or prevent collisions by automatically activating the collision. Similarly, while end-to-end methods process raw brakes when a potential collision is detected [1]. Numerous sensory data, they often lack the reasoning capacity to studies [1]-[5] have demonstrated the effectiveness of AEB interpret indirect cues--such as the illuminated brake lights systems, with reductions in rear-end collisions ranging from on the vehicle to the left of the ego vehicle--that may 25% to 50%.


Learning Compositional Rules via Neural Program Synthesis

Neural Information Processing Systems

Many aspects of human reasoning, including language, require learning rules from very little data. Humans can do this, often learning systematic rules from very few examples, and combining these rules to form compositional rule-based systems. Current neural architectures, on the other hand, often fail to generalize in a compositional manner, especially when evaluated in ways that vary systematically from training. In this work, we present a neuro-symbolic model which learns entire rule systems from a small set of examples. Instead of directly predicting outputs from inputs, we train our model to induce the explicit system of rules governing a set of previously seen examples, drawing upon techniques from the neural program synthesis literature.


Probabilistic Logic Neural Networks for Reasoning

Neural Information Processing Systems

Knowledge graph reasoning, which aims at predicting missing facts through reasoning with observed facts, is critical for many applications. Such a problem has been widely explored by traditional logic rule-based approaches and recent knowledge graph embedding methods. A principled logic rule-based approach is the Markov Logic Network (MLN), which is able to leverage domain knowledge with first-order logic and meanwhile handle uncertainty. However, the inference in MLNs is usually very difficult due to the complicated graph structures. TransE, DistMult) learn effective entity and relation embeddings for reasoning, which are much more effective and efficient. However, they are unable to leverage domain knowledge.


DRUM: End-To-End Differentiable Rule Mining On Knowledge Graphs

Neural Information Processing Systems

In this paper, we study the problem of learning probabilistic logical rules for inductive and interpretable link prediction. Despite the importance of inductive link prediction, most previous works focused on transductive link prediction and cannot manage previously unseen entities. Moreover, they are black-box models that are not easily explainable for humans. We propose DRUM, a scalable and differentiable approach for mining first-order logical rules from knowledge graphs that resolves these problems. We motivate our method by making a connection between learning confidence scores for each rule and low-rank tensor approximation. DRUM uses bidirectional RNNs to share useful information across the tasks of learning rules for different relations.


Learning Macroscopic Brain Connectomes via Group-Sparse Factorization

Neural Information Processing Systems

Mapping structural brain connectomes for living human brains typically requires expert analysis and rule-based models on diffusion-weighted magnetic resonance imaging. A data-driven approach, however, could overcome limitations in such rule-based approaches and improve precision mappings for individuals. In this work, we explore a framework that facilitates applying learning algorithms to automatically extract brain connectomes. Using a tensor encoding, we design an objective with a group-regularizer that prefers biologically plausible fascicle structure. We show that the objective is convex and has unique solutions, ensuring identifiable connectomes for an individual.


Intelligence at the Edge of Chaos

arXiv.org Artificial Intelligence

We explore the emergence of intelligent behavior in artificial systems by investigating how the complexity of rule-based systems influences the capabilities of models trained to predict these rules. Our study focuses on elementary cellular automata (ECA), simple yet powerful one-dimensional systems that generate behaviors ranging from trivial to highly complex. By training distinct Large Language Models (LLMs) on different ECAs, we evaluated the relationship between the complexity of the rules' behavior and the intelligence exhibited by the LLMs, as reflected in their performance on downstream tasks. Our findings reveal that rules with higher complexity lead to models exhibiting greater intelligence, as demonstrated by their performance on reasoning and chess move prediction tasks. Both uniform and periodic systems, and often also highly chaotic systems, resulted in poorer downstream performance, highlighting a sweet spot of complexity conducive to intelligence. We conjecture that intelligence arises from the ability to predict complexity and that creating intelligence may require only exposure to complexity.


Reviews: Multi-value Rule Sets for Interpretable Classification with Feature-Efficient Representations

Neural Information Processing Systems

The paper proposes learning sets of decision rules that can express the disjunction of feature values in atoms of the rules, for example, IF color yellow OR red, THEN stop. The emphasis is on interpretability, and the paper argues that these multi-value rules are more interpretable than similarly trained decision sets that do not support multi-value rules. Following prior work, the paper proposes placing a prior distribution over the parameters of the decision set, such as the number of rules and the maximum number of atoms in each rule. The paper derives bounds on the resulting distribution to accelerate a simulated annealing learning algorithm. Experiments show that multi-value rule sets are as accurate as other classifiers proposed as interpretable model classes, such as Bayesian rule sets on benchmark decision problems.


SWEb: A Large Web Dataset for the Scandinavian Languages

arXiv.org Artificial Intelligence

This paper presents the hitherto largest pretraining dataset for the Scandinavian languages: the Scandinavian WEb (SWEb), comprising over one trillion tokens. The paper details the collection and processing pipeline, and introduces a novel model-based text extractor that significantly reduces complexity in comparison with rule-based approaches. We also introduce a new cloze-style benchmark for evaluating language models in Swedish, and use this test to compare models trained on the SWEb data to models trained on FineWeb, with competitive results. All data, models and code are shared openly. Large language models have made significant strides in recent years due to their general capabilities in language-processing tasks. This progress has been largely driven by the development of extensive and high-quality pretraining datasets sourced from open web data (Wenzek et al., 2020; Brown et al., 2020; Abadji et al., 2022; Penedo et al., 2023; 2024). However, the majority of research aimed at improving pretraining data focuses on high-resource languages such as English. Our goal is to create a large-scale and high-performing open pretraining dataset specifically for the Scandinavian (north-germanic) languages: Swedish, Danish, Norwegian, and Icelandic. Existing large-scale datasets for these languages primarily include mC4 (Xue et al., 2021), OSCAR (Abadji et al., 2022), and HPLT Datasets 1.2 (de Gibert et al., 2024). The Scandinavian portion of mC4 comprises approximately 100B tokens, 10B tokens for OSCAR 23.01, and 35B tokens for HPLT, which are all relatively small numbers considering that state-of-the-art large language models today are trained on trillions of high-quality tokens.