Rule-Based Reasoning
AutoFR: Automated Filter Rule Generation for Adblocking
Le, Hieu, Elmalaki, Salma, Markopoulou, Athina, Shafiq, Zubair
Adblocking relies on filter lists, which are manually curated and maintained by a community of filter list authors. Filter list curation is a laborious process that does not scale well to a large number of sites or over time. In this paper, we introduce AutoFR, a reinforcement learning framework to fully automate the process of filter rule creation and evaluation for sites of interest. We design an algorithm based on multi-arm bandits to generate filter rules that block ads while controlling the trade-off between blocking ads and avoiding visual breakage. We test AutoFR on thousands of sites and we show that it is efficient: it takes only a few minutes to generate filter rules for a site of interest. AutoFR is effective: it generates filter rules that can block 86% of the ads, as compared to 87% by EasyList, while achieving comparable visual breakage. Furthermore, AutoFR generates filter rules that generalize well to new sites. We envision that AutoFR can assist the adblocking community in filter rule generation at scale.
Neural Compositional Rule Learning for Knowledge Graph Reasoning
Cheng, Kewei, Ahmed, Nesreen K., Sun, Yizhou
Learning logical rules is critical to improving reasoning in KGs. This is due to their ability to provide logical and interpretable explanations when used for predictions, as well as their ability to generalize to other tasks, domains, and data. While recent methods have been proposed to learn logical rules, the majority of these methods are either restricted by their computational complexity and cannot handle the large search space of large-scale KGs, or show poor generalization when exposed to data outside the training set. In this paper, we propose an endto-end neural model for learning compositional logical rules called NCRL. By recurrently merging compositions in the rule body with a recurrent attention unit, NCRL finally predicts a single rule head. Experimental results show that NCRL learns high-quality rules, as well as being generalizable. Specifically, we show that NCRL is scalable, efficient, and yields state-of-the-art results for knowledge graph completion on large-scale KGs. Moreover, we test NCRL for systematic generalization by learning to reason on small-scale observed graphs and evaluating on larger unseen ones. Knowledge Graphs (KGs) provide a structured representation of real-world facts (Ji et al., 2021), and they are remarkably useful in various applications (Graupmann et al., 2005; Lukovnikov et al., 2017; Xiong et al., 2017; Yih et al., 2015). Since KGs are usually incomplete, KG reasoning is a crucial problem in KGs, where the goal is to infer the missing knowledge using the observed facts. This paper investigates how to learn logical rules for KG reasoning.
A beginner's guide to machine learning: What it is and is it AI?
So, without further ado, let's get stuck in. Machine learning is a type of artificial intelligence that involves developing algorithms and models that can learn from data and then use what they've learned to make predictions or decisions. It aims to make it possible for computers to improve at a task over time without being told how to do so. In traditional programming, a programmer writes rules or instructions telling the computer how to solve a problem. In machine learning, on the other hand, the computer is fed data and learns to recognize patterns and relationships within that data to make predictions or decisions. This data-driven learning process is called "training" and is a machine learning model.
RweetMiner: Automatic identification and categorization of help requests on twitter during disasters
Ullah, Irfan, Khan, Sharifullah, Imran, Muhammad, Lee, Young-Koo
Catastrophic events create uncertain situations for humanitarian organizations locating and providing aid to affected people. Many people turn to social media during disasters for requesting help and/or providing relief to others. However, the majority of social media posts seeking help could not properly be detected and remained concealed because often they are noisy and ill-formed. Existing systems lack in planning an effective strategy for tweet preprocessing and grasping the contexts of tweets. This research, first of all, formally defines request tweets in the context of social networking sites, hereafter rweets, along with their different primary types and sub-types. Our main contributions are the identification and categorization of rweets. For rweet identification, we employ two approaches, namely a rule-based and logistic regression, and show their high precision and F1 scores. The rweets classification into sub-types such as medical, food, and shelter, using logistic regression shows promising results and outperforms existing works. Finally, we introduce an architecture to store intermediate data to accelerate the development process of the machine learning classifiers.
PyReason: Software for Open World Temporal Logic
Aditya, Dyuman, Mukherji, Kaustuv, Balasubramanian, Srikar, Chaudhary, Abhiraj, Shakarian, Paulo
The growing popularity of neuro symbolic reasoning has led to the adoption of various forms of differentiable (i.e., fuzzy) first order logic. We introduce PyReason, a software framework based on generalized annotated logic that both captures the current cohort of differentiable logics and temporal extensions to support inference over finite periods of time with capabilities for open world reasoning. Further, PyReason is implemented to directly support reasoning over graphical structures (e.g., knowledge graphs, social networks, biological networks, etc.), produces fully explainable traces of inference, and includes various practical features such as type checking and a memory-efficient implementation. This paper reviews various extensions of generalized annotated logic integrated into our implementation, our modern, efficient Python-based implementation that conducts exact yet scalable deductive inference, and a suite of experiments. PyReason is available at: github.com/lab-v2/pyreason.
Identifying roadway departure crash patterns on rural two-lane highways under different lighting conditions: association knowledge using data mining approach
Hossain, Ahmed, Sun, Xiaoduan, Islam, Shahrin, Alam, Shah, Hossain, Md Mahmud
More than half of all fatalities on U.S. highways occur due to roadway departure (RwD) each year. Previous research has explored various risk factors that contribute to RwD crashes, however, a comprehensive investigation considering the effect of lighting conditions has been insufficiently addressed. Using the Louisiana Department of Transportation and Development crash database, fatal and injury RwD crashes occurring on rural two-lane (R2L) highways between 2008-2017 were analyzed based on daylight and dark (with/without streetlight). This research employed a safe system approach to explore meaningful complex interactions among multidimensional crash risk factors. To accomplish this, an unsupervised data mining algorithm association rules mining (ARM) was utilized. Based on the generated rules, the findings reveal several interesting crash patterns in the daylight, dark-with-streetlight, and dark-no-streetlight, emphasizing the importance of investigating RwD crash patterns depending on the lighting conditions. In daylight, fatal RwD crashes are associated with cloudy weather conditions, distracted drivers, standing water on the roadway, no seat belt use, and construction zones. In dark lighting conditions (with/without streetlight), the majority of the RwD crashes are associated with alcohol/drug involvement, young drivers (15-24 years), driver condition (e.g., inattentive, distracted, illness/fatigued/asleep) and colliding with animal (s). The findings reveal how certain driver behavior patterns are connected to RwD crashes, such as a strong association between alcohol/drug intoxication and no seat belt usage in the dark-no-streetlight condition. Based on the identified crash patterns and behavioral characteristics under different lighting conditions, the findings could aid researchers and safety specialists in developing the most effective RwD crash mitigation strategies.
Automatic Heteronym Resolution Pipeline Using RAD-TTS Aligners
Huang, Jocelyn, Bakhturina, Evelina, Tatanov, Oktai
Grapheme-to-phoneme (G2P) transduction is part of the standard text-to-speech (TTS) pipeline. However, G2P conversion is difficult for languages that contain heteronyms -- words that have one spelling but can be pronounced in multiple ways. G2P datasets with annotated heteronyms are limited in size and expensive to create, as human labeling remains the primary method for heteronym disambiguation. We propose a RAD-TTS Aligner-based pipeline to automatically disambiguate heteronyms in datasets that contain both audio with text transcripts. The best pronunciation can be chosen by generating all possible candidates for each heteronym and scoring them with an Aligner model. The resulting labels can be used to create training datasets for use in both multi-stage and end-to-end G2P systems.
A Grammar for the Representation of Unmanned Aerial Vehicles with 3D Topologies
Mallozzi, Piergiuseppe, Sibai, Hussein, Incer, Inigo, Seshia, Sanjit A., Sangiovanni-Vincentelli, Alberto
We propose a context-sensitive grammar for the systematic exploration of the design space of the topology of 3D robots, particularly unmanned aerial vehicles. It defines production rules for adding components to an incomplete design topology modeled over a 3D grid. The rules are local. The grammar is simple, yet capable of modeling most existing UAVs as well as novel ones. It can be easily generalized to other robotic platforms. It can be thought of as a building block for any design exploration and optimization algorithm.
A Sea of Words: An In-Depth Analysis of Anchors for Text Data
Lopardo, Gianluigi, Precioso, Frederic, Garreau, Damien
Anchors (Ribeiro et al., 2018) is a post-hoc, rule-based interpretability method. For text data, it proposes to explain a decision by highlighting a small set of words (an anchor) such that the model to explain has similar outputs when they are present in a document. In this paper, we present the first theoretical analysis of Anchors, considering that the search for the best anchor is exhaustive. After formalizing the algorithm for text classification, we present explicit results on different classes of models when the vectorization step is TF-IDF, and words are replaced by a fixed out-of-dictionary token when removed. Our inquiry covers models such as elementary if-then rules and linear classifiers. We then leverage this analysis to gain insights on the behavior of Anchors for any differentiable classifiers. For neural networks, we empirically show that the words corresponding to the highest partial derivatives of the model with respect to the input, reweighted by the inverse document frequencies, are selected by Anchors.
A comprehensive review of visualization methods for association rule mining: Taxonomy, Challenges, Open problems and Future ideas
Fister, Iztok Jr., Fister, Iztok, Fister, Dušan, Podgorelec, Vili, Salcedo-Sanz, Sancho
Association rule mining is intended for searching for the relationships between attributes in transaction databases. The whole process of rule discovery is very complex, and involves pre-processing techniques, a rule mining step, and post-processing, in which visualization is carried out. Visualization of discovered association rules is an essential step within the whole association rule mining pipeline, to enhance the understanding of users on the results of rule mining. Several association rule mining and visualization methods have been developed during the past decades. This review paper aims to create a literature review, identify the main techniques published in peer-reviewed literature, examine each method's main features, and present the main applications in the field. Defining the future steps of this research area is another goal of this review paper.