Goto

Collaborating Authors

 Rule-Based Reasoning


Combating Insurance Fraud With Machine Learning

#artificialintelligence

Most insurance companies depend on human expertise and business rules-based software to protect themselves from fraud. And the drive for digital transformation and process automation means data and scenarios change faster than you can update the rules. Machine learning has the potential to allow insurers to move from the current state of "detect and react" to "predict and prevent." It excels at automating the process of taking large volumes of data, analysing multiple fraud indicators in parallel โ€“ which taken individually may often be quite normal โ€“ and finding potential fraud. Generally, there are two ways to teach or train a machine learning algorithm, which depend on the available data: supervised and unsupervised learning.


Automated Generation of Test Models from Semi-Structured Requirements

arXiv.org Machine Learning

[Context:] Model-based testing is an instrument for automated generation of test cases. It requires identifying requirements in documents, understanding them syntactically and semantically, and then translating them into a test model. One light-weight language for these test models are Cause-Effect-Graphs (CEG) that can be used to derive test cases. [Problem:] The creation of test models is laborious and we lack an automated solution that covers the entire process from requirement detection to test model creation. In addition, the majority of requirements is expressed in natural language (NL), which is hard to translate to test models automatically. [Principal Idea:] We build on the fact that not all NL requirements are equally unstructured. We found that 14 % of the lines in requirements documents of our industry partner contain "pseudo-code"-like descriptions of business rules. We apply Machine Learning to identify such semi-structured requirements descriptions and propose a rule-based approach for their translation into CEGs. [Contribution:] We make three contributions: (1) an algorithm for the automatic detection of semi-structured requirements descriptions in documents, (2) an algorithm for the automatic translation of the identified requirements into a CEG and (3) a study demonstrating that our proposed solution leads to 86 % time savings for test model creation without loss of quality.


Compliance Change Tracking in Business Process Services

arXiv.org Machine Learning

--Regulatory compliance is an organization's adherence to laws, regulations, guidelines and specifications relevant to its business. Compliance officers responsible for maintaining adherence constantly struggle to keep up with the large amount of changes in regulatory requirements. Keeping up with the changes entail two main tasks: fetching the regulatory announcements that actually contain changes of interest, and incorporating those changes in the business process. In this paper we focus on the first task, and present a Compliance Change Tracking System, that gathers regulatory announcements from government sites, news sites, email subscriptions; classifies their importance i.e Actionability through a hierarchical classifier, and business process applicability through a multi-class classifier . Na ฤฑve Bayes, logistic regression etc.), hierarchical classification method, rule based approach, hybrid approach with various preprocessing and feature selection methods; and show that despite the richness of other models, a simple hierarchical classification with bag-of-words features works the best for Actionability classifier and multi-class logistic regression works the best for Applicability classifier . The system has been deployed in global delivery centers, and has received positive feedback from payroll compliance officers. Organizations are faced with rapidly changing regulatory policies, and ever-growing number of regulations.


Efficient Discovery of Expressive Multi-label Rules using Relaxed Pruning

arXiv.org Artificial Intelligence

Being able to model correlations between labels is considered crucial in multi-label classification. Rule-based models enable to expose such dependencies, e.g., implications, subsumptions, or exclusions, in an interpretable and human-comprehensible manner. Albeit the number of possible label combinations increases exponentially with the number of available labels, it has been shown that rules with multiple labels in their heads, which are a natural form to model local label dependencies, can be induced efficiently by exploiting certain properties of rule evaluation measures and pruning the label search space accordingly. However, experiments have revealed that multi-label heads are unlikely to be learned by existing methods due to their restrictiveness. To overcome this limitation, we propose a plug-in approach that relaxes the search space pruning used by existing methods in order to introduce a bias towards larger multi-label heads resulting in more expressive rules. We further demonstrate the effectiveness of our approach empirically and show that it does not come with drawbacks in terms of training time or predictive performance.


Are You for Real? Detecting Identity Fraud via Dialogue Interactions

arXiv.org Artificial Intelligence

Identity fraud detection is of great importance in many real-world scenarios such as the financial industry. However, few studies addressed this problem before. In this paper, we focus on identity fraud detection in loan applications and propose to solve this problem with a novel interactive dialogue system which consists of two modules. One is the knowledge graph (KG) constructor organizing the personal information for each loan applicant. The other is structured dialogue management that can dynamically generate a series of questions based on the personal KG to ask the applicants and determine their identity states. We also present a heuristic user simulator based on problem analysis to evaluate our method. Experiments have shown that the trainable dialogue system can effectively detect fraudsters, and achieve higher recognition accuracy compared with rule-based systems. Furthermore, our learned dialogue strategies are interpretable and flexible, which can help promote real-world applications.


ED2: Two-stage Active Learning for Error Detection -- Technical Report

arXiv.org Machine Learning

Traditional error detection approaches require user-defined parameters and rules. Thus, the user has to know both the error detection system and the data. However, we can also formulate error detection as a semi-supervised classification problem that only requires domain expertise. The challenges for such an approach are twofold: (1) to represent the data in a way that enables a classification model to identify various kinds of data errors, and (2) to pick the most promising data values for learning. In this paper, we address these challenges with ED2, our new example-driven error detection method. First, we present a new two-dimensional multi-classifier sampling strategy for active learning. Second, we propose novel multi-column features. The combined application of these techniques provides fast convergence of the classification task with high detection accuracy. On several real-world datasets, ED2 requires, on average, less than 1% labels to outperform existing error detection approaches. This report extends the peer-reviewed paper "ED2: A Case for Active Learning in Error Detection". All source code related to this project is available on GitHub.


Artificial Intelligence and Machine Learning Empowering Business Growth for Entrepreneurs Today

#artificialintelligence

In 2018 alone, AI-related startups and ventures have secured $9.3 billion in VC funding, according to a report from PwC and CB Insights. The Indian unicorns PayTm, Swiggy, and Oyo have been investing resources to gain AI capabilities and have acquired at least one AI company. In 2018, VCs have funded Indian AI startups with USD 478.38 million in 111 funding rounds. One of the reasons why AI & machine learning-based technologies on the rise are that the competitive advantages one can develop using them, especially with the customer experience and cost optimization. In e-commerce, for example, understanding consumer behaviour and product demand and making the right offer at the right time can be the difference between winning or losing over the competition. AI in the early stages was mostly based on rule-based systems, whose ability to deliver value is limited by how well the rules are defined, which requires human expertise.


Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review

arXiv.org Artificial Intelligence

Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using ICD-10. The majority of studies focused on diseases of the circulatory system (n=38) while endocrine and metabolic diseases were fewest (n=14). This was due to the structure of clinical records related to metabolic diseases, which typically contain much more structured data, compared with medical records for diseases of the circulatory system, which focus more on unstructured data and consequently have seen a stronger focus of NLP. The review has shown that there is a significant increase in the use of machine learning methods compared to rule-based approaches; however, deep learning methods remain emergent (n=3). Consequently, the majority of works focus on classification of disease phenotype with only a handful of papers addressing extraction of comorbidities from the free text or integration of clinical notes with structured data. There is a notable use of relatively simple methods, such as shallow classifiers (or combination with rule-based methods), due to the interpretability of predictions, which still represents a significant issue for more complex methods. Finally, scarcity of publicly available data may also have contributed to insufficient development of more advanced methods, such as extraction of word embeddings from clinical notes. Further efforts are still required to improve (1) progression of clinical NLP methods from extraction toward understanding; (2) recognition of relations among entities rather than entities in isolation; (3) temporal extraction to understand past, current, and future clinical events; (4) exploitation of alternative sources of clinical knowledge; and (5) availability of large-scale, de-identified clinical corpora.


Reasoning-Driven Question-Answering for Natural Language Understanding

arXiv.org Artificial Intelligence

Natural language understanding (NLU) of text is a fundamental challenge in AI, and it has received significant attention throughout the history of NLP research. This primary goal has been studied under different tasks, such as Question Answering (QA) and Textual Entailment (TE). In this thesis, we investigate the NLU problem through the QA task and focus on the aspects that make it a challenge for the current state-of-the-art technology. This thesis is organized into three main parts: In the first part, we explore multiple formalisms to improve existing machine comprehension systems. We propose a formulation for abductive reasoning in natural language and show its effectiveness, especially in domains with limited training data. Additionally, to help reasoning systems cope with irrelevant or redundant information, we create a supervised approach to learn and detect the essential terms in questions. In the second part, we propose two new challenge datasets. In particular, we create two datasets of natural language questions where (i) the first one requires reasoning over multiple sentences; (ii) the second one requires temporal common sense reasoning. We hope that the two proposed datasets will motivate the field to address more complex problems. In the final part, we present the first formal framework for multi-step reasoning algorithms, in the presence of a few important properties of language use, such as incompleteness, ambiguity, etc. We apply this framework to prove fundamental limitations for reasoning algorithms. These theoretical results provide extra intuition into the existing empirical evidence in the field.


LoRMIkA: Local Rule-based Model Interpretability with k-optimal Associations

arXiv.org Artificial Intelligence

--As we rely more and more on machine learning models for real-life decision-making, being able to understand and trust the predictions becomes ever more important. Local explainer models have recently been introduced to explain the predictions of complex machine learning models at the instance level. In this paper, we propose Local Rule-based Model In-terpretability with k-optimal Associations (LoRMIkA), a novel model-agnostic approach that obtains k-optimal association rules from a neighbourhood of the instance to be explained. Compared to other rule-based approaches in the literature, we argue that the most predictive rules are not necessarily the rules that provide the best explanations. Consequently, the LoRMIkA framework provides a flexible way to obtain predictive and interesting rules. It uses an efficient search algorithm guaranteed to find the k-optimal rules with respect to objectives such as strength, lift, leverage, coverage, and support. It also provides multiple rules which explain the decision and counterfactual rules, which give indications for potential changes to obtain different outputs for given instances. We compare our approach to other state-of-the-art approaches in local model interpretability on three different datasets, and achieve competitive results in terms of local accuracy and interpretability. Explainability of machine learning models is becoming ever more important; for example, the European General Data Protection Regulation from 2018 contains a right to explanation concept for any decision provided by predictive models. Recently, model-agnostic interpretability methods have been introduced that offer such explanations for predictions.