Akoka
Domain Knowledge in Artificial Intelligence: Using Conceptual Modeling to Increase Machine Learning Accuracy and Explainability
Storey, V. C., Parsons, J., Castellanos, A., Tremblay, M., Lukyanenko, R., Maass, W., Castillo, A.
Machine learning enables the extraction of useful information from large, diverse datasets. However, despite many successful applications, machine learning continues to suffer from performance and transparency issues. These challenges can be partially attributed to the limited use of domain knowledge by machine learning models. This research proposes using the domain knowledge represented in conceptual models to improve the preparation of the data used to train machine learning models. We develop and demonstrate a method, called the Conceptual Modeling for Machine Learning (CMML), which is comprised of guidelines for data preparation in machine learning and based on conceptual modeling constructs and principles. To assess the impact of CMML on machine learning outcomes, we first applied it to two real-world problems to evaluate its impact on model performance. We then solicited an assessment by data scientists on the applicability of the method. These results demonstrate the value of CMML for improving machine learning outcomes.
CREDAL: Close Reading of Data Models
Fletcher, George, Nahurna, Olha, Prytula, Matvii, Stoyanovich, Julia
Data models are necessary for the birth of data and of any data-driven system. Indeed, every algorithm, every machine learning model, every statistical model, and every database has an underlying data model without which the system would not be usable. Hence, data models are excellent sites for interrogating the (material, social, political, ...) conditions giving rise to a data system. Towards this, drawing inspiration from literary criticism, we propose to closely read data models in the same spirit as we closely read literary artifacts. Close readings of data models reconnect us with, among other things, the materiality, the genealogies, the techne, the closed nature, and the design of technical systems. While recognizing from literary theory that there is no one correct way to read, it is nonetheless critical to have systematic guidance for those unfamiliar with close readings. This is especially true for those trained in the computing and data sciences, who too often are enculturated to set aside the socio-political aspects of data work. A systematic methodology for reading data models currently does not exist. To fill this gap, we present the CREDAL methodology for close readings of data models. We detail our iterative development process and present results of a qualitative evaluation of CREDAL demonstrating its usability, usefulness, and effectiveness in the critical study of data.
Artificial Intelligence (AI) in Legal Data Mining
Deroy, Aniket, Bailung, Naksatra Kumar, Ghosh, Kripabandhu, Ghosh, Saptarshi, Chakraborty, Abhijnan
Despite the availability of vast amounts of data, legal data is often unstructured, making it difficult even for law practitioners to ingest and comprehend the same. It is important to organise the legal information in a way that is useful for practitioners and downstream automation tasks. The word ontology was used by Greek philosophers to discuss concepts of existence, being, becoming and reality. Today, scientists use this term to describe the relation between concepts, data, and entities. A great example for a working ontology was developed by Dhani and Bhatt. This ontology deals with Indian court cases on intellectual property rights (IPR) The future of legal ontologies is likely to be handled by computer experts and legal experts alike.
Empowering remittance management in the digitised landscape: A real-time Data-Driven Decision Support with predictive abilities for financial transactions
Weerawarna, Rashikala, Miah, Shah J
The advent of Blockchain technology (BT) revolutionised the way remittance transactions are recorded. Banks and remittance organisations have shown a growing interest in exploring blockchain's potential advantages over traditional practices. This paper presents a data-driven predictive decision support approach as an innovative artefact designed for the blockchain-oriented remittance industry. Employing a theory-generating Design Science Research (DSR) approach, we have uncovered the emergence of predictive capabilities driven by transactional big data. The artefact integrates predictive analytics and Machine Learning (ML) to enable real-time remittance monitoring, empowering management decision-makers to address challenges in the uncertain digitised landscape of blockchain-oriented remittance companies. Bridging the gap between theory and practice, this research not only enhances the security of the remittance ecosystem but also lays the foundation for future predictive decision support solutions, extending the potential of predictive analytics to other domains. Additionally, the generated theory from the artifact's implementation enriches the DSR approach and fosters grounded and stakeholder theory development in the information systems domain.
Estimating the time-lapse between medical insurance reimbursement with non-parametric regression models
Akinyemi, Mary, Yinka-Banjo, Chika, Ugot, Ogban-Asuquo, Nwachuku, Akwarandu Ugo
Nonparametric supervised learning algorithms represent a succinct class of supervised learning algorithms where the learning parameters are highly flexible and whose values are directly dependent on the size of the training data. In this paper, we comparatively study the properties of four nonparametric algorithms, K-Nearest Neighbours (KNNs), Support Vector Machines (SVMs), Decision trees and Random forests. The supervised learning task is a regression estimate of the time lapse in medical insurance reimbursement. Our study is concerned precisely with how well each of the nonparametric regression models fits the training data. We quantify the goodness of fit using the R-squared metric. The results are presented with a focus on the effect of the size of the training data, the feature space dimension and hyperparameter optimization. The findings suggest k-NN's and SVM's algorithms as better models in predicting welldefined output labels (i.e,