Goto

Collaborating Authors

 Asturias




xpSHACL: Explainable SHACL Validation using Retrieval-Augmented Generation and Large Language Models

Publio, Gustavo Correa, Gayo, José Emilio Labra

arXiv.org Artificial Intelligence

Shapes Constraint Language (SHACL) is a powerful language for validating RDF data. Given the recent industry attention to Knowledge Graphs (KGs), more users need to validate linked data properly. However, traditional SHACL validation engines often provide terse reports in English that are difficult for non-technical users to interpret and act upon. This paper presents xpSHACL, an explainable SHACL validation system that addresses this issue by combining rule-based justification trees with retrieval-augmented generation (RAG) and large language models (LLMs) to produce detailed, multilanguage, human-readable explanations for constraint violations. A key feature of xpSHACL is its usage of a Violation KG to cache and reuse explanations, improving efficiency and consistency.


One Class Restricted Kernel Machines

Quadir, A., Sajid, M., Tanveer, M.

arXiv.org Artificial Intelligence

Restricted kernel machines (RKMs) have demonstrated a significant impact in enhancing generalization ability in the field of machine learning. Recent studies have introduced various methods within the RKM framework, combining kernel functions with the least squares support vector machine (LSSVM) in a manner similar to the energy function of restricted boltzmann machines (RBM), such that a better performance can be achieved. However, RKM's efficacy can be compromised by the presence of outliers and other forms of contamination within the dataset. These anomalies can skew the learning process, leading to less accurate and reliable outcomes. To address this critical issue and to ensure the robustness of the model, we propose the novel one-class RKM (OCRKM). In the framework of OCRKM, we employ an energy function akin to that of the RBM, which integrates both visible and hidden variables in a nonprobabilistic setting. The formulation of the proposed OCRKM facilitates the seamless integration of one-class classification method with the RKM, enhancing its capability to detect outliers and anomalies effectively. The proposed OCRKM model is evaluated over UCI benchmark datasets. Experimental findings and statistical analyses consistently emphasize the superior generalization capabilities of the proposed OCRKM model over baseline models across all scenarios.


Common Foundations for SHACL, ShEx, and PG-Schema

Ahmetaj, S., Boneva, I., Hidders, J., Hose, K., Jakubowski, M., Labra-Gayo, J. E., Martens, W., Mogavero, F., Murlak, F., Okulmus, C., Polleres, A., Savkovic, O., Simkus, M., Tomaszuk, D.

arXiv.org Artificial Intelligence

Graphs have emerged as an important foundation for a variety of applications, including capturing and reasoning over factual knowledge, semantic data integration, social networks, and providing factual knowledge for machine learning algorithms. To formalise certain properties of the data and to ensure data quality, there is a need to describe the schema of such graphs. Because of the breadth of applications and availability of different data models, such as RDF and property graphs, both the Semantic Web and the database community have independently developed graph schema languages: SHACL, ShEx, and PG-Schema. Each language has its unique approach to defining constraints and validating graph data, leaving potential users in the dark about their commonalities and differences. In this paper, we provide formal, concise definitions of the core components of each of these schema languages. We employ a uniform framework to facilitate a comprehensive comparison between the languages and identify a common set of functionalities, shedding light on both overlapping and distinctive features of the three languages.


Quantification via Gaussian Latent Space Representations

Pérez-Mon, Olaya, del Coz, Juan José, González, Pablo

arXiv.org Artificial Intelligence

Quantification, or prevalence estimation, is the task of predicting the prevalence of each class within an unknown bag of examples. Most existing quantification methods in the literature rely on prior probability shift assumptions to create a quantification model that uses the predictions of an underlying classifier to make optimal prevalence estimates. In this work, we present an end-to-end neural network that uses Gaussian distributions in latent spaces to obtain invariant representations of bags of examples. This approach addresses the quantification problem using deep learning, enabling the optimization of specific loss functions relevant to the problem and avoiding the need for an intermediate classifier, tackling the quantification problem as a direct optimization problem. Our method achieves state-of-the-art results, both against traditional quantification methods and other deep learning approaches for quantification. The code needed to reproduce all our experiments is publicly available at https://github.com/AICGijon/gmnet.


The Sets of Power

Marques-Silva, Joao, Mencía, Carlos, Mencía, Raúl

arXiv.org Artificial Intelligence

Measures of voting power have been the subject of extensive research since the mid 1940s. More recently, similar measures of relative importance have been studied in other domains that include inconsistent knowledge bases, intensity of attacks in argumentation, different problems in the analysis of database management, and explainability. This paper demonstrates that all these examples are instantiations of computing measures of importance for a rather more general problem domain. The paper then shows that the best-known measures of importance can be computed for any reference set whenever one is given a monotonically increasing predicate that partitions the subsets of that reference set. As a consequence, the paper also proves that measures of importance can be devised in several domains, for some of which such measures have not yet been studied nor proposed. Furthermore, the paper highlights several research directions related with computing measures of importance.


Detecci\'on Autom\'atica de Patolog\'ias en Notas Cl\'inicas en Espa\~nol Combinando Modelos de Lenguaje y Ontolog\'ias M\'edicos

Torre, Léon-Paul Schaub, Quirós, Pelayo, Mieres, Helena García

arXiv.org Artificial Intelligence

In this paper we present a hybrid method for the automatic detection of dermatological pathologies in medical reports. We use a large language model combined with medical ontologies to predict, given a first appointment or follow-up medical report, the pathology a person may suffer from. The results show that teaching the model to learn the type, severity and location on the body of a dermatological pathology as well as in which order it has to learn these three features significantly increases its accuracy. The article presents the demonstration of state-of-the-art results for classification of medical texts with a precision of 0.84, micro and macro F1-score of 0.82 and 0.75, and makes both the method and the dataset used available to the community.


Reinforcement Learning Approach to Optimizing Profilometric Sensor Trajectories for Surface Inspection

Roos-Hoefgeest, Sara, Roos-Hoefgeest, Mario, Alvarez, Ignacio, González, Rafael C.

arXiv.org Artificial Intelligence

High-precision surface defect detection in manufacturing is essential for ensuring quality control. Laser triangulation profilometric sensors are key to this process, providing detailed and accurate surface measurements over a line. To achieve a complete and precise surface scan, accurate relative motion between the sensor and the workpiece is required. It is crucial to control the sensor pose to maintain optimal distance and relative orientation to the surface. It is also important to ensure uniform profile distribution throughout the scanning process. This paper presents a novel Reinforcement Learning (RL) based approach to optimize robot inspection trajectories for profilometric sensors. Building upon the Boustrophedon scanning method, our technique dynamically adjusts the sensor position and tilt to maintain optimal orientation and distance from the surface, while also ensuring a consistent profile distance for uniform and high-quality scanning. Utilizing a simulated environment based on the CAD model of the part, we replicate real-world scanning conditions, including sensor noise and surface irregularities. This simulation-based approach enables offline trajectory planning based on CAD models. Key contributions include the modeling of the state space, action space, and reward function, specifically designed for inspection applications using profilometric sensors. We use Proximal Policy Optimization (PPO) algorithm to efficiently train the RL agent, demonstrating its capability to optimize inspection trajectories with profilometric sensors. To validate our approach, we conducted several experiments where a model trained on a specific training piece was tested on various parts in simulation. Also, we conducted a real-world experiment by executing the optimized trajectory, generated offline from a CAD model, to inspect a part using a UR3e robotic arm model.


Global-to-Local Support Spectrums for Language Model Explainability

Agussurja, Lucas, Lu, Xinyang, Low, Bryan Kian Hsiang

arXiv.org Artificial Intelligence

Existing sample-based methods, like influence functions and representer points, measure the importance of a training point by approximating the effect of its removal from training. As such, they are skewed towards outliers and points that are very close to the decision boundaries. The explanations provided by these methods are often static and not specific enough for different test points. In this paper, we propose a method to generate an explanation in the form of support spectrums which are based on two main ideas: the support sets and a global-to-local importance measure. The support set is the set of training points, in the predicted class, that ``lie in between'' the test point and training points in the other classes. They indicate how well the test point can be distinguished from the points not in the predicted class. The global-to-local importance measure is obtained by decoupling existing methods into the global and local components which are then used to select the points in the support set. Using this method, we are able to generate explanations that are tailored to specific test points. In the experiments, we show the effectiveness of the method in image classification and text generation tasks.