AITopics | Pérez, Aritz

Collaborating Authors

Pérez, Aritz

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Time-dependent Probabilistic Generative Models for Disease Progression

Zaballa, Onintze, Pérez, Aritz, Gómez-Inhiesto, Elisa, Acaiturri-Ayesta, Teresa, Lozano, Jose A.

arXiv.org Machine LearningNov-15-2023

Electronic health records contain valuable information for monitoring patients' health trajectories over time. Disease progression models have been developed to understand the underlying patterns and dynamics of diseases using these data as sequences. However, analyzing temporal data from EHRs is challenging due to the variability and irregularities present in medical records. We propose a Markovian generative model of treatments developed to (i) model the irregular time intervals between medical events; (ii) classify treatments into subtypes based on the patient sequence of medical events and the time intervals between them; and (iii) segment treatments into subsequences of disease progression patterns. We assume that sequences have an associated structure of latent variables: a latent class representing the different subtypes of treatments; and a set of latent stages indicating the phase of progression of the treatments. We use the Expectation-Maximization algorithm to learn the model, which is efficiently solved with a dynamic programming-based method. Various parametric models have been employed to model the time intervals between medical events during the learning process, including the geometric, exponential, and Weibull distributions. The results demonstrate the effectiveness of our model in recovering the underlying model from data and accurately modeling the irregular time intervals between medical actions.

artificial intelligence, machine learning, sequence, (16 more...)

arXiv.org Machine Learning

2311.09369

Country:

Europe > Spain (0.14)
Oceania > New Zealand (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Epidemiology (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.70)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

Speeding-up Evolutionary Algorithms to solve Black-Box Optimization Problems

Echevarrieta, Judith, Arza, Etor, Pérez, Aritz

arXiv.org Machine LearningSep-23-2023

Population-based evolutionary algorithms are often considered when approaching computationally expensive black-box optimization problems. They employ a selection mechanism to choose the best solutions from a given population after comparing their objective values, which are then used to generate the next population. This iterative process explores the solution space efficiently, leading to improved solutions over time. However, these algorithms require a large number of evaluations to provide a quality solution, which might be computationally expensive when the evaluation cost is high. In some cases, it is possible to replace the original objective function with a less accurate approximation of lower cost. This introduces a trade-off between the evaluation cost and its accuracy. In this paper, we propose a technique capable of choosing an appropriate approximate function cost during the execution of the optimization algorithm. The proposal finds the minimum evaluation cost at which the solutions are still properly ranked, and consequently, more evaluations can be computed in the same amount of time with minimal accuracy loss. An experimental section on four very different problems reveals that the proposed approach can reach the same objective value in less than half of the time in certain cases.

evolutionary algorithm, machine learning, objective function, (17 more...)

arXiv.org Machine Learning

2309.13349

Country: North America > United States > Pennsylvania (0.28)

Genre: Research Report > Promising Solution (0.46)

Industry: Transportation > Air (0.61)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

Structural Restricted Boltzmann Machine for image denoising and classification

Bidaurrazaga, Arkaitz, Pérez, Aritz, Santana, Roberto

arXiv.org Machine LearningJun-16-2023

Restricted Boltzmann Machines are generative models that consist of a layer of hidden variables connected to another layer of visible units, and they are used to model the distribution over visible variables. In order to gain a higher representability power, many hidden units are commonly used, which, in combination with a large number of visible units, leads to a high number of trainable parameters. In this work we introduce the Structural Restricted Boltzmann Machine model, which taking advantage of the structure of the data in hand, constrains connections of hidden units to subsets of visible units in order to reduce significantly the number of trainable parameters, without compromising performance. As a possible area of application, we focus on image modelling. Based on the nature of the images, the structure of the connections is given in terms of spatial neighbourhoods over the pixels of the image that constitute the visible variables of the model. We conduct extensive experiments on various image domains. Image denoising is evaluated with corrupted images from the MNIST dataset. The generative power of our models is compared to vanilla RBMs, as well as their classification performance, which is assessed with five different image domains. Results show that our proposed model has a faster and more stable training, while also obtaining better results compared to an RBM with no constrained connections between its visible and hidden units.

artificial intelligence, machine learning, rbm, (17 more...)

arXiv.org Machine Learning

2306.09628

Country: Europe > Spain > Basque Country (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Efficient Learning of Minimax Risk Classifiers in High Dimensions

Bondugula, Kartheek, Mazuelas, Santiago, Pérez, Aritz

arXiv.org Artificial IntelligenceJun-11-2023

High-dimensional data is common in multiple areas, such as health care and genomics, where the number of features can be tens of thousands. In such scenarios, the large number of features often leads to inefficient learning. Constraint generation methods have recently enabled efficient learning of L1-regularized support vector machines (SVMs). In this paper, we leverage such methods to obtain an efficient learning algorithm for the recently proposed minimax risk classifiers (MRCs). The proposed iterative algorithm also provides a sequence of worst-case error probabilities and performs feature selection. Experiments on multiple high-dimensional datasets show that the proposed algorithm is efficient in high-dimensional scenarios. In addition, the worst-case error probability provides useful information about the classifier performance, and the features selected by the algorithm are competitive with the state-of-the-art.

artificial intelligence, error probability, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2306.06649

Country: Europe > Spain (0.14)

Genre: Research Report (0.65)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)

Add feedback

Unsupervised Semantic Analysis of a Region from Satellite Image Time Series

Echegoyen, Carlos, Pérez, Aritz, Santafé, Guzmán, Pérez-Goya, Unai, Ugarte, María Dolores

arXiv.org Artificial IntelligenceNov-23-2022

Temporal sequences of satellite images constitute a highly valuable and abundant resource to analyze a given region. However, the labeled data needed to train most machine learning models are scarce and difficult to obtain. In this context, the current work investigates a fully unsupervised methodology that, given a sequence of images, learns a semantic embedding and then, creates a partition of the ground according to its semantic properties and its evolution over time. We illustrate the methodology by conducting the semantic analysis of a sequence of satellite images of a region of Navarre (Spain). The proposed approach reveals a novel broad perspective of the land, where potentially large areas that share both a similar semantic and a similar temporal evolution are connected in a compact and well-structured manner. The results also show a close relationship between the allocation of the clusters in the geographic space and their allocation in the embedded spaces. The semantic analysis is completed by obtaining the representative sequence of tiles corresponding to each cluster, the linear interpolation between related areas, and a graph that shows the relationships between the clusters, providing a concise semantic summary of the whole region.

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Artificial Intelligence

2208.13504

Country:

North America > United States (0.46)
Europe > Spain > Navarre (0.34)

Genre: Research Report > New Finding (0.46)

Industry:

Energy (0.48)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

MRCpy: A Library for Minimax Risk Classifiers

Bondugula, Kartheek, Mazuelas, Santiago, Pérez, Aritz

arXiv.org Machine LearningAug-4-2021

Existing libraries for supervised classification implement techniques that are based on empirical risk minimization and utilize surrogate losses. We present MRCpy library that implements minimax risk classifiers (MRCs) that are based on robust risk minimization and can utilize 0-1-loss. Such techniques give rise to a manifold of classification methods that can provide tight bounds on the expected loss. MRCpy provides a unified interface for different variants of MRCs and follows the standards of popular Python libraries. The presented library also provides implementation for popular techniques that can be seen as MRCs such as L1-regularized logistic regression, zero-one adversarial, and maximum entropy machines. In addition, MRCpy implements recent feature mappings such as Fourier, ReLU, and threshold features. The library is designed with an object-oriented approach that facilitates collaborators and users.

artificial intelligence, feature mapping, machine learning, (12 more...)

arXiv.org Machine Learning

2108.01952

Country: Europe > Spain (0.31)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Passive Approach for the K-means Problem on Streaming Data

Bidaurrazaga, Arkaitz, Pérez, Aritz, Capó, Marco

arXiv.org Machine LearningDec-7-2020

Currently the amount of data produced worldwide is increasing beyond measure, thus a high volume of unsupervised data must be processed continuously. One of the main unsupervised data analysis is clustering. In streaming data scenarios, the data is composed by an increasing sequence of batches of samples where the concept drift phenomenon may happen. In this paper, we formally define the Streaming $K$-means(S$K$M) problem, which implies a restart of the error function when a concept drift occurs. We propose a surrogate error function that does not rely on concept drift detection. We proof that the surrogate is a good approximation of the S$K$M error. Hence, we suggest an algorithm which minimizes this alternative error each time a new batch arrives. We present some initialization techniques for streaming data scenarios as well. Besides providing theoretical results, experiments demonstrate an improvement of the converged error for the non-trivial initialization methods.

artificial intelligence, batch, machine learning, (18 more...)

arXiv.org Machine Learning

2012.03628

Country:

South America (0.14)
North America > United States > Louisiana (0.14)
Europe (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Add feedback

Generalized Maximum Entropy for Supervised Classification

Mazuelas, Santiago, Shen, Yuan, Pérez, Aritz

arXiv.org Machine LearningJul-10-2020

The maximum entropy principle advocates to evaluate events' probabilities using a distribution that maximizes entropy among those that satisfy certain expectations' constraints. Such principle can be generalized for arbitrary decision problems where it corresponds to minimax approaches. This paper establishes a framework for supervised classification based on the generalized maximum entropy principle that leads to minimax risk classifiers (MRCs). We develop learning techniques that determine MRCs for general entropy functions and provide performance guarantees by means of convex optimization. In addition, we describe the relationship of the presented techniques with existing classification methods, and quantify MRCs performance in comparison with the proposed bounds and conventional methods.

artificial intelligence, bayesian inference, constraint, (18 more...)

arXiv.org Machine Learning

2007.05447

Country:

Europe > United Kingdom > England (0.14)
North America > United States > New York (0.14)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.85)

Add feedback

Weak Labeling for Crowd Learning

Beñaran-Muñoz, Iker, Hernández-González, Jerónimo, Pérez, Aritz

arXiv.org Machine LearningApr-26-2018

Crowdsourcing has become very popular among the machine learning community as a way to obtain labels that allow a ground truth to be estimated for a given dataset. In most of the approaches that use crowdsourced labels, annotators are asked to provide, for each presented instance, a single class label. Such a request could be inefficient, that is, considering that the labelers may not be experts, that way to proceed could fail to take real advantage of the knowledge of the labelers. In this paper, the use of weak labeling for crowd learning is proposed, where the annotators may provide more than a single label per instance to try not to miss the real label. The main hypothesis is that, by allowing weak labeling, knowledge can be extracted from the labelers more efficiently by than in the standard crowd learning scenario. Empirical evidence which supports that hypothesis is presented.

annotator, crowdsourcing, social media, (20 more...)

arXiv.org Machine Learning

1804.10023

Country:

Europe > Spain > Basque Country (0.14)
North America > Canada > Quebec (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

An efficient K -means clustering algorithm for massive data

Capó, Marco, Pérez, Aritz, Lozano, Jose A.

arXiv.org Machine LearningJan-9-2018

The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the implementation and relatively low computational cost. Among these algorithms, the K -means algorithm stands out as the most popular approach, besides its high dependency on the initial conditions, as well as to the fact that it might not scale well on massive datasets. In this article, we propose a recursive and parallel approximation to the K -means algorithm that scales well on both the number of instances and dimensionality of the problem, without affecting the quality of the approximation. In order to achieve this, instead of analyzing the entire dataset, we work on small weighted sets of points that mostly intend to extract information from those regions where it is harder to determine the correct cluster assignment of the original instances. In addition to different theoretical properties, which deduce the reasoning behind the algorithm, experimental results indicate that our method outperforms the state-of-the-art in terms of the trade-off between number of distance computations and the quality of the solution obtained.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

1801.02949

Country: Europe > Spain > Basque Country (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback