AITopics | Gionis, Aristides

Collaborating Authors

Gionis, Aristides

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fair Clustering for Data Summarization: Improved Approximation Algorithms and Complexity Insights

Gadekar, Ameet, Gionis, Aristides, Thejaswi, Suhas

arXiv.org Artificial IntelligenceOct-16-2024

Data summarization tasks are often modeled as $k$-clustering problems, where the goal is to choose $k$ data points, called cluster centers, that best represent the dataset by minimizing a clustering objective. A popular objective is to minimize the maximum distance between any data point and its nearest center, which is formalized as the $k$-center problem. While in some applications all data points can be chosen as centers, in the general setting, centers must be chosen from a predefined subset of points, referred as facilities or suppliers; this is known as the $k$-supplier problem. In this work, we focus on fair data summarization modeled as the fair $k$-supplier problem, where data consists of several groups, and a minimum number of centers must be selected from each group while minimizing the $k$-supplier objective. The groups can be disjoint or overlapping, leading to two distinct problem variants each with different computational complexity. We present $3$-approximation algorithms for both variants, improving the previously known factor of $5$. For disjoint groups, our algorithm runs in polynomial time, while for overlapping groups, we present a fixed-parameter tractable algorithm, where the exponential runtime depends only on the number of groups and centers. We show that these approximation factors match the theoretical lower bounds, assuming standard complexity theory conjectures. Finally, using an open-source implementation, we demonstrate the scalability of our algorithms on large synthetic datasets and assess the price of fairness on real-world data, comparing solution quality with and without fairness constraints.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2410.12913

Country:

Europe (0.68)
North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

Efficient Exploration of the Rashomon Set of Rule Set Models

Ciaperoni, Martino, Xiao, Han, Gionis, Aristides

arXiv.org Artificial IntelligenceJun-5-2024

Today, as increasingly complex predictive models are developed, simple rule sets remain a crucial tool to obtain interpretable predictions and drive high-stakes decision making. However, a single rule set provides a partial representation of a learning task. An emerging paradigm in interpretable machine learning aims at exploring the Rashomon set of all models exhibiting near-optimal performance. Existing work on Rashomon-set exploration focuses on exhaustive search of the Rashomon set for particular classes of models, which can be a computationally challenging task. On the other hand, exhaustive enumeration leads to redundancy that often is not necessary, and a representative sample or an estimate of the size of the Rashomon set is sufficient for many applications. In this work, we propose, for the first time, efficient methods to explore the Rashomon set of rule set models with or without exhaustive search. Extensive experiments demonstrate the effectiveness of the proposed methods in a variety of scenarios.

artificial intelligence, expert system, rashomon, (16 more...)

arXiv.org Artificial Intelligence

2406.03059

Country:

Europe (1.00)
North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)

Add feedback

Diversity-aware clustering: Computational Complexity and Approximation Algorithms

Thejaswi, Suhas, Gadekar, Ameet, Ordozgoiti, Bruno, Gionis, Aristides

arXiv.org Artificial IntelligenceJan-10-2024

Diversity is an essential design choice across numerous real-world contexts, spanning social environments [1], organizational structures [2], and demographic studies [3]. Embracing diversity entails acknowledging and incorporating multifaceted characteristics within groups. This concept holds profound relevance in addressing real-world challenges, particularly in scenarios where intersectionality -- the interconnected nature of social categorizations such as gender, ethnicity, religion, socio-economic status and sexual orientation -- plays a pivotal role [4, 5]. Consider the task of constituting a representative committee that accurately mirrors the demography of a broader population. In the pursuit of diversifying, and recognizing its significance in the context of fairness, it is imperative to ensure representation from various groups based on their gender, ethnicity, and economic status, among other [6]. In reality, individuals belong to multiple social categories, for example, a person could be a woman of a specific ethnic background and economic group.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2401.05502

Country:

North America > United States (0.14)
Europe (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Learning Cellular Coverage from Real Network Configurations using GNNs

Jin, Yifei, Daoutis, Marios, Girdzijauskas, Sarunas, Gionis, Aristides

arXiv.org Artificial IntelligenceApr-20-2023

Cellular coverage quality estimation has been a critical task for self-organized networks. In real-world scenarios, deep-learning-powered coverage quality estimation methods cannot scale up to large areas due to little ground truth can be provided during network design & optimization. In addition they fall short in produce expressive embeddings to adequately capture the variations of the cells' configurations. To deal with this challenge, we formulate the task in a graph representation and so that we can apply state-of-the-art graph neural networks, that show exemplary performance. We propose a novel training framework that can both produce quality cell configuration embeddings for estimating multiple KPIs, while we show it is capable of generalising to large (area-wide) scenarios given very few labeled cells. We show that our framework yields comparable accuracy with models that have been trained using massively labeled samples.

artificial intelligence, machine learning, relation, (20 more...)

arXiv.org Artificial Intelligence

2304.10328

Country: Europe (0.47)

Genre: Research Report (0.82)

Industry:

Information Technology > Networks (0.64)
Telecommunications > Networks (0.40)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Concise and interpretable multi-label rule sets

Ciaperoni, Martino, Xiao, Han, Gionis, Aristides

arXiv.org Artificial IntelligenceNov-7-2022

Multi-label classification is becoming increasingly ubiquitous, but not much attention has been paid to interpretability. In this paper, we develop a multi-label classifier that can be represented as a concise set of simple "if-then" rules, and thus, it offers better interpretability compared to black-box models. Notably, our method is able to find a small set of relevant patterns that lead to accurate multi-label classification, while existing rule-based classifiers are myopic and wasteful in searching rules,requiring a large number of rules to achieve high accuracy. In particular, we formulate the problem of choosing multi-label rules to maximize a target function, which considers not only discrimination ability with respect to labels, but also diversity. Accounting for diversity helps to avoid redundancy, and thus, to control the number of rules in the solution set. To tackle the said maximization problem we propose a 2-approximation algorithm, which relies on a novel technique to sample high-quality rules. In addition to our theoretical analysis, we provide a thorough experimental evaluation, which indicates that our approach offers a trade-off between predictive performance and interpretability that is unmatched in previous work.

artificial intelligence, expert system, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2210.01533

Country: Europe (0.28)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Regularized impurity reduction: Accurate decision trees with complexity guarantees

Zhang, Guangyi, Gionis, Aristides

arXiv.org Artificial IntelligenceAug-23-2022

Decision trees are popular classification models, providing high accuracy and intuitive explanations. However, as the tree size grows the model interpretability deteriorates. Traditional tree-induction algorithms, such as C4.5 and CART, rely on impurity-reduction functions that promote the discriminative power of each split. Thus, although these traditional methods are accurate in practice, there has been no theoretical guarantee that they will produce small trees. In this paper, we justify the use of a general family of impurity functions, including the popular functions of entropy and Gini-index, in scenarios where small trees are desirable, by showing that a simple enhancement can equip them with complexity guarantees. We consider a general setting, where objects to be classified are drawn from an arbitrary probability distribution, classification can be binary or multi-class, and splitting tests are associated with non-uniform costs. As a measure of tree complexity, we adopt the expected cost to classify an object drawn from the input distribution, which, in the uniform-cost case, is the expected number of tests. We propose a tree-induction algorithm that gives a logarithmic approximation guarantee on the tree complexity. This approximation factor is tight up to a constant factor under mild assumptions. The algorithm recursively selects a test that maximizes a greedy criterion defined as a weighted sum of three components. The first two components encourage the selection of tests that improve the balance and the cost-efficiency of the tree, respectively, while the third impurity-reduction component encourages the selection of more discriminative tests. As shown in our empirical evaluation, compared to the original heuristics, the enhanced algorithms strike an excellent balance between predictive accuracy and tree complexity.

artificial intelligence, decision tree, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10618-022-00884-7

2208.10949

Country: Europe (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Diverse Rule Sets

Zhang, Guangyi, Gionis, Aristides

arXiv.org Artificial IntelligenceJun-17-2020

While machine-learning models are flourishing and transforming many aspects of everyday life, the inability of humans to understand complex models poses difficulties for these models to be fully trusted and embraced. Thus, interpretability of models has been recognized as an equally important quality as their predictive power. In particular, rule-based systems are experiencing a renaissance owing to their intuitive if-then representation. However, simply being rule-based does not ensure interpretability. For example, overlapped rules spawn ambiguity and hinder interpretation. Here we propose a novel approach of inferring diverse rule sets, by optimizing small overlap among decision rules with a 2-approximation guarantee under the framework of Max-Sum diversification. We formulate the problem as maximizing a weighted sum of discriminative quality and diversity of a rule set. In order to overcome an exponential-size search space of association rules, we investigate several natural options for a small candidate set of high-quality rules, including frequent and accurate rules, and examine their hardness. Leveraging the special structure in our formulation, we then devise an efficient randomized algorithm, which samples rules that are highly discriminative and have small overlap. The proposed sampling algorithm analytically targets a distribution of rules that is tailored to our objective. We demonstrate the superior predictive power and interpretability of our model with a comprehensive empirical study against strong baselines.

algorithm, artificial intelligence, expert system, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3394486.3403204

2006.0989

Country: Europe (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)

Add feedback

What is the dimension of your binary data?

Tatti, Nikolaj, Mielikainen, Taneli, Gionis, Aristides, Mannila, Heikki

arXiv.org Machine LearningFeb-4-2019

Many 0/1 datasets have a very large number of variables; on the other hand, they are sparse and the dependency structure of the variables is simpler than the number of variables would suggest. Defining the effective dimensionality of such a dataset is a nontrivial problem. We consider the problem of defining a robust measure of dimension for 0/1 datasets, and show that the basic idea of fractal dimension can be adapted for binary data. However, as such the fractal dimension is difficult to interpret. Hence we introduce the concept of normalized fractal dimension. For a dataset $D$, its normalized fractal dimension is the number of columns in a dataset $D'$ with independent columns and having the same (unnormalized) fractal dimension as $D$. The normalized fractal dimension measures the degree of dependency structure of the data. We study the properties of the normalized fractal dimension and discuss its computation. We give empirical results on the normalized fractal dimension, comparing it against baseline measures such as PCA. We also study the relationship of the dimension of the whole dataset and the dimensions of subgroups formed by clustering. The results indicate interesting differences between and within datasets.

artificial intelligence, data mining, dimension, (19 more...)

arXiv.org Machine Learning

doi: 10.1109/ICDM.2006.167

1902.0148

Country:

Europe (1.00)
North America > United States (0.46)

Genre: Research Report (0.64)

Industry: Retail (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Explainable time series tweaking via irreversible and reversible temporal transformations

Karlsson, Isak, Rebane, Jonathan, Papapetrou, Panagiotis, Gionis, Aristides

arXiv.org Machine LearningSep-13-2018

Time series classification has received great attention over the past decade with a wide range of methods focusing on predictive performance by exploiting various types of temporal features. Nonetheless, little emphasis has been placed on interpretability and explainability. In this paper, we formulate the novel problem of explainable time series tweaking, where, given a time series and an opaque classifier that provides a particular classification decision for the time series, we want to find the minimum number of changes to be performed to the given time series so that the classifier changes its decision to another class. We show that the problem is NP-hard, and focus on two instantiations of the problem, which we refer to as reversible and irreversible time series tweaking. The classifier under investigation is the random shapelet forest classifier. Moreover, we propose two algorithmic solutions for the two problems along with simple optimizations, as well as a baseline solution using the nearest neighbor classifier. An extensive experimental evaluation on a variety of real datasets demonstrates the usefulness and effectiveness of our problem formulation and solutions.

cardiology, transformation, vascular disease, (22 more...)

arXiv.org Machine Learning

1809.05183

Country: Europe (0.46)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Semi-supervised Kernel Metric Learning Using Relative Comparisons

Amid, Ehsan, Gionis, Aristides, Ukkonen, Antti

arXiv.org Machine LearningDec-3-2016

We consider the problem of metric learning subject to a set of constraints on relative-distance comparisons between the data items. Such constraints are meant to reflect side-information that is not expressed directly in the feature vectors of the data items. The relative-distance constraints used in this work are particularly effective in expressing structures at finer level of detail than must-link (ML) and cannot-link (CL) constraints, which are most commonly used for semi-supervised clustering. Relative-distance constraints are thus useful in settings where providing an ML or a CL constraint is difficult because the granularity of the true clustering is unknown. Our main contribution is an efficient algorithm for learning a kernel matrix using the log determinant divergence --- a variant of the Bregman divergence --- subject to a set of relative-distance constraints. The learned kernel matrix can then be employed by many different kernel methods in a wide range of applications. In our experimental evaluations, we consider a semi-supervised clustering setting and show empirically that kernels found by our algorithm yield clusterings of higher quality than existing approaches that either use ML/CL constraints or a different means to implement the supervision using relative comparisons.

artificial intelligence, constraint, machine learning, (18 more...)

arXiv.org Machine Learning

1612.00086

Country: North America > United States (0.95)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback