AITopics | Domeniconi, Carlotta

Collaborating Authors

Domeniconi, Carlotta

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DispaRisk: Assessing and Interpreting Disparity Risks in Datasets

Vasquez, Jonathan, Domeniconi, Carlotta, Rangwala, Huzefa

arXiv.org Artificial IntelligenceMay-20-2024

Machine Learning algorithms (ML) impact virtually every aspect of human lives and have found use across diverse sectors, including healthcare, finance, and education. Often, ML algorithms have been found to exacerbate societal biases presented in datasets, leading to adversarial impacts on subsets/groups of individuals, in many cases minority groups. To effectively mitigate these untoward effects, it is crucial that disparities/biases are identified and assessed early in a ML pipeline. This proactive approach facilitates timely interventions to prevent bias amplification and reduce complexity at later stages of model development. In this paper, we introduce DispaRisk, a novel framework designed to proactively assess the potential risks of disparities in datasets during the initial stages of the ML pipeline. We evaluate DispaRisk's effectiveness by benchmarking it with commonly used datasets in fairness research. Our findings demonstrate the capabilities of DispaRisk to identify datasets with a high-risk of discrimination, model families prone to biases, and characteristics that heighten discrimination susceptibility in a ML pipeline. The code for our experiments is available in the following repository: https://github.com/jovasque156/disparisk

artificial intelligence, information, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2405.12372

Country: North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Law (0.46)
Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Federated Causality Learning with Explainable Adaptive Optimization

Yang, Dezhi, He, Xintong, Wang, Jun, Yu, Guoxian, Domeniconi, Carlotta, Zhang, Jinglin

arXiv.org Artificial IntelligenceDec-9-2023

Discovering the causality from observational data is a crucial task in various scientific domains. With increasing awareness of privacy, data are not allowed to be exposed, and it is very hard to learn causal graphs from dispersed data, since these data may have different distributions. In this paper, we propose a federated causal discovery strategy (FedCausal) to learn the unified global causal graph from decentralized heterogeneous data. We design a global optimization formula to naturally aggregate the causal graphs from client data and constrain the acyclicity of the global graph without exposing local data. Unlike other federated causal learning algorithms, FedCausal unifies the local and global optimizations into a complete directed acyclic graph (DAG) learning process with a flexible optimization objective. We prove that this optimization objective has a high interpretability and can adaptively handle homogeneous and heterogeneous data. Experimental results on synthetic and real datasets show that FedCausal can effectively deal with non-independently and identically distributed (non-iid) data and has a superior performance.

artificial intelligence, causal graph, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2312.0554

Country:

Asia (0.46)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Long-tail Cross Modal Hashing

Gao, Zijun, Wang, Jun, Yu, Guoxian, Yan, Zhongmin, Domeniconi, Carlotta, Zhang, Jinglin

arXiv.org Artificial IntelligenceNov-28-2022

Existing Cross Modal Hashing (CMH) methods are mainly designed for balanced data, while imbalanced data with long-tail distribution is more general in real-world. Several long-tail hashing methods have been proposed but they can not adapt for multi-modal data, due to the complex interplay between labels and individuality and commonality information of multi-modal data. Furthermore, CMH methods mostly mine the commonality of multi-modal data to learn hash codes, which may override tail labels encoded by the individuality of respective modalities. In this paper, we propose LtCMH (Long-tail CMH) to handle imbalanced multi-modal data. LtCMH firstly adopts auto-encoders to mine the individuality and commonality of different modalities by minimizing the dependency between the individuality of respective modalities and by enhancing the commonality of these modalities. Then it dynamically combines the individuality and commonality with direct features extracted from respective modalities to create meta features that enrich the representation of tail labels, and binaries meta features to generate hash codes. LtCMH significantly outperforms state-of-the-art baselines on long-tail datasets and holds a better (or comparable) performance on datasets with balanced labels.

artificial intelligence, machine learning, modality, (18 more...)

arXiv.org Artificial Intelligence

2211.15162

Country:

North America > United States (0.46)
Asia (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)

Add feedback

Hub-VAE: Unsupervised Hub-based Regularization of Variational Autoencoders

Mani, Priya, Domeniconi, Carlotta

arXiv.org Artificial IntelligenceNov-18-2022

Exemplar-based methods rely on informative data points or prototypes to guide the optimization of learning algorithms. Such data facilitate interpretable model design and prediction. Of particular interest is the utility of exemplars in learning unsupervised deep representations. In this paper, we leverage hubs, which emerge as frequent neighbors in high-dimensional spaces, as exemplars to regularize a variational autoencoder and to learn a discriminative embedding for unsupervised down-stream tasks. We propose an unsupervised, data-driven regularization of the latent space with a mixture of hub-based priors and a hub-based contrastive loss. Experimental evaluation shows that our algorithm achieves superior cluster separability in the embedding space, and accurate data reconstruction and generation, compared to baselines and state-of-the-art techniques.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2211.10469

Country: North America > United States (0.30)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

MetaMIML: Meta Multi-Instance Multi-Label Learning

Yang, Yuanlin, Yu, Guoxian, Wang, Jun, Liu, Lei, Domeniconi, Carlotta, Guo, Maozu

arXiv.org Artificial IntelligenceNov-7-2021

Multi-Instance Multi-Label learning (MIML) models complex objects (bags), each of which is associated with a set of interrelated labels and composed with a set of instances. Current MIML solutions still focus on a single-type of objects and assumes an IID distribution of training data. But these objects are linked with objects of other types, %(i.e., pictures in Facebook link with various users), which also encode the semantics of target objects. In addition, they generally need abundant labeled data for training. To effectively mine interdependent MIML objects of different types, we propose a network embedding and meta learning based approach (MetaMIML). MetaMIML introduces the context learner with network embedding to capture semantic information of objects of different types, and the task learner to extract the meta knowledge for fast adapting to new tasks. In this way, MetaMIML can naturally deal with MIML objects at data level improving, but also exploit the power of meta-learning at the model enhancing. Experiments on benchmark datasets demonstrate that MetaMIML achieves a significantly better performance than state-of-the-art algorithms.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2111.04112

Country: Asia > China (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)

Add feedback

Cross-modal Zero-shot Hashing by Label Attributes Embedding

Wang, Runmin, Yu, Guoxian, Liu, Lei, Cui, Lizhen, Domeniconi, Carlotta, Zhang, Xiangliang

arXiv.org Artificial IntelligenceNov-7-2021

Cross-modal hashing (CMH) is one of the most promising methods in cross-modal approximate nearest neighbor search. Most CMH solutions ideally assume the labels of training and testing set are identical. However, the assumption is often violated, causing a zero-shot CMH problem. Recent efforts to address this issue focus on transferring knowledge from the seen classes to the unseen ones using label attributes. However, the attributes are isolated from the features of multi-modal data. To reduce the information gap, we introduce an approach called LAEH (Label Attributes Embedding for zero-shot cross-modal Hashing). LAEH first gets the initial semantic attribute vectors of labels by word2vec model and then uses a transformation network to transform them into a common subspace. Next, it leverages the hash vectors and the feature similarity matrix to guide the feature extraction network of different modalities. At the same time, LAEH uses the attribute similarity as the supplement of label similarity to rectify the label embedding and common subspace. Experiments show that LAEH outperforms related representative zero-shot and cross-modal hashing methods.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2111.0408

Country:

Asia > China (0.28)
North America > United States > Massachusetts (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Crowdsourcing with Meta-Workers: A New Way to Save the Budget

Han, Guangyang, Yu, Guoxian, Cui, Lizhen, Domeniconi, Carlotta, Zhang, Xiangliang

arXiv.org Artificial IntelligenceNov-7-2021

Due to the unreliability of Internet workers, it's difficult to complete a crowdsourcing project satisfactorily, especially when the tasks are multiple and the budget is limited. Recently, meta learning has brought new vitality to few-shot learning, making it possible to obtain a classifier with a fair performance using only a few training samples. Here we introduce the concept of \emph{meta-worker}, a machine annotator trained by meta learning for types of tasks (i.e., image classification) that are well-fit for AI. Unlike regular crowd workers, meta-workers can be reliable, stable, and more importantly, tireless and free. We first cluster unlabeled data and ask crowd workers to repeatedly annotate the instances nearby the cluster centers; we then leverage the annotated data and meta-training datasets to build a cluster of meta-workers using different meta learning algorithms. Subsequently, meta-workers are asked to annotate the remaining crowdsourced tasks. The Jensen-Shannon divergence is used to measure the disagreement among the annotations provided by the meta-workers, which determines whether or not crowd workers should be invited for further annotation of the same task. Finally, we model meta-workers' preferences and compute the consensus annotation by weighted majority voting. Our empirical study confirms that, by combining machine and human intelligence, we can accomplish a crowdsourcing project with a lower budget than state-of-the-art task assignment methods, while achieving a superior or comparable quality.

machine learning, teaching medhods, teaching method, (19 more...)

arXiv.org Artificial Intelligence

2111.04068

Country:

Asia > China (0.14)
Asia > Middle East (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Open-Set Crowdsourcing using Multiple-Source Transfer Learning

Han, Guangyang, Yu, Guoxian, Liu, Lei, Cui, Lizhen, Domeniconi, Carlotta, Zhang, Xiangliang

arXiv.org Machine LearningNov-7-2021

We raise and define a new crowdsourcing scenario, open set crowdsourcing, where we only know the general theme of an unfamiliar crowdsourcing project, and we don't know its label space, that is, the set of possible labels. This is still a task annotating problem, but the unfamiliarity with the tasks and the label space hampers the modelling of the task and of workers, and also the truth inference. We propose an intuitive solution, OSCrowd. First, OSCrowd integrates crowd theme related datasets into a large source domain to facilitate partial transfer learning to approximate the label space inference of these tasks. Next, it assigns weights to each source domain based on category correlation. After this, it uses multiple-source open set transfer learning to model crowd tasks and assign possible annotations. The label space and annotations given by transfer learning will be used to guide and standardize crowd workers' annotations. We validate OSCrowd in an online scenario, and prove that OSCrowd solves the open set crowdsourcing problem, works better than related crowdsourcing solutions.

artificial intelligence, machine learning, social media, (19 more...)

arXiv.org Machine Learning

2111.04073

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)

Add feedback

Few-Shot Partial-Label Learning

Zhao, Yunfeng, Yu, Guoxian, Liu, Lei, Yan, Zhongmin, Cui, Lizhen, Domeniconi, Carlotta

arXiv.org Artificial IntelligenceJun-2-2021

Partial-label learning (PLL) generally focuses on inducing a noise-tolerant multi-class classifier by training on overly-annotated samples, each of which is annotated with a set of labels, but only one is the valid label. A basic promise of existing PLL solutions is that there are sufficient partial-label (PL) samples for training. However, it is more common than not to have just few PL samples at hand when dealing with new tasks. Furthermore, existing few-shot learning algorithms assume precise labels of the support set; as such, irrelevant labels may seriously mislead the meta-learner and thus lead to a compromised performance. How to enable PLL under a few-shot learning setting is an important problem, but not yet well studied. In this paper, we introduce an approach called FsPLL (Few-shot PLL). FsPLL first performs adaptive distance metric learning by an embedding network and rectifying prototypes on the tasks previously encountered. Next, it calculates the prototype of each class of a new task in the embedding network. An unseen example can then be classified via its distance to each prototype. Experimental results on widely-used few-shot datasets (Omniglot and miniImageNet) demonstrate that our FsPLL can achieve a superior performance than the state-of-the-art methods across different settings, and it needs fewer samples for quickly adapting to new tasks.

artificial intelligence, inductive learning, prototype, (17 more...)

arXiv.org Artificial Intelligence

2106.00984

Country: Asia > China > Shandong Province (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Effects of Model Misspecification on Bayesian Bandits: Case Studies in UX Optimization

Sweeney, Mack, van Adelsberg, Matthew, Laskey, Kathryn, Domeniconi, Carlotta

arXiv.org Artificial IntelligenceOct-7-2020

Bayesian bandits using Thompson Sampling have seen increasing success in recent years. Yet existing value models (of rewards) are misspecified on many real-world problem. We demonstrate this on the User Experience Optimization (UXO) problem, providing a novel formulation as a restless, sleeping bandit with unobserved confounders plus optional stopping. Our case studies show how common misspecifications can lead to sub-optimal rewards, and we provide model extensions to address these, along with a scientific model building process practitioners can adopt or adapt to solve their own unique problems. To our knowledge, this is the first study showing the effects of overdispersion on bandit explore/exploit efficacy, tying the common notions of under- and over-confidence to over- and under-exploration, respectively. We also present the first model to exploit cointegration in a restless bandit, demonstrating that finite regret and fast and consistent optional stopping are possible by moving beyond simpler windowing, discounting, and drift models.

artificial intelligence, bayesian inference, bb-glm, (19 more...)

arXiv.org Artificial Intelligence

2010.0401

Country:

North America > United States > New York (0.14)
North America > United States > Virginia > Fairfax County (0.14)

Genre: Research Report > Experimental Study (0.47)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback