Goto

Collaborating Authors

 constrained


An Exploration of Non-Euclidean Gradient Descent: Muon and its Many Variants

Crawshaw, Michael, Modi, Chirag, Liu, Mingrui, Gower, Robert M.

arXiv.org Machine Learning

To define a steepest descent method over a neural network, we need to choose a norm for each layer, a way to aggregate these norms across layers, and whether to use normalization. We systematically explore different alternatives for aggregating norms across layers, both formalizing existing combinations of Adam and the recently proposed Muon as a type of non-Euclidean gradient descent, and deriving new variants of the Muon optimizer. Through a comprehensive experimental evaluation of the optimizers within our framework, we find that Muon is sensitive to the choice of learning rate, whereas a new variant we call MuonMax is significantly more robust. We then show how to combine any non-Euclidean gradient method with model based momentum (known as Momo). The new Momo variants of Muon are significantly more robust to hyperparameter tuning, and often achieve a better validation score. Thus for new tasks, where the optimal hyperparameters are not known, we advocate for using Momo in combination with MuonMax to save on costly hyperparameter tuning.


Improving Generative Cross-lingual Aspect-Based Sentiment Analysis with Constrained Decoding

Šmíd, Jakub, Přibáň, Pavel, Král, Pavel

arXiv.org Artificial Intelligence

While aspect-based sentiment analysis (ABSA) has made substantial progress, challenges remain for low-resource languages, which are often overlooked in favour of English. Current cross-lingual ABSA approaches focus on limited, less complex tasks and often rely on external translation tools. This paper introduces a novel approach using constrained decoding with sequence-to-sequence models, eliminating the need for unreliable translation tools and improving cross-lingual performance by 5\% on average for the most complex task. The proposed method also supports multi-tasking, which enables solving multiple ABSA tasks with a single model, with constrained decoding boosting results by more than 10\%. We evaluate our approach across seven languages and six ABSA tasks, surpassing state-of-the-art methods and setting new benchmarks for previously unexplored tasks. Additionally, we assess large language models (LLMs) in zero-shot, few-shot, and fine-tuning scenarios. While LLMs perform poorly in zero-shot and few-shot settings, fine-tuning achieves competitive results compared to smaller multilingual models, albeit at the cost of longer training and inference times. We provide practical recommendations for real-world applications, enhancing the understanding of cross-lingual ABSA methodologies. This study offers valuable insights into the strengths and limitations of cross-lingual ABSA approaches, advancing the state-of-the-art in this challenging research domain.


Assessing the Human Likeness of AI-Generated Counterspeech

Song, Xiaoying, Mamidisetty, Sujana, Blanco, Eduardo, Hong, Lingzi

arXiv.org Artificial Intelligence

Counterspeech is a targeted response to counteract and challenge abusive or hateful content. It effectively curbs the spread of hatred and fosters constructive online communication. Previous studies have proposed different strategies for automatically generated counterspeech. Evaluations, however, focus on relevance, surface form, and other shallow linguistic characteristics. This paper investigates the human likeness of AI-generated counterspeech, a critical factor influencing effectiveness. We implement and evaluate several LLM-based generation strategies, and discover that AI-generated and human-written counterspeech can be easily distinguished by both simple classifiers and humans. Further, we reveal differences in linguistic characteristics, politeness, and specificity. The dataset used in this study is publicly available for further research.


Theoretically Grounded Pruning of Large Ground Sets for Constrained, Discrete Optimization

Nath, Ankur, Kuhnle, Alan

arXiv.org Artificial Intelligence

Modern instances of combinatorial optimization problems often exhibit billion-scale ground sets, which have many uninformative or redundant elements. In this work, we develop light-weight pruning algorithms to quickly discard elements that are unlikely to be part of an optimal solution. Under mild assumptions on the instance, we prove theoretical guarantees on the fraction of the optimal value retained and the size of the resulting pruned ground set. Through extensive experiments on real-world datasets for various applications, we demonstrate that our algorithm, QuickPrune, efficiently prunes over 90% of the ground set and outperforms state-of-the-art classical and machine learning heuristics for pruning.


Spectral Clustering in Convex and Constrained Settings

Behera, Swarup Ranjan, Saradhi, Vijaya V.

arXiv.org Artificial Intelligence

Spectral clustering methods have gained widespread recognition for their effectiveness in clustering high-dimensional data. Among these techniques, constrained spectral clustering has emerged as a prominent approach, demonstrating enhanced performance by integrating pairwise constraints. However, the application of such constraints to semidefinite spectral clustering, a variant that leverages semidefinite programming to optimize clustering objectives, remains largely unexplored. In this paper, we introduce a novel framework for seamlessly integrating pairwise constraints into semidefinite spectral clustering. Our methodology systematically extends the capabilities of semidefinite spectral clustering to capture complex data structures, thereby addressing real-world clustering challenges more effectively. Additionally, we extend this framework to encompass both active and self-taught learning scenarios, further enhancing its versatility and applicability. Empirical studies conducted on well-known datasets demonstrate the superiority of our proposed framework over existing spectral clustering methods, showcasing its robustness and scalability across diverse datasets and learning settings. By bridging the gap between constrained learning and semidefinite spectral clustering, our work contributes to the advancement of spectral clustering techniques, offering researchers and practitioners a versatile tool for addressing complex clustering challenges in various real-world applications. Access to the data, code, and experimental results is provided for further exploration (https://github.com/swarupbehera/SCCCS).


Online Learning: Stochastic, Constrained, and Smoothed Adversaries

Neural Information Processing Systems

Learning theory has largely focused on two main learning scenarios: the classical statistical setting where instances are drawn i.i.d. It can be argued that in the real world neither of these assumptions is reasonable. We define the minimax value of a game where the adversary is restricted in his moves, capturing stochastic and non-stochastic assumptions on data. Building on the sequential symmetrization approach, we define a notion of distribution-dependent Rademacher complexity for the spectrum of problems ranging from i.i.d. to worst-case. The bounds let us immediately deduce variation-type bounds. We study a smoothed online learning scenario and show that exponentially small amount of noise can make function classes with infinite Littlestone dimension learnable.


iCub Knows Where You Look: Exploiting Social Cues for Interactive Object Detection Learning

Lombardi, Maria, Maiettini, Elisa, Tikhanoff, Vadim, Natale, Lorenzo

arXiv.org Artificial Intelligence

Performing joint interaction requires constant mutual monitoring of own actions and their effects on the other's behaviour. Such an action-effect monitoring is boosted by social cues and might result in an increasing sense of agency. Joint actions and joint attention are strictly correlated and both of them contribute to the formation of a precise temporal coordination. In human-robot interaction, the robot's ability to establish joint attention with a human partner and exploit various social cues to react accordingly is a crucial step in creating communicative robots. Along the social component, an effective human-robot interaction can be seen as a new method to improve and make the robot's learning process more natural and robust for a given task. In this work we use different social skills, such as mutual gaze, gaze following, speech and human face recognition, to develop an effective teacher-learner scenario tailored to visual object learning in dynamic environments. Experiments on the iCub robot demonstrate that the system allows the robot to learn new objects through a natural interaction with a human teacher in presence of distractors.


A Bibliographic View on Constrained Clustering

Kuncheva, Ludmila, Williams, Francis, Hennessey, Samuel

arXiv.org Artificial Intelligence

A keyword search on constrained clustering on Web-of-Science returned just under 3,000 documents. We ran automatic analyses of those, and compiled our own bibliography of 183 papers which we analysed in more detail based on their topic and experimental study, if any. This paper presents general trends of the area and its sub-topics by Pareto analysis, using citation count and year of publication. We list available software and analyse the experimental sections of our reference collection. We found a notable lack of large comparison experiments. Among the topics we reviewed, applications studies were most abundant recently, alongside deep learning, active learning and ensemble learning.


ML / AI / Human Creativity wants to be Constrained

#artificialintelligence

In a previous article I introduced a concept of a Creative Intelligence (CI) as either a human or AI that produces creative output. I introduced the term Intelligence Director (ID) as the one who directs the CI towards a goal. I introduced the concept of a Constraint Language (CL) as a language used for constraining the CI working on a given task. This article builds on the previous, and focuses more on constraints and why they are so important for creativity. It is well known that constraints and art go hand in hand.


Boolean and $\mathbb{F}_p$-Matrix Factorization: From Theory to Practice

Fomin, Fedor, Panolan, Fahad, Patil, Anurag, Tanveer, Adil

arXiv.org Artificial Intelligence

Boolean Matrix Factorization (BMF) aims to find an approximation of a given binary matrix as the Boolean product of two low-rank binary matrices. Binary data is ubiquitous in many fields, and representing data by binary matrices is common in medicine, natural language processing, bioinformatics, computer graphics, among many others. Unfortunately, BMF is computationally hard and heuristic algorithms are used to compute Boolean factorizations. Very recently, the theoretical breakthrough was obtained independently by two research groups. Ban et al. (SODA 2019) and Fomin et al. (Trans. Algorithms 2020) show that BMF admits an efficient polynomial-time approximation scheme (EPTAS). However, despite the theoretical importance, the high double-exponential dependence of the running times from the rank makes these algorithms unimplementable in practice. The primary research question motivating our work is whether the theoretical advances on BMF could lead to practical algorithms. The main conceptional contribution of our work is the following. While EPTAS for BMF is a purely theoretical advance, the general approach behind these algorithms could serve as the basis in designing better heuristics. We also use this strategy to develop new algorithms for related $\mathbb{F}_p$-Matrix Factorization. Here, given a matrix $A$ over a finite field GF($p$) where $p$ is a prime, and an integer $r$, our objective is to find a matrix $B$ over the same field with GF($p$)-rank at most $r$ minimizing some norm of $A-B$. Our empirical research on synthetic and real-world data demonstrates the advantage of the new algorithms over previous works on BMF and $\mathbb{F}_p$-Matrix Factorization.