Goto

Collaborating Authors

 hardness



Bi-Objective Online Matching and Submodular Allocations

Neural Information Processing Systems

Online allocation problems have been widely studied due to their numerous practical applications (particularly to Internet advertising), as well as considerable theoretical interest. The main challenge in such problems is making assignment decisions in the face of uncertainty about future input; effective algorithms need to predict which constraints are most likely to bind, and learn the balance between short-term gain and the value of long-term resource availability. In many important applications, the algorithm designer is faced with multiple objectives to optimize. In particular, in online advertising it is fairly common to optimize multiple metrics, such as clicks, conversions, and impressions, as well as other metrics which may be largely uncorrelated such as'share of voice', and'buyer surplus'. While there has been considerable work on multi-objective offline optimization (when the entire input is known in advance), very little is known about the online case, particularly in the case of adversarial input. In this paper, we give the first results for bi-objective online submodular optimization, providing almost matching upper and lower bounds for allocating items to agents with two submodular value functions. We also study practically relevant special cases of this problem related to Internet advertising, and obtain improved results. All our algorithms are nearly best possible, as well as being efficient and easy to implement in practice.


Computational Complexity of Learning Neural Networks: Smoothness and Degeneracy

Neural Information Processing Systems

Understanding when neural networks can be learned efficiently is a fundamental question in learning theory. Existing hardness results suggest that assumptions on both the input distribution and the network's weights are necessary for obtaining efficient algorithms. Moreover, it was previously shown that depth-2 networks can be efficiently learned under the assumptions that the input distribution is Gaussian, and the weight matrix is non-degenerate. In this work, we study whether such assumptions may suffice for learning deeper networks and prove negative results. We show that learning depth-3 ReLU networks under the Gaussian input distribution is hard even in the smoothed-analysis framework, where a random noise is added to the network's parameters. It implies that learning depth-3 ReLU networks under the Gaussian distribution is hard even if the weight matrices are non-degenerate. Moreover, we consider depth-2networks, and show hardness of learning in the smoothed-analysis framework, where both the network parameters and the input distribution are smoothed. Our hardness results are under a wellstudied assumption on the existence of local pseudorandom generators.


in Fixed Dimension Training Neural Networks is NP-Hard

Neural Information Processing Systems

Our results settle the complexity status regarding these parameters number of dimensions and number of ReLUs if the network is assumed to compute the ReLU case, we show fixed-parameter tractability for the combined parameter four ReLUs (or two linear threshold neurons) with zero training error. Finally, in We also answer a question by Froese et al. [2022, JAIR] proving W[1]-hardness for dimensions, which excludes any polynomial-time algorithm for constant dimension. Khalife and Basu [2022, IPCO] showing that both problems are NP-hard for two eral questions are still open. We answer questions by Arora et al. [2018, ICLR] and complexity of these problems has been studied numerous times in recent years, sevsidering ReLU and linear threshold activation functions.


Learning to Think from Multiple Thinkers

arXiv.org Machine Learning

We study learning with Chain-of-Thought (CoT) supervision from multiple thinkers, all of whom provide correct but possibly systematically different solutions, e.g., step-by-step solutions to math problems written by different thinkers, or step-by-step execution traces of different programs solving the same problem. We consider classes that are computationally easy to learn using CoT supervision from a single thinker, but hard to learn with only end-result supervision, i.e., without CoT (Joshi et al. 2025). We establish that, under cryptographic assumptions, learning can be hard from CoT supervision provided by two or a few different thinkers, in passive data-collection settings. On the other hand, we provide a generic computationally efficient active learning algorithm that learns with a small amount of CoT data per thinker that is completely independent of the target accuracy $\varepsilon$, a moderate number of thinkers that scales as $\log \frac{1}{\varepsilon}\log \log \frac{1}{\varepsilon}$, and sufficient passive end-result data that scales as $\frac{1}{\varepsilon}\cdot poly\log\frac{1}{\varepsilon}$.



Evaluating State-of-the-Art Classification Models Against Bayes Optimality

Neural Information Processing Systems

Evaluating the inherent difficulty of a given data-driven classification problem is important for establishing absolute benchmarks and evaluating progress in the field. To this end, a natural quantity to consider is the Bayes error, which measures the optimal classification error theoretically achievable for a given data distribution. While generally an intractable quantity, we show that we can compute the exact Bayes error of generative models learned using normalizing flows. Our technique relies on a fundamental result, which states that the Bayes error is invariant under invertible transformation. Therefore, we can compute the exact Bayes error of the learned flow models by computing it for Gaussian base distributions, which can be done efficiently using Holmes-Diaconis-Ross integration. Moreover, we show that by varying the temperature of the learned flow models, we can generate synthetic datasets that closely resemble standard benchmark datasets, but with almost any desired Bayes error. We use our approach to conduct a thorough investigation of state-of-the-art classification models, and find that in some -- but not all -- cases, these models are capable of obtaining accuracy very near optimal. Finally, we use our method to evaluate the intrinsic "hardness" of standard benchmark datasets.



ARelated Work

Neural Information Processing Systems

We remind important related works to understand how our AdvInfoNCE stands and its role in rich literature. Our work is related to the literature on contrastive learning-based collaborative filtering (CL-based CF) methods, and theoretical understanding of contrastive loss in collaborative filtering. A.1 Contrastive Learning-based Collaborative Filtering The latest CL-based CF methods can roughly fall into two research lines. The second category, referred to as "loss-based" approaches, mainly focuses on the modification of contrastive loss. In loss-based CF models, interacted items serve as positive instances. The prevailing augmentation-based paradigm in CL-based CF methods is to employ user-item bipartite graph augmentations to generate contrasting views. These contrasting views are then treated as positive instances in the application of contrastive loss, such as InfoNCE loss, to further enhance collaborative filtering signals.