Goto

Collaborating Authors

 definition



Appendix

Neural Information Processing Systems

ThepolytopeP(X,L) is in fact a "twisted sum" of a finite number of lattice polytopes fibering overP(F,L| F) .




Do Neural Networks Need Gradient Descent to Generalize? A Theoretical Study

Alexander, Yotam, Slutzky, Yonatan, Ran-Milo, Yuval, Cohen, Nadav

arXiv.org Machine Learning

Conventional wisdom attributes the mysterious generalization abilities of overparameterized neural networks to gradient descent (and its variants). The recent volume hypothesis challenges this view: it posits that these generalization abilities persist even when gradient descent is replaced by Guess & Check (G&C), i.e., by drawing weight settings until one that fits the training data is found. The validity of the volume hypothesis for wide and deep neural networks remains an open question. In this paper, we theoretically investigate this question for matrix factorization (with linear and non-linear activation)--a common testbed in neural network theory. We first prove that generalization under G&C deteriorates with increasing width, establishing what is, to our knowledge, the first case where G&C is provably inferior to gradient descent. Conversely, we prove that generalization under G&C improves with increasing depth, revealing a stark contrast between wide and deep networks, which we further validate empirically. These findings suggest that even in simple settings, there may not be a simple answer to the question of whether neural networks need gradient descent to generalize well.


A Definition of Continual Reinforcement Learning

Neural Information Processing Systems

In a standard view of the reinforcement learning problem, an agent's goal is to efficiently identify a policy that maximizes long-term reward. However, this perspective is based on a restricted view of learning as finding a solution, rather than treating learning as endless adaptation. In contrast, continual reinforcement learning refers to the setting in which the best agents never stop learning. Despite the importance of continual reinforcement learning, the community lacks a simple definition of the problem that highlights its commitments and makes its primary concepts precise and clear. To this end, this paper is dedicated to carefully defining the continual reinforcement learning problem.


Dialogue Systems for Emotional Support via Value Reinforcement

Kim, Juhee, Mok, Chunghu, Lee, Jisun, Kim, Hyang Sook, Jo, Yohan

arXiv.org Artificial Intelligence

Emotional support dialogue systems aim to reduce help-seekers' distress and help them overcome challenges. While human values$\unicode{x2013}$core beliefs that shape an individual's priorities$\unicode{x2013}$are increasingly emphasized in contemporary psychological therapy for their role in fostering internal transformation and long-term emotional well-being, their integration into emotional support systems remains underexplored. To bridge this gap, we present a value-driven method for training emotional support dialogue systems designed to reinforce positive values in seekers. Our model learns to identify which values to reinforce at each turn and how to do so, by leveraging online support conversations from Reddit. The model demonstrated superior performance in emotional support capabilities, outperforming various baselines. Notably, it more effectively explored and elicited values from seekers. Expert assessments by therapists highlighted two key strengths of our model: its ability to validate users' challenges and its effectiveness in emphasizing positive aspects of their situations$\unicode{x2013}$both crucial elements of value reinforcement. Our work validates the effectiveness of value reinforcement for emotional support systems and establishes a foundation for future research.


Polyatomic Complexes: A topologically-informed learning representation for atomistic systems

Khorana, Rahul, Noack, Marcus, Qian, Jin

arXiv.org Artificial Intelligence

Developing robust representations of chemical structures that enable models to learn topological inductive biases is challenging. In this manuscript, we present a representation of atomistic systems. We begin by proving that our representation satisfies all structural, geometric, efficiency, and generalizability constraints. Afterward, we provide a general algorithm to encode any atomistic system. Finally, we report performance comparable to state-of-the-art methods on numerous tasks.


Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity

Neural Information Processing Systems

We develop a general duality between neural networks and compositional kernel Hilbert spaces. We introduce the notion of a computation skeleton, an acyclic graph that succinctly describes both a family of neural networks and a kernel space. Random neural networks are generated from a skeleton through node replication followed by sampling from a normal distribution to assign weights. The kernel space consists of functions that arise by compositions, averaging, and non-linear transformations governed by the skeleton's graph topology and activation functions. We prove that random networks induce representations which approximate the kernel space. In particular, it follows that random weight initialization often yields a favorable starting point for optimization despite the worst-case intractability of training neural networks.


Quota Sample (Statistics) Definition

#artificialintelligence

In probability sampling, scientists need to understand the effects of a phenomena on a specific population. Sometimes, however scientists try and structure their sampling techniques to ensure subgroups of a population have equal chances of being selected for the sample population group. This is a process known as non-probability sampling, and the technique is known as quota sampling.