Plotting

Erdős Goes Neural: an Unsupervised Learning Framework for Combinatorial Optimization on Graphs

Neural Information Processing Systems

Combinatorial optimization (CO) problems are notoriously challenging for neural networks, especially in the absence of labeled instances. This work proposes an unsupervised learning framework for CO problems on graphs that can provide integral solutions of certified quality. Inspired by Erdős' probabilistic method, we use a neural network to parametrize a probability distribution over sets. Crucially, we show that when the network is optimized w.r.t. a suitably chosen loss, the learned distribution contains, with controlled probability, a low-cost integral solution that obeys the constraints of the combinatorial problem. The probabilistic proof of existence is then derandomized to decode the desired solutions. We demonstrate the efficacy of this approach to obtain valid solutions to the maximum clique problem and to perform local graph clustering. Our method achieves competitive results on both real datasets and synthetic hard instances.



A Contextualizing our Work

Neural Information Processing Systems

A.1 Additional Recent Works Variations of Adam have been proposed to improve its speed of convergence, generalization, and stability during training. Reddi et al. (2018) observed that Adam does not collect long-term memory of past gradients and therefore the effective learning rate could be increasing in some cases. Hence, they propose AMSGrad that maintains a maximum over the exponential running average of the squared gradients. Zaheer et al. (2018) proposed a more controlled increase in the effective learning rate by switching to additive updates, using a more refined version of AdaGrad (Duchi et al., 2011). Other variations include (a) Nadam (Dozat, 2016) that uses Nesterov momentum, (b) AdamW (Loshchilov and Hutter, 2019) that decouples the weight decay from the optimization step, (c) AdaBound (Luo et al., 2019) that maintains a dynamic upper and lower bound on the step size, (d) AdaBelief (Zhuang et al., 2020) uses a decaying average of estimated variance in the gradient in place of the running average of the squared gradients, (e) QHAdam (Ma and Yarats, 2019) that replaces both momentum estimators in Adam with quasi-hyperbolic terms, etc. LAMB (You et al., 2020) used a layerwise adaptive version of Adam to pretrain large language models efficiently. A.2 Broader Impact Our work is primarily theoretical in nature, but we discuss its broader impacts here. Strubell et al. (2020) highlighted the environmental impact of training large language models. Formal scaling rules remove the need to grid search over hyperparameters: in the case of adaptive algorithms, the grid search is over an even larger space because of the additional adaptivity hyperparameters.


Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models

Neural Information Processing Systems

Machine unlearning (MU) empowers individuals with the'right to be forgotten' by removing their private or sensitive information encoded in machine learning models. However, it remains uncertain whether MU can be effectively applied to Multimodal Large Language Models (MLLMs), particularly in scenarios of forgetting the leaked visual data of concepts. To overcome the challenge, we propose an efficient method, Single Image Unlearning (SIU), to unlearn the visual recognition of a concept by fine-tuning a single associated image for few steps. SIU consists of two key aspects: (i) Constructing Multifaceted fine-tuning data. We introduce four targets, based on which we construct fine-tuning data for the concepts to be forgotten; (ii) Joint training loss. To synchronously forget the visual recognition of concepts and preserve the utility of MLLMs, we fine-tune MLLMs through a novel Dual Masked KL-divergence Loss combined with Cross Entropy loss. Alongside our method, we establish MMUBench, a new benchmark for MU in MLLMs and introduce a collection of metrics for its evaluation.


A Simulating

Neural Information Processing Systems

Below is a FGG for derivations of a PCFG in Chomsky normal form. The largest right-hand side has 3 variables, so k = 2. The variables range over nonterminals, so m = |N| where N is the CFG's nonterminal alphabet. B.1 Plate diagrams Plate diagrams are extensions of graphs that describe repeated structure in Bayesian networks (Buntine, 1994) or factor graphs (Obermeyer et al., 2019). A plate is a subset of variables/factors, together with a count M, indicating that the variables/factors inside the plate are to be replicated M times. But there cannot be edges between different instances of a plate.


Factor Graph Grammars

Neural Information Processing Systems

We propose the use of hyperedge replacement graph grammars for factor graphs, or factor graph grammars (FGGs) for short. FGGs generate sets of factor graphs and can describe a more general class of models than plate notation, dynamic graphical models, case-factor diagrams, and sum-product networks can. Moreover, inference can be done on FGGs without enumerating all the generated factor graphs. For finite variable domains (but possibly infinite sets of graphs), a generalization of variable elimination to FGGs allows exact and tractable inference in many situations. For finite sets of graphs (but possibly infinite variable domains), a FGG can be converted to a single factor graph amenable to standard inference techniques.


49ca03822497d26a3943d5084ed59130-AuthorFeedback.pdf

Neural Information Processing Systems

Reviewer 4 (R4) asks for "at least an example of a problem in which it would be more useful than other frameworks." Formalism R2 gives an example FGG that appears to generate graphs with arbitrarily high treewidth. We are uncertain how R4's suggestion to draw the external nodes on the left-hand side differs from Renaming We agree with R3 and R4 that Def. 7 should formalize how nodes are renamed when copied. But this wouldn't be as flexible as we'd like; for example, we'd like to query a HMM for the Second, R4 writes that we need node names when matching up nodes during conjunction. But conjunction is an operation on FGGs, not factor graphs, so at the time of conjunction, no renaming has taken place.


Supplementary Material

Neural Information Processing Systems

Optimal Event Executions for Calculating Completion Rate When the synthesis tree becomes complicated, it is not straightforward to calculate the maximum potential executions for all the events, making it difficult to evaluate the performance through the metric Completion rate. Therefore, we develop an optimization formulation to compute the number of event executions that maximize the credits obtained by agents. This optimization is formulated in a single-agent setting. Since it aims to obtain maximum potential credits, multi-agent cases can also be applied with the set of events being the union of agents' skills. All natural resources can eventually be collected. Tab. 13 shows the parameters and variables used in this optimization. Table 13: Parameters and variables used in credit optimization. Example Prompt for LLM-C The following examples illustrate the prompts used in LLM-C for each mini-game. The prompts vary slightly for different mini-games and also differ across stages within the same mini-game. Specifically, the prompt for the dynamic scenario in Social Structure is presented in Listing 1. For the contract formation stage in Contract, the prompt is displayed in Listing 2. Similarly, the prompt for the negotiation stage in Negotiation can be found in Listing 3. The physical stage for Contract and that for Negotiation are the same. There are two physical stage settings, featuring different levels of difficulty. The corresponding prompts are provided in Listing 4 and Listing 5. Instructions: - The AdaSociety game is an open-ended multi-agent environment. The game consists of a complex crafting tree, where the agent needs to obtain as many resources as possible in the limited time and craft tools to mine more advanced resources to maximize its benefit.


AdaSociety: An Adaptive Environment with Social Structures for Multi-Agent Decision-Making * 2

Neural Information Processing Systems

Traditional interactive environments limit agents' intelligence growth with fixed tasks. Recently, single-agent environments address this by generating new tasks based on agent actions, enhancing task diversity. We consider the decision-making problem in multi-agent settings, where tasks are further influenced by social connections, affecting rewards and information access. However, existing multi-agent environments lack a combination of adaptive physical surroundings and social connections, hindering the learning of intelligent behaviors. To address this, we introduce AdaSociety, a customizable multi-agent environment featuring expanding state and action spaces, alongside explicit and alterable social structures. As agents progress, the environment adaptively generates new tasks with social structures for agents to undertake. In AdaSociety, we develop three mini-games showcasing distinct social structures and tasks. Initial results demonstrate that specific social structures can promote both individual and collective benefits, though current reinforcement learning and LLM-based algorithms show limited effectiveness in leveraging social structures to enhance performance. Overall, AdaSociety serves as a valuable research platform for exploring intelligence in diverse physical and social settings.


Topological obstruction to the training of shallow ReLU neural networks

Neural Information Processing Systems

Studying the interplay between the geometry of the loss landscape and the optimization trajectories of simple neural networks is a fundamental step for understanding their behavior in more complex settings. This paper reveals the presence of topological obstruction in the loss landscape of shallow ReLU neural networks trained using gradient flow. We discuss how the homogeneous nature of the ReLU activation function constrains the training trajectories to lie on a product of quadric hypersurfaces whose shape depends on the particular initialization of the network's parameters. When the neural network's output is a single scalar, we prove that these quadrics can have multiple connected components, limiting the set of reachable parameters during training. We analytically compute the number of these components and discuss the possibility of mapping one to the other through neuron rescaling and permutation. In this simple setting, we find that the non-connectedness results in a topological obstruction, which, depending on the initialization, can make the global optimum unreachable.