weight constraint
GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM
Ahn, Kyeongjin, Han, Sungwon, Lee, Seungeon, Ahn, Donghyun, Kim, Hyoshin, Kim, Jungwon, Kim, Jihee, Park, Sangyoon, Cha, Meeyoung
Socio-economic indicators like regional GDP, population, and education levels, are crucial to shaping policy decisions and fostering sustainable development. This research introduces GeoReg a regression model that integrates diverse data sources, including satellite imagery and web-based geospatial information, to estimate these indicators even for data-scarce regions such as developing countries. Our approach leverages the prior knowledge of large language model (LLM) to address the scarcity of labeled data, with the LLM functioning as a data engineer by extracting informative features to enable effective estimation in few-shot settings. Specifically, our model obtains contextual relationships between data features and the target indicator, categorizing their correlations as positive, negative, mixed, or irrelevant. These features are then fed into the linear estimator with tailored weight constraints for each category. To capture nonlinear patterns, the model also identifies meaningful feature interactions and integrates them, along with nonlinear transformations. Experiments across three countries at different stages of development demonstrate that our model outperforms baselines in estimating socio-economic indicators, even for low-income countries with limited data availability.
RedAHD: Reduction-Based End-to-End Automatic Heuristic Design with Large Language Models
Thach, Nguyen, Riahifar, Aida, Huynh, Nathan, Chan, Hau
Solving NP-hard combinatorial optimization problems (COPs) (e.g., traveling salesman problems (TSPs) and capacitated vehicle routing problems (CVRPs)) in practice traditionally involves handcrafting heuristics or specifying a search space for finding effective heuristics. The main challenges from these approaches, however, are the sheer amount of domain knowledge and implementation efforts required from human experts. Recently, significant progress has been made to address these challenges, particularly by using large language models (LLMs) to design heuristics within some predetermined generalized algorithmic framework (GAF, e.g., ant colony optimization and guided local search) for building key functions/components (e.g., a priori information on how promising it is to include each edge in a solution for TSP and CVRP). Although existing methods leveraging this idea have shown to yield impressive optimization performance, they are not fully end-to-end and still require considerable manual interventions. In this paper, we propose a novel end-to-end framework, named RedAHD, that enables these LLM-based heuristic design methods to operate without the need of GAFs. More specifically, RedAHD employs LLMs to automate the process of reduction, i.e., transforming the COP at hand into similar COPs that are better-understood, from which LLM-based heuristic design methods can design effective heuristics for directly solving the transformed COPs and, in turn, indirectly solving the original COP. Our experimental results, evaluated on six COPs, show that RedAHD is capable of designing heuristics with competitive or improved results over the state-of-the-art methods with minimal human involvement.
Advancing Constrained Monotonic Neural Networks: Achieving Universal Approximation Beyond Bounded Activations
Sartor, Davide, Sinigaglia, Alberto, Susto, Gian Antonio
Conventional techniques for imposing monotonicity in MLPs by construction involve the use of non-negative weight constraints and bounded activation functions, which pose well-known optimization challenges. In this work, we generalize previous theoretical results, showing that MLPs with non-negative weight constraint and activations that saturate on alternating sides are universal approximators for monotonic functions. Additionally, we show an equivalence between the saturation side in the activations and the sign of the weight constraint. This connection allows us to prove that MLPs with convex monotone activations and non-positive constrained weights also qualify as universal approximators, in contrast to their non-negative constrained counterparts. Our results provide theoretical grounding to the empirical effectiveness observed in previous works while leading to possible architectural simplification. Moreover, to further alleviate the optimization difficulties, we propose an alternative formulation that allows the network to adjust its activations according to the sign of the weights. This eliminates the requirement for weight reparameterization, easing initialization and improving training stability. Experimental evaluation reinforces the validity of the theoretical results, showing that our novel approach compares favourably to traditional monotonic architectures.
R-CONV: An Analytical Approach for Efficient Data Reconstruction via Convolutional Gradients
Eltaras, Tamer Ahmed, Malluhi, Qutaibah, Savino, Alessandro, Di Carlo, Stefano, Qayyum, Adnan, Qadir, Junaid
In the effort to learn from extensive collections of distributed data, federated learning has emerged as a promising approach for preserving privacy by using a gradient-sharing mechanism instead of exchanging raw data. However, recent studies show that private training data can be leaked through many gradient attacks. While previous analytical-based attacks have successfully reconstructed input data from fully connected layers, their effectiveness diminishes when applied to convolutional layers. This paper introduces an advanced data leakage method to efficiently exploit convolutional layers' gradients. We present a surprising finding: even with non-fully invertible activation functions, such as ReLU, we can analytically reconstruct training samples from the gradients. To the best of our knowledge, this is the first analytical approach that successfully reconstructs convolutional layer inputs directly from the gradients, bypassing the need to reconstruct layers' outputs. Prior research has mainly concentrated on the weight constraints of convolution layers, overlooking the significance of gradient constraints. Our findings demonstrate that existing analytical methods used to estimate the risk of gradient attacks lack accuracy. In some layers, attacks can be launched with less than 5 % of the reported constraints.
2 Matching Law
This outcome corresponds to the undermatching phenomenon, which has been observed in behavioral experiments. Our results suggest that when we discuss the learning processes in a decision making network, it may be insufficient to only consider a steady state for individual weight updates, and we should therefore consider the dynamics of the weight distribution and the network architecture. This proceeding is a short version of our original paper [12], with the model modified and new results included. First, let us formulate the matching law. We will consider a case with two alternatives (each denoted as A and B), which has generally been studied in animal experiments.
Complexity and scalability of defeasible reasoning in many-valued weighted knowledge bases with typicality
Alviano, Mario, Giordano, Laura, Duprรฉ, Daniele Theseider
Weighted knowledge bases for description logics with typicality under a "concept-wise" multi-preferential semantics provide a logical interpretation of MultiLayer Perceptrons. In this context, Answer Set Programming (ASP) has been shown to be suitable for addressing defeasible reasoning in the finitely many-valued case, providing a $\Pi^p_2$ upper bound on the complexity of the problem, nonetheless leaving unknown the exact complexity and only providing a proof-of-concept implementation. This paper fulfils the lack by providing a $P^{NP[log]}$-completeness result and new ASP encodings that deal with weighted knowledge bases with large search spaces.
Understanding Training-Data Leakage from Gradients in Neural Networks for Image Classification
Chen, Cangxiong, Campbell, Neill D. F.
In federated learning [6] of deep learning models for supervised tasks such as image classification and segmentation, gradients from each participant are shared either with another participant or are aggregated at a central server. In many applications of federated learning, the privacy of the training data will need to be protected and we want to obtain guarantees that a malicious participant will not be able to recover fully the training data from other participants, with shared gradients and knowledge of the model architecture. The guarantee will be indispensable in removing the barriers for applying federated learning in tasks such as image segmentations in film post-production where the training data are usually under strict IP protections. In this scenario, it is the training data that needs to be protected, rather than the information we can infer about them. In order to develop protection mechanisms, an appropriate understanding of the source of leakage of the training data is needed. For this work, we are concerned with the following question: for a deep learning model performing image classifications, what determines the success of reconstructing the training data given its label, its gradients from training, and the model architecture? We will focus on the case when we aim to reconstruct a single target image with an untrained model. Although our work was inspired by R-GAP [10], our method COPA (combined optimisation attack) provides a more general theoretical framework to training-data reconstructions, particularly for convolutional layers. Compared with DLG [11], COPA provides more insight to the mechanism of training-data leakage through a more informative formulation of the objective function, making it clearer the source of constraints.
R-GAP: Recursive Gradient Attack on Privacy
Federated learning frameworks have been regarded as a promising approach to break the dilemma between demands on privacy and the promise of learning from large collections of distributed data. Many such frameworks only ask collaborators to share their local update of a common model, i.e. gradients with respect to locally stored data, instead of exposing their raw data to other collaborators. However, recent optimization-based gradient attacks show that raw data can often be accurately recovered from gradients. It has been shown that minimizing the Euclidean distance between true gradients and those calculated from estimated data is often effective in fully recovering private data. However, there is a fundamental lack of theoretical understanding of how and when gradients can lead to unique recovery of original data. Our research fills this gap by providing a closed-form recursive procedure to recover data from gradients in deep neural networks. We demonstrate that gradient attacks consist of recursively solving a sequence of systems of linear equations. Furthermore, our closed-form approach works as well as or even better than optimization-based approaches at a fraction of the computation, we name it Recursive Gradient Attack on Privacy (R-GAP). Additionally, we propose a rank analysis method, which can be used to estimate a network architecture's risk of a gradient attack. Experimental results demonstrate the validity of the closed-form attack and rank analysis, while demonstrating its superior computational properties and lack of susceptibility to local optima vis a vis optimization-based attacks. Source code is available for download from https://github.com/JunyiZhu-AI/R-GAP.
How to Train a Progressive Growing GAN in Keras for Synthesizing Faces
Generative adversarial networks, or GANs, are effective at generating high-quality synthetic images. A limitation of GANs is that the are only capable of generating relatively small images, such as 64 64 pixels. The Progressive Growing GAN is an extension to the GAN training procedure that involves training a GAN to generate very small images, such as 4 4, and incrementally increasing the size of the generated images to 8 8, 16 16, until the desired output size is met. This has allowed the progressive GAN to generate photorealistic synthetic faces with 1024 1024 pixel resolution. The key innovation of the progressive growing GAN is the two-phase training procedure that involves the fading-in of new blocks to support higher-resolution images followed by fine-tuning. In this tutorial, you will discover how to implement and train a progressive growing generative adversarial network for generating celebrity faces. Discover how to develop DCGANs, conditional GANs, Pix2Pix, CycleGANs, and more with Keras in my new GANs book, with 29 step-by-step tutorials and full source code. Photo by Alessandro Caproni, some rights reserved. GANs are effective at generating crisp synthetic images, although are typically limited in the size of the images that can be generated. The Progressive Growing GAN is an extension to the GAN that allows the training generator models to be capable of generating large high-quality images, such as photorealistic faces with the size 1024 1024 pixels. It was described in the 2017 paper by Tero Karras, et al. from Nvidia titled "Progressive Growing of GANs for Improved Quality, Stability, and Variation."