joint learning
On Joint Learning for Solving Placement and Routing in Chip Design
For its advantage in GPU acceleration and less dependency on human experts, machine learning has been an emerging tool for solving the placement and routing problems, as two critical steps in modern chip design flow. Being still in its early stage, there are several fundamental issues unresolved: scalability, reward design, and end-to-end learning paradigm etc. To achieve end-to-end placement learning, we first propose a joint learning method for the placement of macros and standard cells, by the integration of reinforcement learning with a gradient based optimization scheme. To further bridge the placement with the subsequent routing task, we also develop a joint learning approach via reinforcement learning. One key design in our (reinforcement) learning paradigm involves a multi-view embedding model to encode both global graph level and local node level information of the input macros. Moreover, the random network distillation is devised to encourage exploration. Experiments on public chip design benchmarks show that our method can effectively learn from experience and also provide high-quality intermediate placement for the post standard cell placement, within few hours for training.
Joint Learning of Label and Environment Causal Independence for Graph Out-of-Distribution Generalization
We tackle the problem of graph out-of-distribution (OOD) generalization. Existing graph OOD algorithms either rely on restricted assumptions or fail to exploit environment information in training data. In this work, we propose to simultaneously incorporate label and environment causal independence (LECI) to fully make use of label and environment information, thereby addressing the challenges faced by prior methods on identifying causal and invariant subgraphs. We further develop an adversarial training strategy to jointly optimize these two properties for casual subgraph discovery with theoretical guarantees. Extensive experiments and analysis show that LECI significantly outperforms prior methods on both synthetic and real-world datasets, establishing LECI as a practical and effective solution for graph OOD generalization.
Expandable and Differentiable Dual Memories with Orthogonal Regularization for Exemplar-free Continual Learning
Moon, Hyung-Jun, Cho, Sung-Bae
Continual learning methods used to force neural networks to process sequential tasks in isolation, preventing them from leveraging useful inter-task relationships and causing them to repeatedly relearn similar features or overly differentiate them. To address this problem, we propose a fully differentiable, exemplar-free expandable method composed of two complementary memories: One learns common features that can be used across all tasks, and the other combines the shared features to learn discriminative characteristics unique to each sample. Both memories are differentiable so that the network can autonomously learn latent representations for each sample. For each task, the memory adjustment module adaptively prunes critical slots and minimally expands capacity to accommodate new concepts, and orthogonal regularization enforces geometric separation between preserved and newly learned memory components to prevent interference. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet show that the proposed method outperforms 14 state-of-the-art methods for class-incremental learning, achieving final accuracies of 55.13\%, 37.24\%, and 30.11\%, respectively. Additional analysis confirms that, through effective integration and utilization of knowledge, the proposed method can increase average performance across sequential tasks, and it produces feature extraction results closest to the upper bound, thus establishing a new milestone in continual learning.
- North America > United States > Hawaii (0.04)
- Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)
Supplementary Material: Joint Learning of 2D-3D Weakly Supervised Semantic Segmentation Hyeokjun Kweon KAIST 0327june@kaist.ac.kr Kuk-Jin Y oon KAIST kjyoon@kaist.ac.kr 1 Implementation details
In the first phase, we individually train both 2D and 3D classifiers with the classification loss of each domain. After that, we jointly train them using the proposed 2D-to-3D and 3D-to-2D losses, in addition to the classification loss. Here, in the second phase, please note that we first train our framework without the 3D-to-2D loss for the first few epochs. As we explained in Section 4.1 of the main paper, we augment the images and point clouds before we In this subsection, we provide our augmentation methods in detail. On the other hand, when we use a smaller patch size, fine details of the image could not be preserved.
In-Context Learning for Gradient-Free Receiver Adaptation: Principles, Applications, and Theory
Zecchin, Matteo, Raviv, Tomer, Kalathil, Dileep, Narayanan, Krishna, Shlezinger, Nir, Simeone, Osvaldo
In recent years, deep learning has facilitated the creation of wireless receivers capable of functioning effectively in conditions that challenge traditional model-based designs. Leveraging programmable hardware architectures, deep learning-based receivers offer the potential to dynamically adapt to varying channel environments. However, current adaptation strategies, including joint training, hypernetwork-based methods, and meta-learning, either demonstrate limited flexibility or necessitate explicit optimization through gradient descent. This paper presents gradient-free adaptation techniques rooted in the emerging paradigm of in-context learning (ICL). We review architectural frameworks for ICL based on Transformer models and structured state-space models (SSMs), alongside theoretical insights into how sequence models effectively learn adaptation from contextual information. Further, we explore the application of ICL to cell-free massive MIMO networks, providing both theoretical analyses and empirical evidence. Our findings indicate that ICL represents a principled and efficient approach to real-time receiver adaptation using pilot signals and auxiliary contextual information-without requiring online retraining.
- North America > United States > Texas > Brazos County > College Station (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Joint Learning in the Gaussian Single Index Model
Pillaud-Vivien, Loucas, Schertzer, Adrien
We consider the problem of jointly learning a one-dimensional projection and a univariate function in high-dimensional Gaussian models. Specifically, we study predictors of the form $f(x)=φ^\star(\langle w^\star, x \rangle)$, where both the direction $w^\star \in \mathcal{S}_{d-1}$, the sphere of $\mathbb{R}^d$, and the function $φ^\star: \mathbb{R} \to \mathbb{R}$ are learned from Gaussian data. This setting captures a fundamental non-convex problem at the intersection of representation learning and nonlinear regression. We analyze the gradient flow dynamics of a natural alternating scheme and prove convergence, with a rate controlled by the information exponent reflecting the \textit{Gaussian regularity} of the function $φ^\star$. Strikingly, our analysis shows that convergence still occurs even when the initial direction is negatively correlated with the target. On the practical side, we demonstrate that such joint learning can be effectively implemented using a Reproducing Kernel Hilbert Space (RKHS) adapted to the structure of the problem, enabling efficient and flexible estimation of the univariate function. Our results offer both theoretical insight and practical methodology for learning low-dimensional structure in high-dimensional settings.
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- Europe > Germany > Hesse > Darmstadt Region > Frankfurt (0.04)
- Europe > France (0.04)
ARISE: Iterative Rule Induction and Synthetic Data Generation for Text Classification
M., Yashwanth, Singh, Vaibhav, Maheshwari, Ayush, Krishna, Amrith, Ramakrishnan, Ganesh
We propose ARISE, a framework that iteratively induces rules and generates synthetic data for text classification. We combine synthetic data generation and automatic rule induction, via bootstrapping, to iteratively filter the generated rules and data. We induce rules via inductive generalisation of syntactic n-grams, enabling us to capture a complementary source of supervision. These rules alone lead to performance gains in both, in-context learning (ICL) and fine-tuning (FT) settings. Similarly, use of augmented data from ARISE alone improves the performance for a model, outperforming configurations that rely on complex methods like contrastive learning. Further, our extensive experiments on various datasets covering three full-shot, eight few-shot and seven multilingual variant settings demonstrate that the rules and data we generate lead to performance improvements across these diverse domains and languages.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Singapore (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- (22 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.88)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Joint Learning of Energy-based Models and their Partition Function
Sander, Michael E., Roulet, Vincent, Liu, Tianlin, Blondel, Mathieu
Energy-based models (EBMs) offer a flexible framework for parameterizing probability distributions using neural networks. However, learning EBMs by exact maximum likelihood estimation (MLE) is generally intractable, due to the need to compute the partition function (normalization constant). In this paper, we propose a novel formulation for approximately learning probabilistic EBMs in combinatorially-large discrete spaces, such as sets or permutations. Our key idea is to jointly learn both an energy model and its log-partition, both parameterized as a neural network. Our approach not only provides a novel tractable objective criterion to learn EBMs by stochastic gradient descent (without relying on MCMC), but also a novel means to estimate the log-partition function on unseen data points. On the theoretical side, we show that our approach recovers the optimal MLE solution when optimizing in the space of continuous functions. Furthermore, we show that our approach naturally extends to the broader family of Fenchel-Young losses, allowing us to obtain the first tractable method for optimizing the sparsemax loss in combinatorially-large spaces. We demonstrate our approach on multilabel classification and label ranking.
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.86)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.86)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)
Review for NeurIPS paper: Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID
Weaknesses: - The main idea of this method is unified contrastive learning. However, the strategy of joint learning of source and target domain is not new although different methods implement with different losses (e.g., in [57,58]). It is also natural that the performance on source domain with joint learning of source and target domains is higher than finetuing with target data only. Besides, the form of non-parametric contrastive learning is widely used in general unsupervised visual representation learning methods (such as MoCo and SimCLR) and is not new in this method. It may meet with the current UDA benchmarks but the generality of this method based on such assumption is limited in those real-world practical application scenarios where no prior knowledge are available on target data. Existing methods which optimize source and target domains separately thus show more advantages in this aspect.
On Joint Learning for Solving Placement and Routing in Chip Design
For its advantage in GPU acceleration and less dependency on human experts, machine learning has been an emerging tool for solving the placement and routing problems, as two critical steps in modern chip design flow. Being still in its early stage, there are several fundamental issues unresolved: scalability, reward design, and end-to-end learning paradigm etc. To achieve end-to-end placement learning, we first propose a joint learning method for the placement of macros and standard cells, by the integration of reinforcement learning with a gradient based optimization scheme. To further bridge the placement with the subsequent routing task, we also develop a joint learning approach via reinforcement learning. One key design in our (reinforcement) learning paradigm involves a multi-view embedding model to encode both global graph level and local node level information of the input macros.