adaptation algorithm
5e6bd7a6970cd4325e587f02667f7f73-Paper.pdf
A common assumption in machine learning is that the training set and test set are drawn from the same distribution [25]. However, this assumption often does not hold in practice when models are deployed in the real world [3, 28]. One common type of distribution shift is label shift, where the conditional distribution p(x|y) is fixed but the label distribution p(y) changes over time.
Analog In-Memory Computing Attention Mechanism for Fast and Energy-Efficient Large Language Models
Leroux, Nathan, Manea, Paul-Philipp, Sudarshan, Chirag, Finkbeiner, Jan, Siegel, Sebastian, Strachan, John Paul, Neftci, Emre
Transformer networks, driven by self-attention, are central to Large Language Models. In generative Transformers, self-attention uses cache memory to store token projections, avoiding recomputation at each time step. However, GPU-stored projections must be loaded into SRAM for each new generation step, causing latency and energy bottlenecks. We present a custom self-attention in-memory computing architecture based on emerging charge-based memories called gain cells, which can be efficiently written to store new tokens during sequence generation and enable parallel analog dot-product computation required for self-attention. However, the analog gain cell circuits introduce non-idealities and constraints preventing the direct mapping of pre-trained models. To circumvent this problem, we design an initialization algorithm achieving text processing performance comparable to GPT-2 without training from scratch. Our architecture respectively reduces attention latency and energy consumption by up to two and five orders of magnitude compared to GPUs, marking a significant step toward ultra-fast, low-power generative Transformers.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
- Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)
- Asia (0.04)
Grasping Force Control and Adaptation for a Cable-Driven Robotic Hand
Mountain, Eric, Weise, Ean, Tian, Sibo, Li, Beiwen, Liang, Xiao, Zheng, Minghui
This paper introduces a unique force control and adaptation algorithm for a lightweight and low-complexity five-fingered robotic hand, namely an Integrated-Finger Robotic Hand (IFRH). The force control and adaptation algorithm is intuitive to design, easy to implement, and improves the grasping functionality through feedforward adaptation automatically. Specifically, we have extended Youla-parameterization which is traditionally used in feedback controller design into a feedforward iterative learning control algorithm (ILC). The uniqueness of such an extension is that both the feedback and feedforward controllers are parameterized over one unified design parameter which can be easily customized based on the desired closed-loop performance. While Youla-parameterization and ILC have been explored in the past on various applications, our unique parameterization and computational methods make the design intuitive and easy to implement. This provides both robust and adaptive learning capabilities, and our application rivals the complexity of many robotic hand control systems. Extensive experimental tests have been conducted to validate the effectiveness of our method.
- North America > United States > Texas > Brazos County > College Station (0.14)
- North America > United States > New York > Nassau County > Mineola (0.04)
- North America > United States > Iowa > Story County > Ames (0.04)
CoverLib: Classifiers-equipped Experience Library by Iterative Problem Distribution Coverage Maximization for Domain-tuned Motion Planning
Ishida, Hirokazu, Hiraoka, Naoki, Okada, Kei, Inaba, Masayuki
Abstract--Library-based methods are known to be very effective for fast motion planning by adapting an experience retrieved from a precomputed library. This article presents CoverLib, a principled approach for constructing and utilizing such a library. CoverLib iteratively adds an experience-classifier-pair to the library, where each classifier corresponds to an adaptable region of the experience within the problem space. This iterative process is an active procedure, as it selects the next experience based on its ability to effectively cover the uncovered region. During the query phase, these classifiers are utilized to select an experience that is expected to be adaptable for a given problem. Experimental results demonstrate that CoverLib effectively mitigates the trade-off between plannability and speed observed in global (e.g. As a result, it achieves both fast planning and high success rates over the problem domain. Similarly, in home service OTION planning has been studied from two ends of the spectrum: global and local methods. Global robotics, although the tasks are diverse, the tasks that act as methods, such as sampling-based motion planners (SBMP) bottlenecks are often known in advance (e.g., reaching into a like Probabilistic Roadmap (PRM) [1] and Rapidly-exploring narrow container). Random Tree (RRT) [2], are expected to find a solution if one exists, given enough time. However, these methods often A promising approach to this end is to use a library of require long and varying amount of computational time to experiences [5]-[10] reviewed in Section II-A.
Co-Training for Domain Adaptation
Domain adaptation algorithms seek to generalize a model trained in a source domain to a new target domain. In many practical cases, the source and target distributions can differ substantially, and in some cases crucial target features may not have support in the source domain. In this paper we introduce an algorithm that bridges the gap between source and target domains by slowly adding to the training set both the target features and instances in which the current algorithm is the most confident. Our algorithm is a variant of co-training [7], and we name it CODA (Co-training for domain adaptation). Unlike the original co-training work, we do not assume a particular feature split. Instead, for each iteration of cotraining, we formulate a single optimization problem which simultaneously learns a target predictor, a split of the feature space into views, and a subset of source and target features to include in the predictor.
- Europe > Czechia > Prague (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Missouri > St. Louis County > St. Louis (0.04)
- (2 more...)
Differentially Private Domain Adaptation with Theoretical Guarantees
Bassily, Raef, Cortes, Corinna, Mao, Anqi, Mohri, Mehryar
In many applications, the labeled data at the learner's disposal is subject to privacy constraints and is relatively limited. To derive a more accurate predictor for the target domain, it is often beneficial to leverage publicly available labeled data from an alternative domain, somewhat close to the target domain. This is the modern problem of supervised domain adaptation from a public source to a private target domain. We present two $(\epsilon, \delta)$-differentially private adaptation algorithms for supervised adaptation, for which we make use of a general optimization problem, recently shown to benefit from favorable theoretical learning guarantees. Our first algorithm is designed for regression with linear predictors and shown to solve a convex optimization problem. Our second algorithm is a more general solution for loss functions that may be non-convex but Lipschitz and smooth. While our main objective is a theoretical analysis, we also report the results of several experiments first demonstrating that the non-private versions of our algorithms outperform adaptation baselines and next showing that, for larger values of the target sample size or $\epsilon$, the performance of our private algorithms remains close to that of the non-private formulation.
- North America > United States > New York (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Ohio (0.04)
- (4 more...)
A Closer Look at Few-shot Classification Again
Luo, Xu, Wu, Hao, Zhang, Ji, Gao, Lianli, Xu, Jing, Song, Jingkuan
Few-shot classification consists of a training phase where a model is learned on a relatively large dataset and an adaptation phase where the learned model is adapted to previously-unseen tasks with limited labeled samples. In this paper, we empirically prove that the training algorithm and the adaptation algorithm can be completely disentangled, which allows algorithm analysis and design to be done individually for each phase. Our meta-analysis for each phase reveals several interesting insights that may help better understand key aspects of few-shot classification and connections with other fields such as visual representation learning and transfer learning. We hope the insights and research challenges revealed in this paper can inspire future work in related directions. Code and pre-trained models (in PyTorch) are available at https://github.com/Frankluox/CloserLookAgainFewShot.
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
- Asia > China > Heilongjiang Province > Harbin (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
Sequential Adaptation of Radial Basis Function Neural Networks and its Application to Time-series Prediction
We develop a sequential adaptation algorithm for radial basis function (RBF) neural networks of Gaussian nodes, based on the method of succes(cid:173) sive F-Projections. This method makes use of each observation efficiently in that the network mapping function so obtained is consistent with that information and is also optimal in the least L 2-norm sense. The RBF network with the F-Projections adaptation algorithm was used for pre(cid:173) dicting a chaotic time-series. We compare its performance to an adapta(cid:173) tion scheme based on the method of stochastic approximation, and show that the F-Projections algorithm converges to the underlying model much faster.
TeST: Test-time Self-Training under Distribution Shift
Sinha, Samarth, Gehler, Peter, Locatello, Francesco, Schiele, Bernt
Despite their recent success, deep neural networks continue to perform poorly when they encounter distribution shifts at test time. Many recently proposed approaches try to counter this by aligning the model to the new distribution prior to inference. With no labels available this requires unsupervised objectives to adapt the model on the observed test data. In this paper, we propose Test-Time Self-Training (TeST): a technique that takes as input a model trained on some source data and a novel data distribution at test time, and learns invariant and robust representations using a student-teacher framework. We find that models adapted using TeST significantly improve over baseline test-time adaptation algorithms. TeST achieves competitive performance to modern domain adaptation algorithms, while having access to 5-10x less data at time of adaption. We thoroughly evaluate a variety of baselines on two tasks: object detection and image segmentation and find that models adapted with TeST. We find that TeST sets the new state-of-the art for test-time domain adaptation algorithms.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Asia > Middle East > Jordan (0.04)