nguyen
Supplementary Document
The pseudo-code of plugging our method into the vanilla BO is summarised in Algorithm 1. Therefore, our method is applicable to any other variants of BO in a plug-in manner. In this section, we present the proofs associated with the theoretical assertions from Section 2. To Lemma 1. Assume the GP employs a stationary kernel Lemma 2. Given Lemma 1, determining Proposition 2. Leveraging Lemma 2, suppose Lemma 3. As per Srinivas et al., the optimization process in BO can be conceptualized as a sampling Pr null |f ( x) µ(x) | ωσ ( x) null > δ, (24) where δ > 0 signifies the confidence level adhered to by the UCB. This lemma is directly from Srinivas et al. . The proof can be found therein. Theorem 1. Leveraging Corollary 1, when employing the termination method proposed in this paper, As discussed in Remark 2 of Section 2.2 in the main manuscript, we suggest initializing L-BFGS Different subplots are (a) our proposed method, (b) Naïve method, (c) Nguyen's method, (d) Lorenz's Different subplots are (a) our proposed method, (b) Naïve method, (c) Nguyen's method, (d) Lorenz's Different subplots are (a) our proposed method, (b) Naïve method, (c) Nguyen's method, (d) Lorenz's Different subplots are (a) our proposed method, (b) Naïve method, (c) Nguyen's method, (d) Lorenz's
- North America > United States > California > Santa Clara County > Stanford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Fungi help turn old mattresses into insulation
Every day, 50,000 mattresses are tossed in the trash in the United States. A relative of penicillin could be the cure. Breakthroughs, discoveries, and DIY tips sent six days a week. Shopping for a new mattress can be stressful--this is something you plan to sleep on for years to come, after all. But your old one can be its own problem for the environment .
- North America > United States (0.26)
- Oceania > Australia (0.05)
- Europe > United Kingdom (0.05)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.37)
- Materials (0.31)
Can we battle the downsides of a rule-based world, asks a new book
Imposing order on the world is seductive, but it flattens out the diversity and rich messiness of human life. Oddly, playing by the rules may help us fight back, argues C. Thi Nguyen in The Score THIS time last year, I wrote an article for New Scientist about the perfect way to cook the classic pasta dish cacio e pepe, according to physicists. The meal's smooth, glossy emulsion of black pepper, pecorino cheese and water is hard to make lump-free. Ivan Di Terlizzi at the Max Planck Institute for the Physics of Complex Systems in Germany and his colleagues cooked cacio e pepe hundreds of times until they produced an exacting and foolproof method. The story proved popular with readers.
- Europe > Germany (0.25)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.05)
Mixture of Experts Meets Prompt-Based Continual Learning
Exploiting the power of pre-trained models, prompt-based approaches stand out compared to other continual learning solutions in effectively preventing catastrophic forgetting, even with very few learnable parameters and without the need for a memory buffer. While existing prompt-based continual learning methods excel in leveraging prompts for state-of-the-art performance, they often lack a theoretical explanation for the effectiveness of prompting. This paper conducts a theoretical analysis to unravel how prompts bestow such advantages in continual learning, thus offering a new perspective on prompt design. We first show that the attention block of pre-trained models like Vision Transformers inherently encodes a special mixture of experts architecture, characterized by linear experts and quadratic gating score functions. This realization drives us to provide a novel view on prefix tuning, reframing it as the addition of new task-specific experts, thereby inspiring the design of a novel gating mechanism termed Non-linear Residual Gates (NoRGa). Through the incorporation of non-linear activation and residual connection, NoRGa enhances continual learning performance while preserving parameter efficiency. The effectiveness of NoRGa is substantiated both theoretically and empirically across diverse benchmarks and pretraining paradigms.
Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals
Transformers have achieved remarkable success in a wide range of natural language processing and computer vision applications. However, the representation capacity of a deep transformer model is degraded due to the over-smoothing issue in which the token representations become identical when the model's depth grows. In this work, we show that self-attention layers in transformers minimize a functional which promotes smoothness, thereby causing token uniformity. We then propose a novel regularizer that penalizes the norm of the difference between the smooth output tokens from self-attention and the input tokens to preserve the fidelity of the tokens. Minimizing the resulting regularized energy functional, we derive the Neural Transformer with a Regularized Nonlocal Functional (NeuTRENO), a novel class of transformer models that can mitigate the over-smoothing issue. We empirically demonstrate the advantages of NeuTRENO over the baseline transformers and state-of-the-art methods in reducing the over-smoothing of token representations on various practical tasks, including object classification, image segmentation, and language modeling.
IBA: Towards Irreversible Backdoor Attacks in Federated Learning
Federated learning (FL) is a distributed learning approach that enables machine learning models to be trained on decentralized data without compromising end devices' personal, potentially sensitive data. However, the distributed nature and uninvestigated data intuitively introduce new security vulnerabilities, including backdoor attacks. In this scenario, an adversary implants backdoor functionality into the global model during training, which can be activated to cause the desired misbehaviors for any input with a specific adversarial pattern. Despite having remarkable success in triggering and distorting model behavior, prior backdoor attacks in FL often hold impractical assumptions, limited imperceptibility, and durability. Specifically, the adversary needs to control a sufficiently large fraction of clients or know the data distribution of other honest clients.
CYCLO: Cyclic Graph Transformer Approach to Multi-Object Relationship Modeling in Aerial Videos
Video scene graph generation (VidSGG) has emerged as a transformative approach to capturing and interpreting the intricate relationships among objects and their temporal dynamics in video sequences. In this paper, we introduce the new AeroEye dataset that focuses on multi-object relationship modeling in aerial videos. Our AeroEye dataset features various drone scenes and includes a visually comprehensive and precise collection of predicates that capture the intricate relationships and spatial arrangements among objects. To this end, we propose the novel Cyclic Graph Transformer (CYCLO) approach that allows the model to capture both direct and long-range temporal dependencies by continuously updating the history of interactions in a circular manner. The proposed approach also allows one to handle sequences with inherent cyclical patterns and process object relationships in the correct sequential order. Therefore, it can effectively capture periodic and overlapping relationships while minimizing information loss. The extensive experiments on the AeroEye dataset demonstrate the effectiveness of the proposed CYCLO model, demonstrating its potential to perform scene understanding on drone videos. Finally, the CYCLO method consistently achieves State-of-the-Art (SOTA) results on two in-the-wild scene graph generation benchmarks, i.e., PVSG and ASPIRe.
Designing Robust Transformers using Robust Kernel Density Estimation
Transformer-based architectures have recently exhibited remarkable successes across different domains beyond just powering large language models. However, existing approaches typically focus on predictive accuracy and computational cost, largely ignoring certain other practical issues such as robustness to contaminated samples. In this paper, by re-interpreting the self-attention mechanism as a non-parametric kernel density estimator, we adapt classical robust kernel density estimation methods to develop novel classes of transformers that are resistant to adversarial attacks and data contamination. We first propose methods that down-weight outliers in RKHS when computing the self-attention operations. We empirically show that these methods produce improved performance over existing state-of-the-art methods, particularly on image data under adversarial attacks. Then we leverage the median-of-means principle to obtain another efficient approach that results in noticeably enhanced performance and robustness on language modeling and time series classification tasks. Our methods can be combined with existing transformers to augment their robust properties, thus promising to impact a wide variety of applications.
OpenAssistant Conversations - Democratizing Large Language Model Alignment
Aligning large language models (LLMs) with human preferences has proven to drastically improve usability and has driven rapid adoption as demonstrated by ChatGPT.Alignment techniques such as supervised fine-tuning (\textit{SFT}) and reinforcement learning from human feedback (\textit{RLHF}) greatly reduce the required skill and domain knowledge to effectively harness the capabilities of LLMs, increasing their accessibility and utility across various domains.However, state-of-the-art alignment techniques like \textit{RLHF} rely on high-quality human feedback data, which is expensive to create and often remains proprietary.In an effort to democratize research on large-scale alignment, we release OpenAssistant Conversations, a human-generated, human-annotated assistant-style conversation corpus consisting of 161,443 messages in 35 different languages, annotated with 461,292 quality ratings, resulting in over 10,000 complete and fully annotated conversation trees.The corpus is a product of a worldwide crowd-sourcing effort involving over 13,500 volunteers.Models trained on OpenAssistant Conversations show consistent improvements on standard benchmarks over respective base models.We release our code\footnote{\git} and data\footnote{\data} under a fully permissive licence.