Not enough data to create a plot.
Try a different view from the menu above.
AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment
Motivated by the transformative capabilities of large language models (LLMs) across various natural language tasks, there has been a growing demand to deploy these models effectively across diverse real-world applications and platforms. However, the challenge of efficiently deploying LLMs has become increasingly pronounced due to the varying application-specific performance requirements and the rapid evolution of computational platforms, which feature diverse resource constraints and deployment flows. These varying requirements necessitate LLMs that can adapt their structures (depth and width) for optimal efficiency across different platforms and application specifications. To address this critical gap, we propose AmoebaLLM, a novel framework designed to enable the instant derivation of LLM subnets of arbitrary shapes, which achieve the accuracyefficiency frontier and can be extracted immediately after a one-time fine-tuning. In this way, AmoebaLLM significantly facilitates rapid deployment tailored to various platforms and applications. Specifically, AmoebaLLM integrates three innovative components: (1) a knowledge-preserving subnet selection strategy that features a dynamic-programming approach for depth shrinking and an importancedriven method for width shrinking; (2) a shape-aware mixture of LoRAs to mitigate gradient conflicts among subnets during fine-tuning; and (3) an in-place distillation scheme with loss-magnitude balancing as the fine-tuning objective. Extensive experiments validate that AmoebaLLM not only sets new standards in LLM adaptability but also successfully delivers subnets that achieve stateof-the-art trade-offs between accuracy and efficiency.
Don't plug these 7 appliances (including AC units) into extension cords - here's why
Extension cords are generally a safe solution for running power to electronics that are too far from the nearest wall outlet. But the operative word here is "electronics," which is not as all-encompassing as some people might think. Appliances (like refrigerators and toaster ovens) are obviously electronic devices, but they're in a different class from most electronics because of the amperage demands they need to function. Extension cords are manufactured with a maximum capacity to handle electrical current, which is determined by the size or gauge of the wire used in the cord. For instance, a 16-gauge extension cord can handle a maximum of 13 amps, while a 14-gauge cord can handle up to 15 amps (or 1,800 watts), the same as a standard wall outlet in the US.
TrajCLIP: Pedestrian Trajectory Prediction Method Using Contrastive Learning and Idempotent Networks
The distribution of pedestrian trajectories is highly complex and influenced by the scene, nearby pedestrians, and subjective intentions. This complexity presents challenges for modeling and generalizing trajectory prediction. Previous methods modeled the feature space of future trajectories based on the high-dimensional feature space of historical trajectories, but this approach is suboptimal because it overlooks the similarity between historical and future trajectories. Our proposed method, TrajCLIP, utilizes contrastive learning and idempotent generative networks to address this issue. By pairing historical and future trajectories and applying contrastive learning on the encoded feature space, we enforce same-space consistency constraints. To manage complex distributions, we use idempotent loss and tightness loss to control over-expansion in the latent space. Additionally, we have developed a trajectory interpolation algorithm and synthetic trajectory data to enhance model capacity and improve generalization. Experimental results on public datasets demonstrate that TrajCLIP achieves state-of-the-art performance and excels in scene-to-scene transfer, few-shot transfer, and online learning tasks.
Robust Fine-tuning of Zero-shot Models via Variance Reduction
When fine-tuning zero-shot models like CLIP, our desideratum is for the fine-tuned model to excel in both in-distribution (ID) and out-of-distribution (OOD). Recently, ensemble-based models (ESM) have been shown to offer significant robustness improvement, while preserving high ID accuracy. However, our study finds that ESMs do not solve the ID-OOD trade-offs: they achieve peak performance for ID and OOD accuracy at different mixing coefficients. When optimized for OOD accuracy, the ensemble model exhibits a noticeable decline in ID accuracy, and vice versa. In contrast, we propose a sample-wise ensembling technique that can simultaneously attain the best ID and OOD accuracy without the trade-offs.
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
Recent works (e.g., (Li and Arora, 2020)) suggest that the use of popular normalization schemes (including Batch Normalization) in today's deep learning can move it far from a traditional optimization viewpoint, e.g., use of exponentially increasing learning rates. The current paper highlights other ways in which behavior of normalized nets departs from traditional viewpoints, and then initiates a formal framework for studying their mathematics via suitable adaptation of the conventional framework namely, modeling SGD-induced training trajectory via a suitable stochastic differential equation (SDE) with a noise term that captures gradient noise. This yields: (a) A new "intrinsic learning rate" parameter that is the product of the normal learning rate η and weight decay factor λ. Analysis of the SDE shows how the effective speed of learning varies and equilibrates over time under the control of intrinsic LR.
a7453a5f026fb6831d68bdc9cb0edcae-AuthorFeedback.pdf
We thank reviewers for their thorough reading. We will fix the typos and clarify the unclear points in the next version of our paper. Batch size has been an important component of past analyses. When the nets are without BN, e.g. with LN or GN, the magnitude However, this analysis doesn't hold for the general case where BN is allowed and thus we treat batch size as a fixed hyper-parameter The fast equilibrium conjecture only partially explains the benefits of BN. Besides this conjecture, there are many other benefits, e.g., BN affects the If we make the second phase longer, one should expect the ratio becomes closer to 10. 2. Figure 10 gives a more clear and However, this is not observed in any of our settings, so it's not clear to us whether the heavy tail assumption holds for our setting.
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Diffusion models have demonstrated great success in the field of text-to-image generation. However, alleviating the misalignment between the text prompts and images is still challenging. We break down the problem into two causes: concept ignorance and concept mismapping. To tackle the two challenges, we propose CoMat, an end-to-end diffusion model fine-tuning strategy with the imageto-text concept matching mechanism. Firstly, we introduce a novel image-totext concept activation module to guide the diffusion model in revisiting ignored concepts. Additionally, an attribute concentration module is proposed to map the text conditions of each entity to its corresponding image area correctly. Extensive experimental evaluations, conducted across three distinct text-to-image alignment benchmarks, demonstrate the superior efficacy of our proposed method, CoMat-SDXL, over the baseline model, SDXL [49]. We also show that our method enhances general condition utilization capability and generalizes to the long and complex prompt despite not specifically training on it. The code is available at https://github.com/CaraJ7/CoMat.