ila
Understanding Layer Significance in LLM Alignment
Shi, Guangyuan, Lu, Zexin, Dong, Xiaoyu, Zhang, Wenlong, Zhang, Xuanyu, Feng, Yujie, Wu, Xiao-Ming
Aligning large language models (LLMs) through fine-tuning is essential for tailoring them to specific applications. Therefore, understanding what LLMs learn during the alignment process is crucial. Recent studies suggest that alignment primarily adjusts a model's presentation style rather than its foundational knowledge, indicating that only certain components of the model are significantly impacted. To delve deeper into LLM alignment, we propose to identify which layers within LLMs are most critical to the alignment process, thereby uncovering how alignment influences model behavior at a granular level. We propose a novel approach to identify the important layers for LLM alignment (ILA). It involves learning a binary mask for each incremental weight matrix in the LoRA algorithm, indicating the significance of each layer. ILA consistently identifies important layers across various alignment datasets, with nearly 90% overlap even with substantial dataset differences, highlighting fundamental patterns in LLM alignment. Experimental results indicate that freezing non-essential layers improves overall model performance, while selectively tuning the most critical layers significantly enhances fine-tuning efficiency with minimal performance loss.
- Asia > China > Hong Kong (0.14)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Allocation Requires Prediction Only if Inequality Is Low
Shirali, Ali, Abebe, Rediet, Hardt, Moritz
Algorithmic predictions are emerging as a promising solution concept for efficiently allocating societal resources. Fueling their use is an underlying assumption that such systems are necessary to identify individuals for interventions. We propose a principled framework for assessing this assumption: Using a simple mathematical model, we evaluate the efficacy of prediction-based allocations in settings where individuals belong to larger units such as hospitals, neighborhoods, or schools. We find that prediction-based allocations outperform baseline methods using aggregate unit-level statistics only when between-unit inequality is low and the intervention budget is high. Our results hold for a wide range of settings for the price of prediction, treatment effect heterogeneity, and unit-level statistics' learnability. Combined, we highlight the potential limits to improving the efficacy of interventions through prediction.
- North America > United States > New York (0.04)
- North America > United States > Utah (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- (6 more...)
- Government (1.00)
- Health & Medicine > Health Care Providers & Services (0.48)
- Education > Educational Setting (0.45)
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
Tu, Shuyuan, Dai, Qi, Wu, Zuxuan, Cheng, Zhi-Qi, Hu, Han, Jiang, Yu-Gang
Contrastive language-image pretraining (CLIP) has demonstrated remarkable success in various image tasks. However, how to extend CLIP with effective temporal modeling is still an open and crucial problem. Existing factorized or joint spatial-temporal modeling trades off between the efficiency and performance. While modeling temporal information within straight through tube is widely adopted in literature, we find that simple frame alignment already provides enough essence without temporal attention. To this end, in this paper, we proposed a novel Implicit Learnable Alignment (ILA) method, which minimizes the temporal modeling effort while achieving incredibly high performance. Specifically, for a frame pair, an interactive point is predicted in each frame, serving as a mutual information rich region. By enhancing the features around the interactive point, two frames are implicitly aligned. The aligned features are then pooled into a single token, which is leveraged in the subsequent spatial self-attention. Our method allows eliminating the costly or insufficient temporal self-attention in video. Extensive experiments on benchmarks demonstrate the superiority and generality of our module. Particularly, the proposed ILA achieves a top-1 accuracy of 88.7% on Kinetics-400 with much fewer FLOPs compared with Swin-L and ViViT-H. Code is released at https://github.com/Francis-Rings/ILA .
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Vision > Video Understanding (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.34)
ILA Says Union Will Not Service Automated Ships Without Crews
The International Longshoremen's Association warns against the use of crewless automated ships at its ports. In its ongoing efforts to resist all forms of automation in the maritime world, the powerful U.S. International Longshoremen's Association (ILA) has announced that its members would not service automated vessels operating without crews. Citing issues of safety and security, ILA, the largest union of maritime workers in North America, has long fought automation and even before that, resisted the move to containerization. Responding to various recent media reports about advancements in shipping automation and, specifically, efforts by Yara, NYK, and others developing automated container ships, ILA president Harold Daggett said, "Don't sail them into ILA ports from Maine to Texas, Puerto Rico, and Eastern Canada – they won't be unloaded or loaded by ILA members." The ILA staged fierce opposition to all forms of automation.
- North America > Canada (0.62)
- North America > United States > Texas (0.28)
- North America > United States > Maine (0.28)
- (3 more...)
Intermediate Level Adversarial Attack for Enhanced Transferability
Huang, Qian, Gu, Zeqi, Katsman, Isay, He, Horace, Pawakapan, Pian, Lin, Zhiqiu, Belongie, Serge, Lim, Ser-Nam
Neural networks are vulnerable to adversarial examples, malicious inputs crafted to fool trained models. Adversarial examples often exhibit black-box transfer, meaning that adversarial examples for one model can fool another model. However, adversarial examples may be overfit to exploit the particular architecture and feature representation of a source model, resulting in sub-optimal black-box transfer attacks to other target models. This leads us to introduce the Intermediate Level Attack (ILA), which attempts to fine-tune an existing adversarial example for greater black-box transferability by increasing its perturbation on a pre-specified layer of the source model. We show that our method can effectively achieve this goal and that we can decide a nearly-optimal layer of the source model to perturb without any knowledge of the target models.
- Transportation (0.75)
- Information Technology > Security & Privacy (0.64)
- Government > Military (0.41)
Introducing, ILA for Educators!
ILA, the Intelligent Learning Assistant, was designed with classrooms in mind and aims to enhance both learning and teaching! Specifically, it gives teachers more time to actually TEACH by doing roll call, collecting assignments, distributing materials, finding and managing resources, and, more importantly, keep track of pupil progress and participation. Harnessing IBM's cognitive computing system, Watson, ILA collects, analyzes, and converts information into natural language. Not only does this allow educators to access, evaluate, and integrate new teaching material, but it also provides them with analysis and diagnostics on the progress of each individual student's verbal skills and literary understanding. This makes it possible for educations to assess and tailor the learning behavior and path of each individual student.
Instance-Based Domain Adaptation in NLP via In-Target-Domain Logistic Approximation
Xia, Rui (Nanjing University of Science and Technology) | Yu, Jianfei (Nanjing University of Science and Technology) | Xu, Feng (Nanjing University of Science and Technology) | Wang, Shumei (Nanjing University of Science and Technology)
In the field of NLP, most of the existing domain adaptation studies belong to the feature-based adaptation, while the research of instance-based adaptation is very scarce. In this work, we propose a new instance-based adaptation model, called in-target-domain logistic approximation (ILA). In ILA, we adapt the source-domain data to the target domain by a logistic approximation. The normalized in-target-domain probability is assigned as an instance weight to each of the source-domain training data. An instance-weighted classification model is trained finally for the cross-domain classification problem. Compared to the previous techniques, ILA conducts instance adaptation in a dimensionality-reduced linear feature space to ensure efficiency in high-dimensional NLP tasks. The instance weights in ILA are learnt by leveraging the criteria of both maximum likelihood and minimum statistical distance. The empirical results on two NLP tasks including text categorization and sentiment classification show that our ILA model beats the state-of-the-art instance adaptation methods significantly, in cross-domain classification accuracy, parameter stability and computational efficiency.
- Research Report > Experimental Study (0.69)
- Research Report > New Finding (0.47)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)
An Infinite Latent Attribute Model for Network Data
Palla, Konstantina, Knowles, David, Ghahramani, Zoubin
Latent variable models for network data extract a summary of the relational structure underlying an observed network. The simplest possible models subdivide nodes of the network into clusters; the probability of a link between any two nodes then depends only on their cluster assignment. Currently available models can be classified by whether clusters are disjoint or are allowed to overlap. These models can explain a "flat" clustering structure. Hierarchical Bayesian models provide a natural approach to capture more complex dependencies. We propose a model in which objects are characterised by a latent feature vector. Each feature is itself partitioned into disjoint groups (subclusters), corresponding to a second layer of hierarchy. In experimental comparisons, the model achieves significantly improved predictive performance on social and biological link prediction tasks. The results indicate that models with a single layer hierarchy over-simplify real networks.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California (0.04)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- Asia > Middle East > Jordan (0.04)
- Telecommunications > Networks (0.62)
- Information Technology > Networks (0.62)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)