Goto

Collaborating Authors

 ila


Understanding Layer Significance in LLM Alignment

Shi, Guangyuan, Lu, Zexin, Dong, Xiaoyu, Zhang, Wenlong, Zhang, Xuanyu, Feng, Yujie, Wu, Xiao-Ming

arXiv.org Artificial Intelligence

Aligning large language models (LLMs) through fine-tuning is essential for tailoring them to specific applications. Therefore, understanding what LLMs learn during the alignment process is crucial. Recent studies suggest that alignment primarily adjusts a model's presentation style rather than its foundational knowledge, indicating that only certain components of the model are significantly impacted. To delve deeper into LLM alignment, we propose to identify which layers within LLMs are most critical to the alignment process, thereby uncovering how alignment influences model behavior at a granular level. We propose a novel approach to identify the important layers for LLM alignment (ILA). It involves learning a binary mask for each incremental weight matrix in the LoRA algorithm, indicating the significance of each layer. ILA consistently identifies important layers across various alignment datasets, with nearly 90% overlap even with substantial dataset differences, highlighting fundamental patterns in LLM alignment. Experimental results indicate that freezing non-essential layers improves overall model performance, while selectively tuning the most critical layers significantly enhances fine-tuning efficiency with minimal performance loss.


Allocation Requires Prediction Only if Inequality Is Low

Shirali, Ali, Abebe, Rediet, Hardt, Moritz

arXiv.org Artificial Intelligence

Algorithmic predictions are emerging as a promising solution concept for efficiently allocating societal resources. Fueling their use is an underlying assumption that such systems are necessary to identify individuals for interventions. We propose a principled framework for assessing this assumption: Using a simple mathematical model, we evaluate the efficacy of prediction-based allocations in settings where individuals belong to larger units such as hospitals, neighborhoods, or schools. We find that prediction-based allocations outperform baseline methods using aggregate unit-level statistics only when between-unit inequality is low and the intervention budget is high. Our results hold for a wide range of settings for the price of prediction, treatment effect heterogeneity, and unit-level statistics' learnability. Combined, we highlight the potential limits to improving the efficacy of interventions through prediction.


Implicit Temporal Modeling with Learnable Alignment for Video Recognition

Tu, Shuyuan, Dai, Qi, Wu, Zuxuan, Cheng, Zhi-Qi, Hu, Han, Jiang, Yu-Gang

arXiv.org Artificial Intelligence

Contrastive language-image pretraining (CLIP) has demonstrated remarkable success in various image tasks. However, how to extend CLIP with effective temporal modeling is still an open and crucial problem. Existing factorized or joint spatial-temporal modeling trades off between the efficiency and performance. While modeling temporal information within straight through tube is widely adopted in literature, we find that simple frame alignment already provides enough essence without temporal attention. To this end, in this paper, we proposed a novel Implicit Learnable Alignment (ILA) method, which minimizes the temporal modeling effort while achieving incredibly high performance. Specifically, for a frame pair, an interactive point is predicted in each frame, serving as a mutual information rich region. By enhancing the features around the interactive point, two frames are implicitly aligned. The aligned features are then pooled into a single token, which is leveraged in the subsequent spatial self-attention. Our method allows eliminating the costly or insufficient temporal self-attention in video. Extensive experiments on benchmarks demonstrate the superiority and generality of our module. Particularly, the proposed ILA achieves a top-1 accuracy of 88.7% on Kinetics-400 with much fewer FLOPs compared with Swin-L and ViViT-H. Code is released at https://github.com/Francis-Rings/ILA .


ILA Says Union Will Not Service Automated Ships Without Crews

#artificialintelligence

The International Longshoremen's Association warns against the use of crewless automated ships at its ports. In its ongoing efforts to resist all forms of automation in the maritime world, the powerful U.S. International Longshoremen's Association (ILA) has announced that its members would not service automated vessels operating without crews. Citing issues of safety and security, ILA, the largest union of maritime workers in North America, has long fought automation and even before that, resisted the move to containerization. Responding to various recent media reports about advancements in shipping automation and, specifically, efforts by Yara, NYK, and others developing automated container ships, ILA president Harold Daggett said, "Don't sail them into ILA ports from Maine to Texas, Puerto Rico, and Eastern Canada – they won't be unloaded or loaded by ILA members." The ILA staged fierce opposition to all forms of automation.


Intermediate Level Adversarial Attack for Enhanced Transferability

Huang, Qian, Gu, Zeqi, Katsman, Isay, He, Horace, Pawakapan, Pian, Lin, Zhiqiu, Belongie, Serge, Lim, Ser-Nam

arXiv.org Machine Learning

Neural networks are vulnerable to adversarial examples, malicious inputs crafted to fool trained models. Adversarial examples often exhibit black-box transfer, meaning that adversarial examples for one model can fool another model. However, adversarial examples may be overfit to exploit the particular architecture and feature representation of a source model, resulting in sub-optimal black-box transfer attacks to other target models. This leads us to introduce the Intermediate Level Attack (ILA), which attempts to fine-tune an existing adversarial example for greater black-box transferability by increasing its perturbation on a pre-specified layer of the source model. We show that our method can effectively achieve this goal and that we can decide a nearly-optimal layer of the source model to perturb without any knowledge of the target models.


Introducing, ILA for Educators!

#artificialintelligence

ILA, the Intelligent Learning Assistant, was designed with classrooms in mind and aims to enhance both learning and teaching! Specifically, it gives teachers more time to actually TEACH by doing roll call, collecting assignments, distributing materials, finding and managing resources, and, more importantly, keep track of pupil progress and participation. Harnessing IBM's cognitive computing system, Watson, ILA collects, analyzes, and converts information into natural language. Not only does this allow educators to access, evaluate, and integrate new teaching material, but it also provides them with analysis and diagnostics on the progress of each individual student's verbal skills and literary understanding. This makes it possible for educations to assess and tailor the learning behavior and path of each individual student.


Instance-Based Domain Adaptation in NLP via In-Target-Domain Logistic Approximation

Xia, Rui (Nanjing University of Science and Technology) | Yu, Jianfei (Nanjing University of Science and Technology) | Xu, Feng (Nanjing University of Science and Technology) | Wang, Shumei (Nanjing University of Science and Technology)

AAAI Conferences

In the field of NLP, most of the existing domain adaptation studies belong to the feature-based adaptation, while the research of instance-based adaptation is very scarce. In this work, we propose a new instance-based adaptation model, called in-target-domain logistic approximation (ILA). In ILA, we adapt the source-domain data to the target domain by a logistic approximation. The normalized in-target-domain probability is assigned as an instance weight to each of the source-domain training data. An instance-weighted classification model is trained finally for the cross-domain classification problem. Compared to the previous techniques, ILA conducts instance adaptation in a dimensionality-reduced linear feature space to ensure efficiency in high-dimensional NLP tasks. The instance weights in ILA are learnt by leveraging the criteria of both maximum likelihood and minimum statistical distance. The empirical results on two NLP tasks including text categorization and sentiment classification show that our ILA model beats the state-of-the-art instance adaptation methods significantly, in cross-domain classification accuracy, parameter stability and computational efficiency.


An Infinite Latent Attribute Model for Network Data

Palla, Konstantina, Knowles, David, Ghahramani, Zoubin

arXiv.org Machine Learning

Latent variable models for network data extract a summary of the relational structure underlying an observed network. The simplest possible models subdivide nodes of the network into clusters; the probability of a link between any two nodes then depends only on their cluster assignment. Currently available models can be classified by whether clusters are disjoint or are allowed to overlap. These models can explain a "flat" clustering structure. Hierarchical Bayesian models provide a natural approach to capture more complex dependencies. We propose a model in which objects are characterised by a latent feature vector. Each feature is itself partitioned into disjoint groups (subclusters), corresponding to a second layer of hierarchy. In experimental comparisons, the model achieves significantly improved predictive performance on social and biological link prediction tasks. The results indicate that models with a single layer hierarchy over-simplify real networks.