mend
- Europe > United Kingdom > Scotland (0.05)
- Europe > Albania > Tirana County > Tirana (0.04)
- Europe > Germany > Hesse > Darmstadt Region > Frankfurt (0.04)
- (17 more...)
- Research Report > New Finding (0.68)
- Personal (0.46)
- Leisure & Entertainment > Sports (1.00)
- Leisure & Entertainment > Games > Computer Games (0.46)
- Europe > United Kingdom > Scotland (0.05)
- Europe > Albania > Tirana County > Tirana (0.04)
- Europe > Germany > Hesse > Darmstadt Region > Frankfurt (0.04)
- (17 more...)
- Research Report > New Finding (0.68)
- Personal (0.46)
- Leisure & Entertainment > Sports (1.00)
- Leisure & Entertainment > Games > Computer Games (0.46)
MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency
Zhang, Junzhe, Zhang, Huixuan, Yin, Xunjian, Huang, Baizhou, Zhang, Xu, Hu, Xinyu, Wan, Xiaojun
Multimodal large language models (MLLMs) are prone to non-factual or outdated knowledge issues, which can manifest as misreading and misrecognition errors due to the complexity of multimodal knowledge. Previous benchmarks have not systematically analyzed the performance of editing methods in correcting these two error types. To better represent and correct these errors, we decompose multimodal knowledge into its visual and textual components. Different error types correspond to different editing formats, which edits distinct part of the multimodal knowledge. We present MC-MKE, a fine-grained Multimodal Knowledge Editing benchmark emphasizing Modality Consistency. Our benchmark facilitates independent correction of misreading and misrecognition errors by editing the corresponding knowledge component. We evaluate three multimodal knowledge editing methods on MC-MKE, revealing their limitations, particularly in terms of modality consistency. Our work highlights the challenges posed by multimodal knowledge editing and motivates further research in developing effective techniques for this task.
- Europe > United Kingdom (0.05)
- South America > Argentina (0.04)
- North America > United States (0.04)
- (2 more...)
MEND: Meta dEmonstratioN Distillation for Efficient and Effective In-Context Learning
Li, Yichuan, Ma, Xiyao, Lu, Sixing, Lee, Kyumin, Liu, Xiaohu, Guo, Chenlei
Large Language models (LLMs) have demonstrated impressive in-context learning (ICL) capabilities, where a LLM makes predictions for a given test input together with a few input-output pairs (demonstrations). Nevertheless, the inclusion of demonstrations leads to a quadratic increase in the computational overhead of the self-attention mechanism. Existing solutions attempt to distill lengthy demonstrations into compact vectors. However, they often require task-specific retraining or compromise LLM's in-context learning performance. To mitigate these challenges, we present Meta dEmonstratioN Distillation (MEND), where a language model learns to distill any lengthy demonstrations into vectors without retraining for a new downstream task. We exploit the knowledge distillation to enhance alignment between MEND and LLM, achieving both efficiency and effectiveness simultaneously. MEND is endowed with the meta-knowledge of distilling demonstrations through a two-stage training process, which includes meta-distillation pretraining and fine-tuning. Comprehensive evaluations across seven diverse ICL task partitions using decoder-only (GPT-2) and encoder-decoder (T5) attest to MEND's prowess. It not only matches but often outperforms the Vanilla ICL as well as other state-of-the-art distillation models, while significantly reducing the computational demands. This innovation promises enhanced scalability and efficiency for the practical deployment of large language models
Cross-lingual Editing in Multilingual Language Models
Beniwal, Himanshu, D, Kowsik Nandagopan, Singh, Mayank
The training of large language models (LLMs) necessitates substantial data and computational resources, and updating outdated LLMs entails significant efforts and resources. While numerous model editing techniques (METs) have emerged to efficiently update model outputs without retraining, their effectiveness in multilingual LLMs, where knowledge is stored in diverse languages, remains an underexplored research area. This research paper introduces the cross-lingual model editing (\textbf{XME}) paradigm, wherein a fact is edited in one language, and the subsequent update propagation is observed across other languages. To investigate the XME paradigm, we conducted experiments using BLOOM, mBERT, and XLM-RoBERTa using the two writing scripts: \textit{Latin} (English, French, and Spanish) and \textit{Indic} (Hindi, Gujarati, and Bengali). The results reveal notable performance limitations of state-of-the-art METs under the XME setting, mainly when the languages involved belong to two distinct script families. These findings highlight the need for further research and development of XME techniques to address these challenges. For more comprehensive information, the dataset used in this research and the associated code are publicly available at the following URL\url{https://github.com/lingo-iitgn/XME}.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Asia > India > Gujarat > Gandhinagar (0.04)
- (8 more...)
Massive Editing for Large Language Models via Meta Learning
Tan, Chenmien, Zhang, Ge, Fu, Jie
While large language models (LLMs) have enabled learning knowledge from the pre-training corpora, the acquired knowledge may be fundamentally incorrect or outdated over time, which necessitates rectifying the knowledge of the language model (LM) after the training. A promising approach involves employing a hyper-network to generate parameter shift, whereas existing hyper-networks suffer from inferior scalability in synchronous editing operation amount. To mitigate the problem, we propose the MAssive Language Model Editing Network (MALMEN), which formulates the parameter shift aggregation as the least square problem, subsequently updating the LM parameters using the normal equation. To accommodate editing multiple facts simultaneously with limited memory budgets, we separate the computation on the hyper-network and LM, enabling arbitrary batch size on both neural networks. Our method is evaluated by editing up to thousands of facts on LMs with different architectures, i.e., BERT-base, GPT-2, T5-XL (2.8B), and GPT-J (6B), across various knowledge-intensive NLP tasks, i.e., closed book fact-checking and question answering. Remarkably, MALMEN is capable of editing hundreds of times more facts than strong baselines with the identical hyper-network architecture and outperforms editor specifically designed for GPT. Our code is available at https://github.com/ChenmienTan/malmen.
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Europe > France (0.04)
- Asia > China > Beijing > Beijing (0.04)
- (3 more...)
TempAMLSI : Temporal Action Model Learning based on Grammar Induction
Grand, Maxence, Pellier, Damien, Fiorino, Humbert
Hand-encoding PDDL domains is generally accepted as difficult, tedious and error-prone. The difficulty is even greater when temporal domains have to be encoded. Indeed, actions have a duration and their effects are not instantaneous. In this paper, we present TempAMLSI, an algorithm based on the AMLSI approach able to learn temporal domains. TempAMLSI is based on the classical assumption done in temporal planning that it is possible to convert a non-temporal domain into a temporal domain. TempAMLSI is the first approach able to learn temporal domain with single hard envelope and Cushing's intervals. We show experimentally that TempAMLSI is able to learn accurate temporal domains, i.e., temporal domain that can be used directly to solve new planning problem, with different forms of action concurrency.
- North America > United States > Oklahoma > Payne County > Cushing (0.26)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- Africa > Senegal > Louga Region > Louga (0.04)
- (5 more...)
- Research Report (0.50)
- Workflow (0.47)
Fast Model Editing at Scale
Mitchell, Eric, Lin, Charles, Bosselut, Antoine, Finn, Chelsea, Manning, Christopher D.
While large pre-trained models have enabled impressive results on a variety of downstream tasks, the largest existing models still make errors, and even accurate predictions may become outdated over time. Because detecting all such failures at training time is impossible, enabling both developers and end users of such models to correct inaccurate outputs while leaving the model otherwise intact is desirable. However, the distributed, black-box nature of the representations learned by large neural networks makes producing such targeted edits difficult. If presented with only a single problematic input and new desired output, fine-tuning approaches tend to overfit; other editing algorithms are either computationally infeasible or simply ineffective when applied to very large models. To enable easy post-hoc editing at scale, we propose Model Editor Networks with Gradient Decomposition (MEND), a collection of small auxiliary editing networks that use a single desired input-output pair to make fast, local edits to a pre-trained model. MEND learns to transform the gradient obtained by standard fine-tuning, using a low-rank decomposition of the gradient to make the parameterization of this transformation tractable. MEND can be trained on a single GPU in less than a day even for 10 billion parameter models; once trained MEND enables rapid application of new edits to the pre-trained model. Our experiments with T5, GPT, BERT, and BART models show that MEND is the only approach to model editing that produces effective edits for models with tens of millions to over 10 billion parameters. Increasingly large neural networks have become a fundamental tool in solving data-driven problems in computer vision (Huang et al., 2017) and natural language processing (Vaswani et al., 2017) in particular. However, a key challenge in deploying and maintaining such models is issuing patches to adjust model behavior after deployment (Sinitsin et al., 2020).
- Europe > United Kingdom (0.69)
- Asia > India (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- (14 more...)
Marketing intelligence could mend a broken business
Data has become the most valuable currency in business. But without the right tools or intelligence, its true value will not be realised. According to a MiQ survey, 43 per cent of US and UK brand marketers think that the lack of measurement of business impact, such as sales or growth, is the main hurdle to investing more in data analytics. But if marketing metrics are not the same as business goals, why are campaigns measured against them? Marketing should align with the same goals as the rest of the company, in order to measure tangible business results.
The Fashion House Of Artificial Intelligence
We've been relying on computers, their analytics, and their algorithms to give us deeper, broader, and faster knowledge of what people want for a long time now. Surveys, browser cookies, user data, and sales trends all tell us an incredible amount of detail. Why not apply this same logic to fashion? Fashion subscription service Stitch Fix decided to try it last year, and the human-measured results are in: computers are really good designers. Stitch Fix's computers identified shirt cuts, patterns, and sleeve styles popular among the company's subscribers, and mashed them together with some human help to create three brand new shirts.