AITopics | Sherborne, Tom

Collaborating Authors

Sherborne, Tom

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs

Khalifa, Muhammad, Tan, Yi-Chern, Ahmadian, Arash, Hosking, Tom, Lee, Honglak, Wang, Lu, Üstün, Ahmet, Sherborne, Tom, Gallé, Matthias

arXiv.org Artificial IntelligenceDec-11-2024

Model merging has shown great promise at combining expert models, but the benefit of merging is unclear when merging ``generalist'' models trained on many tasks. We explore merging in the context of large (~100B) models, by recycling checkpoints that exhibit tradeoffs among different tasks. Such checkpoints are often created in the process of developing a frontier model, and many suboptimal ones are usually discarded. Given a pool of model checkpoints obtained from different training runs (e.g., different stages, objectives, hyperparameters, and data mixtures), which naturally show tradeoffs across different language capabilities (e.g., instruction following vs. code generation), we investigate whether merging can recycle such suboptimal models into a Pareto-optimal one. Our optimization algorithm tunes the weight of each checkpoint in a linear combination, resulting in a Pareto-optimal models that outperforms both individual models and merge-based baselines. Further analysis shows that good merges tend to include almost all checkpoints with non-zero weights, indicating that even seemingly bad initial checkpoints can contribute to good final merges.

evolutionary algorithm, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.04144

Country: Europe > Austria (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Scalable Data Ablation Approximations for Language Models through Modular Training and Merging

Na, Clara, Magnusson, Ian, Jha, Ananya Harsh, Sherborne, Tom, Strubell, Emma, Dodge, Jesse, Dasigi, Pradeep

arXiv.org Artificial IntelligenceOct-21-2024

Training data compositions for Large Language Models (LLMs) can significantly affect their downstream performance. However, a thorough data ablation study exploring large sets of candidate data mixtures is typically prohibitively expensive since the full effect is seen only after training the models; this can lead practitioners to settle for sub-optimal data mixtures. We propose an efficient method for approximating data ablations which trains individual models on subsets of a training corpus and reuses them across evaluations of combinations of subsets. In continued pre-training experiments, we find that, given an arbitrary evaluation set, the perplexity score of a single model trained on a candidate set of data is strongly correlated with perplexity scores of parameter averages of models trained on distinct partitions of that data. From this finding, we posit that researchers and practitioners can conduct inexpensive simulations of data ablations by maintaining a pool of models that were each trained on partitions of a large training corpus, and assessing candidate data mixtures by evaluating parameter averages of combinations of these models. This approach allows for substantial improvements in amortized training efficiency -- scaling only linearly with respect to new data -- by enabling reuse of previous training computation, opening new avenues for improving model performance through rigorous, incremental data assessment and mixing.

data mixture, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2024.emnlp-main.1176

2410.15661

Country: Asia (0.68)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)

Add feedback

On Leakage of Code Generation Evaluation Datasets

Matton, Alexandre, Sherborne, Tom, Aumiller, Dennis, Tommasone, Elena, Alizadeh, Milad, He, Jingyi, Ma, Raymond, Voisin, Maxime, Gilsenan-McMahon, Ellen, Gallé, Matthias

arXiv.org Artificial IntelligenceJul-11-2024

A second possibility is that contamination happens indirectly through the use Code generation has emerged as an important skill of synthetic data--a widespread paradigm used in for large language models to master. Measuring recent particular to increase code capabilities by generating progress in code generation has relied on few, additional code training tokens. Finally, we critical benchmarks to judge performance between argue that final model selection might have been model families and checkpoints. While many recent overly influenced by their performance on these sophisticated evaluation datasets have been datasets, overfitting to performance on these metrics proposed (Jain et al., 2024; Jimenez et al., 2024), over general-purpose code-oriented skills.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2407.07565

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

How To Train Your (Compressed) Large Language Model

Jha, Ananya Harsh, Sherborne, Tom, Walsh, Evan Pete, Groeneveld, Dirk, Strubell, Emma, Beltagy, Iz

arXiv.org Artificial IntelligenceNov-18-2023

With the increase in the size of large language models (LLMs), we need compression methods that can reduce the model size while preserving the generality and zero-shot promptability of the model. This goal is more ambitious than the typical compression setup, which reduces the model's size at the expense of specializing it to a specific end-task. To study this, we develop a task-agnostic compression pipeline with a large-scale evaluation comprising language modeling perplexity and 12 zero-shot end-tasks. Our results show that a simple layer-wise pruning followed by continued language model pretraining matches or outperforms three existing state-of-the-art baselines while being 1.5x more computationally efficient. However, unlike typical task-specialized compression, our best-compressed model significantly underperforms a similar-sized model trained from scratch. We posit the half-sized pretrained model as an upper bound for task-agnostic compression and call for future work to bridge this gap under a reasonable token budget. Our findings highlight the inadequacy of existing compression methods for LLMs and establish a requirement for new methods that preserve a model's generality and zero-shot promptability under compression. We release our code and evaluation setup to facilitate reproducibility and help iterate on method design.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.14864

Country: North America (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

TRAM: Bridging Trust Regions and Sharpness Aware Minimization

Sherborne, Tom, Saphra, Naomi, Dasigi, Pradeep, Peng, Hao

arXiv.org Artificial IntelligenceOct-5-2023

By reducing the curvature of the loss surface in the parameter space, Sharpness-aware minimization (SAM) yields widespread robustness improvement under domain transfer. Instead of focusing on parameters, however, this work considers the transferability of representations as the optimization target for out-of-domain generalization in a fine-tuning setup. To encourage the retention of transferable representations, we consider trust region-based fine-tuning methods, which exploit task-specific skills without forgetting task-agnostic representations from pre-training. We unify parameter- and representation-space smoothing approaches by using trust region bounds to inform SAM-style regularizers on both of these optimization surfaces. We propose Trust Region Aware Minimization (TRAM), a fine-tuning algorithm that optimizes for flat minima and smooth, informative representations without forgetting pre-trained structure. We find that TRAM outperforms both sharpness-aware and trust region-based optimization methods on cross-domain language modeling and cross-lingual transfer, where robustness to domain transfer and representation generality are critical for success. TRAM establishes a new standard in training generalizable models with minimal additional computation.

artificial intelligence, bridging trust region, region and sharpness aware minimization, (1 more...)

arXiv.org Artificial Intelligence

2310.03646

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (0.87)

Add feedback

Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation

Peng, Hao, Cao, Qingqing, Dodge, Jesse, Peters, Matthew E., Fernandez, Jared, Sherborne, Tom, Lo, Kyle, Skjonsberg, Sam, Strubell, Emma, Plessas, Darrell, Beltagy, Iz, Walsh, Evan Pete, Smith, Noah A., Hajishirzi, Hannaneh

arXiv.org Artificial IntelligenceJul-18-2023

Rising computational demands of modern natural language processing (NLP) systems have increased the barrier to entry for cutting-edge research while posing serious environmental concerns. Yet, progress on model efficiency has been impeded by practical challenges in model evaluation and comparison. For example, hardware is challenging to control due to disparate levels of accessibility across different institutions. Moreover, improvements in metrics such as FLOPs often fail to translate to progress in real-world applications. In response, we introduce Pentathlon, a benchmark for holistic and realistic evaluation of model efficiency. Pentathlon focuses on inference, which accounts for a majority of the compute in a model's lifecycle. It offers a strictly-controlled hardware platform, and is designed to mirror real-world applications scenarios. It incorporates a suite of metrics that target different aspects of efficiency, including latency, throughput, memory overhead, and energy consumption. Pentathlon also comes with a software library that can be seamlessly integrated into any codebase and enable evaluation. As a standardized and centralized evaluation platform, Pentathlon can drastically reduce the workload to make fair and reproducible efficiency comparisons. While initially focused on natural language processing (NLP) models, Pentathlon is designed to allow flexible extension to other fields. We envision Pentathlon will stimulate algorithmic innovations in building efficient models, and foster an increased awareness of the social and environmental implications in the development of future-generation NLP models.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2307.09701

Country:

Europe (1.00)
Asia > Middle East > UAE (0.14)
North America > United States > Maryland (0.14)
(2 more...)

Genre:

Research Report (1.00)
Overview (0.66)

Industry: Energy (0.90)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing

Sherborne, Tom, Hosking, Tom, Lapata, Mirella

arXiv.org Artificial IntelligenceJul-9-2023

Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data. Previous work has primarily considered silver-standard data augmentation or zero-shot methods, however, exploiting few-shot gold data is comparatively unexplored. We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between probabilistic latent variables using Optimal Transport. We demonstrate how this direct guidance improves parsing from natural languages using fewer examples and less training. We evaluate our method on two datasets, MTOP and MultiATIS++SQL, establishing state-of-the-art results under a few-shot cross-lingual regime. Ablation studies further reveal that our method improves performance even without parallel input translations. In addition, we show that our model better captures cross-lingual structure in the latent space to improve semantic representation similarity.

artificial intelligence, computational linguistic, natural language, (19 more...)

arXiv.org Artificial Intelligence

2307.04096

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Extrinsic Evaluation of Machine Translation Metrics

Moghe, Nikita, Sherborne, Tom, Steedman, Mark, Birch, Alexandra

arXiv.org Artificial IntelligenceJun-18-2023

Automatic machine translation (MT) metrics are widely used to distinguish the translation qualities of machine translation systems across relatively large test sets (system-level evaluation). However, it is unclear if automatic metrics are reliable at distinguishing good translations from bad translations at the sentence level (segment-level evaluation). In this paper, we investigate how useful MT metrics are at detecting the success of a machine translation component when placed in a larger platform with a downstream task. We evaluate the segment-level performance of the most widely used MT metrics (chrF, COMET, BERTScore, etc.) on three downstream cross-lingual tasks (dialogue state tracking, question answering, and semantic parsing). For each task, we only have access to a monolingual task-specific model. We calculate the correlation between the metric's ability to predict a good/bad translation with the success/failure on the final task for the Translate-Test setup. Our experiments demonstrate that all metrics exhibit negligible correlation with the extrinsic evaluation of the downstream outcomes. We also find that the scores provided by neural metrics are not interpretable mostly because of undefined ranges. We synthesise our analysis into recommendations for future MT metrics to produce labels rather than scores for more informative interaction between machine translation and multilingual language understanding.

artificial intelligence, machine translation, natural language, (14 more...)

arXiv.org Artificial Intelligence

2212.10297

Country:

North America > United States > Maryland (0.28)
Europe > Portugal > Lisbon > Lisbon (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback