mtv
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
The recent success of interleaved Large Multimodal Models (LMMs) in few-shot learning suggests that in-context learning (ICL) with many examples can be promising for learning new tasks. However, this many-shot multimodal ICL setting has one crucial problem: it is fundamentally limited by the model's context length set at pretraining. The problem is especially prominent in the multimodal domain, which processes both text and images, requiring additional tokens.
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Instability in Downstream Task Performance During LLM Pretraining
Nishida, Yuto, Isonuma, Masaru, Oda, Yusuke
When training large language models (LLMs), it is common practice to track downstream task performance throughout the training process and select the checkpoint with the highest validation score. However, downstream metrics often exhibit substantial fluctuations, making it difficult to identify the checkpoint that truly represents the best-performing model. In this study, we empirically analyze the stability of downstream task performance in an LLM trained on diverse web-scale corpora. We find that task scores frequently fluctuate throughout training, both at the aggregate and example levels. To address this instability, we investigate two post-hoc checkpoint integration methods: checkpoint averaging and ensemble, motivated by the hypothesis that aggregating neighboring checkpoints can reduce performance volatility. We demonstrate both empirically and theoretically that these methods improve downstream performance stability without requiring any changes to the training procedure.
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (5 more...)
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
The recent success of interleaved Large Multimodal Models (LMMs) in few-shot learning suggests that in-context learning (ICL) with many examples can be promising for learning new tasks. However, this many-shot multimodal ICL setting has one crucial problem: it is fundamentally limited by the model's context length set at pretraining. The problem is especially prominent in the multimodal domain, which processes both text and images, requiring additional tokens. In this work, we enable LMMs to perform multimodal, many-shot in-context learning by leveraging Multimodal Task Vectors (MTV)---compact implicit representations of in-context examples compressed in the model's attention heads. Specifically, we first demonstrate the existence of such MTV in LMMs and then leverage these extracted MTV to enable many-shot in-context learning for various vision-and-language tasks. Our experiments suggest that MTV can scale in performance with the number of compressed shots and generalize to similar out-of-domain tasks without additional context length for inference.
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
Huang, Brandon, Mitra, Chancharik, Arbelle, Assaf, Karlinsky, Leonid, Darrell, Trevor, Herzig, Roei
The recent success of interleaved Large Multimodal Models (LMMs) in few-shot learning suggests that in-context learning (ICL) with many examples can be promising for learning new tasks. However, this many-shot multimodal ICL setting has one crucial problem: it is fundamentally limited by the model's context length set at pretraining. The problem is especially prominent in the multimodal domain, which processes both text and images, requiring additional tokens. This motivates the need for a multimodal method to compress many shots into fewer tokens without finetuning. In this work, we enable LMMs to perform multimodal, many-shot in-context learning by leveraging Multimodal Task Vectors (MTV)--compact implicit representations of in-context examples compressed in the model's attention heads. Specifically, we first demonstrate the existence of such MTV in LMMs and then leverage these extracted MTV to enable many-shot in-context learning for various vision-and-language tasks. Our experiments suggest that MTV can scale in performance with the number of compressed shots and generalize to similar out-of-domain tasks without additional context length for inference.
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (2 more...)
Optimal Initialization of Batch Bayesian Optimization
Field experiments and computer simulations are effective but time-consuming methods of measuring the quality of engineered systems at different settings. To reduce the total time required, experimenters may employ Bayesian optimization, which is parsimonious with measurements, and take measurements of multiple settings simultaneously, in a batch. In practice, experimenters use very few batches, thus, it is imperative that each batch be as informative as possible. Typically, the initial batch in a Batch Bayesian Optimization (BBO) is constructed from a quasi-random sample of settings values. We propose a batch-design acquisition function, Minimal Terminal Variance (MTV), that designs a batch by optimization rather than random sampling. MTV adapts a design criterion function from Design of Experiments, called I-Optimality, which minimizes the variance of the post-evaluation estimates of quality, integrated over the entire space of settings. MTV weights the integral by the probability that a setting is optimal, making it able to design not only an initial batch but all subsequent batches, as well. Applicability to both initialization and subsequent batches is novel among acquisition functions. Numerical experiments on test functions and simulators show that MTV compares favorably to other BBO methods.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
'Bigger than MTV': how video games are helping the music industry thrive
"Video games have not only helped the music industry survive, but thrive on entirely new levels," Steve Schnur tells me. As the worldwide executive and president of music at game publisher EA, his team – many of whom have been professional musicians and singer/songwriters – work with some of the biggest music acts in the world, licensing music for video game series like Fifa, Madden NFL, Need for Speed and NHL. Since the 90s, when licensed music became prevalent in games, series such as Tony Hawk's Pro Skater, Grand Theft Auto and Wipeout have become just as well-known for their soundtracks as they are for their gameplay. For millions of people, video games have been a way to discover new favourite bands or dive into other musical genres. And because people discover this music while playing a game they love, they develop a strong emotional attachment to it.
- Media > Music (1.00)
- Leisure & Entertainment > Games > Computer Games (1.00)
Defying Gravity Broadcasting & Cable
Network television is at risk of getting caught in a vicious cycle. As the audience fragments in a million different directions, smaller subsets of that audience see promos for new shows. Then, as new shows draw smaller crowds, even fewer viewers see promos for other programs. The reach of television networks (the total number of viewers who watch for a minute or more once a day) is down a daunting 12 percent in one year. Yet a six percent larger audience has seen the promos for MTV's Viacom networks--even though they're using fewer spots.
- Media > Television (1.00)
- Leisure & Entertainment (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (0.76)
- Information Technology > Data Science > Data Mining > Big Data (0.58)