sama
Towards Multi Turn Referential Grounded Video Chat with Large Language Models
Achieving fine-grained spatio-temporal understanding in videos remains a major challenge for current Video Large Multimodal Models (Video LMMs). Addressing this challenge requires mastering two core capabilities: video referring understanding, which captures the semantics of video regions, and video grounding, which segments object regions based on natural language descriptions. However, most existing approaches tackle these tasks in isolation, limiting progress toward unified, referentially grounded video interaction. We identify a key bottleneck in the lack of high-quality, unified video instruction data and a comprehensive benchmark for evaluating referentially grounded video chat. To address these challenges, we contribute in three core aspects: dataset, model, and benchmark. First, we introduce SAMA-239K, a large-scale dataset comprising 15K videos specifically curated to enable joint learning of video referring understanding, grounding, and multi-turn video chat. Second, we propose the SAMA model, which incorporates a versatile spatio-temporal context aggregator and a Segment Anything Model to jointly enhance fine-grained video comprehension and precise grounding capabilities. Finally, we establish SAMA-Bench, a meticulously designed benchmark consisting of 5,067 questions from 522 videos, to comprehensively evaluate the integrated capabilities of Video LMMs in multi-turn, spatio-temporal referring understanding and grounded dialogue. Extensive experiments and benchmarking results show that SAMA not only achieves strong performance on SAMA-Bench but also sets a new state-of-the-art on general grounding benchmarks, while maintaining highly competitive performance on standard visual understanding benchmarks.
Meta in row after sacking workers who say they saw smart glasses users having sex
Meta is under pressure to explain why it cancelled a major contract with a company it was using to train AI, shortly after some of its Kenya-based workers alleged they had to view graphic content captured by Meta smart glasses. In February, workers at the company, Sama, told two Swedish newspapers they had witnessed glasses users going to the toilet and having sex . Less than two months later, Meta ended its contract with Sama, which Sama said would result in 1,108 workers being made redundant. Meta says it's because Sama did not meet its standards, a criticism Sama rejects. A Kenyan workers' organisation alleges Meta's decision was caused by the staff speaking out.
Making Scalable Meta Learning Practical
Despite its flexibility to learn diverse inductive biases in machine learning programs, meta learning (i.e.,\ learning to learn) has long been recognized to suffer from poor scalability due to its tremendous compute/memory costs, training instability, and a lack of efficient distributed training support. In this work, we focus on making scalable meta learning practical by introducing SAMA, which combines advances in both implicit differentiation algorithms and systems. Specifically, SAMA is designed to flexibly support a broad range of adaptive optimizers in the base level of meta learning programs, while reducing computational burden by avoiding explicit computation of second-order gradient information, and exploiting efficient distributed training techniques implemented for first-order gradients.
Making Scalable Meta Learning Practical
Despite its flexibility to learn diverse inductive biases in machine learning programs, meta learning (i.e.,\ learning to learn) has long been recognized to suffer from poor scalability due to its tremendous compute/memory costs, training instability, and a lack of efficient distributed training support. In this work, we focus on making scalable meta learning practical by introducing SAMA, which combines advances in both implicit differentiation algorithms and systems. Specifically, SAMA is designed to flexibly support a broad range of adaptive optimizers in the base level of meta learning programs, while reducing computational burden by avoiding explicit computation of second-order gradient information, and exploiting efficient distributed training techniques implemented for first-order gradients. Furthermore, we show that SAMA-based data optimization leads to consistent improvements in text classification accuracy with BERT and RoBERTa large language models, and achieves state-of-the-art results in both small- and large-scale data pruning on image classification tasks, demonstrating the practical applicability of scalable meta learning across language and vision domains.
Making Scalable Meta Learning Practical
Despite its flexibility to learn diverse inductive biases in machine learning programs, meta learning (i.e.,\ learning to learn) has long been recognized to suffer from poor scalability due to its tremendous compute/memory costs, training instability, and a lack of efficient distributed training support. In this work, we focus on making scalable meta learning practical by introducing SAMA, which combines advances in both implicit differentiation algorithms and systems. Specifically, SAMA is designed to flexibly support a broad range of adaptive optimizers in the base level of meta learning programs, while reducing computational burden by avoiding explicit computation of second-order gradient information, and exploiting efficient distributed training techniques implemented for first-order gradients. Furthermore, we show that SAMA-based data optimization leads to consistent improvements in text classification accuracy with BERT and RoBERTa large language models, and achieves state-of-the-art results in both small- and large-scale data pruning on image classification tasks, demonstrating the practical applicability of scalable meta learning across language and vision domains.
Kenya's President Wades Into Meta Lawsuits
Can a Big Tech company be sued in Kenya for alleged abuses at an outsourcing company working on its behalf? That's the question at the heart of two lawsuits that are attempting to set a new precedent in Kenya, which is the prime destination for tech companies looking to farm out digital work to the African continent. The two-year legal battle stems from allegations of human rights violations at an outsourced Meta content moderation facility in Nairobi, where employees hired by a contractor were paid as little as 1.50 per hour to view traumatic content, such as videos of rapes, murders, and war crimes. The suits claim that despite the workers being contracted by an outsourcing company, called Sama, Meta essentially supervised and set the terms for the work, and designed and managed the software required for the task. Both companies deny wrongdoing and Meta has challenged the Kenyan courts' jurisdiction to hear the cases.
OpenAI co-founder and Chief Scientist Ilya Sutskever is leaving the company
Ilya Sutskever has announced on X, formerly known as Twitter, that he's leaving OpenAI almost a decade after he co-founded the company. He's confident that OpenAI "will build [artificial general intelligence] that is both safe and beneficial" under the leadership of CEO Sam Altman, President Greg Brockman and CTO Mira Murati, he continued. In his own post about Sutskever's departure, Altman called him "one of the greatest minds of our generation" and credited him for his work with the company. Jakub Pachocki, OpenAI's previous Director of Research who headed the development of GPT-4 and OpenAI Five, has taken Sutskever's role as Chief Scientist. After almost a decade, I have made the decision to leave OpenAI.
Making Scalable Meta Learning Practical
Choe, Sang Keun, Mehta, Sanket Vaibhav, Ahn, Hwijeen, Neiswanger, Willie, Xie, Pengtao, Strubell, Emma, Xing, Eric
Despite its flexibility to learn diverse inductive biases in machine learning programs, meta learning (i.e., learning to learn) has long been recognized to suffer from poor scalability due to its tremendous compute/memory costs, training instability, and a lack of efficient distributed training support. In this work, we focus on making scalable meta learning practical by introducing SAMA, which combines advances in both implicit differentiation algorithms and systems. Specifically, SAMA is designed to flexibly support a broad range of adaptive optimizers in the base level of meta learning programs, while reducing computational burden by avoiding explicit computation of second-order gradient information, and exploiting efficient distributed training techniques implemented for first-order gradients. Evaluated on multiple large-scale meta learning benchmarks, SAMA showcases up to 1.7/4.8x increase in throughput and 2.0/3.8x decrease in memory consumption respectively on single-/multi-GPU setups compared to other baseline meta learning algorithms. Furthermore, we show that SAMA-based data optimization leads to consistent improvements in text classification accuracy with BERT and RoBERTa large language models, and achieves state-of-the-art results in both small- and large-scale data pruning on image classification tasks, demonstrating the practical applicability of scalable meta learning across language and vision domains.