Generative AI
Aug2Search: Enhancing Facebook Marketplace Search with LLM-Generated Synthetic Data Augmentation
Xi, Ruijie, Ba, He, Yuan, Hao, Agrawal, Rishu, Tian, Yuxin, Kong, Ruoyan, Prakash, Arul
Embedding-Based Retrieval (EBR) is an important technique in modern search engines, enabling semantic match between search queries and relevant results. However, search logging data on platforms like Facebook Marketplace lacks the diversity and details needed for effective EBR model training, limiting the models' ability to capture nuanced search patterns. To address this challenge, we propose Aug2Search, an EBR-based framework leveraging synthetic data generated by Generative AI (GenAI) models, in a multimodal and multitask approach to optimize query-product relevance. This paper investigates the capabilities of GenAI, particularly Large Language Models (LLMs), in generating high-quality synthetic data, and analyzing its impact on enhancing EBR models. We conducted experiments using eight Llama models and 100 million data points from Facebook Marketplace logs. Our synthetic data generation follows three strategies: (1) generate queries, (2) enhance product listings, and (3) generate queries from enhanced listings. We train EBR models on three different datasets: sampled engagement data or original data ((e.g., "Click" and "Listing Interactions")), synthetic data, and a mixture of both engagement and synthetic data to assess their performance across various training sets. Our findings underscore the robustness of Llama models in producing synthetic queries and listings with high coherence, relevance, and diversity, while maintaining low levels of hallucination. Aug2Search achieves an improvement of up to 4% in ROC_AUC with 100 million synthetic data samples, demonstrating the effectiveness of our approach. Moreover, our experiments reveal that with the same volume of training data, models trained exclusively on synthetic data often outperform those trained on original data only or a mixture of original and synthetic data.
The End of Publishing as We Know It
When tech companies first rolled out generative-AI products, some critics immediately feared a media collapse. Every bit of writing, imagery, and video became suspect. But for news publishers and journalists, another calamity was on the horizon. Chatbots have proved adept at keeping users locked into conversations. They do so by answering every question, often through summarizing articles from news publishers.
Meta boss praises new US army division enlisting tech execs as lieutenant colonels
Meta's chief technology officer has called it "the great honor of my life" to be enlisted in a new US army corps that defence chiefs set up to better integrate military and tech industry expertise, including senior figures from top tech firms that also include Palantir and OpenAI. Andrew Bosworth, a long-term lieutenant to Mark Zuckerberg known widely as "Boz", is one of several senior Silicon Valley executives commissioned to the rank of lieutenant colonel in the corps, called Detachment 201, which the US army says will "fuse cutting-edge tech expertise with military innovation". Bosworth, who joined Facebook in 2006, was sworn into the army reserves earlier this month alongside Shyam Sankar, the chief technology officer of Palantir, a technology firm with extensive defence contracts, Kevin Weil, chief product officer of OpenAI, and Bob McGrew, an adviser at Thinking Machines Lab, a 10bn AI company. They wore military fatigues at the swearing-in ceremony but will not be full-time soldiers. The recruitment is a sign of the increasing importance of technology in modern warfare and growing commercial and research links between some of the largest tech firms and the military.
Ring harnesses generative AI to power Ring Video Descriptions
Ring is bringing generative AI to its family of home security cameras and video doorbells with a new feature called Video Descriptions. Once this feature is enabled, the motion alerts triggered by Ring cameras will be accompanied by an AI-generated analysis of the motion that triggered the camera to record. In a blog post earlier today, Ring founder Jamie Siminoff described how the push notifications Ring users receive on their smartphones when motion is detected will be enhanced with text descriptions of what that motion was. "This new generative AI feature," Siminoff said, "helps you quickly distinguish between urgent and everyday activity with a quick glance at your phone." Ring will use genereative AI to deliver descriptions of the events its security cameras and video doorbells capture on video.
The AI Hype Index: AI-powered toys are coming
That's why we've created the AI Hype Index--a simple, at-a-glance summary of everything you need to know about the state of the industry. AI agents might be the toast of the AI industry, but they're still not that reliable. That's why Yoshua Bengio, one of the world's leading AI experts, is creating his own nonprofit dedicated to guarding against deceptive agents. Not only can they mislead you, but new research suggests that the weaker an AI model powering an agent is, the less likely it is to be able to negotiate you a good deal online. Elsewhere, OpenAI has inked a deal with toymaker Mattel to develop "age-appropriate" AI-infused products.
Uncovering Conceptual Blindspots in Generative Image Models Using Sparse Autoencoders
Bohacek, Matyas, Fel, Thomas, Agrawala, Maneesh, Lubana, Ekdeep Singh
Despite their impressive performance, generative image models trained on large-scale datasets frequently fail to produce images with seemingly simple concepts -- e.g., human hands or objects appearing in groups of four -- that are reasonably expected to appear in the training data. These failure modes have largely been documented anecdotally, leaving open the question of whether they reflect idiosyncratic anomalies or more structural limitations of these models. To address this, we introduce a systematic approach for identifying and characterizing "conceptual blindspots" -- concepts present in the training data but absent or misrepresented in a model's generations. Our method leverages sparse autoencoders (SAEs) to extract interpretable concept embeddings, enabling a quantitative comparison of concept prevalence between real and generated images. We train an archetypal SAE (RA-SAE) on DINOv2 features with 32,000 concepts -- the largest such SAE to date -- enabling fine-grained analysis of conceptual disparities. Applied to four popular generative models (Stable Diffusion 1.5/2.1, PixArt, and Kandinsky), our approach reveals specific suppressed blindspots (e.g., bird feeders, DVD discs, and whitespaces on documents) and exaggerated blindspots (e.g., wood background texture and palm trees). At the individual datapoint level, we further isolate memorization artifacts -- instances where models reproduce highly specific visual templates seen during training. Overall, we propose a theoretically grounded framework for systematically identifying conceptual blindspots in generative models by assessing their conceptual fidelity with respect to the underlying data-generating process.
Towards an Introspective Dynamic Model of Globally Distributed Computing Infrastructures
Kilic, Ozgur O., Park, David K., Ren, Yihui, Korchuganova, Tatiana, Vatsavai, Sairam Sri, Boudreau, Joseph, Chowdhury, Tasnuva, Feng, Shengyu, Khan, Raees, Kim, Jaehyung, Klasky, Scott, Maeno, Tadashi, Nilsson, Paul, Outschoorn, Verena Ingrid Martinez, Podhorszki, Norbert, Suter, Frรฉdรฉric, Yang, Wei, Yang, Yiming, Yoo, Shinjae, Klimentov, Alexei, Hoisie, Adolfy
Large-scale scientific collaborations like ATLAS, Belle II, CMS, DUNE, and others involve hundreds of research institutes and thousands of researchers spread across the globe. These experiments generate petabytes of data, with volumes soon expected to reach exabytes. Consequently, there is a growing need for computation, including structured data processing from raw data to consumer-ready derived data, extensive Monte Carlo simulation campaigns, and a wide range of end-user analysis. To manage these computational and storage demands, centralized workflow and data management systems are implemented. However, decisions regarding data placement and payload allocation are often made disjointly and via heuristic means. A significant obstacle in adopting more effective heuristic or AI-driven solutions is the absence of a quick and reliable introspective dynamic model to evaluate and refine alternative approaches. In this study, we aim to develop such an interactive system using real-world data. By examining job execution records from the PanDA workflow management system, we have pinpointed key performance indicators such as queuing time, error rate, and the extent of remote data access. The dataset includes five months of activity. Additionally, we are creating a generative AI model to simulate time series of payloads, which incorporate visible features like category, event count, and submitting group, as well as hidden features like the total computational load-derived from existing PanDA records and computing site capabilities. These hidden features, which are not visible to job allocators, whether heuristic or AI-driven, influence factors such as queuing times and data movement.
What do professional software developers need to know to succeed in an age of Artificial Intelligence?
Kam, Matthew, Miller, Cody, Wang, Miaoxin, Tidwell, Abey, Lee, Irene A., Malyn-Smith, Joyce, Perez, Beatriz, Tiwari, Vikram, Kenitzer, Joshua, Macvean, Andrew, Barrar, Erin
Generative AI is showing early evidence of productivity gains for software developers, but concerns persist regarding workforce disruption and deskilling. We describe our research with 21 developers at the cutting edge of using AI, summarizing 12 of their work goals we uncovered, together with 75 associated tasks and the skills & knowledge for each, illustrating how developers use AI at work. From all of these, we distilled our findings in the form of 5 insights. We found that the skills & knowledge to be a successful AI-enhanced developer are organized into four domains (using Generative AI effectively, core software engineering, adjacent engineering, and adjacent non-engineering) deployed at critical junctures throughout a 6-step task workflow. In order to "future proof" developers for this age of AI, on-the-job learning initiatives and computer science degree programs will need to target both "soft" skills and the technical skills & knowledge in all four domains to reskill, upskill and safeguard against deskilling.
Japan aims to regulate social media monetization in disasters
An internal affairs ministry working group Monday unveiled a draft interim report underlining the need for the government to consider a legal system aimed at regulating social media monetization during natural disasters. The report calls on social medial service providers to introduce voluntary regulations designed to suspend monetization in the event of a disaster, to curb the spread of disinformation. The working group plans to ask industry groups to draw up a code of conduct by the end of the year to prevent the spread of disinformation in such situations. It also urged businesses to take measures such as attaching labels to images created by generative artificial intelligence. Social media posts are rewarded based on the number of viewers.
ConsumerBench: Benchmarking Generative AI Applications on End-User Devices
Gu, Yile, Kadekodi, Rohan, Nguyen, Hoang, Kamahori, Keisuke, Liu, Yiyu, Kasikci, Baris
The recent shift in Generative AI (GenAI) applications from cloud-only environments to end-user devices introduces new challenges in resource management, system efficiency, and user experience. This paper presents ConsumerBench, a comprehensive benchmarking framework designed to evaluate the system efficiency and response time of GenAI models running on end-user devices. Unlike existing benchmarks that assume exclusive model access on dedicated GPUs, ConsumerBench simulates realistic multi-application scenarios executing concurrently on constrained hardware. Furthermore, ConsumerBench supports customizable workflows that simulate complex tasks requiring coordination among multiple applications. ConsumerBench captures both application-level metrics, including latency and Service Level Objective (SLO) attainment, and system-level metrics like CPU/GPU utilization and memory bandwidth. Through extensive experiments, ConsumerBench reveals inefficiencies in resource sharing, unfair scheduling under greedy allocation, and performance pitfalls of static model server configurations. The paper also provides practical insights for model developers and system designers, highlighting the benefits of custom kernels tailored to consumer-grade GPU architectures and the value of implementing SLO-aware scheduling strategies.