prometheus
Jeff Bezos' New AI Venture Quietly Acquired an Agentic Computing Startup
Jeff Bezos' New AI Venture Quietly Acquired an Agentic Computing Startup Project Prometheus has raised over $6 billion in funding and hired over 100 employees, a handful of whom joined through its acquisition of General Agents, according to records and sources. In early June, tech entrepreneur Vik Bajaj took over Saison, a two-Michelin star restaurant in San Francisco, for an off-the-record dinner to talk about AI with journalists and a handful of scientists. In attendance was Sherjil Ozair, a late addition who had previously held senior research roles at DeepMind and Tesla . The following day, Bajaj and Ozair were on their way to making a deal, public records show. Bajaj didn't mention it at the dinner, but earlier this year he had begun working with Amazon executive chairman Jeff Bezos on a new AI venture called Project Prometheus.
- North America > United States > California > San Francisco County > San Francisco (0.26)
- Asia > Nepal (0.15)
- Asia > Myanmar (0.05)
- (6 more...)
- Retail > Online (1.00)
- Information Technology > Services (1.00)
Netflix's em Frankenstein /em Departs From the Book in a Major Way
Netflix's offers a different spin on one of literature's all-time assholes. Enter your email to receive alerts for this author. You can manage your newsletter subscriptions at any time. You're already subscribed to the aa_Laura_Miller newsletter. You can manage your newsletter subscriptions at any time.
- Europe > Germany > Bavaria > Upper Bavaria > Ingolstadt (0.05)
- Europe > Switzerland (0.05)
- Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.05)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Information Technology > Services (0.73)
Understanding Code Agent Behaviour: An Empirical Study of Success and Failure Trajectories
Majgaonkar, Oorja, Fei, Zhiwei, Li, Xiang, Sarro, Federica, Ye, He
The increasing deployment of Large Language Model (LLM) agents for complex software engineering tasks has created a need to understand their problem-solving behaviours beyond simple success metrics. While these agents demonstrate impressive capabilities in automated issue resolution, their decision-making processes remain largely opaque. This paper presents an empirical study of agent trajectories, namely the execution traces capturing the steps agents take when attempting to resolve software issues. We analyse trajectories from three state-of-the-art code agents (OpenHands, SWE-agent, and Prometheus) on the SWE-Bench benchmark, examining both successful and failed attempts. Our investigation reveals several key insights into agent behaviour. First, we identify how distinct problem-solving strategies, such as defensive programming and context gathering, enable success in different scenarios. Second, we find that failed trajectories are consistently longer and exhibit higher variance than successful ones, with failure patterns differing significantly between agents. Third, our fault localisation analysis shows that while most trajectories correctly identify problematic files (72-81\% even in failures), success depends more on achieving approximate rather than exact code modifications. These and other findings unveiled by our study, provide a foundation for understanding agent behaviour through trajectory analysis, contributing to the development of more robust and interpretable autonomous software engineering systems.
- Europe > United Kingdom > England > Greater London > London (0.41)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (6 more...)
- Research Report > Experimental Study (0.66)
- Research Report > New Finding (0.46)
Terra: Explorable Native 3D World Model with Point Latents
Huang, Yuanhui, Chen, Weiliang, Zheng, Wenzhao, Tao, Xin, Wan, Pengfei, Zhou, Jie, Lu, Jiwen
World models have garnered increasing attention for comprehensive modeling of the real world. However, most existing methods still rely on pixel-aligned representations as the basis for world evolution, neglecting the inherent 3D nature of the physical world. This could undermine the 3D consistency and diminish the modeling efficiency of world models. In this paper, we present Terra, a native 3D world model that represents and generates explorable environments in an intrinsic 3D latent space. Specifically, we propose a novel point-to-Gaussian variational autoencoder (P2G-VAE) that encodes 3D inputs into a latent point representation, which is subsequently decoded as 3D Gaussian primitives to jointly model geometry and appearance. We then introduce a sparse point flow matching network (SPFlow) for generating the latent point representation, which simultaneously denoises the positions and features of the point latents. Our Terra enables exact multi-view consistency with native 3D representation and architecture, and supports flexible rendering from any viewpoint with only a single generation process. Furthermore, Terra achieves explorable world modeling through progressive generation in the point latent space. We conduct extensive experiments on the challenging indoor scenes from ScanNet v2. Terra achieves state-of-the-art performance in both reconstruction and generation with high 3D consistency.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Vision (0.94)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)
Prometheus: Universal, Open-Source Mocap-Based Teleoperation System with Force Feedback for Dataset Collection in Robot Learning
Satsevich, S., Bazhenov, A., Egorov, S., Erkhov, A., Gromakov, M., Fedoseev, A., Tsetserukou, D.
This paper presents a novel teleoperation system with force feedback, utilizing consumer-grade HTC Vive Trackers 2.0. The system integrates a custom-built controller, a UR3 robotic arm, and a Robotiq gripper equipped with custom-designed fingers to ensure uniform pressure distribution on an embedded force sensor. Real-time compression force data is transmitted to the controller, enabling operators to perceive the gripping force applied to objects. Experimental results demonstrate that the system enhances task success rates and provides a low-cost solution for large-scale imitation learning data collection without compromising affordability.
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- Europe > Spain > Aragón (0.04)
- (3 more...)
Reference-Free Rating of LLM Responses via Latent Information
Girrbach, Leander, Su, Chi-Ping, Saanum, Tankred, Socher, Richard, Schulz, Eric, Akata, Zeynep
How reliable are single-response LLM-as-a-judge ratings without references, and can we obtain fine-grained, deterministic scores in this setting? We study the common practice of asking a judge model to assign Likert-scale scores to free-text responses and show two systematic issues: scores are unstable under sampling and poorly calibrated, leading to compression near the top of the scale and frequent ties. We then propose and evaluate Latent Judges, which derive scalar ratings from internal model signals: (i) probability-weighted scores over integer ratings, (ii) verifier-style probabilities of "yes", and (iii) linear probes trained on model activations at the rating position. Across a broad suite of pairwise and single-rating benchmarks, latent methods match or surpass standard prompting, with consistent gains on pairwise accuracy and listwise ranking relevant to Best-of-N selection. Probability-weighted scores achieve the strongest single-rating correlations, while probes recover useful signals when output logits are miscalibrated. These results indicate that latent information provides deterministic and more discriminative signals for reference-free evaluation, and can improve selection and training approaches like Best-of-$N$, multi-teacher distillation, and routing.
Silicon Valley trades researchers like football teams poach players
The tech industry is in a high-flying war over who can dole out more millions to attract artificial intelligence specialists. Individual researchers, most equipped with PhDs in computer science, are commanding giant salaries and mammoth signing bonuses in hiring negotiations. You might call them talent. The Washington Post called them Olympians in a recent headline: "Why AI superathletes could be winning 100 million bonuses in Silicon Valley." These are the most sought-after employees in the world.
- Europe > United Kingdom (0.05)
- North America > United States > Pennsylvania (0.05)
- North America > United States > Hawaii (0.05)
- (4 more...)
- Information Technology (1.00)
- Leisure & Entertainment > Sports > Soccer (0.40)
- Leisure & Entertainment > Sports > Football (0.40)
Real-Time Evaluation Models for RAG: Who Detects Hallucinations Best?
This article surveys Evaluation models to automatically detect hallucinations in Retrieval-Augmented Generation (RAG), and presents a comprehensive benchmark of their performance across six RAG applications. Methods included in our study include: LLM-as-a-Judge, Prometheus, Lynx, the Hughes Hallucination Evaluation Model (HHEM), and the Trustworthy Language Model (TLM). These approaches are all reference-free, requiring no ground-truth answers/labels to catch incorrect LLM responses. Our study reveals that, across diverse RAG applications, some of these approaches consistently detect incorrect RAG responses with high precision/recall.
- Health & Medicine (0.47)
- Food & Agriculture > Agriculture (0.46)
Read an extract from Michel Nieva's science fiction novel Dengue Boy
Michel Nieva's Dengue Boy is set on a drowned future Earth Spread-eagle on that strange white surface which lay beneath the inclement Antarctic sun, Dengue Destroyed saw everything flash by in no more than a second. What of life is there to look back on in the space of a few instants when a boy, a girl, a destroyed void, believes it is about to die? Might it think of its dear mother, lament the father it never knew, or perhaps recall, some humorous or traumatic anecdote involving its classmates? Truthfully, not much else had happened during her brief time on Earth. However (for the mind works in mysterious and unpredictable ways, especially the mind of a mutant mosquito), Dengue Destroyed did not think about any of these people, but rather about a story her mother used to read her at bedtime, the story of Snow White and the Seven Dwarfs.
- North America > United States > New York (0.05)
- South America > Argentina > Pampas (0.05)
Enabling Autonomic Microservice Management through Self-Learning Agents
Yu, Fenglin, Yang, Fangkai, Qin, Xiaoting, Zhang, Zhiyang, Zhang, Jue, Lin, Qingwei, Zhang, Hongyu, Dang, Yingnong, Rajmohan, Saravan, Zhang, Dongmei, Zhang, Qi
The increasing complexity of modern software systems necessitates robust autonomic self-management capabilities. While Large Language Models (LLMs) demonstrate potential in this domain, they often face challenges in adapting their general knowledge to specific service contexts. To address this limitation, we propose ServiceOdyssey, a self-learning agent system that autonomously manages microservices without requiring prior knowledge of service-specific configurations. By leveraging curriculum learning principles and iterative exploration, ServiceOdyssey progressively develops a deep understanding of operational environments, reducing dependence on human input or static documentation. A prototype built with the Sock Shop microservice demonstrates the potential of this approach for autonomic microservice management.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- South America > Brazil (0.04)
- (8 more...)
- Workflow (0.98)
- Research Report (0.82)