AITopics | prometheus

Collaborating Authors

prometheus

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Jeff Bezos' New AI Venture Quietly Acquired an Agentic Computing Startup

WIREDNov-26-2025, 20:02:13 GMT

Jeff Bezos' New AI Venture Quietly Acquired an Agentic Computing Startup Project Prometheus has raised over $6 billion in funding and hired over 100 employees, a handful of whom joined through its acquisition of General Agents, according to records and sources. In early June, tech entrepreneur Vik Bajaj took over Saison, a two-Michelin star restaurant in San Francisco, for an off-the-record dinner to talk about AI with journalists and a handful of scientists. In attendance was Sherjil Ozair, a late addition who had previously held senior research roles at DeepMind and Tesla . The following day, Bajaj and Ozair were on their way to making a deal, public records show. Bajaj didn't mention it at the dinner, but earlier this year he had begun working with Amazon executive chairman Jeff Bezos on a new AI venture called Project Prometheus.

large language model, machine learning, natural language, (18 more...)

WIRED

Country:

North America > United States > California > San Francisco County > San Francisco (0.26)
Asia > Nepal (0.15)
Asia > Myanmar (0.05)
(6 more...)

Genre: Financial News (0.85)

Industry:

Retail > Online (1.00)
Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (0.74)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Netflix's em Frankenstein /em Departs From the Book in a Major Way

SlateNov-10-2025, 21:54:08 GMT

Netflix's offers a different spin on one of literature's all-time assholes. Enter your email to receive alerts for this author. You can manage your newsletter subscriptions at any time. You're already subscribed to the aa_Laura_Miller newsletter. You can manage your newsletter subscriptions at any time.

artificial intelligence, science fiction, shelley, (15 more...)

Slate

Country: Europe (0.15)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Information Technology > Services (0.73)

Technology:

Information Technology > Communications (0.72)
Information Technology > Artificial Intelligence > Science Fiction (0.52)

Add feedback

Understanding Code Agent Behaviour: An Empirical Study of Success and Failure Trajectories

Majgaonkar, Oorja, Fei, Zhiwei, Li, Xiang, Sarro, Federica, Ye, He

arXiv.org Artificial IntelligenceNov-4-2025

The increasing deployment of Large Language Model (LLM) agents for complex software engineering tasks has created a need to understand their problem-solving behaviours beyond simple success metrics. While these agents demonstrate impressive capabilities in automated issue resolution, their decision-making processes remain largely opaque. This paper presents an empirical study of agent trajectories, namely the execution traces capturing the steps agents take when attempting to resolve software issues. We analyse trajectories from three state-of-the-art code agents (OpenHands, SWE-agent, and Prometheus) on the SWE-Bench benchmark, examining both successful and failed attempts. Our investigation reveals several key insights into agent behaviour. First, we identify how distinct problem-solving strategies, such as defensive programming and context gathering, enable success in different scenarios. Second, we find that failed trajectories are consistently longer and exhibit higher variance than successful ones, with failure patterns differing significantly between agents. Third, our fault localisation analysis shows that while most trajectories correctly identify problematic files (72-81\% even in failures), success depends more on achieving approximate rather than exact code modifications. These and other findings unveiled by our study, provide a foundation for understanding agent behaviour through trajectory analysis, contributing to the development of more robust and interpretable autonomous software engineering systems.

large language model, natural language, trajectory, (15 more...)

arXiv.org Artificial Intelligence

2511.00197

Country:

North America > United States (0.94)
South America > Brazil > Rio de Janeiro (0.16)

Genre:

Research Report > Experimental Study (0.66)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)

Add feedback

Terra: Explorable Native 3D World Model with Point Latents

Huang, Yuanhui, Chen, Weiliang, Zheng, Wenzhao, Tao, Xin, Wan, Pengfei, Zhou, Jie, Lu, Jiwen

arXiv.org Artificial IntelligenceOct-17-2025

World models have garnered increasing attention for comprehensive modeling of the real world. However, most existing methods still rely on pixel-aligned representations as the basis for world evolution, neglecting the inherent 3D nature of the physical world. This could undermine the 3D consistency and diminish the modeling efficiency of world models. In this paper, we present Terra, a native 3D world model that represents and generates explorable environments in an intrinsic 3D latent space. Specifically, we propose a novel point-to-Gaussian variational autoencoder (P2G-VAE) that encodes 3D inputs into a latent point representation, which is subsequently decoded as 3D Gaussian primitives to jointly model geometry and appearance. We then introduce a sparse point flow matching network (SPFlow) for generating the latent point representation, which simultaneously denoises the positions and features of the point latents. Our Terra enables exact multi-view consistency with native 3D representation and architecture, and supports flexible rendering from any viewpoint with only a single generation process. Furthermore, Terra achieves explorable world modeling through progressive generation in the point latent space. We conduct extensive experiments on the challenging indoor scenes from ScanNet v2. Terra achieves state-of-the-art performance in both reconstruction and generation with high 3D consistency.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.14977

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

Prometheus: Universal, Open-Source Mocap-Based Teleoperation System with Force Feedback for Dataset Collection in Robot Learning

Satsevich, S., Bazhenov, A., Egorov, S., Erkhov, A., Gromakov, M., Fedoseev, A., Tsetserukou, D.

arXiv.org Artificial IntelligenceOct-2-2025

This paper presents a novel teleoperation system with force feedback, utilizing consumer-grade HTC Vive Trackers 2.0. The system integrates a custom-built controller, a UR3 robotic arm, and a Robotiq gripper equipped with custom-designed fingers to ensure uniform pressure distribution on an embedded force sensor. Real-time compression force data is transmitted to the controller, enabling operators to perceive the gripping force applied to objects. Experimental results demonstrate that the system enhances task success rates and provides a low-cost solution for large-scale imitation learning data collection without compromising affordability.

artificial intelligence, arxiv preprint arxiv, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2510.01023

Country:

Europe (0.68)
North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Hardware (0.35)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reference-Free Rating of LLM Responses via Latent Information

Girrbach, Leander, Su, Chi-Ping, Saanum, Tankred, Socher, Richard, Schulz, Eric, Akata, Zeynep

arXiv.org Artificial IntelligenceSep-30-2025

How reliable are single-response LLM-as-a-judge ratings without references, and can we obtain fine-grained, deterministic scores in this setting? We study the common practice of asking a judge model to assign Likert-scale scores to free-text responses and show two systematic issues: scores are unstable under sampling and poorly calibrated, leading to compression near the top of the scale and frequent ties. We then propose and evaluate Latent Judges, which derive scalar ratings from internal model signals: (i) probability-weighted scores over integer ratings, (ii) verifier-style probabilities of "yes", and (iii) linear probes trained on model activations at the rating position. Across a broad suite of pairwise and single-rating benchmarks, latent methods match or surpass standard prompting, with consistent gains on pairwise accuracy and listwise ranking relevant to Best-of-N selection. Probability-weighted scores achieve the strongest single-rating correlations, while probes recover useful signals when output logits are miscalibrated. These results indicate that latent information provides deterministic and more discriminative signals for reference-free evaluation, and can improve selection and training approaches like Best-of-$N$, multi-teacher distillation, and routing.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.24678

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Silicon Valley trades researchers like football teams poach players

The GuardianJul-22-2025, 13:13:43 GMT

The tech industry is in a high-flying war over who can dole out more millions to attract artificial intelligence specialists. Individual researchers, most equipped with PhDs in computer science, are commanding giant salaries and mammoth signing bonuses in hiring negotiations. You might call them talent. The Washington Post called them Olympians in a recent headline: "Why AI superathletes could be winning 100 million bonuses in Silicon Valley." These are the most sought-after employees in the world.

artificial intelligence, meta, social media, (13 more...)

The Guardian

Country:

Europe > United Kingdom (0.05)
North America > United States > Pennsylvania (0.05)
North America > United States > Hawaii (0.05)
(4 more...)

Industry:

Information Technology (1.00)
Leisure & Entertainment > Sports > Soccer (0.40)
Leisure & Entertainment > Sports > Football (0.40)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.73)

Add feedback

Real-Time Evaluation Models for RAG: Who Detects Hallucinations Best?

Sardana, Ashish

arXiv.org Artificial IntelligenceApr-7-2025

This article surveys Evaluation models to automatically detect hallucinations in Retrieval-Augmented Generation (RAG), and presents a comprehensive benchmark of their performance across six RAG applications. Methods included in our study include: LLM-as-a-Judge, Prometheus, Lynx, the Hughes Hallucination Evaluation Model (HHEM), and the Trustworthy Language Model (TLM). These approaches are all reference-free, requiring no ground-truth answers/labels to catch incorrect LLM responses. Our study reveals that, across diverse RAG applications, some of these approaches consistently detect incorrect RAG responses with high precision/recall.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.21157

Country: North America > United States > California > Santa Clara County > San Jose (0.04)

Genre: Research Report > New Finding (0.49)

Industry:

Health & Medicine (0.47)
Food & Agriculture > Agriculture (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Read an extract from Michel Nieva's science fiction novel Dengue Boy

New ScientistFeb-28-2025, 10:13:09 GMT

Michel Nieva's Dengue Boy is set on a drowned future Earth Spread-eagle on that strange white surface which lay beneath the inclement Antarctic sun, Dengue Destroyed saw everything flash by in no more than a second. What of life is there to look back on in the space of a few instants when a boy, a girl, a destroyed void, believes it is about to die? Might it think of its dear mother, lament the father it never knew, or perhaps recall, some humorous or traumatic anecdote involving its classmates? Truthfully, not much else had happened during her brief time on Earth. However (for the mind works in mysterious and unpredictable ways, especially the mind of a mutant mosquito), Dengue Destroyed did not think about any of these people, but rather about a story her mother used to read her at bedtime, the story of Snow White and the Seven Dwarfs.

artificial intelligence, science fiction, snow, (14 more...)

New Scientist

Country:

North America > United States > New York (0.05)
South America > Argentina > Pampas (0.05)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology: Information Technology > Artificial Intelligence > Science Fiction (0.41)

Add feedback

Enabling Autonomic Microservice Management through Self-Learning Agents

Yu, Fenglin, Yang, Fangkai, Qin, Xiaoting, Zhang, Zhiyang, Zhang, Jue, Lin, Qingwei, Zhang, Hongyu, Dang, Yingnong, Rajmohan, Saravan, Zhang, Dongmei, Zhang, Qi

arXiv.org Artificial IntelligenceJan-31-2025

The increasing complexity of modern software systems necessitates robust autonomic self-management capabilities. While Large Language Models (LLMs) demonstrate potential in this domain, they often face challenges in adapting their general knowledge to specific service contexts. To address this limitation, we propose ServiceOdyssey, a self-learning agent system that autonomously manages microservices without requiring prior knowledge of service-specific configurations. By leveraging curriculum learning principles and iterative exploration, ServiceOdyssey progressively develops a deep understanding of operational environments, reducing dependence on human input or static documentation. A prototype built with the Sock Shop microservice demonstrates the potential of this approach for autonomic microservice management.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.19056

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
South America > Brazil (0.04)
(8 more...)

Genre:

Workflow (0.98)
Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback