hype
HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models
Generative models often use human evaluations to measure the perceived quality of their outputs. Automated metrics are noisy indirect proxies, because they rely on heuristics or pretrained embeddings. However, up until now, direct human evaluation strategies have been ad-hoc, neither standardized nor validated. Our work establishes a gold standard human benchmark for generative realism. We construct Human eYe Perceptual Evaluation (HYPE) a human benchmark that is (1) grounded in psychophysics research in perception, (2) reliable across different sets of randomly sampled outputs from a model, (3) able to produce separable model performances, and (4) efficient in cost and time. We introduce two variants: one that measures visual perception under adaptive time constraints to determine the threshold at which a model's outputs appear real (e.g.
Can AI really help us discover new materials?
Can AI really help us discover new materials? Judging from headlines and social media posts in recent years, one might reasonably assume that AI is going to fix the power grid, cure the world's diseases, and finish my holiday shopping for me. This week, we published a new package called Hype Correction . The collection of stories takes a look at how the world is starting to reckon with the reality of what AI can do, and what's just fluff. One of my favorite stories in that package comes from my colleague David Rotman, who took a hard look at AI for materials research . AI could transform the process of discovering new materials--innovation that could be especially useful in the world of climate tech, which needs new batteries, semiconductors, magnets, and more.
- Energy (0.50)
- Retail (0.35)
- Health & Medicine (0.35)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada (0.04)
- Research Report > Experimental Study (0.48)
- Research Report > New Finding (0.47)
HYPE: Hybrid Planning with Ego Proposal-Conditioned Predictions
Yu, Hang, Jordan, Julian, Schmidt, Julian, Lindner, Silvan, Canevaro, Alessandro, Stork, Wilhelm
Safe and interpretable motion planning in complex urban environments needs to reason about bidirectional multi-agent interactions. This reasoning requires to estimate the costs of potential ego driving maneuvers. Many existing planners generate initial trajectories with sampling-based methods and refine them by optimizing on learned predictions of future environment states, which requires a cost function that encodes the desired vehicle behavior. Designing such a cost function can be very challenging, especially if a wide range of complex urban scenarios has to be considered. We propose HYPE: HYbrid Planning with Ego proposal-conditioned predictions, a planner that integrates multimodal trajectory proposals from a learned proposal model as heuristic priors into a Monte Carlo Tree Search (MCTS) refinement. To model bidirectional interactions, we introduce an ego-conditioned occupancy prediction model, enabling consistent, scene-aware reasoning. Our design significantly simplifies cost function design in refinement by considering proposal-driven guidance, requiring only minimalistic grid-based cost terms. Evaluations on large-scale real-world benchmarks nuPlan and DeepUrban show that HYPE effectively achieves state-of-the-art performance, especially in safety and adaptability.
- Asia > Middle East > Jordan (0.40)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada (0.04)
- Research Report > Experimental Study (0.48)
- Research Report > New Finding (0.47)
Hype or not? Formalizing Automatic Promotional Language Detection in Biomedical Research
Batalo, Bojan, Shimomoto, Erica K., Millar, Neil
In science, promotional language ('hype') is increasing and can undermine objective evaluation of evidence, impede research development, and erode trust in science. In this paper, we introduce the task of automatic detection of hype, which we define as hyperbolic or subjective language that authors use to glamorize, promote, embellish, or exaggerate aspects of their research. We propose formalized guidelines for identifying hype language and apply them to annotate a portion of the National Institutes of Health (NIH) grant application corpus. We then evaluate traditional text classifiers and language models on this task, comparing their performance with a human baseline. Our experiments show that formalizing annotation guidelines can help humans reliably annotate candidate hype adjectives and that using our annotated dataset to train machine learning models yields promising results. Our findings highlight the linguistic complexity of the task, and the potential need for domain knowledge and temporal awareness of the facts. While some linguistic works address hype detection, to the best of our knowledge, we are the first to approach it as a natural language processing task.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (3 more...)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.66)
- Media > News (0.68)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
- Health & Medicine > Therapeutic Area > Immunology (0.46)
- Government > Regional Government > North America Government > United States Government (0.34)
Vivaldi rejects AI browsing: 'Humans over hype'
If you're concerned that your favorite may be subsumed by the growing wave of AI, Vivaldi would like you to know they plan to resist. Vivaldi, the small Norwegian-made browser which I use as an alternative to more mainstream browsers like Microsoft Edge and Google Chrome, said it plans to "choose humans over hype," in the words of Jon von Tetzchner, the company's chief executive. "We're taking a stand, choosing humans over hype, and we will not turn the joy of exploring into inactive spectatorship," von Tetzchner said in a statement, shared by the company. "Without exploration, the web becomes far less interesting. Our curiosity loses oxygen and the diversity of the web dies."
AI for the ancient world: how a new machine learning system can help make sense of Latin inscriptions
A fragment of a bronze military diploma from Sardinia, issued by the emperor Trajan to a sailor on a warship, as restored by Aeneas. If you believe the hype, generative artificial intelligence (AI) is the future. However, new research suggests the technology may also improve our understanding of the past. A team of computer scientists from Google DeepMind, working with classicists and archaeologists from universities in the United Kingdom and Greece, described a new machine-learning system designed to help experts to understand ancient Latin inscriptions. Named Aeneas (after the mythical hero of Rome's foundation epic), the system is a generative neural network designed to provide context for Latin inscriptions written between the 7th century BCE and the 8th century CE.
Anchoring AI Capabilities in Market Valuations: The Capability Realization Rate Model and Valuation Misalignment Risk
Fang, Xinmin, Tao, Lingfeng, Li, Zhengxiong
Recent breakthroughs in artificial intelligence (AI) have triggered surges in market valuations for AI-related companies, often outpacing the realization of underlying capabilities. We examine the anchoring effect of AI capabilities on equity valuations and propose a Capability Realization Rate (CRR) model to quantify the gap between AI potential and realized performance. Using data from the 2023--2025 generative AI boom, we analyze sector-level sensitivity and conduct case studies (OpenAI, Adobe, NVIDIA, Meta, Microsoft, Goldman Sachs) to illustrate patterns of valuation premium and misalignment. Our findings indicate that AI-native firms commanded outsized valuation premiums anchored to future potential, while traditional companies integrating AI experienced re-ratings subject to proof of tangible returns. We argue that CRR can help identify valuation misalignment risk-where market prices diverge from realized AI-driven value. We conclude with policy recommendations to improve transparency, mitigate speculative bubbles, and align AI innovation with sustainable market value.
- North America > United States > Colorado > Denver County > Denver (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Asia > China (0.04)
- Financial News (0.93)
- Research Report (0.70)
- Information Technology > Services (1.00)
- Banking & Finance > Trading (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.59)
Don't let hype about AI agents get ahead of reality
Let's start with the term "agent" itself. Right now, it's being slapped on everything from simple scripts to sophisticated AI workflows. There's no shared definition, which leaves plenty of room for companies to market basic automation as something much more advanced. That kind of "agentwashing" doesn't just confuse customers; it invites disappointment. We don't necessarily need a rigid standard, but we do need clearer expectations about what these systems are supposed to do, how autonomously they operate, and how reliably they perform. And reliability is the next big challenge.