hamlet
HAMLET: Switch your Vision-Language-Action Model into a History-Aware Policy
Koo, Myungkyu, Choi, Daewon, Kim, Taeyoung, Lee, Kyungmin, Kim, Changyeon, Seo, Younggyo, Shin, Jinwoo
Inherently, robotic manipulation tasks are history-dependent: leveraging past context could be beneficial. However, most existing Vision-Language-Action models (VLAs) have been designed without considering this aspect, i.e., they rely solely on the current observation, ignoring preceding context. In this paper, we propose HAMLET, a scalable framework to adapt VLAs to attend to the historical context during action prediction. Specifically, we introduce moment tokens that compactly encode perceptual information at each timestep. Their representations are initialized with time-contrastive learning, allowing them to better capture temporally distinctive aspects. Next, we employ a lightweight memory module that integrates the moment tokens across past timesteps into memory features, which are then leveraged for action prediction. Through empirical evaluation, we show that HAMLET successfully transforms a state-of-the-art VLA into a history-aware policy, especially demonstrating significant improvements on long-horizon tasks that require historical context. In particular, on top of GR00T N1.5, HAMLET achieves an average success rate of 76.4% on history-dependent real-world tasks, surpassing the baseline performance by 47.2%. Furthermore, HAMLET pushes prior art performance from 64.1% to 66.4% on RoboCasa Kitchen (100-demo setup) and from 95.6% to 97.7% on LIBERO, highlighting its effectiveness even under generic robot-manipulation benchmarks.
Topic Identification in LLM Input-Output Pairs through the Lens of Information Bottleneck
Large Language Models (LLMs) are prone to critical failure modes, including \textit{intrinsic faithfulness hallucinations} (also known as confabulations), where a response deviates semantically from the provided context. Frameworks designed to detect this, such as Semantic Divergence Metrics (SDM), rely on identifying latent topics shared between prompts and responses, typically by applying geometric clustering to their sentence embeddings. This creates a disconnect, as the topics are optimized for spatial proximity, not for the downstream information-theoretic analysis. In this paper, we bridge this gap by developing a principled topic identification method grounded in the Deterministic Information Bottleneck (DIB) for geometric clustering. Our key contribution is to transform the DIB method into a practical algorithm for high-dimensional data by substituting its intractable KL divergence term with a computationally efficient upper bound. The resulting method, which we dub UDIB, can be interpreted as an entropy-regularized and robustified version of K-means that inherently favors a parsimonious number of informative clusters. By applying UDIB to the joint clustering of LLM prompt and response embeddings, we generate a shared topic representation that is not merely spatially coherent but is fundamentally structured to be maximally informative about the prompt-response relationship. This provides a superior foundation for the SDM framework and offers a novel, more sensitive tool for detecting confabulations.
- North America > United States (0.14)
- Europe > Denmark (0.04)
- Government (0.46)
- Banking & Finance (0.40)
Towards a Holistic and Automated Evaluation Framework for Multi-Level Comprehension of LLMs in Book-Length Contexts
Deng, Jiaqi, Lee, Yuho, Kim, Nicole Hee-Yeon, Min, Hyangsuk, Yun, Taewon, Ban, Minjeong, Yul, Kim, Song, Hwanjun
We introduce HAMLET, a holistic and automated framework for evaluating the long-context comprehension of large language models (LLMs). HAMLET structures source texts into a three-level key-fact hierarchy at root-, branch-, and leaf-levels, and employs query-focused summarization to evaluate how well models recall and faithfully represent information at each level. To validate the reliability of our fully automated pipeline, we conduct a systematic human study, showing that our automatic evaluation achieves over 90% agreement with expert human judgments, while reducing the cost by up to 25 times. HAMLET reveals that LLMs struggle with fine-grained comprehension, especially at the leaf level, and are sensitive to positional effects like the lost-in-the-middle. Analytical queries pose greater challenges than narrative ones, and consistent performance gaps emerge between open-source and proprietary models, as well as across model scales. Our code and dataset are publicly available at https://github.com/DISL-Lab/HAMLET.
Prompt-Response Semantic Divergence Metrics for Faithfulness Hallucination and Misalignment Detection in Large Language Models
The proliferation of Large Language Models (LLMs) is challenged by hallucinations, critical failure modes where models generate non-factual, nonsensical or unfaithful text. This paper introduces Semantic Divergence Metrics (SDM), a novel lightweight framework for detecting Faithfulness Hallucinations -- events of severe deviations of LLMs responses from input contexts. We focus on a specific implementation of these LLM errors, {confabulations, defined as responses that are arbitrary and semantically misaligned with the user's query. Existing methods like Semantic Entropy test for arbitrariness by measuring the diversity of answers to a single, fixed prompt. Our SDM framework improves upon this by being more prompt-aware: we test for a deeper form of arbitrariness by measuring response consistency not only across multiple answers but also across multiple, semantically-equivalent paraphrases of the original prompt. Methodologically, our approach uses joint clustering on sentence embeddings to create a shared topic space for prompts and answers. A heatmap of topic co-occurances between prompts and responses can be viewed as a quantified two-dimensional visualization of the user-machine dialogue. We then compute a suite of information-theoretic metrics to measure the semantic divergence between prompts and responses. Our practical score, $\mathcal{S}_H$, combines the Jensen-Shannon divergence and Wasserstein distance to quantify this divergence, with a high score indicating a Faithfulness hallucination. Furthermore, we identify the KL divergence KL(Answer $||$ Prompt) as a powerful indicator of \textbf{Semantic Exploration}, a key signal for distinguishing different generative behaviors. These metrics are further combined into the Semantic Box, a diagnostic framework for classifying LLM response types, including the dangerous, confident confabulation.
Violent and lewd! Not Grand Theft Auto, Shakespeare's Macbeth
Last week, the Guardian spoke to the team behind Lili, a video game retelling of Macbeth, shown at the Cannes film festival. The headline quote from the piece was "Shakespeare would be writing for games today", which I have heard many times, and does make a lot of sense. Shakespeare worked in the Elizabethan theatre, a period in which plays were considered popularist entertainment hardly worthy of analysis or preservation – just like video games today! The authorities were also concerned about the lewd and violent nature of plays and the effect they may have on the impressionable masses – ditto! But if we agree that a 21st-century Shakespeare would be making games, what sort would he be making?
HAMLET: Healthcare-focused Adaptive Multilingual Learning Embedding-based Topic Modeling
Traditional topic models often struggle with contextual nuances and fail to adequately handle polysemy and rare words. This limitation typically results in topics that lack coherence and quality. Large Language Models (LLMs) can mitigate this issue by generating an initial set of topics. However, these raw topics frequently lack refinement and representativeness, which leads to redundancy without lexical similarity and reduced interpretability. This paper introduces HAMLET, a graph-driven architecture for cross-lingual healthcare topic modeling that uses LLMs. The proposed approach leverages neural-enhanced semantic fusion to refine the embeddings of topics generated by the LLM. Instead of relying solely on statistical co-occurrence or human interpretation to extract topics from a document corpus, this method introduces a topic embedding refinement that uses Bidirectional Encoder Representations from Transformers (BERT) and Graph Neural Networks (GNN). After topic generation, a hybrid technique that involves BERT and Sentence-BERT (SBERT) is employed for embedding. The topic representations are further refined using a GNN, which establishes connections between documents, topics, words, similar topics, and similar words. A novel method is introduced to compute similarities. Consequently, the topic embeddings are refined, and the top k topics are extracted. Experiments were conducted using two healthcare datasets, one in English and one in French, from which six sets were derived. The results demonstrate the effectiveness of HAMLET.
- North America > United States > New York > Broome County > Binghamton (0.04)
- North America > Martinique (0.04)
- Asia > Middle East > Jordan (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.69)
- Media > News (0.67)
- Health & Medicine > Health Care Technology > Medical Record (0.46)
How Hamlet found a virtual stage in Grand Theft Auto
Young cast member Nora has benefited from this opportunity. She openly thanks those in game for giving her the opportunity to act and express herself freely, particularly as someone going through a gender transition. "It's amazing that her first production experience of Shakespeare, beyond studying in school, was in Grand Theft Auto," Grylls says. "That's what kept us going really, the fact people kept coming back because they wanted to." Grylls, Crane and Oosterveen's committed madness has paid off.
- Leisure & Entertainment > Games > Computer Games (0.65)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.65)
- Media (0.64)
The Morning After: This is Tesla's robotaxi, the Cybercab
At Tesla's We, Robot event at Warner Bros. Discovery's studio in California, the company finally unveiled its robotaxi. The car is expected to go into production before 2027, but even Musk caveated that, saying he was "highly optimistic with timeframes." The Cybercab doesn't have a steering wheel and, according to Elon Musk (so pinch of salt!), could be very cheap to run. The Tesla boss said the operating cost of the robotaxi would be 20 cents a mile, 30 to 40 cents with taxes. He also confirmed people can buy one and that Tesla expects to sell the Cybercab for below 30,000.
- North America > United States > California (0.26)
- Asia > China (0.06)
- Transportation > Ground > Road (1.00)
- Information Technology > Robotics & Automation (0.99)
- Automobiles & Trucks (0.99)
- Leisure & Entertainment (0.73)
Mash-up of Grand Theft Auto and Hamlet is coming to theaters in the US
Mubi has secured the US rights and global SVOD rights to Grand Theft Hamlet. In this documentary, two out-of-work actors attempt to stage an entire production of William Shakespeare's tragedy Hamlet within the game world of Grand Theft Auto Online during the Covid-19 pandemic. According to The Hollywood Reporter, Mubi plans to give the film a release in early 2025, and Mubi's own posts on X say that it will be in "US theaters and streaming globally." The movie is composed of more than 300 hours of GTA footage. Sam Crane and Mark Oosterveen might be the main drivers of making the play the thing, but they looped in other random players through in-game auditions to fill out the cast.
- Media > Film (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.89)
- Leisure & Entertainment > Games > Computer Games (0.78)
Royal Reveals: LiDAR Mapping of Kronborg Castle, Echoes of Hamlet's Halls
This paper presents a large scale dataset from a meticulous 360-degree LiDAR (Light Detection and Ranging) scan conducted on Kronborg Castle, a renowned Renaissance fortress located in Elsinore (Helsing{\o}r), Denmark, famously associated with Shakespeare's "Hamlet." Utilising a vertical mounted, gimbal stabilised, 16 channel, 360-degree Velodyne VLP-16 LiDAR scanner, paired with an Intel RealSense L515 depth camera. This research offers an unparalleled digital representation of the castle's intricate architectural details and structural nuances, enabling fellow researchers to conduct experiments utilising the data for SLAM (Simultaneous Localisation and Mapping) as well as floorplan generation.
- Europe > Sweden (0.05)
- Europe > Northern Europe (0.05)
- Europe > Denmark > Capital Region > Copenhagen (0.05)
- Atlantic Ocean > North Atlantic Ocean > Baltic Sea (0.05)