Goto

Collaborating Authors

 circumstance


Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

Neural Information Processing Systems

AI assistants such as ChatGPT are trained to respond to users by saying, I am a large language model".This raises questions. Do such models know'' that they are LLMs and reliably act on this knowledge? Are they aware of their current circumstances, such as being deployed to the public?We refer to a model's knowledge of itself and its circumstances as situational awareness


Self-Explaining Deviations for Coordination

Neural Information Processing Systems

Fully cooperative, partially observable multi-agent problems are ubiquitous in the real world. In this paper, we focus on a specific subclass of coordination problems in which humans are able to discover self-explaining deviations (SEDs). SEDs are actions that deviate from the common understanding of what reasonable behavior would be in normal circumstances. They are taken with the intention of causing another agent or other agents to realize, using theory of mind, that the circumstance must be abnormal. We motivate this idea with a real world example and formalize its definition. Next, we introduce an algorithm for improvement maximizing SEDs (IMPROVISED). Lastly, we evaluate IMPROVISED both in an illustrative toy setting and the popular benchmark setting Hanabi, where we show that it can produce so called finesse plays.


"Sirāt" Is a Harrowing, Exhilarating Dance of Death

The New Yorker

At one point, Luis assumes that he and Esteban have been abandoned, only to realize, with a start, that their newfound friends are actually circling back to help. In such moments, we grasp the source of the story's mysterious power: a tough-minded understanding that kindness is rare yet persistent, and quite possibly an affront to the laws of nature. "Sirāt" is a chain of defiantly compassionate acts--noble human improbabilities that take on, in retrospect, an air of fatalistic inevitability. Laxe, a restless wanderer himself, knows Morocco well. He shot his first feature, "You All Are Captains" (2011), in Tangier, where he'd spent several years working at a shelter for disadvantaged children. Several of these children appeared in the movie--a formally playful collision of fiction and documentary in which Laxe, also making an appearance, slyly interrogated his European outsider-artist role. Next came "Mimosas" (2016), an elusive, arrestingly gorgeous drama about a caravan bearing a dying sheikh across Morocco's Atlas Mountains to his homeland. The film had the beauty of a travelogue and the opacity of a parable. Its most dynamic character was a fiery Muslim preacher who warned his fellow-travellers not to stray, geographically or morally.


"I Sweated So Much I Never Needed to Pee": Life in China's Relentless Gig Economy

WIRED

In his newly translated memoir, Hu Anyan captures the brutal labor and quiet grace of life at the edge of China's booming ecommerce industry. "Often, sweat was dripping down my back within the first two hours of a shift and would not stop dripping until the next morning," writes Hu Anyan in the new English translation of his bestselling book . "I sweated so much I never once needed to pee." This passage was on my mind as I read his book in Tianjin during one hot, Labubu brainrot summer, during which yet another unprecedented annual heat wave had forced almost everyone inside--except for the tireless couriers and delivery workers, whose services are in higher demand when temperatures soar. Hu's writing first went viral in China five years ago, and he's now a prolific, established author in the country.


Soppia: A Structured Prompting Framework for the Proportional Assessment of Non-Pecuniary Damages in Personal Injury Cases

Araujo, Jorge Alberto

arXiv.org Artificial Intelligence

Applying complex legal rules characterized by multiple, heterogeneously weighted criteria presents a fundamental challenge in judicial decision-making, often hindering the consistent realization of legislative intent. This challenge is particularly evident in the quantification of non-pecuniary damages in personal injury cases. This paper introduces Soppia, a structured prompting framework designed to assist legal professionals in navigating this complexity. By leveraging advanced AI, the system ensures a comprehensive and balanced analysis of all stipulated criteria, fulfilling the legislator's intent that compensation be determined through a holistic assessment of each case. Using the twelve criteria for non-pecuniary damages established in the Brazilian CLT (Art. 223-G) as a case study, we demonstrate how Soppia (System for Ordered Proportional and Pondered Intelligent Assessment) operationalizes nuanced legal commands into a practical, replicable, and transparent methodology. The framework enhances consistency and predictability while providing a versatile and explainable tool adaptable across multi-criteria legal contexts, bridging normative interpretation and computational reasoning toward auditable legal AI.


MARCUS: An Event-Centric NLP Pipeline that generates Character Arcs from Narratives

Bhyravajjula, Sriharsh, Narayan, Ujwal, Shrivastava, Manish

arXiv.org Artificial Intelligence

Character arcs are important theoretical devices employed in literary studies to understand character journeys, identify tropes across literary genres, and establish similarities between narratives. This work addresses the novel task of computationally generating event-centric, relation-based character arcs from narratives. Providing a quantitative representation for arcs brings tangibility to a theoretical concept and paves the way for subsequent applications. We present MARCUS (Modelling Arcs for Understanding Stories), an NLP pipeline that extracts events, participant characters, implied emotion, and sentiment to model inter-character relations. MARCUS tracks and aggregates these relations across the narrative to generate character arcs as graphical plots. We generate character arcs from two extended fantasy series, Harry Potter and Lord of the Rings. We evaluate our approach before outlining existing challenges, suggesting applications of our pipeline, and discussing future work.


Mark Cuban Would Still Have Dinner With Donald Trump

WIRED

The billionaire investor campaigned for Kamala Harris, but thinks tech execs have a "moral imperative" to play nice with the president. Back in May, Mark Cuban appeared in his last episode of ABC's after spending more than a decade on the show investing in--or deprecating--entrepreneurs' big ideas. But that doesn't mean the billionaire is going away. Yes, Cuban loves to talk--about ideas, about the future, about what it takes to actually make America healthy again. Or, at least, to get Americans more affordable drugs, which Cuban is endeavoring to do with his startup, Cost Plus Drug Company. Nor does Cuban, like many billionaire businessmen, shy away from talking politics: Does he like President Trump? But would he join the president for dinner like so many of his peers have in recent months? With enthusiasm, according to a conversation we had for this week's episode of . Keep reading to find out why. Just so you know--well it's too late now--we always start these conversations with some rapid-fire questions. What is the smartest investment you ever made? What's the dumbest purchase you ever made? Alright, one word to describe the startup pitches that you hate. Would you rather invest in passion or in numbers? Tell me a little bit about why.


Formalizing Style in Personal Narratives

Cortal, Gustave, Finkel, Alain

arXiv.org Artificial Intelligence

Personal narratives are stories authors construct to make meaning of their experiences. Style, the distinctive way authors use language to express themselves, is fundamental to how these narratives convey subjective experiences. Yet there is a lack of a formal framework for systematically analyzing these stylistic choices. We present a novel approach that formalizes style in personal narratives as patterns in the linguistic choices authors make when communicating subjective experiences. Our framework integrates three domains: functional linguistics establishes language as a system of meaningful choices, computer science provides methods for automatically extracting and analyzing sequential patterns, and these patterns are linked to psychological observations. Using language models, we automatically extract linguistic features such as processes, participants, and circumstances. We apply our framework to hundreds of dream narratives, including a case study on a war veteran with post-traumatic stress disorder. Analysis of his narratives uncovers distinctive patterns, particularly how verbal processes dominate over mental ones, illustrating the relationship between linguistic choices and psychological states.


Nobody Cares If Music Is Real Anymore

The Atlantic - Technology

The traffic receded as Chicago withdrew into the distance behind me on Interstate 90. The speakers in my rental car, playing Spotify from my smartphone, put out the opening riff of a laid-back psychedelic-rock song. When the lyrics came, delivered in a folksy vibrato, they matched my mood: "Smoke in the sky / No peace found," the band's vocalist sang. Except perhaps he didn't really sing, because he doesn't exist. By all appearances, neither does the band, called the Velvet Sundown.


Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

Neural Information Processing Systems

AI assistants such as ChatGPT are trained to respond to users by saying, "I am a large language model".This raises questions. Do such models "know'' that they are LLMs and reliably act on this knowledge? Are they "aware" of their current circumstances, such as being deployed to the public?We refer to a model's knowledge of itself and its circumstances as situational awareness.To quantify situational awareness in LLMs, we introduce a range of behavioral tests, based on question answering and instruction following. These tests form the Situational Awareness Dataset (SAD), a benchmark comprising 7 task categories and over 13,000 questions.The benchmark tests numerous abilities, including the capacity of LLMs to (i) recognize their own generated text, (ii) predict their own behavior, (iii) determine whether a prompt is from internal evaluation or real-world deployment, and (iv) follow instructions that depend on self-knowledge.We evaluate 16 LLMs on SAD, including both base (pretrained) and chat models.While all models perform better than chance, even the highest-scoring model (Claude 3 Opus) is far from a human baseline on certain tasks. We also observe that performance on SAD is only partially predicted by metrics of general knowledge.