basketball
- North America > United States > California > San Francisco County > San Francisco (0.15)
- Europe > Austria > Vienna (0.14)
- Europe > Sweden > Stockholm > Stockholm (0.06)
- (23 more...)
- Health & Medicine (0.94)
- Transportation > Ground > Rail (0.93)
- Information Technology > Information Management > Search (0.69)
- Information Technology > Artificial Intelligence > Natural Language (0.69)
- Information Technology > Sensing and Signal Processing > Image Processing (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.47)
- North America > United States > California > San Francisco County > San Francisco (0.15)
- Europe > Austria > Vienna (0.14)
- Europe > Sweden > Stockholm > Stockholm (0.06)
- (23 more...)
- Health & Medicine (0.94)
- Transportation > Ground > Rail (0.93)
- Information Technology > Artificial Intelligence > Natural Language (0.70)
- Information Technology > Information Management > Search (0.70)
- Information Technology > Sensing and Signal Processing > Image Processing (0.49)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.47)
Inferring Event Descriptions from Time Series with Language Models
Tan, Mingtian, Merrill, Mike A., Gottesman, Zack, Althoff, Tim, Evans, David, Hartvigsen, Tom
Time series data measure how environments change over time and drive decision-making in critical domains like finance and healthcare. When analyzing time series, we often seek to understand the underlying events occurring in the measured environment. For example, one might ask: What caused a sharp drop in the stock price? Events are often described with natural language, so we conduct the first study of whether Large Language Models (LLMs) can infer natural language events from time series. We curate a new benchmark featuring win probabilities collected from 4,200 basketball and American football games, featuring 1.7M timesteps with real value data and corresponding natural language events. Building on the recent wave of using LLMs on time series, we evaluate 16 LLMs and find that they demonstrate promising abilities to infer events from time series data. The open-weights DeepSeek-R1 32B model outperforms proprietary models like GPT-4o. Despite this impressive initial performance, we also find clear avenues to improve recent models, as we identify failures when altering the provided context, event sequence lengths, and evaluation strategy. (All resources needed to reproduce our work are available: https://github.com/BennyTMT/GAMETime)
- North America > United States > Virginia (0.05)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- Leisure & Entertainment > Sports > Football (1.00)
- Leisure & Entertainment > Sports > Basketball (1.00)
- Health & Medicine (1.00)
- (2 more...)
The study of short texts in digital politics: Document aggregation for topic modeling
Nakka, Nitheesha, Yalcin, Omer F., Desmarais, Bruce A., Rajtmajer, Sarah, Monroe, Burt
Statistical topic modeling is widely used in political science to study text. Researchers examine documents of varying lengths, from tweets to speeches. There is ongoing debate on how document length affects the interpretability of topic models. We investigate the effects of aggregating short documents into larger ones based on natural units that partition the corpus. In our study, we analyze one million tweets by U.S. state legislators from April 2016 to September 2020. We find that for documents aggregated at the account level, topics are more associated with individual states than when using individual tweets. This finding is replicated with Wikipedia pages aggregated by birth cities, showing how document definitions can impact topic modeling results.
- North America > United States > Florida (0.93)
- North America > United States > Texas (0.46)
- North America > United States > Louisiana (0.46)
- (35 more...)
- Media > News (1.00)
- Media > Music (1.00)
- Leisure & Entertainment > Sports > Soccer (1.00)
- (15 more...)
Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation
Wang, Yiping, He, Xuehai, Wang, Kuan, Ma, Luyao, Yang, Jianwei, Wang, Shuohang, Du, Simon Shaolei, Shen, Yelong
The current state-of-the-art video generative models can produce commercial-grade videos with highly realistic details. However, they still struggle to coherently present multiple sequential events in the stories specified by the prompts, which is foreseeable an essential capability for future long video generation scenarios. For example, top T2V generative models still fail to generate a video of the short simple story 'how to put an elephant into a refrigerator.' While existing detail-oriented benchmarks primarily focus on fine-grained metrics like aesthetic quality and spatial-temporal consistency, they fall short of evaluating models' abilities to handle event-level story presentation. To address this gap, we introduce StoryEval, a story-oriented benchmark specifically designed to assess text-to-video (T2V) models' story-completion capabilities. StoryEval features 423 prompts spanning 7 classes, each representing short stories composed of 2-4 consecutive events. We employ advanced vision-language models, such as GPT-4V and LLaVA-OV-Chat-72B, to verify the completion of each event in the generated videos, applying a unanimous voting method to enhance reliability. Our methods ensure high alignment with human evaluations, and the evaluation of 11 models reveals its challenge, with none exceeding an average story-completion rate of 50%. StoryEval provides a new benchmark for advancing T2V models and highlights the challenges and opportunities in developing next-generation solutions for coherent story-driven video generation.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
Adversarial Circuit Evaluation
de Bos, Niels uit, Garriga-Alonso, Adrià
Circuits are supposed to accurately describe how a neural network performs a specific task, but do they really? We evaluate three circuits found in the literature (IOI, greater-than, and docstring) in an adversarial manner, considering inputs where the circuit's behavior maximally diverges from the full model. Concretely, we measure the KL divergence between the full model's output and the circuit's output, calculated through resample ablation, and we analyze the worst-performing inputs. Our results show that the circuits for the IOI and docstring tasks fail to behave similarly to the full model even on completely benign inputs from the original task, indicating that more robust circuits are needed for safety-critical applications.
- North America > United States > Texas (0.14)
- North America > United States > California > San Diego County > San Diego (0.04)
Mathematical models for off-ball scoring prediction in basketball
In professional basketball, the accurate prediction of scoring opportunities based on strategic decision-making is crucial for space and player evaluations. However, traditional models often face challenges in accounting for the complexities of off-ball movements, which are essential for accurate predictive performance. In this study, we propose two mathematical models to predict off-ball scoring opportunities in basketball, considering both pass-to-score and dribble-to-score movements: the Ball Movement for Off-ball Scoring (BMOS) and the Ball Intercept and Movement for Off-ball Scoring (BIMOS) models. The BMOS adapts principles from the Off-Ball Scoring Opportunities (OBSO) model, originally designed for soccer, to basketball, whereas the BIMOS also incorporates the likelihood of interception during ball movements. We evaluated these models using player tracking data from 630 NBA games in the 2015-2016 regular season, demonstrating that the BIMOS outperforms the BMOS in terms of scoring prediction accuracy. Thus, our models provide valuable insights for tactical analysis and player evaluation in basketball.
- Oceania > Australia > Australian Capital Territory > Canberra (0.04)
- Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)
- Asia > Japan > Honshū > Kantō > Saitama Prefecture > Saitama (0.04)
- Asia > Japan > Honshū > Chūbu > Aichi Prefecture > Nagoya (0.04)
Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation
Varshney, Neeraj, Raj, Satyam, Mishra, Venkatesh, Chatterjee, Agneet, Sarkar, Ritika, Saeidi, Amir, Baral, Chitta
Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks. However, they have been shown to suffer from a critical limitation pertinent to 'hallucination' in their output. Recent research has focused on investigating and addressing this problem for a variety of tasks such as biography generation, question answering, abstractive summarization, and dialogue generation. However, the crucial aspect pertaining to 'negation' has remained considerably underexplored. Negation is important because it adds depth and nuance to the understanding of language and is also crucial for logical reasoning and inference. In this work, we address the above limitation and particularly focus on studying the impact of negation in LLM hallucinations. Specifically, we study four tasks with negation: 'false premise completion', 'constrained fact generation', 'multiple choice question answering', and 'fact generation'. We show that open-source state-of-the-art LLMs such as LLaMA-2-chat, Vicuna, and Orca-2 hallucinate considerably on all these tasks involving negation which underlines a critical shortcoming of these models. Addressing this problem, we further study numerous strategies to mitigate these hallucinations and demonstrate their impact.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China (0.14)
- Oceania > Australia (0.05)
- (30 more...)
- Personal > Honors (1.00)
- Questionnaire & Opinion Survey (0.66)
- Media > Music (1.00)
- Leisure & Entertainment > Sports > Soccer (1.00)
- Leisure & Entertainment > Sports > Cricket (1.00)
- (4 more...)
Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs
Chughtai, Bilal, Cooney, Alan, Nanda, Neel
How do transformer-based large language models (LLMs) store and retrieve knowledge? We focus on the most basic form of this task -- factual recall, where the model is tasked with explicitly surfacing stored facts in prompts of form `Fact: The Colosseum is in the country of'. We find that the mechanistic story behind factual recall is more complex than previously thought. It comprises several distinct, independent, and qualitatively different mechanisms that additively combine, constructively interfering on the correct attribute. We term this generic phenomena the additive motif: models compute through summing up multiple independent contributions. Each mechanism's contribution may be insufficient alone, but summing results in constructive interfere on the correct answer. In addition, we extend the method of direct logit attribution to attribute an attention head's output to individual source tokens. We use this technique to unpack what we call `mixed heads' -- which are themselves a pair of two separate additive updates from different source tokens.
- South America > Brazil (0.14)
- Europe > Italy (0.05)
- Europe > Germany (0.05)
- (96 more...)
- Leisure & Entertainment > Sports > Soccer (1.00)
- Leisure & Entertainment > Sports > Basketball (0.68)
Decoding In-Context Learning: Neuroscience-inspired Analysis of Representations in Large Language Models
Yousefi, Safoora, Betthauser, Leo, Hasanbeig, Hosein, Millière, Raphaël, Momennejad, Ida
Large language models (LLMs) exhibit remarkable performance improvement through in-context learning (ICL) by leveraging task-specific examples in the input. However, the mechanisms behind this improvement remain elusive. In this work, we investigate how LLM embeddings and attention representations change following in-context-learning, and how these changes mediate improvement in behavior. We employ neuroscience-inspired techniques such as representational similarity analysis (RSA) and propose novel methods for parameterized probing and measuring ratio of attention to relevant vs. irrelevant information in Llama-2 70B and Vicuna 13B. We designed two tasks with a priori relationships among their conditions: linear regression and reading comprehension. We formed hypotheses about expected similarities in task representations and measured hypothesis alignment of LLM representations before and after ICL as well as changes in attention. Our analyses revealed a meaningful correlation between improvements in behavior after ICL and changes in both embeddings and attention weights across LLM layers. This empirical framework empowers a nuanced understanding of how latent representations shape LLM behavior, offering valuable tools and insights for future research and practical applications.
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Washington > King County > Redmond (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)