championship
Chess Grandmaster Magnus Carlsen Beats ChatGPT Without Losing a Single Piece
The world's top chess player defeated ChatGPT in an online match in only 53 moves. Magnus Carlsen won the game without losing a single piece, while ChatGPT lost all its pawns, screenshots the Norwegian grandmaster shared on X on July 10 showed. "I sometimes get bored while travelling," Carlsen captioned the post. "That was methodical, clean, and sharp. Well played!" ChatGPT said to him, according to the screenshots Carlsen posted.
Jim Harbaugh added to lawsuit about former assistant's alleged hacking to obtain photos of athletes
Jim Harbaugh joins Colin Cowherd to discuss the culture he's created with the Los Angeles Chargers, Justin Herbert's mentality and the'dog-eat-dog' chaos of the AFC West. Los Angeles Chargers head coach Jim Harbaugh was added Friday to a lawsuit against his former employer, the University of Michigan, and a former assistant football coach accused of hacking into computer systems to acquire photos of college athletes. Attorneys claim Harbaugh allowed Matt Weiss to continue working as co-offensive coordinator in a national playoff game after Weiss was seen viewing private information on a computer in December 2022. "The university's delay in taking meaningful protective action until after a high-stakes game sends a clear message: Student welfare was secondary," said Parker Stinar, the lead lawyer in a class-action lawsuit arising from a criminal investigation of Weiss. "Had Harbaugh implemented basic oversight of his staff, plaintiffs and the class would have been protected against predators such as Weiss," the updated lawsuit states.
- North America > United States > California > Los Angeles County > Los Angeles (0.47)
- North America > United States > Michigan (0.47)
- Leisure & Entertainment > Sports > Football (1.00)
- Law > Litigation (1.00)
LLM-Symbolic Integration for Robust Temporal Tabular Reasoning
Kulkarni, Atharv, Dixit, Kushagra, Srikumar, Vivek, Roth, Dan, Gupta, Vivek
Temporal tabular question answering presents a significant challenge for Large Language Models (LLMs), requiring robust reasoning over structured data, which is a task where traditional prompting methods often fall short. These methods face challenges such as memorization, sensitivity to table size, and reduced performance on complex queries. To overcome these limitations, we introduce TempTabQA-C, a synthetic dataset designed for systematic and controlled evaluations, alongside a symbolic intermediate representation that transforms tables into database schemas. This structured approach allows LLMs to generate and execute SQL queries, enhancing generalization and mitigating biases. By incorporating adaptive few-shot prompting with contextually tailored examples, our method achieves superior robustness, scalability, and performance. Experimental results consistently highlight improvements across key challenges, setting a new benchmark for robust temporal reasoning with LLMs.
- Asia > China > Beijing > Beijing (0.05)
- North America > United States > Utah (0.04)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- (15 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)
CORG: Generating Answers from Complex, Interrelated Contexts
Lee, Hyunji, Dernoncourt, Franck, Bui, Trung, Yoon, Seunghyun
In a real-world corpus, knowledge frequently recurs across documents but often contains inconsistencies due to ambiguous naming, outdated information, or errors, leading to complex interrelationships between contexts. Previous research has shown that language models struggle with these complexities, typically focusing on single factors in isolation. We classify these relationships into four types: distracting, ambiguous, counterfactual, and duplicated. Our analysis reveals that no single approach effectively addresses all these interrelationships simultaneously. Therefore, we introduce Context Organizer (CORG), a framework that organizes multiple contexts into independently processed groups. This design allows the model to efficiently find all relevant answers while ensuring disambiguation. CORG consists of three key components: a graph constructor, a reranker, and an aggregator. Our results demonstrate that CORG balances performance and efficiency effectively, outperforming existing grouping methods and achieving comparable results to more computationally intensive, single-context approaches.
- North America > Canada > British Columbia > Vancouver Island > Capital Regional District > Victoria (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.05)
- Europe > Slovakia > Košice > Košice (0.04)
- (3 more...)
- Media > Television (0.68)
- Media > Film (0.46)
- Leisure & Entertainment > Sports > Hockey (0.31)
Memory-augmented Query Reconstruction for LLM-based Knowledge Graph Reasoning
Xu, Mufan, Liang, Gewen, Chen, Kehai, Wang, Wei, Zhou, Xun, Yang, Muyun, Zhao, Tiejun, Zhang, Min
Large language models (LLMs) have achieved remarkable performance on knowledge graph question answering (KGQA) tasks by planning and interacting with knowledge graphs. However, existing methods often confuse tool utilization with knowledge reasoning, harming readability of model outputs and giving rise to hallucinatory tool invocations, which hinder the advancement of KGQA. To address this issue, we propose Memory-augmented Query Reconstruction for LLM-based Knowledge Graph Reasoning (MemQ) to decouple LLM from tool invocation tasks using LLM-built query memory. By establishing a memory module with explicit descriptions of query statements, the proposed MemQ facilitates the KGQA process with natural language reasoning and memory-augmented query reconstruction. Meanwhile, we design an effective and readable reasoning to enhance the LLM's reasoning capability in KGQA. Experimental results that MemQ achieves state-of-the-art performance on widely used benchmarks WebQSP and CWQ.
- North America > United States (0.46)
- Europe > Belgium (0.14)
- Asia > Thailand (0.14)
- Asia > China (0.14)
The study of short texts in digital politics: Document aggregation for topic modeling
Nakka, Nitheesha, Yalcin, Omer F., Desmarais, Bruce A., Rajtmajer, Sarah, Monroe, Burt
Statistical topic modeling is widely used in political science to study text. Researchers examine documents of varying lengths, from tweets to speeches. There is ongoing debate on how document length affects the interpretability of topic models. We investigate the effects of aggregating short documents into larger ones based on natural units that partition the corpus. In our study, we analyze one million tweets by U.S. state legislators from April 2016 to September 2020. We find that for documents aggregated at the account level, topics are more associated with individual states than when using individual tweets. This finding is replicated with Wikipedia pages aggregated by birth cities, showing how document definitions can impact topic modeling results.
- North America > United States > Florida (0.93)
- North America > United States > Texas (0.46)
- North America > United States > Louisiana (0.46)
- (35 more...)
- Media > News (1.00)
- Media > Music (1.00)
- Leisure & Entertainment > Sports > Soccer (1.00)
- (15 more...)
\llinstruct: An Instruction-tuned model for English Language Proficiency Assessments
We present \llinstruct: An 8B instruction-tuned model that is designed to generate content for English Language Proficiency Assessments (ELPA) and related applications. Our work involves creating a new dataset of 70K instructions and explanations in the ELPA domain and using these to fine-tune Llama-3 8B models (SFT) of different sizes (e.g., SFT-17K, SFT-50K and SFT-70K). Human evaluations are conducted over unseen instructions to compare these SFT models against SOTA models (e.g., Dolly-2, Mistral, Llama-3 base version, and GPT-3.5). The findings show although all three SFT models perform comparably, the model trained on largest instruction dataset -- SFT-70K - leads to the most valid outputs ready for assessments. However, although the SFT models perform better than larger model, e.g., GPT 3.5 on the aspect of explanations of outputs, many outputs still need human interventions to make them actual ready for real world assessments.
- Oceania > Australia (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (5 more...)
- Leisure & Entertainment > Sports (1.00)
- Education > Curriculum > Subject-Specific Education (0.71)
Wimbledon to replace tennis line judges with electronic system from 2025
Wimbledon will break with tradition and replace line judges with electronic line calling from next year's championships, the All England Club confirmed. The sight of immaculately dressed line judges standing or crouching at the side and back of the grass courts has been a feature at the Grand Slam for 147 years. Electronic line calling was first used as an experiment at the ATP Next Gen Finals in Milan in 2017 and was adopted more widely during the COVID-19 pandemic. It will be used on all courts across ATP Tour events from 2025. The Australian Open and US Open have already replaced line judges with electronic calling although the French Open still relies on the human eye.
Generating Tables from the Parametric Knowledge of Language Models
Berkovitch, Yevgeni, Glickman, Oren, Somech, Amit, Wolfson, Tomer
We explore generating factual and accurate tables from the parametric knowledge of large language models (LLMs). While LLMs have demonstrated impressive capabilities in recreating knowledge bases and generating free-form text, we focus on generating structured tabular data, which is crucial in domains like finance and healthcare. We examine the table generation abilities of four state-of-the-art LLMs: GPT-3.5, GPT-4, Llama2-13B, and Llama2-70B, using three prompting methods for table generation: (a) full-table, (b) row-by-row; (c) cell-by-cell. For evaluation, we introduce a novel benchmark, WikiTabGen which contains 100 curated Wikipedia tables. Tables are further processed to ensure their factual correctness and manually annotated with short natural language descriptions. Our findings reveal that table generation remains a challenge, with GPT-4 reaching the highest accuracy at 19.6%. Our detailed analysis sheds light on how various table properties, such as size, table popularity, and numerical content, influence generation performance. This work highlights the unique challenges in LLM-based table generation and provides a solid evaluation framework for future research. Our code, prompts and data are all publicly available: https://github.com/analysis-bots/WikiTabGen
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (10 more...)
Aligning Language Models to Explicitly Handle Ambiguity
Kim, Hyuhng Joon, Kim, Youna, Park, Cheonbok, Kim, Junyeob, Park, Choonghyun, Yoo, Kang Min, Lee, Sang-goo, Kim, Taeuk
In interactions between users and language model agents, user utterances frequently exhibit ellipsis (omission of words or phrases) or imprecision (lack of exactness) to prioritize efficiency. This can lead to varying interpretations of the same input based on different assumptions or background knowledge. It is thus crucial for agents to adeptly handle the inherent ambiguity in queries to ensure reliability. However, even state-of-the-art large language models (LLMs) still face challenges in such scenarios, primarily due to the following hurdles: (1) LLMs are not explicitly trained to deal with ambiguous utterances; (2) the degree of ambiguity perceived by the LLMs may vary depending on the possessed knowledge. To address these issues, we propose Alignment with Perceived Ambiguity (APA), a novel pipeline that aligns LLMs to manage ambiguous queries by leveraging their own assessment of ambiguity (i.e., perceived ambiguity). Experimental results on question-answering datasets demonstrate that APA empowers LLMs to explicitly detect and manage ambiguous queries while retaining the ability to answer clear questions. Furthermore, our finding proves that APA excels beyond training with gold-standard labels, especially in out-of-distribution scenarios.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > India (0.14)
- Asia > Singapore (0.04)
- (11 more...)