firework
Twelve killed in China fireworks shop blast during Lunar New Year
An explosion at a fireworks shop in central China's Hubei province has killed at least 12 people, state media reported, marking the second deadly blast linked to fireworks as the country celebrates the Lunar New Year . The explosion tore through the shop in Xiangyang on Wednesday afternoon. Officials said five children and seven adults died in the explosion. The victims included the shop owner and customers who had been buying fireworks for holiday celebrations. Some had travelled from other areas to visit relatives during the festive period .
- North America > United States (0.53)
- South America (0.42)
- North America > Central America (0.42)
- (9 more...)
- Government (0.54)
- Media (0.34)
A Property Proofs
Section 3, which are analogous to those of [35]: Proposition. Hanabi is a cooperative card game that can be played with 2 to 5 people. In Hanabi, players can see all other players' hands but their own. Hanabi ( k) means'fireworks' in Japanese. Hint - The active agent chooses another player to grant a hint to.
STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models
Chiang, Cheng-Han, Wang, Xiaofei, Li, Linjie, Lin, Chung-Ching, Lin, Kevin, Liu, Shujie, Wang, Zhendong, Yang, Zhengyuan, Lee, Hung-yi, Wang, Lijuan
Spoken Language Models (SLMs) are designed to take speech inputs and produce spoken responses. However, current SLMs lack the ability to perform an internal, unspoken thinking process before responding. In contrast, humans typically engage in complex mental reasoning internally, enabling them to communicate ideas clearly and concisely. Thus, integrating an unspoken thought process into SLMs is highly desirable. While naively generating a complete chain-of-thought (CoT) reasoning before starting to talk can enable thinking for SLMs, this induces additional latency for the speech response, as the CoT reasoning can be arbitrarily long. To solve this issue, we propose Stitch, a novel generation method that alternates between the generation of unspoken reasoning chunks and spoken response chunks. Since the audio duration of a chunk of spoken response is much longer than the time to generate the tokens in a chunk of spoken response, we use the remaining free time to generate the unspoken reasoning tokens. When a chunk of audio is played to the user, the model continues to generate the next unspoken reasoning chunk, achieving simultaneous thinking and talking. Remarkably, Stitch matches the latency of baselines that cannot generate unspoken CoT by design while outperforming those baselines by 15% on math reasoning datasets; Stitch also performs equally well on non-reasoning datasets as those baseline models. Some animations and demonstrations are on the project page: https://d223302.github.io/STITCH.
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (5 more...)
WhisperKit: On-device Real-time ASR with Billion-Scale Transformers
Orhon, Atila, Okan, Arda, Durmus, Berkin, Nagengast, Zach, Pacheco, Eduardo
Real-time Automatic Speech Recognition (ASR) is a fundamental building block for many commercial applications of ML, including live captioning, dictation, meeting transcriptions, and medical scribes. Accuracy and latency are the most important factors when companies select a system to deploy. We present WhisperKit, an optimized on-device inference system for real-time ASR that significantly outperforms leading cloud-based systems. We benchmark against server-side systems that deploy a diverse set of models, including a frontier model (OpenAI gpt-4o-transcribe), a proprietary model (Deepgram nova-3), and an open-source model (Fireworks large-v3-turbo).Our results show that WhisperKit matches the lowest latency at 0.46s while achieving the highest accuracy 2.2% WER. The optimizations behind the WhisperKit system are described in detail in this paper.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada (0.04)
No more fireworks? Big change coming to 4th of July at Pasadena's Rose Bowl
Marking the end of a longtime tradition, the Fourth of July celebration at the Rose Bowl in Pasadena will not feature a fireworks show this year. Instead, there will be a drone show. The move comes as some venues have switched from fireworks to drone shows -- in which a fleet of drones performs a choreographed light show -- to celebrate the 4th of July. But drone shows have fallen flat for some. Notably Redondo Beach and Laguna Beach switched back to fireworks after trying out drone shows, and some promoters of fireworks shows have voiced criticism over efforts to transition to drone shows.
- North America > United States > California > Los Angeles County > Redondo Beach (0.26)
- North America > United States > California > San Diego County > San Diego (0.07)
- North America > United States > California > San Francisco County > San Francisco (0.05)
Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs
Kim, Jaemin, Chang, Hangeol, Hwang, Hyunmin, Kim, Choonghan, Ye, Jong Chul
Large Language Models (LLMs) have demonstrated remarkable general capabilities, but enhancing skills such as reasoning often demands substantial computational resources and may compromise their generalization. While Parameter-Efficient Fine-Tuning (PEFT) methods offer a more resource-conscious alternative, they typically requires retraining for each LLM backbone due to architectural dependencies. To address these challenges, here we propose Universal Reasoner (UniR) - a single, lightweight, composable, and plug-and-play reasoning module that can be used with any frozen LLM to endow it with specialized reasoning capabilities. Specifically, UniR decomposes the reward into a standalone reasoning module that is trained independently using predefined rewards, effectively translating trajectory-level signals into token-level guidance. Once trained, UniR can be combined with any frozen LLM at inference time by simply adding its output logits to those of the LLM backbone. This additive structure naturally enables modular composition: multiple UniR modules trained for different tasks can be jointly applied by summing their logits, enabling complex reasoning via composition. Experimental results on mathematical reasoning and machine translation tasks show that UniR significantly outperforms existing baseline fine-tuning methods using the Llama3.2 model. Furthermore, UniR demonstrates strong weak-to-strong generalization: reasoning modules trained on smaller models effectively guide much larger LLMs. This makes UniR a cost-efficient, adaptable, and robust solution for enhancing reasoning in LLMs without compromising their core capabilities. Code is open-sourced at https://github.com/hangeol/UniR
Model Equality Testing: Which Model Is This API Serving?
Gao, Irena, Liang, Percy, Guestrin, Carlos
Users often interact with large language models through black-box inference APIs, both for closed- and open-weight models (e.g., Llama models are popularly accessed via Amazon Bedrock and Azure AI Studio). In order to cut costs or add functionality, API providers may quantize, watermark, or finetune the underlying model, changing the output distribution -- often without notifying users. We formalize detecting such distortions as Model Equality Testing, a two-sample testing problem, where the user collects samples from the API and a reference distribution and conducts a statistical test to see if the two distributions are the same. We find that tests based on the Maximum Mean Discrepancy between distributions are powerful for this task: a test built on a simple string kernel achieves a median of 77.4% power against a range of distortions, using an average of just 10 samples per prompt. We then apply this test to commercial inference APIs for four Llama models, finding that 11 out of 31 endpoints serve different distributions than reference weights released by Meta.
- Africa > Middle East > Libya (0.14)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Greater London > London > City of London (0.04)
- (17 more...)
- Transportation (1.00)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- (7 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Drones carrying fireworks: why the world's most famous gunpowder artist is collaborating with AI
For decades, Cai Guo-Qiang has been the world's foremost fine artist of explosions. He is famous for his massive fireworks displays, from his glowing footsteps in the sky at the opening of the 2008 Beijing Olympics, to his 2015 Sky Ladder, a 1,650-foot flaming ladder to heaven featured in a Netflix documentary. Recently, the gunpowder artist has become obsessed with a new threatening technology: artificial intelligence. AI "brings me more anxiety, but also, freshness", the 66-year-old Chinese artist told me last week at the historic Nassau Veterans Memorial Coliseum in Los Angeles, where he was preparing for his newest "explosion event", which would be the kickoff of a major arts festival opening in southern California this month. "It's similar to why I use gunpowder," Cai told me.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Asia > China > Beijing > Beijing (0.25)
LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments
Chen, Junzhe, Hu, Xuming, Liu, Shuodi, Huang, Shiyu, Tu, Wei-Wei, He, Zhaofeng, Wen, Lijie
Recent advancements in large language models (LLMs) have revealed their potential for achieving autonomous agents possessing human-level intelligence. However, existing benchmarks for evaluating LLM Agents either use static datasets, potentially leading to data leakage or focus only on single-agent scenarios, overlooking the complexities of multi-agent interactions. There is a lack of a benchmark that evaluates the diverse capabilities of LLM agents in multi-agent, dynamic environments. To this end, we introduce LLMArena, a novel and easily extensible framework for evaluating the diverse capabilities of LLM in multi-agent dynamic environments. LLMArena encompasses seven distinct gaming environments, employing Trueskill scoring to assess crucial abilities in LLM agents, including spatial reasoning, strategic planning, numerical reasoning, risk assessment, communication, opponent modeling, and team collaboration. We conduct an extensive experiment and human evaluation among different sizes and types of LLMs, showing that LLMs still have a significant journey ahead in their development towards becoming fully autonomous agents, especially in opponent modeling and team collaboration. We hope LLMArena could guide future research towards enhancing these capabilities in LLMs, ultimately leading to more sophisticated and practical applications in dynamic, multi-agent settings. The code and data will be available.
- Europe > Austria > Vienna (0.14)
- North America > United States > Texas (0.05)
- Europe > Middle East (0.04)
- (9 more...)