AITopics

2508.05011

Country: Asia > China (0.68)

Genre: Research Report (0.82)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceAug-8-2025

"Set It Up": Functional Object Arrangement with Compositional Generative Models (Journal Version)

Xu, Yiqing, Mao, Jiayuan, Li, Linfeng, Du, Yilun, Lozáno-Pérez, Tomas, Kaelbling, Leslie Pack, Hsu, David

Functional object arrangement (FORM) is the task of arranging objects to fulfill a function, e.g., "set up a dining table for two". One key challenge here is that the instructions for FORM are often under-specified and do not explicitly specify the desired object goal poses. This paper presents SetItUp, a neuro-symbolic framework that learns to specify the goal poses of objects from a few training examples and a structured natural-language task specification. SetItUp uses a grounding graph, which is composed of abstract spatial relations among objects (e.g., left-of), as its intermediate representation. This decomposes the FORM problem into two stages: (i) predicting this graph among objects and (ii) predicting object poses given the grounding graph. For (i), SetItUp leverages large language models (LLMs) to induce Python programs from a task specification and a few training examples. This program can be executed to generate grounding graphs in novel scenarios. For (ii), SetItUp pre-trains a collection of diffusion models to capture primitive spatial relations and online composes these models to predict object poses based on the grounding graph. We evaluated SetItUp on a dataset spanning three distinct task families: arranging tableware on a dining table, organizing items on a bookshelf, and laying out furniture in a bedroom. Experiments show that SetItUp outperforms existing models in generating functional, physically feasible, and aesthetically pleasing object arrangements. This article extends our conference paper published at Robotics: Science and Systems (RSS) 2024.

large language model, machine learning, natural language, (22 more...)

2508.02068

Country: North America > United States (0.27)

Genre: Research Report > New Finding (0.45)

Industry:

Media (0.67)
Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.92)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

FOX NewsAug-7-2025, 22:07:24 GMT

Forget SEO: How to get found by AI tools in 2025

NVIDIA CEO and co-founder Jensen Huang commends President Donald Trump's A.I. agenda and outlines what the country's job future will look like on'Special Report.' Three years ago, I said Google was going the way of the dial-up modem. People called me crazy with a capital K. Well, I was spot on. ChatGPT now has over 180 million users and powers more than 800 million sessions each week. Google's own AI Overviews appear in over 60% of search results.

ai tool, chatgpt, forget seo, (9 more...)

FOX News

Industry:

Information Technology (0.37)
Media > News (0.33)

Technology:

Information Technology > Communications > Social Media (0.36)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.33)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.33)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.33)

FOX NewsAug-7-2025, 10:00:54 GMT

5,900 Unitree R1 robot is surprisingly affordable

Industries can rethink how work gets done, raising the bar for productivity and workplace safety. Unitree just dropped its latest creation, the R1 humanoid robot, and people are talking. At only 5,900, it's the most affordable bipedal robot we've seen so far. The low price has taken the tech world by surprise and kicked off a wave of excitement. Sign up for my FREE CyberGuy Report Get my best tech tips, urgent security alerts, and exclusive deals delivered straight to your inbox.

humanoid robot, ultimate scam survival guide, unitree, (12 more...)

FOX News

Country: Asia > China (0.05)

Industry: Media > News (0.32)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Neural Information Processing SystemsAug-7-2025, 00:25:44 GMT

Supplementary Materials 660 GTSinger: A Global Multi-Technique Singing Corpus 661 with Realistic Music Scores for All Singing Tasks

Small errors are inevitable in annotations.

dataset, please describe, please provide, (12 more...)

Neural Information Processing Systems

Genre: Research Report (0.48)

Industry:

Law (0.69)
Media > Music (0.66)
Leisure & Entertainment (0.66)
Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.47)

Neural Information Processing SystemsAug-7-2025, 00:25:42 GMT

GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks Y u Zhang

To tackle these problems, we present GTSinger, a large G lobal, multi-T echnique, free-to-use, high-quality singing corpus with realistic music scores, designed for all singing tasks, along with its benchmarks. Particularly, (1) we collect 80.59 hours of high-quality singing voices, forming the largest recorded singing dataset; (2) 20 professional singers across nine widely spoken languages offer diverse timbres and styles; (3) we provide controlled comparison and phoneme-level annotations of six commonly used singing techniques, helping technique modeling and control; (4) GTSinger offers realistic music scores, assisting real-world musical composition; (5) singing

gtsinger, phoneme, realistic music score, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > Japan > Honshū > Tōhoku (0.04)
Asia > China (0.04)

Genre: Research Report (0.95)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark

Wang, He, Ma, Linhan, Guo, Dake, Wang, Xiong, Xie, Lei, Xu, Jin, Lin, Junyang

Automatic Speech Recognition (ASR) has been extensively investigated, yet prior benchmarks have largely focused on assessing the acoustic robustness of ASR models, leaving evaluations of their linguistic capabilities relatively underexplored. This largely stems from the limited parameter sizes and training corpora of conventional ASR models, leaving them with insufficient world knowledge, which is crucial for accurately recognizing named entities across diverse domains. For instance, drug and treatment names in medicine or specialized technical terms in engineering. Recent breakthroughs in Large Language Models (LLMs) and corresponding Large Audio Language Models (LALMs) have markedly enhanced the visibility of advanced context modeling and general artificial intelligence capabilities. Leveraging LLMs, we envision a unified system capable of robust speech recognition across diverse real-world domains, yet existing benchmarks are inadequate for evaluating this objective. To address this gap, we propose ContextASR-Bench: a comprehensive, large-scale benchmark designed to assess the linguistic competence of ASR systems using corpora that feature numerous named entities across multiple domains. It encompasses up to 40,000 data entries with more than 300,000 named entities across over 10 domains. Beyond the audio and its transcription, each sample provides the domain it belongs to and a list of named entities it contains, which are referred to as the context. Based on this, we introduce three evaluation modes to assess how effectively models can exploit such context to improve ASR accuracy. Extensive evaluation on ContextASR-Bench highlights that LALMs outperform conventional ASR models by a large margin thanks to the strong world knowledge and context modeling of LLMs, yet there remains ample room for further improvement. The dataset and evaluation code have been released.

large language model, machine learning, natural language, (20 more...)

2507.05727

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)
Oceania > Australia (0.67)

Genre: Research Report (0.50)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

NVSpeech: An Integrated and Scalable Pipeline for Human-Like Speech Modeling with Paralinguistic Vocalizations

Liao, Huan, Ni, Qinke, Wang, Yuancheng, Lu, Yiheng, Zhan, Haoyue, Xie, Pengyuan, Zhang, Qiang, Wu, Zhizheng

Paralinguistic vocalizations-including non-verbal sounds like laughter and breathing, as well as lexicalized interjections such as "uhm" and "oh"-are integral to natural spoken communication. Despite their importance in conveying affect, intent, and interactional cues, such cues remain largely overlooked in conventional automatic speech recognition (ASR) and text-to-speech (TTS) systems. We present NVSpeech, an integrated and scalable pipeline that bridges the recognition and synthesis of paralinguistic vocalizations, encompassing dataset construction, ASR modeling, and controllable TTS. (1) We introduce a manually annotated dataset of 48,430 human-spoken utterances with 18 word-level paralinguistic categories. (2) We develop the paralinguistic-aware ASR model, which treats paralinguistic cues as inline decodable tokens (e.g., "You're so funny [Laughter]"), enabling joint lexical and non-verbal transcription. This model is then used to automatically annotate a large corpus, the first large-scale Chinese dataset of 174,179 utterances (573 hours) with word-level alignment and paralingustic cues. (3) We finetune zero-shot TTS models on both human- and auto-labeled data to enable explicit control over paralinguistic vocalizations, allowing context-aware insertion at arbitrary token positions for human-like speech synthesis. By unifying the recognition and generation of paralinguistic vocalizations, NVSpeech offers the first open, large-scale, word-level annotated pipeline for expressive speech modeling in Mandarin, integrating recognition and synthesis in a scalable and controllable manner. Dataset and audio demos are available at https://nvspeech170k.github.io/.

large language model, machine learning, natural language, (15 more...)

2508.04195

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.46)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

Characterizing Deep Research: A Benchmark and Formal Definition

Java, Abhinav, Khandelwal, Ashmit, Midigeshi, Sukruta, Halfaker, Aaron, Deshpande, Amit, Goyal, Navin, Gupta, Ankur, Natarajan, Nagarajan, Sharma, Amit

Information tasks such as writing surveys or analytical reports require complex search and reasoning, and have recently been grouped under the umbrella of \textit{deep research} -- a term also adopted by recent models targeting these capabilities. Despite growing interest, the scope of the deep research task remains underdefined and its distinction from other reasoning-intensive problems is poorly understood. In this paper, we propose a formal characterization of the deep research (DR) task and introduce a benchmark to evaluate the performance of DR systems. We argue that the core defining feature of deep research is not the production of lengthy report-style outputs, but rather the high fan-out over concepts required during the search process, i.e., broad and reasoning-intensive exploration. To enable objective evaluation, we define DR using an intermediate output representation that encodes key claims uncovered during search-separating the reasoning challenge from surface-level report generation. Based on this formulation, we propose a diverse, challenging benchmark LiveDRBench with 100 challenging tasks over scientific topics (e.g., datasets, materials discovery, prior art search) and public interest events (e.g., flight incidents, movie awards). Across state-of-the-art DR systems, F1 score ranges between 0.02 and 0.72 for any sub-category. OpenAI's model performs the best with an overall F1 score of 0.55. Analysis of reasoning traces reveals the distribution over the number of referenced sources, branching, and backtracking events executed by current DR systems, motivating future directions for improving their search mechanisms and grounding capabilities. The benchmark is available at https://github.com/microsoft/LiveDRBench.

information retrieval, large language model, machine learning, (22 more...)

2508.04183

Country: North America > United States (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Government (0.67)
Leisure & Entertainment (0.46)
Law (0.46)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.87)

Latent Knowledge Scalpel: Precise and Massive Knowledge Editing for Large Language Models

Liu, Xin, Song, Qiyang, Xu, Shaowen, Zhou, Kerou, Jiang, Wenbo, Jia, Xiaoqi, Zhang, Weijuan, Huang, Heqing, Li, Yakai

Large Language Models (LLMs) often retain inaccurate or outdated information from pre-training, leading to incorrect predictions or biased outputs during inference. While existing model editing methods can address this challenge, they struggle with editing large amounts of factual information simultaneously and may compromise the general capabilities of the models. In this paper, our empirical study demonstrates that it is feasible to edit the internal representations of LLMs and replace the entities in a manner similar to editing natural language inputs. Based on this insight, we introduce the Latent Knowledge Scalpel (LKS), an LLM editor that manipulates the latent knowledge of specific entities via a lightweight hypernetwork to enable precise and large-scale editing. Experiments conducted on Llama-2 and Mistral show even with the number of simultaneous edits reaching 10,000, LKS effectively performs knowledge editing while preserving the general abilities of the edited LLMs. Code is available at: https://github.com/Linuxin-xxx/LKS.

large language model, machine learning, natural language, (19 more...)

2508.03741

Country:

North America > United States (0.68)
Europe (0.68)
North America > Canada > Quebec (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Film (1.00)
Media > Music (0.93)
Leisure & Entertainment > Sports > Soccer (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)