Goto

Collaborating Authors

 Media


SABR: A Stable Adaptive Bitrate Framework Using Behavior Cloning Pretraining and Reinforcement Learning Fine-Tuning

arXiv.org Artificial Intelligence

With the advent of 5G, the internet has entered a new video-centric era. From short-video platforms like TikTok to long-video platforms like Bilibili, online video services are reshaping user consumption habits. Adaptive Bitrate (ABR) control is widely recognized as a critical factor influencing Quality of Experience (QoE). Recent learning-based ABR methods have attracted increasing attention. However, most of them rely on limited network trace sets during training and overlook the wide-distribution characteristics of real-world network conditions, resulting in poor generalization in out-of-distribution (OOD) scenarios. To address this limitation, we propose SABR, a training framework that combines behavior cloning (BC) pretraining with reinforcement learning (RL) fine-tuning. We also introduce benchmarks, ABRBench-3G and ABRBench-4G+, which provide wide-coverage training traces and dedicated OOD test sets for assessing robustness to unseen network conditions. Experimental results demonstrate that SABR achieves the best average rank compared with Pensieve, Comyco, and NetLLM across the proposed benchmarks. These results indicate that SABR enables more stable learning across wide distributions and improves generalization to unseen network conditions.


Spectral and Rhythm Feature Performance Evaluation for Category and Class Level Audio Classification with Deep Convolutional Neural Networks

arXiv.org Artificial Intelligence

Next to decision tree and k-nearest neighbours algorithms deep convolutional neural networks (CNNs) are widely used to classify audio data in many domains like music, speech or environmental sounds. To train a specific CNN various spectral and rhythm features like mel-scaled spectrograms, mel-frequency cepstral coefficients (MFCC), cyclic tempograms, short-time Fourier transform (STFT) chromagrams, constant-Q transform (CQT) chromagrams and chroma energy normalized statistics (CENS) chromagrams can be used as digital image input data for the neural network. The performance of these spectral and rhythm features for audio category level as well as audio class level classification is investigated in detail with a deep CNN and the ESC-50 dataset with 2,000 labeled environmental audio recordings using an end-to-end deep learning pipeline. The evaluated metrics accuracy, precision, recall and F1 score for multiclass classification clearly show that the mel-scaled spectrograms and the mel-frequency cepstral coefficients (MFCC) perform significantly better then the other spectral and rhythm features investigated in this research for audio classification tasks using deep CNNs.


Transplant Then Regenerate: A New Paradigm for Text Data Augmentation

arXiv.org Artificial Intelligence

Data augmentation is a critical technique in deep learning. Traditional methods like Back-translation typically focus on lexical-level rephrasing, which primarily produces variations with the same semantics. While large language models (LLMs) have enhanced text augmentation by their "knowledge emergence" capability, controlling the style and structure of these outputs remains challenging and requires meticulous prompt engineering. In this paper, we propose LMTransplant, a novel text augmentation paradigm leveraging LLMs. The core idea of LMTransplant is transplant-then-regenerate: incorporating seed text into a context expanded by LLM, and asking the LLM to regenerate a variant based on the expanded context. This strategy allows the model to create more diverse and creative content-level variants by fully leveraging the knowledge embedded in LLMs, while preserving the core attributes of the original text. We evaluate LMTransplant across various text-related tasks, demonstrating its superior performance over existing text augmentation methods. Moreover, LMTransplant demonstrates exceptional scalability as the size of augmented data grows.


Multi-Agent Retrieval-Augmented Framework for Evidence-Based Counterspeech Against Health Misinformation

arXiv.org Artificial Intelligence

Large language models (LLMs) incorporated with Retrieval-Augmented Generation (RAG) have demonstrated powerful capabilities in generating counterspeech against misinformation. However, current studies rely on limited evidence and offer less control over final outputs. To address these challenges, we propose a Multi-agent Retrieval-Augmented Framework to generate counterspeech against health misinformation, incorporating multiple LLMs to optimize knowledge retrieval, evidence enhancement, and response refinement. Our approach integrates both static and dynamic evidence, ensuring that the generated counterspeech is relevant, well-grounded, and up-to-date. Our method outperforms baseline approaches in politeness, relevance, informativeness, and factual accuracy, demonstrating its effectiveness in generating high-quality counterspeech. To further validate our approach, we conduct ablation studies to verify the necessity of each component in our framework. Furthermore, cross evaluations show that our system generalizes well across diverse health misinformation topics and datasets. And human evaluations reveal that refinement significantly enhances counterspeech quality and obtains human preference.


Assessing LLMs in Art Contexts: Critique Generation and Theory of Mind Evaluation

arXiv.org Artificial Intelligence

This study explored how large language models (LLMs) perform in two areas related to art: writing critiques of artworks and reasoning about mental states (Theory of Mind, or ToM) in art-related situations. For the critique generation part, we built a system that combines Noel Carroll's evaluative framework with a broad selection of art criticism theories. The model was prompted to first write a full-length critique and then shorter, more coherent versions using a step-by-step prompting process. These AI-generated critiques were then compared with those written by human experts in a Turing test-style evaluation. In many cases, human subjects had difficulty telling which was which, and the results suggest that LLMs can produce critiques that are not only plausible in style but also rich in interpretation, as long as they are carefully guided. In the second part, we introduced new simple ToM tasks based on situations involving interpretation, emotion, and moral tension, which can appear in the context of art. These go beyond standard false-belief tests and allow for more complex, socially embedded forms of reasoning. We tested 41 recent LLMs and found that their performance varied across tasks and models. In particular, tasks that involved affective or ambiguous situations tended to reveal clearer differences. Taken together, these results help clarify how LLMs respond to complex interpretative challenges, revealing both their cognitive limitations and potential. While our findings do not directly contradict the so-called Generative AI Paradox--the idea that LLMs can produce expert-like output without genuine understanding--they suggest that, depending on how LLMs are instructed, such as through carefully designed prompts, these models may begin to show behaviors that resemble understanding more closely than we might assume.


Tool-as-Interface: Learning Robot Policies from Observing Human Tool Use

arXiv.org Artificial Intelligence

Tool use is essential for enabling robots to perform complex real-world tasks, but learning such skills requires extensive datasets. While teleoperation is widely used, it is slow, delay-sensitive, and poorly suited for dynamic tasks. In contrast, human videos provide a natural way for data collection without specialized hardware, though they pose challenges on robot learning due to viewpoint variations and embodiment gaps. To address these challenges, we propose a framework that transfers tool-use knowledge from humans to robots. To improve the policy's robustness to viewpoint variations, we use two RGB cameras to reconstruct 3D scenes and apply Gaussian splatting for novel view synthesis. We reduce the embodiment gap using segmented observations and tool-centric, task-space actions to achieve embodiment-invariant visuomotor policy learning. We demonstrate our framework's effectiveness across a diverse suite of tool-use tasks, where our learned policy shows strong generalization and robustness to human perturbations, camera motion, and robot base movement. Our method achieves a 71\% improvement in task success over teleoperation-based diffusion policies and dramatically reduces data collection time by 77\% and 41\% compared to teleoperation and the state-of-the-art interface, respectively.


130-year-old butter bacteria discovered in Danish basement

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. For over a century, simple lactic acid bacteria has been one of the most reliable additives to keep food and drinks safe for over a century. It goes in butter, cheese, and other dairy products to help extend their shelf life. Now, a team in Denmark has uncovered some of the preservation aid's earliest examples. Their findings, published in the, only come after a chance discovery hidden away in the bowels of a university basement.


Looted mummy skull mislabeled for over a century

Popular Science

The roughly 350-year-old elongated skull is not Incan at all. Breakthroughs, discoveries, and DIY tips sent every weekday. While re-evaluating a centuries' old, elongated skull originally theorized to belong to an Incan "tribal chief," a team of archaeologists in Switzerland came across a startling discovery. The mummified cranium instead likely belonged to a respected member of a separate indigenous community that lived as subjects within the Incan empire. Their findings are published in the and are now helping recontextualize the remains while also addressing the field's longstanding ethical issues.


USA Today Enters Its Gen AI Era With a Chatbot

WIRED

DeeperDive, a new tool that converses with readers, is an effort to beat the AI industry at its own game. The publishing company behind USA Today and 220 other publications is today rolling out a chatbot -like tool called DeeperDive that can converse with readers, summarize insights from its journalism, and suggest new content from across its sites. "Visitors now have a trusted AI answer engine on our platform for anything they want to engage with, anything they want to ask," Mike Reed, CEO of Gannett and the USA Today Network, said at the WIRED AI Power Summit in New York, an event that brought together voices from the tech industry, politics, and the world of media. "and it is performing really great." Most publishers have a fraught relationship with AI, as the chatbots that trained on their content are now summarizing it and eating the traffic that search engines used to send them.


The Download: computing's bright young minds, and cleaning up satellite streaks

MIT Technology Review

Each year, MIT Technology Review honors 35 outstanding people under the age of 35 who are driving scientific progress and solving tough problems in their fields. Today we want to introduce you to the computing innovators on the list who are coming up with new AI chips and specialized datasets--along with smart ideas about how to assess advanced systems for safety. Earlier this year, the $800 million Vera Rubin Observatory commenced its decade-long quest to create an extremely detailed time-lapse movie of the universe. Rubin is capable of capturing many more stars than any other astronomical observatory ever built; it also sees many more satellites. Up to 40% of images captured by the observatory within its first 10 years of operation will be marred by their sunlight-reflecting streaks. Meredith Rawls, a research scientist at the telescope's flagship observation project, Vera Rubin's Legacy Survey of Space and Time, is one of the experts tasked with protecting Rubin's science mission from the satellite blight.