Technology
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
This paper presents a computational model for universal video temporal grounding, which accurately localizes temporal moments in videos based on natural language queries (e.g., questions or descriptions). Unlike existing methods that are often limited to specific video domains or durations, we propose UniTime, a robust and universal video grounding model leveraging the strong vision-language understanding capabilities of generative Multi-modal Large Language Models (MLLMs). Our model effectively handles videos of diverse views, genres, and lengths while comprehending complex language queries. The key contributions include: (i) We consider steering strong MLLMs for temporal grounding in videos. To enable precise timestamp outputs, we incorporate temporal information by interleaving timestamp tokens with video tokens.
Windows Media Player update still can't beat the old version
PCWorld examines the latest Windows Media Player update, comparing its functionality and performance against the classic legacy version that many users still prefer. The updated media player remains inferior to its predecessor, lacking key features and polish that made the original version reliable for audio and video playback. Despite Microsoft's efforts to modernize the application, users may find better value sticking with the older Windows Media Player for their multimedia needs. Windows Insider members have been given access to a new version of Windows Media Player. In taking a closer look at the new version, Windows Latest notes that it offers a number of improvements, not least in terms of stability and the handling of subtitles.
AI is making journalistic language more repetitive and predictable โ and it's a problem for all of us
AI is making journalistic language more repetitive and predictable - and it's a problem for all of us What happens to language when a growing amount of text published in the press, online and on social media is written by machines? This question is not just important for the profession of journalism - it also has an impact on the richness of the language we all use to comprehend, describe and discuss reality itself. Historically, the press has been a space where public language grows and becomes richer. It is not, of course, the only driver of linguistic change, but it is one of the fields where new or emerging words, turns of phrase and ways of describing facts begin to circulate within society. Studies on journalistic language and neologisms clearly demonstrate that newspapers are platforms for the creation and dissemination of new vocabulary, especially when it is needed to report on events, technology and social changes for a broad audience.
Self-Generated In-Context Examples Improve LLMAgents for Sequential Decision-Making Tasks
Improving Large Language Model (LLM) agents for sequential decision-making tasks typically requires extensive task-specific knowledge engineering--custom prompts, curated examples, and specialized observation/action spaces. We investigate a different approach where agents automatically improve by learning from their own successful experiences without human intervention. Our method constructs and refines a database of self-generated trajectories that serve as in-context examples for future tasks.
124 million passwords added to breach database. Yours may be in there, too
PCWorld reports that Have I Been Pwned added 56 million email addresses and 124 million passwords from infostealer malware targeting Windows PCs. These credentials were stolen directly from infected devices rather than corporate breaches, with users often unaware of the ongoing data theft. Immediate password changes, two-factor authentication, and unique passwords for each service are essential to protect against these prevalent cybercriminal tools. The data breach notification service Have I Been Pwned (HIBP) has added a large number of compromised login credentials to its database. In total, 56.3 million email addresses and 124 million passwords have been added. What makes this dataset notable is its origin. Unlike many previous entries, it does not stem from a single cyberattack on an online service. Instead, HIBP says the information was extracted directly from infected computers and devices.
Short-length Adversarial Training Helps LLMs Defend Long-length Jailbreak Attacks: Theoretical and Empirical Evidence
Jailbreak attacks against large language models (LLMs) aim to induce harmful behaviors in LLMs through carefully crafted adversarial prompts. To mitigate attacks, one way is to perform adversarial training (AT)-based alignment, i.e., training LLMs on some of the most adversarial prompts to help them learn how to behave safely under attacks. During AT, the length of adversarial prompts plays a critical role in the robustness of aligned LLMs. While long-length adversarial prompts during AT might lead to strong LLM robustness, their synthesis however is very resource-consuming, which may limit the application of LLMAT. This paper focuses on adversarial suffix jailbreak attacks and unveils that to defend against a jailbreak attack with an adversarial suffix of length ฮ(M), it is enough to align LLMs on prompts with adversarial suffixes of length ฮ( M).
Optimization Guided Rectified Flow For Appearance Transfer
Transferring appearance to 3D assets using different representations of the appearance object-such as images or text-has garnered interest due to its wide range of applications in industries like gaming, augmented reality, and digital content creation. However, state-of-the-art methods still fail when the geometry between the input and appearance objects is significantly different. A straightforward approach is to directly apply a 3D generative model, but we show that this ultimately fails to produce appealing results. Instead, we propose a principled approach inspired by universal guidance. Given a pretrained rectified flow model conditioned on image or text, our training-free method interacts with the sampling process by periodically adding guidance.
DAA: Amplifying Unknown Discrepancy for Test-Time Discovery
Test-Time Discovery (TTD) addresses the critical challenge of identifying and adapting to novel classes during inference while maintaining performance on known classes, which is a capability essential for dynamic real-world environments such as healthcare and autonomous driving. Recent TTD methods adopt training-free, memory-based strategies but rely on frozen models and static representations, resulting in poor generalization. In this paper, we propose a DiscrepancyAmplifying Adapter (DAA), a trainable module that enables real-time adaptation by amplifying feature-level discrepancies between known and unknown classes. During training, DAA is optimized using simulated unknowns and a novel warmup strategy to enhance its discriminative capacity. To ensure continual adaptation at test time, we introduce a Short-Term Memory Renewal (STMR) mechanism, which maintains a queue-based memory for unknown classes and selectively refreshes prototypes using recent, reliable samples. DAA is further updated through self-supervised learning, promoting knowledge retention for known classes while improving discrimination of emerging categories. Extensive experiments show that our method maintains high adaptability and stability, and significantly improves novel class discovery performance.