Goto

Collaborating Authors

 backpack


A Supplementary Material

Neural Information Processing Systems

These challenges have spawned the new task of'Subject-Drive Text-to-Image Generation', which is the core task of our paper aims to solve. Though the mined clusters already contain (image, alt-text) information, the alt-text's noise level is For example, the generation model believes'teapot' should contain a's in-context generation that demonstrates its skill set. Results generated from a single model . Subject (image, text) and editing key words are annotated, with detailed template in the Appendix. Such manual modification process is time-consuming.




Our Favorite Travel and Outdoor Gear Is on Sale at Huckberry

WIRED

Huckberry's eclectic curation of travel clothing, coffee gear, and backpacks are all on sale right now. Huckberry, purveyor of finely curated clothing and gear for the sort of person equally at home in the woods and the city, is having one of the company's rare site-wide sales this week--or pretty close to site-wide. We've tested and love quite a bit of Huckberry's stuff, especially the Proof 72-hour merino T-shirt . If you buy nothing else this year, buy that. Check out the other deals, which we've rounded up below.


Beyond Parameters: Exploring Virtual Logic Depth for Scaling Laws

Zhu, Ruike, Zhang, Hanwen, Li, Kevin, Shi, Tianyu, Duan, Yiqun, Wang, Chi, Zhou, Tianyi, Banerjee, Arindam, Qin, Zengyi

arXiv.org Artificial Intelligence

Scaling large language models typically involves three dimensions: depth, width, and parameter count. In this work, we explore a fourth dimension, \textbf{virtual logical depth} (VLD), which increases effective algorithmic depth without changing parameter count by reusing weights. While parameter reuse is not new, its role in scaling has been underexplored. Unlike recent test-time methods that scale token-wise, VLD alters the internal computation graph during training and inference. Through controlled experiments, we obtain three key insights. (1) \textit{Knowledge capacity vs. parameters}: at fixed parameter count, VLD leaves knowledge capacity nearly unchanged, while across models capacity still scales with parameters. (2) \textit{Reasoning vs. reuse}: properly implemented VLD substantially improves reasoning ability \emph{without} more parameters, decoupling reasoning from size. This suggests a new scaling path beyond token-wise test-time methods. (3) \textit{Robustness and generality}: reasoning gains persist across architectures and reuse schedules, showing VLD captures a general scaling behavior. These results provide insight into future scaling strategies and raise a deeper question: does superintelligence require ever-larger models, or can it be achieved by reusing parameters and increasing logical depth? We argue many unknown dynamics in scaling remain to be explored. Code is available at https://anonymous.4open.science/r/virtual_logical_depth-8024/.



Europe's biggest bat captures and devours birds while flying

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. It's hard to think of a spookier wildlife scenario: A songbird is flying through the air, when it's suddenly intercepted from above at breakneck speed by a large, fanged bat . After a brief struggle, the attacker disappears into the gloom with its bloody prey in tow. But for over two decades, biologists have suspected that these events are even darker than that scary situation. And thanks to tiny bat "backpacks," experts have now confirmed their nightmarish theory.




CAViAR: Critic-Augmented Video Agentic Reasoning

Menon, Sachit, Iscen, Ahmet, Nagrani, Arsha, Weyand, Tobias, Vondrick, Carl, Schmid, Cordelia

arXiv.org Artificial Intelligence

Video understanding has seen significant progress in recent years, with models' performance on perception from short clips continuing to rise. Yet, multiple recent benchmarks, such as LVBench, Neptune, and ActivityNet-RTL, show performance wanes for tasks requiring complex reasoning on videos as queries grow more complex and videos grow longer. In this work, we ask: can existing perception capabilities be leveraged to successfully perform more complex video reasoning? In particular, we develop a large language model agent given access to video modules as subagents or tools. Rather than following a fixed procedure to solve queries as in previous work such as Visual Programming, ViperGPT, and MoReVQA, the agent uses the results of each call to a module to determine subsequent steps. Inspired by work in the textual reasoning domain, we introduce a critic to distinguish between instances of successful and unsuccessful sequences from the agent. We show that the combination of our agent and critic achieve strong performance on the previously-mentioned datasets.