zhu
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > China > Beijing > Beijing (0.04)
- Law (0.67)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.34)
- Health & Medicine > Therapeutic Area > Neurology (0.34)
How China Caught Up on AI--and May Now Win the Future
He Xiaopeng launches Xpeng's next-gen Iron humanoid robot during a press conference at the company's headquarters in Guangzhou on November 5, 2025. He Xiaopeng launches Xpeng's next-gen Iron humanoid robot during a press conference at the company's headquarters in Guangzhou on November 5, 2025. It was a controversy laced with pride for He Xiaopeng. In November, He, the founder and CEO of Chinese physical AI firm XPeng, had just debuted his new humanoid robot, IRON, whose balance, posture shifts, and coquettish swagger mirrored human motion with such eerie precision that a slew of netizens accused him of faking the demonstration by putting a human in a bodysuit. To silence the naysayers, He boldly cut open the robot's leg live on stage to reveal the intricate mechanical systems that allow it to adapt to uneven surfaces and maintain stability just like the human body. "At first, it made me sad," He tells TIME in his Guangzhou headquarters.
- Asia > China > Guangdong Province > Guangzhou (0.65)
- Asia > Russia (0.14)
- Asia > North Korea (0.14)
- (17 more...)
- Law (1.00)
- Information Technology (1.00)
- Government > Military (0.94)
- (4 more...)
Artificial intelligence research has a slop problem, academics say: 'It's a mess'
The author, Kevin Zhu, now runs Algoverse, an AI research and mentoring company for high schoolers. The author, Kevin Zhu, now runs Algoverse, an AI research and mentoring company for high schoolers. Artificial intelligence research has a slop problem, academics say: 'It's a mess' AI research in question as author claims to have written over 100 papers on AI that one expert calls a'disaster' A single person claims to have authored 113 academic papers on artificial intelligence this year, 89 of which will be presented this week at one of the world's leading conference on AI and machine learning, which has raised questions among computer scientists about the state of AI research. Zhu himself graduated from high school in 2018. Papers he has put out in the past two years cover subjects like using AI to locate nomadic pastoralists in sub-Saharan Africa, to evaluate skin lesions, and to translate Indonesian dialects.
- Africa > Sub-Saharan Africa (0.25)
- Oceania > Australia (0.05)
- North America > United States > Virginia (0.05)
- (3 more...)
- Leisure & Entertainment > Sports (0.70)
- Education > Educational Setting > K-12 Education > Secondary School (0.35)
DocLens : A Tool-Augmented Multi-Agent Framework for Long Visual Document Understanding
Zhu, Dawei, Meng, Rui, Chen, Jiefeng, Li, Sujian, Pfister, Tomas, Yoon, Jinsung
Comprehending long visual documents, where information is distributed across extensive pages of text and visual elements, is a critical but challenging task for modern Vision-Language Models (VLMs). Existing approaches falter on a fundamental challenge: evidence localization. They struggle to retrieve relevant pages and overlook fine-grained details within visual elements, leading to limited performance and model hallucination. To address this, we propose DocLens, a tool-augmented multi-agent framework that effectively ``zooms in'' on evidence like a lens. It first navigates from the full document to specific visual elements on relevant pages, then employs a sampling-adjudication mechanism to generate a single, reliable answer. Paired with Gemini-2.5-Pro, DocLens achieves state-of-the-art performance on MMLongBench-Doc and FinRAGBench-V, surpassing even human experts. The framework's superiority is particularly evident on vision-centric and unanswerable queries, demonstrating the power of its enhanced localization capabilities.
- North America > United States > Texas > Schleicher County (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Spatially Sparse Inference for Generative Image Editing 352 Supplementary Material 353 A Additional Implementation Details
We omit the element-wise operations for simplicity and follow the notations in Section 3 . As mentioned in Section 3.2, we fuse Note that the pre-computation is cheap and only needs to be once for each resolution. We elaborate more details on how we build the synthetic editing dataset. Figure 7 (a) shows some examples of our synthetic editing on LSUN Church. The detailed distribution is shown in Figure 8a .
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > China > Beijing > Beijing (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Sweden > Skåne County > Malmö (0.04)
- Asia > Middle East > Jordan (0.04)
Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs
Zhang, Yi, Ni, Bolin, Chen, Xin-Sheng, Zhang, Heng-Rui, Rao, Yongming, Peng, Houwen, Lu, Qinglin, Hu, Han, Guo, Meng-Hao, Hu, Shi-Min
Fully open multimodal large language models (MLLMs) currently lag behind proprietary counterparts, primarily due to a significant gap in data quality for supervised fine-tuning (SFT). Existing open-source datasets are often plagued by widespread noise and a critical deficit in complex reasoning data, such as Chain-of-Thought (CoT), which hinders the development of advanced model capabilities. Addressing these challenges, our work makes three primary contributions. First, we introduce Honey-Data-15M, a new SFT dataset comprising approximately 15 million QA pairs, processed through multiple cleaning techniques and enhanced with a novel dual-level (short and long) CoT enrichment strategy. Second, we introduce HoneyPipe, the data curation pipeline, and its underlying framework DataStudio, providing the community with a transparent and adaptable methodology for data curation that moves beyond static dataset releases. Finally, to validate our dataset and pipeline, we train Bee-8B, an 8B model on Honey-Data-15M. Experiments show that Bee-8B establishes a new state-of-the-art (SOTA) for fully open MLLMs, achieving performance that is competitive with, and in some cases surpasses, recent semi-open models such as InternVL3.5-8B. Our work delivers to the community a suite of foundational resources, including: the Honey-Data-15M corpus; the full-stack suite comprising HoneyPipe and DataStudio; training recipes; an evaluation harness; and the model weights. This effort demonstrates that a principled focus on data quality is a key pathway to developing fully open MLLMs that are highly competitive with their semi-open counterparts.
- Europe > Austria > Vienna (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- (41 more...)
- Information Technology (0.92)
- Transportation (0.67)
NVIDIA Nemotron Nano V2 VL
NVIDIA, null, :, null, Deshmukh, Amala Sanjay, Chumachenko, Kateryna, Rintamaki, Tuomas, Le, Matthieu, Poon, Tyler, Taheri, Danial Mohseni, Karmanov, Ilia, Liu, Guilin, Seppanen, Jarno, Chen, Guo, Sapra, Karan, Yu, Zhiding, Renduchintala, Adi, Wang, Charles, Jin, Peter, Goel, Arushi, Ranzinger, Mike, Voegtle, Lukas, Fischer, Philipp, Roman, Timo, Ping, Wei, Wang, Boxin, Yang, Zhuolin, Lee, Nayeon, Zhang, Shaokun, Liu, Fuxiao, Li, Zhiqi, Zhang, Di, Heinrich, Greg, Yin, Hongxu, Han, Song, Molchanov, Pavlo, Mannan, Parth, Xu, Yao, Scowcroft, Jane Polak, Balough, Tom, Radhakrishnan, Subhashree, Zhang, Paris, Cha, Sean, Kumar, Ratnesh, Bhat, Zaid Pervaiz, Zhang, Jian, Hanley, Darragh, Biswas, Pritam, Oliver, Jesse, Vasques, Kevin, Waleffe, Roger, Riach, Duncan, Olabiyi, Oluwatobi, Mahabaleshwarkar, Ameya Sunil, Kartal, Bilal, Gundecha, Pritam, Nguyen, Khanh, Milesi, Alexandre, Khvedchenia, Eugene, Zilberstein, Ran, Masad, Ofri, Bagrov, Natan, Assaf, Nave, Asida, Tomer, Afrimi, Daniel, Zuker, Amit, Haber, Netanel, Cheng, Zhiyu, Xin, Jingyu, Wu, Di, Spirin, Nik, Moosaei, Maryam, Ageev, Roman, Shah, Vanshil Atul, Wu, Yuting, Korzekwa, Daniel, Sreekumar, Unnikrishnan Kizhakkemadam, Jiang, Wanli, Subramanian, Padmavathy, Rico, Alejandra, Bhaskar, Sandip, Motiian, Saeid, Wu, Kedi, Surla, Annie, Chen, Chia-Chih, Wolff, Hayden, Feinberg, Matthew, Corpuz, Melissa, Wawrzos, Marek, Long, Eileen, Jhunjhunwala, Aastha, Hendricks, Paul, Memarian, Farzan, Hall, Benika, Wang, Xin-Yu, Mosallanezhad, David, Singhal, Soumye, Vega, Luis, Cheung, Katherine, Pawelec, Krzysztof, Evans, Michael, Luna, Katherine, Lou, Jie, Galinkin, Erick, Hazare, Akshay, Purandare, Kaustubh, Guan, Ann, Warno, Anna, Cui, Chen, Suhara, Yoshi, Likhite, Shibani, Mard, Seph, Price, Meredith, Sleiman, Laya, Kaji, Saori, Karpas, Udi, Briski, Kari, Conway, Joey, Lightstone, Michael, Kautz, Jan, Shoeybi, Mohammad, Patwary, Mostofa, Cohen, Jonathen, Kuchaiev, Oleksii, Tao, Andrew, Catanzaro, Bryan
We introduce Nemotron Nano V2 VL, the latest model of the Nemotron vision-language series designed for strong real-world document understanding, long video comprehension, and reasoning tasks. Nemotron Nano V2 VL delivers significant improvements over our previous model, Llama-3.1-Nemotron-Nano-VL-8B, across all vision and text domains through major enhancements in model architecture, datasets, and training recipes. Nemotron Nano V2 VL builds on Nemotron Nano V2, a hybrid Mamba-Transformer LLM, and innovative token reduction techniques to achieve higher inference throughput in long document and video scenarios. We are releasing model checkpoints in BF16, FP8, and FP4 formats and sharing large parts of our datasets, recipes and training code.
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
- Asia > China (0.04)
- Research Report (0.58)
- Instructional Material (0.46)