AITopics | zoom

Collaborating Authors

zoom

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Tuning into the future of collaboration

MIT Technology ReviewFeb-16-2026, 15:00:00 GMT

Intelligent audio and intuitive tools are transforming collaboration from connection to creativity, says Sam Sabet, chief technology officer at Shure, and Brendan Ittelson, chief ecosystem officer at Zoom. When work went remote, the sound of business changed. What began as a scramble to make home offices functional has evolved into a revolution in how people hear and are heard. From education to enterprises, companies across industries have reimagined what clear, reliable communication can mean in a hybrid world. For major audio and communications enterprises like Shure and Zoom, that transformation has been powered by artificial intelligence, new acoustic technologies, and a shared mission: making connection effortless. Necessity during the pandemic accelerated years of innovation in months. Audio and video just working is a baseline for collaboration, says chief ecosystem officer at Zoom, Brendan Ittelson. That expectation has shifted from connecting people to enhancing productivity and creativity across the entire ecosystem. Audio is a foundation for trust, understanding, and collaboration.

artificial intelligence, innovation, natural language, (15 more...)

MIT Technology Review

Country:

North America > United States > Massachusetts (0.04)
Europe > United Kingdom > England (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Personal > Interview (0.46)
Research Report > Promising Solution (0.34)

Industry: Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Social Media (0.94)

Add feedback

Inside the App Where Queer Gooners Run Free

WIREDFeb-16-2026, 11:30:00 GMT

In light of Zoom crackdowns and Skype shutting down, Batemates has emerged as an alternative for "bators" who like masturbating together online. One night not long ago, Jaxon Roman sat naked in front of his laptop wearing only a pup hood as he masturbated with single-minded zeal to the attention of eight other men watching onscreen. It was a typical weekday for the 33-year-old Arlington, Virginia, program analyst. "When bros praise me and say they're enjoying [me], I get to that edge point so fast," Roman says. His favorite instances are "when they all come to what I'm doing." Sometimes, when he's feeling especially kinky, Roman, who is bisexual, likes to ask for permission before climaxing.

artificial intelligence, chatbot, natural language, (14 more...)

WIRED

Country:

North America > United States > Virginia > Arlington County > Arlington (0.24)
Oceania > Guam (0.08)
North America > United States > New York (0.04)
(6 more...)

Industry:

Information Technology (0.69)
Health & Medicine > Therapeutic Area (0.69)
Leisure & Entertainment > Sports > Olympic Games (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.47)

Add feedback

a399456a191ca36c7c78dff367887f0a-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 03:02:53 GMT

artificial intelligence, dataset, machine learning, (15 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Dermatology (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Zoom in, Click out: Unlocking and Evaluating the Potential of Zooming for GUI Grounding

Jiang, Zhiyuan, Xie, Shenghao, Li, Wenyi, Zu, Wenqiang, Li, Peihang, Qiu, Jiahao, Pei, Siqi, Ma, Lei, Huang, Tiejun, Wang, Mengdi, Liu, Shilong

arXiv.org Artificial IntelligenceDec-8-2025

Grounding is a fundamental capability for building graphical user interface (GUI) agents. Although existing approaches rely on large-scale bounding box supervision, they still face various challenges, such as cross-platform generalization, complex layout analysis, and fine-grained element localization. In this paper, we investigate zoom as a strong yet underexplored prior for GUI grounding, and propose a training-free method, ZoomClick. By characterizing four key properties of zoom (i.e., pre-zoom, depth, shrink size, minimal crop size), we unlock its full capabilities for dynamic spatial focusing and adaptive context switching. Experiments demonstrate that our method significantly boosts the performance of both general vision-language and specialized GUI grounding models, achieving state-of-the-art results on several mainstream benchmarks; for example, UI-Venus-72B attains a 73.1% success rate on ScreenSpot-Pro. Furthermore, we present GUIZoom-Bench, a benchmark for evaluating model adaptability to zoom, aiming to inspire future research on improving zoom for further training and test-time scaling in GUI grounding tasks.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2512.05941

Country:

North America > United States > Michigan (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Look, Zoom, Understand: The Robotic Eyeball for Embodied Perception

Yang, Jiashu, Han, Yifan, Xie, Yucheng, Guo, Ning, Lian, Wenzhao

arXiv.org Artificial IntelligenceNov-20-2025

In embodied AI perception systems, visual perception should be active: the goal is not to passively process static images, but to actively acquire more informative data within pixel and spatial budget constraints. Existing vision models and fixed RGB-D camera systems fundamentally fail to reconcile wide-area coverage with fine-grained detail acquisition, severely limiting their efficacy in open-world robotic applications. T o address this issue, we propose EyeVLA,a robotic eyeball for active visual perception that can take proactive actions based on instructions, enabling clear observation of fine-grained target objects and detailed information across a wide spatial extent. EyeVLA discretizes action behaviors into action tokens and integrates them with vision-language models (VLMs) that possess strong open-world understanding capabilities, enabling joint modeling of vision, language, and actions within a single autore-gressive sequence. By using the 2D bounding box coordinates to guide the reasoning chain and applying reinforcement learning to refine the viewpoint selection policy, we transfer the open world scene understanding capability of the VLM to a vision language action (VLA) policy using only minimal real-world data.Experiments show that EyeVLA can effectively understand scenes in real-world environments and actively acquire more accurate visual information through instruction-driven actions of rotation and zoom, thereby achieving strong environmental perception capabilities. EyeVLA introduces a novel robotic vision paradigm: under pixel and spatial budgets, it dynamically acquires dynamically acquires highly informative visual data within given pixel and spatial budgets for environmental perception in multimodal autonomous systems.

machine learning, natural language, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2511.15279

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Liaoning Province > Dalian (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Appendix A Outline

Neural Information Processing SystemsNov-17-2025, 15:11:19 GMT

Finally, we collect their answers and illustrate the statistics in Table 3. 15 Figure 10: Instructions of user study.

artificial intelligence, machine learning, reference image, (14 more...)

Neural Information Processing Systems

Genre: Questionnaire & Opinion Survey (0.58)

Industry: Automobiles & Trucks > Manufacturer (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback

172fd0d638b3282151bd8f3d652cb640-AuthorFeedback.pdf

Neural Information Processing SystemsNov-15-2025, 22:06:36 GMT

The number of parameters is calculated for the CUB dataset. We first thank all reviewers for the valuable feedback. As shown in Table 1, our model outperforms Resnet152 by 3.6%(71.8% We will add more detailed analysis in the final version of the paper. Besides, we observe more maps introduce the attention redundancy, i.e. maps attend to the same region.

artificial intelligence, machine learning, table 1, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Tong, Jingqi, Mou, Yurong, Li, Hangcheng, Li, Mingzhe, Yang, Yongzhuo, Zhang, Ming, Chen, Qiguang, Liang, Tianyi, Hu, Xiaomeng, Zheng, Yining, Chen, Xinchi, Zhao, Jun, Huang, Xuanjing, Qiu, Xipeng

arXiv.org Artificial IntelligenceNov-7-2025

"Thinking with Text" and "Thinking with Images" paradigm significantly improve the reasoning ability of large language models (LLMs) and Vision Language Models (VLMs). However, these paradigms have inherent limitations. (1) Images capture only single moments and fail to represent dynamic processes or continuous changes, and (2) The separation of text and vision as distinct modalities, hindering unified multimodal understanding and generation. To overcome these limitations, we introduce "Thinking with Video", a new paradigm that leverages video generation models, such as Sora-2, to bridge visual and textual reasoning in a unified temporal framework. To support this exploration, we developed the Video Thinking Benchmark (VideoThinkBench). VideoThinkBench encompasses two task categories: (1) vision-centric tasks (e.g., Eyeballing Puzzles), and (2) text-centric tasks (e.g., subsets of GSM8K, MMMU). Our evaluation establishes Sora-2 as a capable reasoner. On vision-centric tasks, Sora-2 is generally comparable to state-of-the-art (SOTA) VLMs, and even surpasses VLMs on several tasks, such as Eyeballing Games. On text-centric tasks, Sora-2 achieves 92% accuracy on MATH, and 75.53% accuracy on MMMU. Furthermore, we systematically analyse the source of these abilities. We also find that self-consistency and in-context learning can improve Sora-2's performance. In summary, our findings demonstrate that the video generation model is the potential unified multimodal understanding and generation model, positions "thinking with video" as a unified multimodal reasoning paradigm.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.0457

Country: