Goto

Collaborating Authors

 cursor


Cursor Launches an AI Coding Tool For Designers

WIRED

The 300-person startup hopes bringing designers aboard will give it an edge in an increasingly competitive AI software market. Cursor, the wildly popular AI coding startup, is launching a new feature that lets people design the look and feel of web applications with AI. The tool, Visual Editor, is essentially a vibe-coding product for designers, giving them access to the same fine-grained controls they'd expect from professional design software. But in addition to making changes manually, the tool lets them request edits from Cursor's AI agent using natural language. Cursor is best known for its AI coding platform, but with Visual Editor, the startup wants to capture other parts of the software creation process.



Taught by the Flawed: How Dataset Insecurity Breeds Vulnerable AI Code

Xia, Catherine, Alalfi, Manar H.

arXiv.org Artificial Intelligence

AI programming assistants have demonstrated a tendency to generate code containing basic security vulnerabilities. While developers are ultimately responsible for validating and reviewing such outputs, improving the inherent quality of these generated code snippets remains essential. A key contributing factor to insecure outputs is the presence of vulnerabilities in the training datasets used to build large language models (LLMs). To address this issue, we propose curating training data to include only code that is free from detectable vulnerabilities. In this study, we constructed a secure dataset by filtering an existing Python corpus using a static analysis tool to retain only vulnerability-free functions. We then trained two transformer-based models: one on the curated dataset and one on the original, unfiltered dataset. The models were evaluated on both the correctness and security of the code they generated in response to natural language function descriptions. Our results show that the model trained on the curated dataset produced outputs with fewer security issues, while maintaining comparable functional correctness. These findings highlight the importance of secure training data in improving the reliability of AI-based programming assistants, though further enhancements to model architecture and evaluation are needed to reinforce these outcomes.


MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols

Yang, Yixuan, Wu, Daoyuan, Chen, Yufan

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are increasingly integrated into real-world applications via the Model Context Protocol (MCP), a universal, open standard for connecting AI agents with data sources and external tools. While MCP enhances the capabilities of LLM-based agents, it also introduces new security risks and expands their attack surfaces. In this paper, we present the first systematic taxonomy of MCP security, identifying 17 attack types across 4 primary attack surfaces. Our benchmark is modular and extensible, allowing researchers to incorporate custom implementations of clients, servers, and transport protocols for systematic security assessment. Experimental results show that over 85% of the identified attacks successfully compromise at least one platform, with core vulnerabilities universally affecting Claude, OpenAI, and Cursor, while prompt-based and tool-centric attacks exhibit considerable variability across different hosts and models. In addition, current protection mechanisms have little effect against these attacks. Large language models (LLMs) are transforming the landscape of intelligent systems, enabling powerful language understanding, reasoning, and generative capabilities. To further unlock their potential in real-world applications, there is an increasing demand for LLMs to interact with external data, tools, and services (Lin et al., 2025; Hasan et al., 2025). The Model Context Protocol (MCP) has emerged as a universal, open standard for connecting AI agents to diverse resources, facilitating richer and more dynamic task-solving. However, this integration also introduces a broader attack surface: vulnerabilities may arise not only from user prompts (such as prompt injection (Shi et al., 2024)), but also from insecure clients, transport protocols, and malicious or misconfigured servers (Hasan et al., 2025). As MCP-powered agents increasingly interact with sensitive enterprise systems and even physical infrastructure, securing the entire MCP stack becomes critical to prevent data breaches, unauthorized actions, and real-world harm (Narajala & Habler, 2025).


Paper2Video: Automatic Video Generation from Scientific Papers

Zhu, Zeyu, Lin, Kevin Qinghong, Shou, Mike Zheng

arXiv.org Artificial Intelligence

Academic presentation videos have become an essential medium for research communication, yet producing them remains highly labor-intensive, often requiring hours of slide design, recording, and editing for a short 2 to 10 minutes video. Unlike natural video, presentation video generation involves distinctive challenges: inputs from research papers, dense multi-modal information (text, figures, tables), and the need to coordinate multiple aligned channels such as slides, subtitles, speech, and human talker. To address these challenges, we introduce Paper2Video, the first benchmark of 101 research papers paired with author-created presentation videos, slides, and speaker metadata. We further design four tailored evaluation metrics--Meta Similarity, PresentArena, PresentQuiz, and IP Memory--to measure how videos convey the paper's information to the audience. Building on this foundation, we propose PaperTalker, the first multi-agent framework for academic presentation video generation. It integrates slide generation with effective layout refinement by a novel effective tree search visual choice, cursor grounding, subtitling, speech synthesis, and talking-head rendering, while parallelizing slide-wise generation for efficiency. Experiments on Paper2Video demonstrate that the presentation videos produced by our approach are more faithful and informative than existing baselines, establishing a practical step toward automated and ready-to-use academic video generation. Our dataset, agent, and code are available at https://github.com/showlab/Paper2Video.


Learning GUI Grounding with Spatial Reasoning from Visual Feedback

Zhao, Yu, Chen, Wei-Ning, Inan, Huseyin Atahan, Kessler, Samuel, Wang, Lu, Wutschitz, Lukas, Yang, Fangkai, Zhang, Chaoyun, Minervini, Pasquale, Rajmohan, Saravan, Sim, Robert

arXiv.org Artificial Intelligence

Graphical User Interface (GUI) grounding is commonly framed as a coordinate prediction task -- given a natural language instruction, generate on-screen coordinates for actions such as clicks and keystrokes. However, recent Vision Language Models (VLMs) often fail to predict accurate numeric coordinates when processing high-resolution GUI images with complex layouts. To address this issue, we reframe GUI grounding as an \emph{interactive search task}, where the VLM generates actions to move a cursor in the GUI to locate UI elements. At each step, the model determines the target object, evaluates the spatial relations between the cursor and the target, and moves the cursor closer to the target conditioned on the movement history. In this interactive process, the rendered cursor provides visual feedback to help the model align its predictions with the corresponding on-screen locations. We train our GUI grounding model, GUI-Cursor, using multi-step online reinforcement learning with a dense trajectory-based reward function. Our experimental results show that GUI-Cursor, based on Qwen2.5-VL-7B, improves the GUI grounding accuracy and achieves state-of-the-art results on ScreenSpot-v2 ($88.8\% \rightarrow 93.9\%$) and ScreenSpot-Pro ($26.8\% \rightarrow 56.5\%$). Moreover, we observe that GUI-Cursor learns to solve the problem within two steps for 95\% of instances and can adaptively conduct more steps on more difficult examples.


Researchers created a soft squeezable computer mouse

Popular Science

'The mouse is long overdue for reinvention.' Breakthroughs, discoveries, and DIY tips sent every weekday. Many of us subscribe to the old adage, "If it ain't broke, don't fix it." But what if that something was actually broken all along and we just didn't realize it? That's the argument presented in an upcoming issue of the journal by researchers from Nazarbayev University in Kazakhstan.


BiND: A Neural Discriminator-Decoder for Accurate Bimanual Trajectory Prediction in Brain-Computer Interfaces

Robert, Timothee, Shaeri, MohammadAli, Shoaran, Mahsa

arXiv.org Artificial Intelligence

-- Decoding bimanual hand movements from in-tracortical recordings remains a critical challenge for brain-computer interfaces (BCIs), due to overlapping neural representations and nonlinear interlimb interactions. We introduce BiND (Bimanual Neural Discriminator-Decoder), a two-stage model that first classifies motion type (unimanual left, unimanual right, or bimanual) and then uses specialized GRU-based decoders--augmented with a trial-relative time index--to predict continuous 2D hand velocities. It also demonstrates greater robustness to session variability than all other benchmarked models, with accuracy improvements of up to 4% compared to GRU in cross-session analyses. This highlights the effectiveness of task-aware discrimination and temporal modeling in enhancing bimanual decoding. According to the World Health Organization (WHO), neurological conditions such as stroke and brain injuries affect over one-third of the global population and represent a leading cause of disability [1], [2]. Around 2% of people worldwide require rehabilitation or assistive technologies [3], [4], often due to motor impairments from spinal cord injuries, stroke, or related disorders, which can lead to partial or complete paralysis and severely impact quality of life.


Hell or High Water: Evaluating Agentic Recovery from External Failures

Wang, Andrew, Hager, Sophia, Asija, Adi, Khashabi, Daniel, Andrews, Nicholas

arXiv.org Artificial Intelligence

As language model agents are applied to real world problems of increasing complexity, they will be expected to formulate plans across large search spaces. If those plans fail for reasons beyond their control, how well do language agents search for alternative ways to achieve their goals? We devise a specialized agentic planning benchmark to study this question. Each planning problem is solved via combinations of function calls. The agent searches for relevant functions from a set of over four thousand possibilities, and observes environmental feedback in the form of function outputs or error messages. Our benchmark confronts the agent with external failures in its workflow, such as functions that suddenly become unavailable. At the same time, even with the introduction of these failures, we guarantee that the task remains solvable. Ideally, an agent's performance on the planning task should not be affected by the presence of external failures. Overall, we find that language agents struggle to formulate and execute backup plans in response to environment feedback. While state-of-the-art models are often able to identify the correct function to use in the right context, they struggle to adapt to feedback from the environment and often fail to pursue alternate courses of action, even when the search space is artificially restricted. We provide a systematic analysis of the failures of both open-source and commercial models, examining the effects of search space size, as well as the benefits of scaling model size in our setting. Our analysis identifies key challenges for current generative models as well as promising directions for future work.


Neuralink's first female patient reveals shocking effect of brain chip

Daily Mail - Science & tech

A woman who has been fully paralyzed for the last 20 years has regained the ability to use a computer, marking a world-first for Elon Musk's company, Neuralink. Thanks to Neuralink's revolutionary implant, Audrey Crews revealed on X how she was able to write her name on a computer screen. 'I tried writing my name for the first time in 20 years. Lol,' Crews posted on X while showing the world her first attempt at a signature since 2005. Using the brain-computer interface (BCI), the implant recipient chose a purple-colored cursor pen to write the name'Audrey' on the screen in cursive script.