Goto

Collaborating Authors

 slider


FouRA: Fourier Low Rank Adaptation

Neural Information Processing Systems

While Low-Rank Adaptation (LoRA) has proven beneficial for efficiently fine-tuning large models, LoRA fine-tuned text-to-image diffusion models lack diversity in the generated images, as the model tends to copy data from the observed training samples.




Semantic Document Derendering: SVG Reconstruction via Vision-Language Modeling

Hazimeh, Adam, Wang, Ke, Collier, Mark, Baechler, Gilles, Kokiopoulou, Efi, Frossard, Pascal

arXiv.org Artificial Intelligence

Multimedia documents such as slide presentations and posters are designed to be interactive and easy to modify. Yet, they are often distributed in a static raster format, which limits editing and customization. Restoring their editability requires converting these raster images back into structured vector formats. However, existing geometric raster-vectorization methods, which rely on low-level primitives like curves and polygons, fall short at this task. Specifically, when applied to complex documents like slides, they fail to preserve the high-level structure, resulting in a flat collection of shapes where the semantic distinction between image and text elements is lost. To overcome this limitation, we address the problem of semantic document derendering by introducing SliDer, a novel framework that uses Vision-Language Models (VLMs) to derender slide images as compact and editable Scalable Vector Graphic (SVG) representations. SliDer detects and extracts attributes from individual image and text elements in a raster input and organizes them into a coherent SVG format. Crucially, the model iteratively refines its predictions during inference in a process analogous to human design, generating SVG code that more faithfully reconstructs the original raster upon rendering. Furthermore, we introduce Slide2SVG, a novel dataset comprising raster-SVG pairs of slide documents curated from real-world scientific presentations, to facilitate future research in this domain. Our results demonstrate that SliDer achieves a reconstruction LPIPS of 0.069 and is favored by human evaluators in 82.9% of cases compared to the strongest zero-shot VLM baseline.


A Two-Layer Electrostatic Film Actuator with High Actuation Stress and Integrated Brake

Wang, Huacen, Wang, Hongqiang

arXiv.org Artificial Intelligence

Robotic systems driven by conventional motors often suffer from challenges such as large mass, complex control algorithms, and the need for additional braking mechanisms, which limit their applications in lightweight and compact robotic platforms. Electrostatic film actuators offer several advantages, including thinness, flexibility, lightweight construction, and high open-loop positioning accuracy. However, the actuation stress exhibited by conventional actuators in air still needs improvement, particularly for the widely used three-phase electrode design. To enhance the output performance of actuators, this paper presents a two-layer electrostatic film actuator with an integrated brake. By alternately distributing electrodes on both the top and bottom layers, a smaller effective electrode pitch is achieved under the same fabrication constraints, resulting in an actuation stress of approximately 241~N/m$^2$, representing a 90.5\% improvement over previous three-phase actuators operating in air. Furthermore, its integrated electrostatic adhesion mechanism enables load retention under braking mode. Several demonstrations, including a tug-of-war between a conventional single-layer actuator and the proposed two-layer actuator, a payload operation, a one-degree-of-freedom robotic arm, and a dual-mode gripper, were conducted to validate the actuator's advantageous capabilities in both actuation and braking modes.


How AI Fails: An Interactive Pedagogical Tool for Demonstrating Dialectal Bias in Automated Toxicity Models

Ghimire, Subhojit

arXiv.org Artificial Intelligence

Now that AI-driven moderation has become pervasive in everyday life, we often hear claims that "the AI is biased". While this is often said jokingly, the light-hearted remark reflects a deeper concern. How can we be certain that an online post flagged as "inappropriate" was not simply the victim of a biased algorithm? This paper investigates this problem using a dual approach. First, I conduct a quantitative benchmark of a widely used toxicity model (unitary/toxic-bert) to measure performance disparity between text in African-American English (AAE) and Standard American English (SAE). The benchmark reveals a clear, systematic bias: on average, the model scores AAE text as 1.8 times more toxic and 8.8 times higher for "identity hate". Second, I introduce an interactive pedagogical tool that makes these abstract biases tangible. The tool's core mechanic, a user-controlled "sensitivity threshold," demonstrates that the biased score itself is not the only harm; instead, the more-concerning harm is the human-set, seemingly neutral policy that ultimately operationalises discrimination. This work provides both statistical evidence of disparate impact and a public-facing tool designed to foster critical AI literacy.


FreeSliders: Training-Free, Modality-Agnostic Concept Sliders for Fine-Grained Diffusion Control in Images, Audio, and Video

Ezra, Rotem, Zisling, Hedi, Berman, Nimrod, Naiman, Ilan, Gorkor, Alexey, Nochumsohn, Liran, Nachmani, Eliya, Azencot, Omri

arXiv.org Artificial Intelligence

Diffusion models have become state-of-the-art generative models for images, audio, and video, yet enabling fine-grained controllable generation, i.e., continuously steering specific concepts without disturbing unrelated content, remains challenging. Concept Sliders (CS) offer a promising direction by discovering semantic directions through textual contrasts, but they require per-concept training and architecture-specific fine-tuning (e.g., LoRA), limiting scalability to new modalities. In this work we introduce FreeSliders, a simple yet effective approach that is fully training-free and modality-agnostic, achieved by partially estimating the CS formula during inference. To support modality-agnostic evaluation, we extend the CS benchmark to include both video and audio, establishing the first suite for fine-grained concept generation control with multiple modalities. We further propose three evaluation properties along with new metrics to improve evaluation quality. Finally, we identify an open problem of scale selection and non-linear traversals and introduce a two-stage procedure that automatically detects saturation points and reparameterizes traversal for perceptually uniform, semantically meaningful edits. Extensive experiments demonstrate that our method enables plug-and-play, training-free concept control across modalities, improves over existing baselines, and establishes new tools for principled controllable generation. An interactive presentation of our benchmark and method is available at: https://azencot-group.github.io/FreeSliders/. Diffusion models have emerged as state-of-the-art generative models, capable of producing realistic and diverse outputs across images, audio, and video (Rombach et al., 2022; Ho et al., 2022; Shi et al., 2023). Beyond generating high-quality samples, a central task is controllable generation, the ability to steer the generative process along user-specified signals (Liu et al., 2023; Ho et al., 2022). In particular, text-to-x, where x is a certain modality, has emerged as a powerful control signal for generative models, offering an intuitive human interface and enabling semantically aligned control (Zhang et al., 2023a;b). This text-guided capability plays a central role in creative applications, allowing users to produce high-quality content without requiring technical knowledge or professional design skills.


Sora Has Lost Its App Store Crown to Drake and Free Chicken

WIRED

Dave's Hot Chicken is the top app in the iOS App Store, ending Sora's weeks-long reign. On Friday, its reign came to an end. Your new champion is Dave's Hot Chicken. Dave's Hot Chicken now rules over the App Store, where its slack-beaked, bug-eyed mascot icon expresses appropriate surprise at its ascent. How did it break the grasp of OpenAI's golem TikTok?