Goto

Collaborating Authors

 yoda


Regression is all you need for medical image translation

Rassmann, Sebastian, Kügler, David, Ewert, Christian, Reuter, Martin

arXiv.org Artificial Intelligence

While Generative Adversarial Nets (GANs) and Diffusion Models (DMs) have achieved impressive results in natural image synthesis, their core strengths - creativity and realism - can be detrimental in medical applications, where accuracy and fidelity are paramount. These models instead risk introducing hallucinations and replication of unwanted acquisition noise. Here, we propose YODA (You Only Denoise once - or Average), a 2.5D diffusion-based framework for medical image translation (MIT). Consistent with DM theory, we find that conventional diffusion sampling stochastically replicates noise. To mitigate this, we draw and average multiple samples, akin to physical signal averaging. As this effectively approximates the DM's expected value, we term this Expectation-Approximation (ExpA) sampling. We additionally propose regression sampling YODA, which retains the initial DM prediction and omits iterative refinement to produce noise-free images in a single step. Across five diverse multi-modal datasets - including multi-contrast brain MRI and pelvic MRI-CT - we demonstrate that regression sampling is not only substantially more efficient but also matches or exceeds image quality of full diffusion sampling even with ExpA. Our results reveal that iterative refinement solely enhances perceptual realism without benefiting information translation, which we confirm in relevant downstream tasks. YODA outperforms eight state-of-the-art DMs and GANs and challenges the presumed superiority of DMs and GANs over computationally cheap regression models for high-quality MIT. Furthermore, we show that YODA-translated images are interchangeable with, or even superior to, physical acquisitions for several medical applications.


Disney and Universal Sue AI Company Midjourney for Copyright Infringement

WIRED

Disney and Universal have filed a lawsuit against Midjourney, alleging that the San Francisco–based AI image generation startup is a "bottomless pit of plagiarism" that generates "endless unauthorized copies" of the studios' work. The complaint includes dozens of images that purportedly demonstrate how Midjourney can conjure images featuring the studios' intellectual property. One image depicts Yoda from Star Wars holding a light saber, which it says was made by inputting the prompt "Yoda with lightsaber, IMAX." Another shows that typing "The Boss Baby" as a prompt allegedly resulted in an image of an animated child in a tuxedo closely resembling the protagonist of Universal's The Boss Baby franchise. "This is an extremely significant development," says IP lawyer Chad Hummel, who sees the compilation of images in the complaint as compelling evidence that "the output is not sufficiently transformative."


Loquacious Set: 25,000 Hours of Transcribed and Diverse English Speech Recognition Data for Research and Commercial Use

Parcollet, Titouan, Tseng, Yuan, Zhang, Shucong, van Dalen, Rogier

arXiv.org Artificial Intelligence

Automatic speech recognition (ASR) research is driven by the availability of common datasets between industrial researchers and academics, encouraging comparisons and evaluations. Lib-riSpeech, despite its long success as an ASR benchmark, is now limited by its size and focus on clean, read speech, leading to near-zero word error rates. More recent datasets, including MOSEL, YODAS, Gigaspeech, OWSM, Libriheavy or People's Speech suffer from major limitations including licenses that researchers in the industry cannot use, unreliable transcriptions, incorrect audio data, or the lack of evaluation sets. This work presents the Loquacious Set, a 25,000-hour curated collection of commercially usable English speech. Featuring hundreds of thousands of speakers with diverse accents and a wide range of speech types (read, spontaneous, talks, clean, noisy), the Loquacious Set is designed to work for academics and researchers in the industry to build ASR systems in real-world scenarios.


YODAS: Youtube-Oriented Dataset for Audio and Speech

Li, Xinjian, Takamichi, Shinnosuke, Saeki, Takaaki, Chen, William, Shiota, Sayaka, Watanabe, Shinji

arXiv.org Artificial Intelligence

In this study, we introduce YODAS (YouTube-Oriented Dataset for Audio and Speech), a large-scale, multilingual dataset comprising currently over 500k hours of speech data in more than 100 languages, sourced from both labeled and unlabeled YouTube speech datasets. The labeled subsets, including manual or automatic subtitles, facilitate supervised model training. Conversely, the unlabeled subsets are apt for self-supervised learning applications. YODAS is distinctive as the first publicly available dataset of its scale, and it is distributed under a Creative Commons license. We introduce the collection methodology utilized for YODAS, which contributes to the large-scale speech dataset construction. Subsequently, we provide a comprehensive analysis of speech, text contained within the dataset. Finally, we describe the speech recognition baselines over the top-15 languages.


Dynamic Attention-Guided Diffusion for Image Super-Resolution

Moser, Brian B., Frolov, Stanislav, Raue, Federico, Palacio, Sebastian, Dengel, Andreas

arXiv.org Artificial Intelligence

Diffusion models in image Super-Resolution (SR) treat all image regions with uniform intensity, which risks compromising the overall image quality. To address this, we introduce "You Only Diffuse Areas" (YODA), a dynamic attention-guided diffusion method for image SR. YODA selectively focuses on spatial regions using attention maps derived from the low-resolution image and the current time step in the diffusion process. This time-dependent targeting enables a more efficient conversion to high-resolution outputs by focusing on areas that benefit the most from the iterative refinement process, i.e., detail-rich objects. We empirically validate YODA by extending leading diffusion-based methods SR3 and SRDiff. Our experiments demonstrate new state-of-the-art performance in face and general SR across PSNR, SSIM, and LPIPS metrics. A notable finding is YODA's stabilization effect by reducing color shifts, especially when training with small batch sizes.


YODA: Teacher-Student Progressive Learning for Language Models

Lu, Jianqiao, Zhong, Wanjun, Wang, Yufei, Guo, Zhijiang, Zhu, Qi, Huang, Wenyong, Wang, Yanlin, Mi, Fei, Wang, Baojun, Wang, Yasheng, Shang, Lifeng, Jiang, Xin, Liu, Qun

arXiv.org Artificial Intelligence

Although large language models (LLMs) have demonstrated adeptness in a range of tasks, they still lag behind human learning efficiency. This disparity is often linked to the inherent human capacity to learn from basic examples, gradually generalize and handle more complex problems, and refine their skills with continuous feedback. Inspired by this, this paper introduces YODA, a novel teacher-student progressive learning framework that emulates the teacher-student education process to improve the efficacy of model fine-tuning. The framework operates on an interactive \textit{basic-generalized-harder} loop. The teacher agent provides tailored feedback on the student's answers, and systematically organizes the education process. This process unfolds by teaching the student basic examples, reinforcing understanding through generalized questions, and then enhancing learning by posing questions with progressively enhanced complexity. With the teacher's guidance, the student learns to iteratively refine its answer with feedback, and forms a robust and comprehensive understanding of the posed questions. The systematic procedural data, which reflects the progressive learning process of humans, is then utilized for model training. Taking math reasoning as a testbed, experiments show that training LLaMA2 with data from YODA improves SFT with significant performance gain (+17.01\% on GSM8K and +9.98\% on MATH). In addition, we find that training with curriculum learning further improves learning robustness.


Learn the Force We Can: Multi-Object Video Generation from Pixel-Level Interactions

Davtyan, Aram, Favaro, Paolo

arXiv.org Artificial Intelligence

We propose a novel unsupervised method to autoregressively generate videos from a single frame and a sparse motion input. Our trained model can generate realistic object-to-object interactions and separate the dynamics and the extents of multiple objects despite only observing them under correlated motion activities. Key components in our method are the randomized conditioning scheme, the encoding of the input motion control, and the randomized and sparse sampling to break correlations. Our model, which we call YODA, has the ability to move objects without physically touching them. We show both qualitatively and quantitatively that YODA accurately follows the user control, while yielding a video quality that is on par with or better than state of the art video generation prior work on several datasets. For videos, visit our project website https://araachie.github.io/yoda.


The "Emodiversity" of Star Wars - Facts So Romantic

Nautilus

This past "Star Wars Day," May 4, I watched some of the original trilogy a bit mournfully: Peter Mayhew, who played Chewbacca, passed away the day before. When The Empire Strikes Back took us to the Yoda-dwelling Dagobah, I recalled what the exiled Jedi Master had told premonition-plagued Anakin Skywalker decades earlier, about how to deal with the fear of losing loved ones. "Death is a natural part of life," he tells Anakin. "Rejoice for those around you who transform into the Force. Attachment leads to jealousy--the shadow of greed, that is." Yoda is often held up as an avatar or icon of sagacity.


The Yoda of Silicon Valley

#artificialintelligence

For half a century, the Stanford computer scientist Donald Knuth, who bears a slight resemblance to Yoda -- albeit standing 6-foot-4 and wearing glasses -- has reigned as the spirit-guide of the algorithmic realm. He is the author of "The Art of Computer Programming," a continuing four-volume opus that is his life's work. The first volume debuted in 1968, and the collected volumes (sold as a boxed set for about $250) were included by American Scientist in 2013 on its list of books that shaped the last century of science -- alongside a special edition of "The Autobiography of Charles Darwin," Tom Wolfe's "The Right Stuff," Rachel Carson's "Silent Spring" and monographs by Albert Einstein, John von Neumann and Richard Feynman. With more than one million copies in print, "The Art of Computer Programming" is the Bible of its field. "Like an actual bible, it is long and comprehensive; no other book is as comprehensive," said Peter Norvig, a director of research at Google.


1173

AI Magazine

We stare intensely at the robot with one eye, keeping the other one out for any surprises. It looks for the door and slowly starts moving into the room. Our minds seem to be sharing the same thought--" YODA, don't fail us now." We decided that the Office Navigation event in the robot competition was to be our first milestone in working toward this goal. It would provide us a context in which to direct our efforts.