Goto

Collaborating Authors

 Media


Enhancing multimodal analogical reasoning with Logic Augmented Generation

arXiv.org Artificial Intelligence

Recent advances in Large Language Models have demonstrated their capabilities across a variety of tasks. However, automatically extracting implicit knowledge from natural language remains a significant challenge, as machines lack active experience with the physical world. Given this scenario, semantic knowledge graphs can serve as conceptual spaces that guide the automated text generation reasoning process to achieve more efficient and explainable results. In this paper, we apply a logic-augmented generation (LAG) framework that leverages the explicit representation of a text through a semantic knowledge graph and applies it in combination with prompt heuristics to elicit implicit analogical connections. This method generates extended knowledge graph triples representing implicit meaning, enabling systems to reason on unlabeled multimodal data regardless of the domain. We validate our work through three metaphor detection and understanding tasks across four datasets, as they require deep analogical reasoning capabilities. The results show that this integrated approach surpasses current baselines, performs better than humans in understanding visual metaphors, and enables more explainable reasoning processes, though still has inherent limitations in metaphor understanding, especially for domain-specific metaphors. Furthermore, we propose a thorough error analysis, discussing issues with metaphorical annotations and current evaluation methods.


Reimagining Dance: Real-time Music Co-creation between Dancers and AI

arXiv.org Artificial Intelligence

Dance performance traditionally follows a unidirectional relationship where movement responds to music. While AI has advanced in various creative domains, its application in dance has primarily focused on generating choreography from musical input. We present a system that enables dancers to dynamically shape musical environments through their movements. Our multi-modal architecture creates a coherent musical composition by intelligently combining pre-recorded musical clips in response to dance movements, establishing a bidirectional creative partnership where dancers function as both performers and composers. Through correlation analysis of performance data, we demonstrate emergent communication patterns between movement qualities and audio features. This approach reconceptualizes the role of AI in performing arts as a responsive collaborator that expands possibilities for both professional dance performance and improvisational artistic expression across broader populations.


Enabling automatic transcription of child-centered audio recordings from real-world environments

arXiv.org Artificial Intelligence

Longform audio recordings obtained with microphones worn by children-also known as child-centered daylong recordings-have become a standard method for studying children's language experiences and their impact on subsequent language development. Transcripts of longform speech audio would enable rich analyses at various linguistic levels, yet the massive scale of typical longform corpora prohibits comprehensive manual annotation. At the same time, automatic speech recognition (ASR)-based transcription faces significant challenges due to the noisy, unconstrained nature of real-world audio, and no existing study has successfully applied ASR to transcribe such data. However, previous attempts have assumed that ASR must process each longform recording in its entirety. In this work, we present an approach to automatically detect those utterances in longform audio that can be reliably transcribed with modern ASR systems, allowing automatic and relatively accurate transcription of a notable proportion of all speech in typical longform data. We validate the approach on four English longform audio corpora, showing that it achieves a median word error rate (WER) of 0% and a mean WER of 18% when transcribing 13% of the total speech in the dataset. In contrast, transcribing all speech without any filtering yields a median WER of 52% and a mean WER of 51%. We also compare word log-frequencies derived from the automatic transcripts with those from manual annotations and show that the frequencies correlate at r = 0.92 (Pearson) for all transcribed words and r = 0.98 for words that appear at least five times in the automatic transcripts. Overall, the work provides a concrete step toward increasingly detailed automated linguistic analyses of child-centered longform audio.


A correlation-permutation approach for speech-music encoders model merging

arXiv.org Artificial Intelligence

Simply permuting shallower layers or attempting to permute all transformer components indiscriminately leads to suboptimal outcomes. C. Layer-wise Permutation Analysis To understand how structural alignment varies across the depth of the models, we examined the percentage of channels permuted on each layer considered for permutation on MERT when using the CNN + "fnn+atnn" setup. Results are shown in Table II, where interestingly, it can be seen that most layers are completely permuted with the exception of the first CNN layer of MERT, where only 30.86% of channels were reordered. This suggests that the initial feature representations learned by both MERT and HuBERT at this shallow depth share considerable similarity. It is plausible that this first layer in both models learns to extract fundamental, low-level acoustic features, akin to filterbank-like representations. Therefore, the internal channel ordering for these basic features might already be substantially aligned between the two independently trained models, necessitating fewer permutations.


A Step-by-Step Guide to Creating a Robust Autonomous Drone Testing Pipeline

arXiv.org Artificial Intelligence

Autonomous drones are rapidly reshaping industries ranging from aerial delivery and infrastructure inspection to environmental monitoring and disaster response. Ensuring the safety, reliability, and efficiency of these systems is paramount as they transition from research prototypes to mission-critical platforms. This paper presents a step-by-step guide to establishing a robust autonomous drone testing pipeline, covering each critical stage: Software-in-the-Loop (SIL) Simulation Testing, Hardware-in-the-Loop (HIL) Testing, Controlled Real-World Testing, and In-Field Testing. Using practical examples, including the marker-based autonomous landing system, we demonstrate how to systematically verify drone system behaviors, identify integration issues, and optimize performance. Furthermore, we highlight emerging trends shaping the future of drone testing, including the integration of Neurosymbolic and LLMs, creating co-simulation environments, and Digital Twin-enabled simulation-based testing techniques. By following this pipeline, developers and researchers can achieve comprehensive validation, minimize deployment risks, and prepare autonomous drones for safe and reliable real-world operations.


Control Architecture and Design for a Multi-robotic Visual Servoing System in Automated Manufacturing Environment

arXiv.org Artificial Intelligence

The use of robotic technology has drastically increased in manufacturing in the 21st century. But by utilizing their sensory cues, humans still outperform machines, especially in micro scale manufacturing, which requires high-precision robot manipulators. These sensory cues naturally compensate for high levels of uncertainties that exist in the manufacturing environment. Uncertainties in performing manufacturing tasks may come from measurement noise, model inaccuracy, joint compliance (e.g., elasticity), etc. Although advanced metrology sensors and high precision microprocessors, which are utilized in modern robots, have compensated for many structural and dynamic errors in robot positioning, a well-designed control algorithm still works as a comparable and cheaper alternative to reduce uncertainties in automated manufacturing. Our work illustrates that a multi-robot control system that simulates the positioning process for fastening and unfastening applications can reduce various uncertainties, which may occur in this process, to a great extent. In addition, most research papers in visual servoing mainly focus on developing control and observation architectures in various scenarios, but few have discussed the importance of the camera's location in the configuration. In a manufacturing environment, the quality of camera estimations may vary significantly from one observation location to another, as the combined effects of environmental conditions result in different noise levels of a single image shot at different locations. Therefore, in this paper, we also propose a novel algorithm for the camera's moving policy so that it explores the camera workspace and searches for the optimal location where the image noise level is minimized.


Google's AI video creator gets major upgrade. How to use it.

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. With every passing month, AI-generated content gets harder to distinguish from material made by human beings. Google's latest video maker is a case in point: The newly launched Veo 3 model is a step up in terms of realism, while also adding audio for the first time, so synced dialog, natural sound, and other audio effects can be added in. Google promises the new Veo 3 model has a better understanding of real world physics, and is smarter at turning your text prompts into video clips. Those clips are capped at eight seconds for now, and at a resolution of 720p--presumably because of the high computing (and environmental) demands of generating these videos.


Fox News AI Newsletter: Hollywood studios sue 'bottomless pit of plagiarism'

FOX News

The Minions pose during the world premiere of the film "Despicable Me 4" in New York City, June 9, 2024. The website of Midjourney, an artificial intelligence (AI) capable of creating AI art, is seen on a smartphone on April 3, 2023, in Berlin, Germany. 'PIRACY IS PIRACY': Two major Hollywood studios are suing Midjourney, a popular AI image generator, over its use and distribution of intellectual property. AI RACE: Meta CEO Mark Zuckerberg is reportedly building a team of experts to develop artificial general intelligence (AGI) that can meet or exceed human capabilities. TECH HUB: New York is poised to play a central role in the development of artificial intelligence (AI), OpenAI executives told key business and civic leaders on Tuesday.


Jennie Garth claims ex-husband Peter Facinelli's dating app age range matched their daughter's

FOX News

Jennie Garth told Fox News Digital that once her youngest child graduates from high school, she will be moving out of California. Jennie Garth is bringing up ex-Peter Facinelli's dating past, specifically about his time on the exclusive celebrity dating app, Raya. During a podcast interview, Garth, 53, claimed that the actor's age range was close to their eldest daughter, Luca, who is now 27. "My ex-husband Peter, I was told, was on Raya, and his age, whatever range, that he was looking for was also the age range of his oldest daughter," Garth shared on the "I Do, Part 2" podcast with Jana Kramer and guest J.P. Rosenbaum. "So, she came across him on her thing."


Empowering Users to Make Sustainability-Forward Decisions for Computing Services

Communications of the ACM

"Green consumerism," is the idea that consumers' decision to purchase or not to purchase an item is mediated at least partly by considering the item's impact on environmental or social conditions.9 In response to a general upswell of awareness around sustainable development goals (SDGs) and climate change, large corporations such as Google and Amazon have provided users with tangible sustainable choices. For example, when checking out on Amazon, if a customer has more than one item in an order, they can opt into Amazon Day, which delays shipping until all items are available to be placed in the same shipping box. Additionally, Amazon allows users to filter a search to only show items containing a climate pledge certification. Similarly, Google's search engine allows users to filter for low-emission options when searching for flights, as well as enabling users to search eco-certified hotels when booking travel plans.