Goto

Collaborating Authors

 groove


Group Contrastive Learning for Weakly Paired Multimodal Data

Gorla, Aditya, Van Assel, Hugues, Huetter, Jan-Christian, Yao, Heming, Cho, Kyunghyun, Regev, Aviv, Littman, Russell

arXiv.org Machine Learning

We present GROOVE, a semi-supervised multi-modal representation learning approach for high-content perturbation data where samples across modalities are weakly paired through shared perturbation labels but lack direct correspondence. Our primary contribution is GroupCLIP, a novel group-level contrastive loss that bridges the gap between CLIP for paired cross-modal data and SupCon for uni-modal supervised contrastive learning, addressing a fundamental gap in contrastive learning for weakly-paired settings. We integrate GroupCLIP with an on-the-fly backtranslating autoencoder framework to encourage cross-modally entangled representations while maintaining group-level coherence within a shared latent space. Critically, we introduce a comprehensive combinatorial evaluation framework that systematically assesses representation learners across multiple optimal transport aligners, addressing key limitations in existing evaluation strategies. This framework includes novel simulations that systematically vary shared versus modality-specific perturbation effects enabling principled assessment of method robustness. Our combinatorial benchmarking reveals that there is not yet an aligner that uniformly dominates across settings or modality pairs. Across simulations and two real single-cell genetic perturbation datasets, GROOVE performs on par with or outperforms existing approaches for downstream cross-modal matching and imputation tasks. Our ablation studies demonstrate that GroupCLIP is the key component driving performance gains. These results highlight the importance of leveraging group-level constraints for effective multi-modal representation learning in scenarios where only weak pairing is available.




Visio-Verbal Teleimpedance Interface: Enabling Semi-Autonomous Control of Physical Interaction via Eye Tracking and Speech

Jekel, Henk H. A., Rosales, Alejandro Díaz, Peternel, Luka

arXiv.org Artificial Intelligence

The paper presents a visio-verbal teleimpedance interface for commanding 3D stiffness ellipsoids to the remote robot with a combination of the operator's gaze and verbal interaction. The gaze is detected by an eye-tracker, allowing the system to understand the context in terms of what the operator is currently looking at in the scene. Along with verbal interaction, a Visual Language Model (VLM) processes this information, enabling the operator to communicate their intended action or provide corrections. Based on these inputs, the interface can then generate appropriate stiffness matrices for different physical interaction actions. To validate the proposed visio-verbal teleimpedance interface, we conducted a series of experiments on a setup including a Force Dimension Sigma.7 haptic device to control the motion of the remote Kuka LBR iiwa robotic arm. The human operator's gaze is tracked by Tobii Pro Glasses 2, while human verbal commands are processed by a VLM using GPT-4o. The first experiment explored the optimal prompt configuration for the interface. The second and third experiments demonstrated different functionalities of the interface on a slide-in-the-groove task.


A Strawberry Harvesting Tool with Minimal Footprint

Sorour, Mohamed, Heshmat, Mohamed, Elgeneidy, Khaled, From, Pål Johan

arXiv.org Artificial Intelligence

In this paper, a novel prototype for harvesting table-top grown strawberries is presented, that is minimalist in its footprint interacting with the fruit. In our methodology, a smooth trapper manipulates the stem into a precise groove location at which a distant laser beam is focused. The tool reaches temperatures as high as 188° Celsius and as such killing germs and preventing the spread of local plant diseases. The burnt stem wound preserves water content and in turn the fruit shelf life. Cycle and cut times achieved are 5.56 and 2.88 seconds respectively in successful in-door harvesting demonstration. Extensive experiments are performed to optimize the laser spot diameter and lateral speed against the cutting time.


Not that Groove: Zero-Shot Symbolic Music Editing

Zhang, Li

arXiv.org Artificial Intelligence

Most work in AI music generation focused on audio, which has seen limited use in the music production industry due to its rigidity. To maximize flexibility while assuming only textual instructions from producers, we are among the first to tackle symbolic music editing. We circumvent the known challenge of lack of labeled data by proving that LLMs with zero-shot prompting can effectively edit drum grooves. The recipe of success is a creatively designed format that interfaces LLMs and music, while we facilitate evaluation by providing an evaluation dataset with annotated unit tests that highly aligns with musicians' judgment.


Laboratory Automation: Precision Insertion with Adaptive Fingers utilizing Contact through Sliding with Tactile-based Pose Estimation

Pai, Sameer, Takahashi, Kuniyuki, Masuda, Shimpei, Fukaya, Naoki, Yamane, Koki, Ummadisingu, Avinash

arXiv.org Artificial Intelligence

Micro well-plates are commonly used apparatus in chemical and biological experiments that are a few centimeters in thickness with wells in them. The task we aim to solve is to place (insert) them onto a well-plate holder with grooves a few millimeters in height. Our insertion task has the following facets: 1) There is uncertainty in the detection of the position and pose of the well-plate and well-plate holder, 2) the accuracy required is in the order of millimeter to sub-millimeter, 3) the well-plate holder is not fastened, and moves with external force, 4) the groove is shallow, and 5) the width of the groove is small. Addressing these challenges, we developed a) an adaptive finger gripper with accurate detection of finger position (for (1)), b) grasped object pose estimation using tactile sensors (for (1)), c) a method to insert the well-plate into the target holder by sliding the well-plate while maintaining contact with the edge of the holder (for (2-4)), and d) estimating the orientation of the edge and aligning the well-plate so that the holder does not move when maintaining contact with the edge (for (5)). We show a significantly high success rate on the insertion task of the well-plate, even though under added noise. An accompanying video is available at the following link: https://drive.google.com/file/d/1UxyJ3XIxqXPnHcpfw-PYs5T5oYQxoc6i/view?usp=sharing


Language Models are Drummers: Drum Composition with Natural Language Pre-Training

Zhang, Li, Callison-Burch, Chris

arXiv.org Artificial Intelligence

Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.


FIRST LOOK: Cleveland's Launcher XL irons get the Artificial Intelligence treatment

#artificialintelligence

Don't be deceived by the sleek look: Cleveland Golf's Launcher XL irons are designed for the game-improvement golfer who needs an abundance of forgiveness and technology. The hollow cavity construction features a new variable-thickness Mainframe face that was created using Artificial Intelligence. In recent years, AI has played a larger role in the club design process as manufacturers have continued to push the boundaries, particularly when it comes to face construction. With AI taking the lead on face design, Cleveland's engineering team focused on improving the common high-toe mishit for mid-to-high handicap golfers. Compared to the previous generation, Launcher XL offers a 15 percent increase in MOI (a measure of forgiveness) on high-toe strikes in an effort to tighten the distance loss delta.


Meet Macaulay Culkin, retro video game nerd

Engadget

Macaulay Culkin is roughly 10 years behind when it comes to video games. The most up-to-date console he owns is an Xbox 360, which he plugs in mostly to beat Mass Effect 2 again or blast through swarms of zombies in Left 4 Dead with his younger brother. The most modern game in his rotation right now is 2014's South Park: The Stick of Truth, which he's about 15 percent of the way through. "I play more old-school kind of things," Culkin said. "I play a lot more Nintendo and Super Nintendo games than I do probably anything else."