Goto

Collaborating Authors

 ruler


A Practical, Progressively-Expressive GNN

Neural Information Processing Systems

Message passing neural networks (MPNNs) have become a dominant flavor of graph neural networks (GNNs) in recent years. Yet, MPNNs come with notable limitations; namely, they are at most as powerful as the 1-dimensional Weisfeiler-Leman (1-WL) test in distinguishing graphs in a graph isomorphism testing frame-work. To this end, researchers have drawn inspiration from the k-WL hierarchy to develop more expressive GNNs. However, current k-WL-equivalent GNNs are not practical for even small values of k, as k-WL becomes combinatorially more complex as k grows. At the same time, several works have found great empirical success in graph learning tasks without highly expressive models, implying that chasing expressiveness with a "coarse-grained ruler" of expressivity like k-WL is often unneeded in practical tasks.


Detection and Measurement of Hailstones with Multimodal Large Language Models

Alker, Moritz, Schedl, David C., Stöckl, Andreas

arXiv.org Artificial Intelligence

This study examines the use of social media and news images to detect and measure hailstones, utilizing pre-trained multimodal large language models. The dataset for this study comprises 474 crowdsourced images of hailstones from documented hail events in Austria, which occurred between January 2022 and September 2024. These hailstones have maximum diameters ranging from 2 to 11cm. We estimate the hail diameters and compare four different models utilizing one-stage and two-stage prompting strategies. The latter utilizes additional size cues from reference objects, such as human hands, within the image. Our results show that pretrained models already have the potential to measure hailstone diameters from images with an average mean absolute error of 1.12cm for the best model. In comparison to a single-stage prompt, two-stage prompting improves the reliability of most models. Our study suggests that these off-the-shelf models, even without fine-tuning, can complement traditional hail sensors by extracting meaningful and spatially dense information from social media imagery, enabling faster and more detailed assessments of severe weather events. The automated real-time image harvesting from social media and other sources remains an open task, but it will make our approach directly applicable to future hail events.


Mol-CADiff: Causality-Aware Autoregressive Diffusion for Molecule Generation

Ahamed, Md Atik, Ye, Qiang, Cheng, Qiang

arXiv.org Artificial Intelligence

The design of novel molecules with desired properties is a key challenge in drug discovery and materials science. Traditional methods rely on trial-and-error, while recent deep learning approaches have accelerated molecular generation. However, existing models struggle with generating molecules based on specific textual descriptions. We introduce Mol-CADiff, a novel diffusion-based framework that uses causal attention mechanisms for text-conditional molecular generation. Our approach explicitly models the causal relationship between textual prompts and molecular structures, overcoming key limitations in existing methods. We enhance dependency modeling both within and across modalities, enabling precise control over the generation process. Our extensive experiments demonstrate that Mol-CADiff outperforms state-of-the-art methods in generating diverse, novel, and chemically valid molecules, with better alignment to specified properties, enabling more intuitive language-driven molecular design.


A Practical, Progressively-Expressive GNN

Neural Information Processing Systems

Message passing neural networks (MPNNs) have become a dominant flavor of graph neural networks (GNNs) in recent years. Yet, MPNNs come with notable limitations; namely, they are at most as powerful as the 1-dimensional Weisfeiler-Leman (1-WL) test in distinguishing graphs in a graph isomorphism testing frame-work. To this end, researchers have drawn inspiration from the k-WL hierarchy to develop more expressive GNNs. However, current k-WL-equivalent GNNs are not practical for even small values of k, as k-WL becomes combinatorially more complex as k grows. At the same time, several works have found great empirical success in graph learning tasks without highly expressive models, implying that chasing expressiveness with a "coarse-grained ruler" of expressivity like k-WL is often unneeded in practical tasks.


Star Attention: Efficient LLM Inference over Long Sequences

Acharya, Shantanu, Jia, Fei, Ginsburg, Boris

arXiv.org Artificial Intelligence

Inference with Transformer-based Large Language Models (LLMs) on long sequences is both costly and slow due to the quadratic complexity of the selfattention mechanism. We introduce Star Attention, a two-phase block-sparse approximation that improves computational efficiency by sharding attention across multiple hosts while minimizing communication overhead. In the first phase, the context is processed using blockwise-local attention across hosts, in parallel. In the second phase, query and response tokens attend to all prior cached tokens through sequence-global attention. Star Attention integrates seamlessly with most Transformer-based LLMs trained with global attention, reducing memory requirements and inference time by up to 11x while preserving 95-100% of accuracy. Recent Large Language Models (LLMs) can support contexts up to millions of tokens in length (Gemini-Team, 2024; Anthropic, 2024; Meta-AI, 2024), unlocking applications such as repositorylevel code analysis, multi-document summarization, and large corpus retrieval. However, processing such long sequences with LLMs requires substantial computational and memory resources due to the quadratic complexity of the self-attention mechanism. To address these challenges, various techniques have been proposed to reduce memory usage and increase inference speed. For example, Flash Attention introduces an efficient GPU block-wise implementation of the global attention, achieving significant reductions in memory overhead and runtime (Dao et al., 2022; Dao, 2024).


MaskMedPaint: Masked Medical Image Inpainting with Diffusion Models for Mitigation of Spurious Correlations

Jin, Qixuan, Gerych, Walter, Ghassemi, Marzyeh

arXiv.org Artificial Intelligence

Spurious features associated with class labels can lead image classifiers to rely on shortcuts that don't generalize well to new domains. This is especially problematic in medical settings, where biased models fail when applied to different hospitals or systems. In such cases, data-driven methods to reduce spurious correlations are preferred, as clinicians can directly validate the modified images. While Denoising Diffusion Probabilistic Models (Diffusion Models) show promise for natural images, they are impractical for medical use due to the difficulty of describing spurious medical features. To address this, we propose Masked Medical Image Inpainting (MaskMedPaint), which uses text-to-image diffusion models to augment training images by inpainting areas outside key classification regions to match the target domain. We demonstrate that MaskMedPaint enhances generalization to target domains across both natural (Waterbirds, iWildCam) and medical (ISIC 2018, Chest X-ray) datasets, given limited unlabeled target images.


Learning secondary tool affordances of human partners using iCub robot's egocentric data

Ding, Bosong, Oztop, Erhan, Spigler, Giacomo, Kirtay, Murat

arXiv.org Artificial Intelligence

Objects, in particular tools, provide several action possibilities to the agents that can act on them, which are generally associated with the term of affordances. A tool is typically designed for a specific purpose, such as driving a nail in the case of a hammer, which we call as the primary affordance. A tool can also be used beyond its primary purpose, in which case we can associate this auxiliary use with the term secondary affordance. Previous work on affordance perception and learning has been mostly focused on primary affordances. Here, we address the less explored problem of learning the secondary tool affordances of human partners. To do this, we use the iCub robot to observe human partners with three cameras while they perform actions on twenty objects using four different tools. In our experiments, human partners utilize tools to perform actions that do not correspond to their primary affordances. For example, the iCub robot observes a human partner using a ruler for pushing, pulling, and moving objects instead of measuring their lengths. In this setting, we constructed a dataset by taking images of objects before and after each action is executed. We then model learning secondary affordances by training three neural networks (ResNet-18, ResNet-50, and ResNet-101) each on three tasks, using raw images showing the `initial' and `final' position of objects as input: (1) predicting the tool used to move an object, (2) predicting the tool used with an additional categorical input that encoded the action performed, and (3) joint prediction of both tool used and action performed. Our results indicate that deep learning architectures enable the iCub robot to predict secondary tool affordances, thereby paving the road for human-robot collaborative object manipulation involving complex affordances.


A Modular End-to-End Multimodal Learning Method for Structured and Unstructured Data

Alessandro, Marco D, Calabrés, Enrique, Elkano, Mikel

arXiv.org Artificial Intelligence

Multimodal learning is a rapidly growing research field that has revolutionized multitasking and generative modeling in AI. While much of the research has focused on dealing with unstructured data (e.g., language, images, audio, or video), structured data (e.g., tabular data, time series, or signals) has received less attention. However, many industry-relevant use cases involve or can be benefited from both types of data. In this work, we propose a modular, end-to-end multimodal learning method called MAGNUM, which can natively handle both structured and unstructured data. MAGNUM is flexible enough to employ any specialized unimodal module to extract, compress, and fuse information from all available modalities.


New survey reveals AI could drive humans to extinction - and top researchers say it would happen by dangerous groups engineering viruses, rulers controlling populations, or threatening economic inequality

Daily Mail - Science & tech

Many tech experts have warned that AI is on a path of destruction, but a new survey of top researchers has quantified the chances of it causing human extinction. A team of international scientists asked 2,778 AI experts about the future of the systems, with five percent reporting the tech will lead to collapse. But, a far more frightening estimation came from one in 10 researchers who said there's a shocking 25 percent chance that AI will destroy the human race. The experts cited three possible causes: AI allowing threatening groups to make powerful tools, like engineered viruses, 'authoritarian rulers using AI to control their populations and AI systems worsening economic inequality by disproportionately benefiting certain individuals.' Artificial intelligence regulation control is the only answer to protecting humans, and if AI isn't regulated, researchers estimated that there is a 10 percent chance that machines will outperform humans in all tasks by 2027, - but it would increase to a 50 percent chance by 2047.


K-12 curriculum 'socially engineering' millions into enraged young 'social justice warriors,' parents warn

FOX News

Fox News contributor Jonathan Turley reacts to a dean at Stanford Law School joining students in heckling a conservative judge on'America Reports.' EXCLUSIVE – A curriculum developed under Yale Medical School is using emotional persuasion tactics to trigger children attending thousands of public schools to become angry about social justice causes and aid them in developing an "intersectional identity," parents worry. Fox News Digital reviewed the tightly guarded curriculum, created by the Center for Emotional Intelligence at the medical school's Child Study Center. Yale's clients are forbidden from sharing its contents with anyone who is not employed at the district, according to the contract it has signed with partners. The lessons probed deeply and, oftentimes intrusively, into the student's emotions, personal relationships, traumas, beliefs and triggers. "Conversations around triggers and Meta-Moments are an excellent way to discuss power and privilege in who, in our society, is required to regulate more strictly in public spaces. Consider examining stereotypes in the context of emotional regulation as they relate to race, gender, sexuality, religion, and other forms of difference," the curriculum said.