Goto

Collaborating Authors

 grit


NuBench: An Open Benchmark for Deep Learning-Based Event Reconstruction in Neutrino Telescopes

Orsoe, Rasmus F., Meighen-Berger, Stephan, Lazar, Jeffrey, Prado, Jorge, Mozun-Mateo, Ivan, Rosted, Aske, Weigel, Philip, Anaya, Arturo Llorente

arXiv.org Artificial Intelligence

Neutrino telescopes are large-scale detectors designed to observe Cherenkov radiation produced from neutrino interactions in water or ice. They exist to identify extraterrestrial neutrino sources and to probe fundamental questions pertaining to the elusive neutrino itself. A central challenge common across neutrino telescopes is to solve a series of inverse problems known as event reconstruction, which seeks to resolve properties of the incident neutrino, based on the detected Cherenkov light. In recent times, significant efforts have been made in adapting advances from deep learning research to event reconstruction, as such techniques provide several benefits over traditional methods. While a large degree of similarity in reconstruction needs and low-level data exists, cross-experimental collaboration has been hindered by a lack of diverse open-source datasets for comparing methods. We present NuBench, an open benchmark for deep learning-based event reconstruction in neutrino telescopes. NuBench comprises seven large-scale simulated datasets containing nearly 130 million charged- and neutral-current muon-neutrino interactions spanning 10 GeV to 100 TeV, generated across six detector geometries inspired by existing and proposed experiments. These datasets provide pulse- and event-level information suitable for developing and comparing machine-learning reconstruction methods in both water and ice environments. Using NuBench, we evaluate four reconstruction algorithms - ParticleNeT and DynEdge, both actively used within the KM3NeT and IceCube collaborations, respectively, along with GRIT and DeepIce - on up to five core tasks: energy and direction reconstruction, topology classification, interaction vertex prediction, and inelasticity estimation.


I built this 'AI aunt' for women after family tragedy in South Africa

BBC News

I built this'AI aunt' for women after family tragedy in South Africa A gruesome killing in her own family inspired South African Leonora Tima to create a digital platform where people, mostly women, can talk about and track abuse. Leonora's relative was just 19 years old, and nine months pregnant, when she was killed, her body dumped on the side of a highway near Cape Town in 2020. I work in the development sector, so I've seen violence, Leonora says. But what stood out for me was that my family member's violent death was seen as so normal in South African society. Her death wasn't published by any news outlet because the sheer volume of these cases in our country is such that it doesn't qualify as news.


GRITS: A Spillage-Aware Guided Diffusion Policy for Robot Food Scooping Tasks

Tai, Yen-Ling, Yang, Yi-Ru, Yu, Kuan-Ting, Chao, Yu-Wei, Chen, Yi-Ting

arXiv.org Artificial Intelligence

Robotic food scooping is a critical manipulation skill for food preparation and service robots. However, existing robot learning algorithms, especially learn-from-demonstration methods, still struggle to handle diverse and dynamic food states, which often results in spillage and reduced reliability. In this work, we introduce GRITS: A Spillage-Aware Guided Diffusion Policy for Robot Food Scooping Tasks. This framework leverages guided diffusion policy to minimize food spillage during scooping and to ensure reliable transfer of food items from the initial to the target location. Specifically, we design a spillage predictor that estimates the probability of spillage given current observation and action rollout. The predictor is trained on a simulated dataset with food spillage scenarios, constructed from four primitive shapes (spheres, cubes, cones, and cylinders) with varied physical properties such as mass, friction, and particle size. At inference time, the predictor serves as a differentiable guidance signal, steering the diffusion sampling process toward safer trajectories while preserving task success. We validate GRITS on a real-world robotic food scooping platform. GRITS is trained on six food categories and evaluated on ten unseen categories with different shapes and quantities. GRITS achieves an 82% task success rate and a 4% spillage rate, reducing spillage by over 40% compared to baselines without guidance, thereby demonstrating its effectiveness.


GRIT: Graph-Regularized Logit Refinement for Zero-shot Cell Type Annotation

Hu, Tianxiang, Zhou, Chenyi, Liu, Jiaxiang, Wang, Jiongxin, Chen, Ruizhe, Xia, Haoxiang, Wang, Gaoang, Wu, Jian, Liu, Zuozhu

arXiv.org Artificial Intelligence

Cell type annotation is a fundamental step in the analysis of single-cell RNA sequencing (scRNA-seq) data. In practice, human experts often rely on the structure revealed by principal component analysis (PCA) followed by $k$-nearest neighbor ($k$-NN) graph construction to guide annotation. While effective, this process is labor-intensive and does not scale to large datasets. Recent advances in CLIP-style models offer a promising path toward automating cell type annotation. By aligning scRNA-seq profiles with natural language descriptions, models like LangCell enable zero-shot annotation. While LangCell demonstrates decent zero-shot performance, its predictions remain suboptimal, particularly in achieving consistent accuracy across all cell types. In this paper, we propose to refine the zero-shot logits produced by LangCell through a graph-regularized optimization framework. By enforcing local consistency over the task-specific PCA-based k-NN graph, our method combines the scalability of the pre-trained models with the structural robustness relied upon in expert annotation. We evaluate our approach on 14 annotated human scRNA-seq datasets from 4 distinct studies, spanning 11 organs and over 200,000 single cells. Our method consistently improves zero-shot annotation accuracy, achieving accuracy gains of up to 10%. Further analysis showcase the mechanism by which GRIT effectively propagates correct signals through the graph, pulling back mislabeled cells toward more accurate predictions. The method is training-free, model-agnostic, and serves as a simple yet effective plug-in for enhancing automated cell type annotation in practice.


GRIT: Graph Transformer For Internal Ice Layer Thickness Prediction

Liu, Zesheng, Rahnemoonfar, Maryam

arXiv.org Artificial Intelligence

Gaining a deeper understanding of the thickness and variability of internal ice layers in Radar imagery is essential in monitoring the snow accumulation, better evaluating ice dynamics processes, and minimizing uncertainties in climate models. Radar sensors, capable of penetrating ice, capture detailed radargram images of internal ice layers. In this work, we introduce GRIT, graph transformer for ice layer thickness. GRIT integrates an inductive geometric graph learning framework with an attention mechanism, designed to map the relationships between shallow and deeper ice layers. Compared to baseline graph neural networks, GRIT demonstrates consistently lower prediction errors. These results highlight the attention mechanism's effectiveness in capturing temporal changes across ice layers, while the graph transformer combines the strengths of transformers for learning long-range dependencies with graph neural networks for capturing spatial patterns, enabling robust modeling of complex spatiotemporal dynamics.


GRIT: Teaching MLLMs to Think with Images

Fan, Yue, He, Xuehai, Yang, Diji, Zheng, Kaizhi, Kuo, Ching-Chen, Zheng, Yuting, Narayanaraju, Sravana Jyothi, Guan, Xinze, Wang, Xin Eric

arXiv.org Artificial Intelligence

Recent studies have demonstrated the efficacy of using Reinforcement Learning (RL) in building reasoning models that articulate chains of thoughts prior to producing final answers. However, despite ongoing advances that aim at enabling reasoning for vision-language tasks, existing open-source visual reasoning models typically generate reasoning content with pure natural language, lacking explicit integration of visual information. This limits their ability to produce clearly articulated and visually grounded reasoning chains. To this end, we propose Grounded Reasoning with Images and Texts (GRIT), a novel method for training MLLMs to think with images. GRIT introduces a grounded reasoning paradigm, in which models generate reasoning chains that interleave natural language and explicit bounding box coordinates. These coordinates point to regions of the input image that the model consults during its reasoning process. Additionally, GRIT is equipped with a reinforcement learning approach, GRPO-GR, built upon the GRPO algorithm. GRPO-GR employs robust rewards focused on the final answer accuracy and format of the grounded reasoning output, which eliminates the need for data with reasoning chain annotations or explicit bounding box labels. As a result, GRIT achieves exceptional data efficiency, requiring as few as 20 image-question-answer triplets from existing datasets. Comprehensive evaluations demonstrate that GRIT effectively trains MLLMs to produce coherent and visually grounded reasoning chains, showing a successful unification of reasoning and grounding abilities.


EDEN: Empathetic Dialogues for English learning

Siyan, Li, Shao, Teresa, Yu, Zhou, Hirschberg, Julia

arXiv.org Artificial Intelligence

Dialogue systems have been used as conversation partners in English learning, but few have studied whether these systems improve learning outcomes. Student passion and perseverance, or grit, has been associated with language learning success. Recent work establishes that as students perceive their English teachers to be more supportive, their grit improves. Hypothesizing that the same pattern applies to English-teaching chatbots, we create EDEN, a robust open-domain chatbot for spoken conversation practice that provides empathetic feedback. To construct EDEN, we first train a specialized spoken utterance grammar correction model and a high-quality social chit-chat conversation model. We then conduct a preliminary user study with a variety of strategies for empathetic feedback. Our experiment suggests that using adaptive empathetic feedback leads to higher perceived affective support, which, in turn, predicts increased student grit.


A Dynamical View of the Question of Why

Fatemi, Mehdi, Gowda, Sindhu

arXiv.org Artificial Intelligence

We address causal reasoning in multivariate time series data generated by stochastic processes. Existing approaches are largely restricted to static settings, ignoring the continuity and emission of variations across time. In contrast, we propose a learning paradigm that directly establishes causation between events in the course of time. We present two key lemmas to compute causal contributions and frame them as reinforcement learning problems. Our approach offers formal and computational tools for uncovering and quantifying causal relationships in diffusion processes, subsuming various important settings such as discrete-time Markov decision processes. Finally, in fairly intricate experiments and through sheer learning, our framework reveals and quantifies causal links, which otherwise seem inexplicable.


TechScape: AI's dark arts come into their own

The Guardian

Programming a computer is, if you squint, a bit like magic. You have to learn the words to the spell to convince a carefully crafted lump of sand to do what you want. If you understand the rules deeply enough, you can chain together the spells to force the sand to do ever more complicated tasks. If your spell is long and well-crafted enough, you can even give the sand the illusion of sentience. That illusion of sentience is nowhere more strong than in the world of machine learning, where text generation engines like GPT-3 and LaMDA are able to hold convincing conversations, answer detailed questions, and perform moderately complex tasks based on just a written request.


GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features

Nguyen, Van-Quang, Suganuma, Masanori, Okatani, Takayuki

arXiv.org Artificial Intelligence

Current state-of-the-art methods for image captioning employ region-based features, as they provide object-level information that is essential to describe the content of images; they are usually extracted by an object detector such as Faster R-CNN. However, they have several issues, such as lack of contextual information, the risk of inaccurate detection, and the high computational cost. The first two could be resolved by additionally using grid-based features. However, how to extract and fuse these two types of features is uncharted. This paper proposes a Transformer-only neural architecture, dubbed GRIT (Grid- and Region-based Image captioning Transformer), that effectively utilizes the two visual features to generate better captions. GRIT replaces the CNN-based detector employed in previous methods with a DETR-based one, making it computationally faster. Moreover, its monolithic design consisting only of Transformers enables end-to-end training of the model. This innovative design and the integration of the dual visual features bring about significant performance improvement. The experimental results on several image captioning benchmarks show that GRIT outperforms previous methods in inference accuracy and speed.