Goto

Collaborating Authors

 noun phrase



Learning Egocentric In-Hand Object Segmentation through Weak Supervision from Human Narrations

Messina, Nicola, Leonardi, Rosario, Ciampi, Luca, Carrara, Fabio, Farinella, Giovanni Maria, Falchi, Fabrizio, Furnari, Antonino

arXiv.org Artificial Intelligence

Pixel-level recognition of objects manipulated by the user from egocentric images enables key applications spanning assistive technologies, industrial safety, and activity monitoring. However, progress in this area is currently hindered by the scarcity of annotated datasets, as existing approaches rely on costly manual labels. In this paper, we propose to learn human-object interaction detection leveraging narrations $\unicode{x2013}$ natural language descriptions of the actions performed by the camera wearer which contain clues about manipulated objects. We introduce Narration-Supervised in-Hand Object Segmentation (NS-iHOS), a novel task where models have to learn to segment in-hand objects by learning from natural-language narrations in a weakly-supervised regime. Narrations are then not employed at inference time. We showcase the potential of the task by proposing Weakly-Supervised In-hand Object Segmentation from Human Narrations (WISH), an end-to-end model distilling knowledge from narrations to learn plausible hand-object associations and enable in-hand object segmentation without using narrations at test time. We benchmark WISH against different baselines based on open-vocabulary object detectors and vision-language models. Experiments on EPIC-Kitchens and Ego4D show that WISH surpasses all baselines, recovering more than 50% of the performance of fully supervised methods, without employing fine-grained pixel-wise annotations. Code and data can be found at https://fpv-iplab.github.io/WISH.




In the beginning, based on the Up-Down model, we have attempted to implement the Constant Prophet Attention

Neural Information Processing Systems

We thank all the reviewers for the helpful comments. We will revise the paper to address your concerns. R1-Q1: The implementation seems straight-forward and the ablation analysis on the loss function. Thus we kept using L1 norm in the rest of experiments. We will conduct a systematic comparison between various loss functions in the next revision.


LINK-KG: LLM-Driven Coreference-Resolved Knowledge Graphs for Human Smuggling Networks

Meher, Dipak, Domeniconi, Carlotta, Correa-Cabrera, Guadalupe

arXiv.org Artificial Intelligence

Abstract--Human smuggling networks are complex and constantly evolving, making them difficult to analyze comprehensively. Legal case documents offer rich factual and procedural insights into these networks but are often long, unstructured, and filled with ambiguous or shifting references, posing significant challenges for automated knowledge graph (KG) construction. Existing methods either overlook coreference resolution or fail to scale beyond short text spans, leading to fragmented graphs and inconsistent entity linking. We propose LINK-KG, a modular framework that integrates a three-stage, LLM-guided coreference resolution pipeline with downstream KG extraction. At the core of our approach is a type-specific Prompt Cache, which consistently tracks and resolves references across document chunks, enabling clean and disambiguated narratives for structured knowledge graph construction from both short and long legal texts. LINK-KG reduces average node duplication by 45.21% and noisy nodes by 32.22% compared to baseline methods, resulting in cleaner and more coherent graph structures. Human smuggling networks represent highly adaptive and organized systems involving a web of actors, routes, vehicles, and intermediaries, often operating under the radar of restrictive immigration policies [1]. These networks exploit legal loopholes, adjust swiftly to enforcement changes, and frequently intersect with transnational criminal organizations. Effectively analyzing their structure and behavior is critical for informing policy, enhancing security, and preventing exploitation. However, much of the actionable insight remains embedded in lengthy, unstructured legal documents, such as court rulings, field reports, and case transcripts, making automated analysis both essential and challenging.


BioCoref: Benchmarking Biomedical Coreference Resolution with LLMs

Salem, Nourah M, White, Elizabeth, Bada, Michael, Hunter, Lawrence

arXiv.org Artificial Intelligence

Coreference resolution in biomedical texts presents unique challenges due to complex domain-specific terminology, high ambiguity in mention forms, and long-distance dependencies between coreferring expressions. In this work, we present a comprehensive evaluation of generative large language models (LLMs) for coreference resolution in the biomedical domain. Using the CRAFT corpus as our benchmark, we assess the LLMs' performance with four prompting experiments that vary in their use of local, contextual enrichment, and domain-specific cues such as abbreviations and entity dictionaries. We benchmark these approaches against a discriminative span-based encoder, SpanBERT, to compare the efficacy of generative versus discriminative methods. Our results demonstrate that while LLMs exhibit strong surface-level coreference capabilities, especially when supplemented with domain-grounding prompts, their performance remains sensitive to long-range context and mentions ambiguity. Notably, the LLaMA 8B and 17B models show superior precision and F1 scores under entity-augmented prompting, highlighting the potential of lightweight prompt engineering for enhancing LLM utility in biomedical NLP tasks.


Comparing human and LLM proofreading in L2 writing: Impact on lexical and syntactic features

Sung, Hakyung, Csuros, Karla, Sung, Min-Chang

arXiv.org Artificial Intelligence

This study examines the lexical and syntactic interventions of human and LLM proofreading aimed at improving overall intelligibility in identical second language writings, and evaluates the consistency of outcomes across three LLMs (ChatGPT-4o, Llama3.1-8b, Deepseek-r1-8b). Findings show that both human and LLM proofreading enhance bigram lexical features, which may contribute to better coherence and contextual connectedness between adjacent words. However, LLM proofreading exhibits a more generative approach, extensively reworking vocabulary and sentence structures, such as employing more diverse and sophisticated vocabulary and incorporating a greater number of adjective modifiers in noun phrases. The proofreading outcomes are highly consistent in major lexical and syntactic features across the three models.



In the beginning, based on the Up-Down model, we have attempted to implement the Constant Prophet Attention

Neural Information Processing Systems

We thank all the reviewers for the helpful comments. We will revise the paper to address your concerns. R1-Q1: The implementation seems straight-forward and the ablation analysis on the loss function. Thus we kept using L1 norm in the rest of experiments. We will conduct a systematic comparison between various loss functions in the next revision.