Goto

Collaborating Authors

 Anguilla


Optimal swimming with body compliance in an overdamped medium

Lin, Jianfeng, Wang, Tianyu, Chong, Baxi, Fernandez, Matthew, Xu, Zhaochen, Goldman, Daniel I.

arXiv.org Artificial Intelligence

Elongate animals and robots use undulatory body waves to locomote through diverse environments. Geometric mechanics provides a framework to model and optimize such systems in highly damped environments, connecting a prescribed shape change pattern (gait) with locomotion displacement. However, the practical applicability of controlling compliant physical robots remains to be demonstrated. In this work, we develop a framework based on geometric mechanics to predict locomotor performance and search for optimal swimming strategies of compliant swimmers. We introduce a compliant extension of Purcell's three-link swimmer by incorporating series-connected springs at the joints. Body dynamics are derived using resistive force theory. Geometric mechanics is incorporated into movement prediction and into an optimization framework that identifies strategies for controlling compliant swimmers to achieve maximal displacement. We validate our framework on a physical cable-driven three-link limbless robot and demonstrate accurate prediction and optimization of locomotor performance under varied programmed, state-dependent compliance in a granular medium. Our results establish a systematic, physics-based approach for modeling and controlling compliant swimming locomotion, highlighting compliance as a design feature that can be exploited for robust movement in both homogeneous and heterogeneous environments.


Evaluating Large Language Models for IUCN Red List Species Information

Uryu, Shinya

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are rapidly being adopted in conservation to address the biodiversity crisis, yet their reliability for species evaluation is uncertain. This study systematically validates five leading models on 21,955 species across four core IUCN Red List assessment components: taxonomy, conservation status, distribution, and threats. A critical paradox was revealed: models excelled at taxonomic classification (94.9%) but consistently failed at conservation reasoning (27.2% for status assessment). This knowledge-reasoning gap, evident across all models, suggests inherent architectural constraints, not just data limitations. Furthermore, models exhibited systematic biases favoring charismatic vertebrates, potentially amplifying existing conservation inequities. These findings delineate clear boundaries for responsible LLM deployment: they are powerful tools for information retrieval but require human oversight for judgment-based decisions. A hybrid approach is recommended, where LLMs augment expert capacity while human experts retain sole authority over risk assessment and policy.


Explainable Fraud Detection with GNNExplainer and Shapley Values

Dao, Ngoc Hieu

arXiv.org Artificial Intelligence

The risk of financial fraud is increasing as digital payments are used more and more frequently. Although the use of artificial intelligence systems for fraud detection is widespread, society and regulators have raised the standards for these systems' transparency for reliability verification purposes. To increase their effectiveness in conducting fraud investigations, fraud analysts also profit from having concise and understandable explanations. To solve these challenges, the paper will concentrate on developing an explainable fraud detector.


WikiVideo: Article Generation from Multiple Videos

Martin, Alexander, Kriz, Reno, Walden, William Gantt, Sanders, Kate, Recknor, Hannah, Yang, Eugene, Ferraro, Francis, Van Durme, Benjamin

arXiv.org Artificial Intelligence

We present the challenging task of automatically creating a high-level Wikipedia-style article that aggregates information from multiple diverse videos about real-world events, such as natural disasters or political elections. Videos are intuitive sources for retrieval-augmented generation (RAG), but most contemporary RAG workflows focus heavily on text and existing methods for video-based summarization focus on low-level scene understanding rather than high-level event semantics. To close this gap, we introduce WikiVideo, a benchmark consisting of expert-written articles and densely annotated videos that provide evidence for articles' claims, facilitating the integration of video into RAG pipelines and enabling the creation of in-depth content that is grounded in multimodal sources. We further propose Collaborative Article Generation (CAG), a novel interactive method for article creation from multiple videos. CAG leverages an iterative interaction between an r1-style reasoning model and a VideoLLM to draw higher level inferences about the target event than is possible with VideoLLMs alone, which fixate on low-level visual features. We benchmark state-of-the-art VideoLLMs and CAG in both oracle retrieval and RAG settings and find that CAG consistently outperforms alternative methods, while suggesting intriguing avenues for future work.


Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding

Cho, Sukmin, Choi, Sangjin, Hwang, Taeho, Seo, Jeongyeon, Jeong, Soyeong, Lee, Huije, Song, Hoyun, Park, Jong C., Kwon, Youngjin

arXiv.org Artificial Intelligence

Accelerating inference in Large Language Models (LLMs) is critical for real-time interactions, as they have been widely incorporated into real-world services. Speculative decoding, a fully algorithmic solution, has gained attention for improving inference speed by drafting and verifying tokens, thereby generating multiple tokens in a single forward pass. However, current drafting strategies usually require significant fine-tuning or have inconsistent performance across tasks. To address these challenges, we propose Hierarchy Drafting (HD), a novel lossless drafting approach that organizes various token sources into multiple databases in a hierarchical framework based on temporal locality. In the drafting step, HD sequentially accesses multiple databases to obtain draft tokens from the highest to the lowest locality, ensuring consistent acceleration across diverse tasks and minimizing drafting latency. Our experiments on Spec-Bench using LLMs with 7B and 13B parameters demonstrate that HD outperforms existing database drafting methods, achieving robust inference speedups across model sizes, tasks, and temperatures.


MINTQA: A Multi-Hop Question Answering Benchmark for Evaluating LLMs on New and Tail Knowledge

He, Jie, Hu, Nan, Long, Wanqiu, Chen, Jiaoyan, Pan, Jeff Z.

arXiv.org Artificial Intelligence

Large language models (LLMs) have demonstrated impressive capabilities in various reasoning tasks but face significant challenges with complex, knowledge-intensive multi-hop queries, particularly those involving new or long-tail knowledge. Existing benchmarks often fail to fully address these challenges. To bridge this gap, we introduce MINTQA (Multi-hop Question Answering on New and Tail Knowledge), a comprehensive benchmark to evaluate LLMs' capabilities in multi-hop reasoning across four critical dimensions: question handling strategy, sub-question generation, retrieval-augmented generation, and iterative or dynamic decomposition and retrieval. MINTQA comprises 10,479 question-answer pairs for evaluating new knowledge and 17,887 pairs for assessing long-tail knowledge, with each question equipped with corresponding sub-questions and answers. Our systematic evaluation of 22 state-of-the-art LLMs on MINTQA reveals significant limitations in their ability to handle complex knowledge base queries, particularly in handling new or unpopular knowledge. Our findings highlight critical challenges and offer insights for advancing multi-hop reasoning capabilities. The MINTQA benchmark is available at https://github.com/probe2/multi-hop/.


WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines

Winata, Genta Indra, Hudi, Frederikus, Irawan, Patrick Amadeus, Anugraha, David, Putri, Rifki Afina, Wang, Yutong, Nohejl, Adam, Prathama, Ubaidillah Ariq, Ousidhoum, Nedjma, Amriani, Afifa, Rzayev, Anar, Das, Anirban, Pramodya, Ashmari, Adila, Aulia, Wilie, Bryan, Mawalim, Candy Olivia, Cheng, Ching Lam, Abolade, Daud, Chersoni, Emmanuele, Santus, Enrico, Ikhwantri, Fariz, Kuwanto, Garry, Zhao, Hanyang, Wibowo, Haryo Akbarianto, Lovenia, Holy, Cruz, Jan Christian Blaise, Putra, Jan Wira Gotama, Myung, Junho, Susanto, Lucky, Machin, Maria Angelica Riera, Zhukova, Marina, Anugraha, Michael, Adilazuarda, Muhammad Farid, Santosa, Natasha, Limkonchotiwat, Peerat, Dabre, Raj, Audino, Rio Alexander, Cahyawijaya, Samuel, Zhang, Shi-Xiong, Salim, Stephanie Yulia, Zhou, Yi, Gui, Yinxuan, Adelani, David Ifeoluwa, Lee, En-Shiun Annie, Okada, Shogo, Purwarianti, Ayu, Aji, Alham Fikri, Watanabe, Taro, Wijaya, Derry Tanti, Oh, Alice, Ngo, Chong-Wah

arXiv.org Artificial Intelligence

Vision Language Models (VLMs) often struggle with culture-specific knowledge, particularly in languages other than English and in underrepresented cultural contexts. To evaluate their understanding of such knowledge, we introduce WorldCuisines, a massive-scale benchmark for multilingual and multicultural, visually grounded language understanding. This benchmark includes a visual question answering (VQA) dataset with text-image pairs across 30 languages and dialects, spanning 9 language families and featuring over 1 million data points, making it the largest multicultural VQA benchmark to date. It includes tasks for identifying dish names and their origins. We provide evaluation datasets in two sizes (12k and 60k instances) alongside a training dataset (1 million instances). Our findings show that while VLMs perform better with correct location context, they struggle with adversarial contexts and predicting specific regional cuisines and languages. To support future research, we release a knowledge base with annotated food entries and images along with the VQA data.


Grounding Natural Language to SQL Translation with Data-Based Self-Explanations

Fan, Yuankai, Ren, Tonghui, Huang, Can, He, Zhenying, Wang, X. Sean

arXiv.org Artificial Intelligence

Natural Language Interfaces for Databases empower non-technical users to interact with data using natural language (NL). Advanced approaches, utilizing either neural sequence-to-sequence or more recent sophisticated large-scale language models, typically implement NL to SQL (NL2SQL) translation in an end-to-end fashion. However, like humans, these end-to-end translation models may not always generate the best SQL output on their first try. In this paper, we propose CycleSQL, an iterative framework designed for end-to-end translation models to autonomously generate the best output through self-evaluation. The main idea of CycleSQL is to introduce data-grounded NL explanations of query results as self-provided feedback, and use the feedback to validate the correctness of the translation iteratively, hence improving the overall translation accuracy. Extensive experiments, including quantitative and qualitative evaluations, are conducted to study CycleSQL by applying it to seven existing translation models on five widely used benchmarks. The results show that 1) the feedback loop introduced in CycleSQL can consistently improve the performance of existing models, and in particular, by applying CycleSQL to RESDSQL, obtains a translation accuracy of 82.0% (+2.6%) on the validation set, and 81.6% (+3.2%) on the test set of Spider benchmark; 2) the generated NL explanations can also provide insightful information for users, aiding in the comprehension of translation results and consequently enhancing the interpretability of NL2SQL translation.


VLEIBot: A New 45-mg Swimming Microrobot Driven by a Bioinspired Anguilliform Propulsor

Blankenship, Elijah K., Trygstad, Conor K., Gonçalves, Francisco M. F. R., Pérez-Arancibia, Néstor O.

arXiv.org Artificial Intelligence

This paper presents the VLEIBot^* (Very Little Eel-Inspired roBot), a 45-mg/23-mm^3 microrobotic swimmer that is propelled by a bioinspired anguilliform propulsor. The propulsor is excited by a single 6-mg high-work-density (HWD) microactuator and undulates periodically due to wave propagation phenomena generated by fluid-structure interaction (FSI) during swimming. The microactuator is composed of a carbon-fiber beam, which functions as a leaf spring, and shape-memory alloy (SMA) wires, which deform cyclically when excited periodically using Joule heating. The VLEIBot can swim at speeds as high as 15.1mm * s^{-1} (0.33 Bl * s^{-1}}) when driven with a heuristically-optimized propulsor. To improve maneuverability, we evolved the VLEIBot design into the 90-mg/47-mm^3 VLEIBot^+, which is driven by two propulsors and fully controllable in the two-dimensional (2D) space. The VLEIBot^+ can swim at speeds as high as 16.1mm * s^{-1} (0.35 Bl * s^{-1}), when driven with heuristically-optimized propulsors, and achieves turning rates as high as 0.28 rad * s^{-1}, when tracking path references. The measured root-mean-square (RMS) values of the tracking errors are as low as 4 mm.


Morphing median fin enhances untethered bionic robotic tuna's linear acceleration and turning maneuverability

Huang, Hongbin, Lin, Zhonglu, Zheng, Wei, Zhang, Jinhu, Liu, Zhibin, Zhou, Wei, Zhang, Yu

arXiv.org Artificial Intelligence

Median fins of fish-like swimmers play a crucial role in linear acceleration and maneuvering processes. However, few research focused on untethered robotic fish experiments. Imitating the behaviour of real tuna, we developed a free-swimming bionic tuna with a foldable dorsal fin. The erection of dorsal fin, at proper conditions, can reduce head heave by 50%, enhance linear acceleration by 15.7%, increase turning angular velocity by 32.78%, and turning radius decreasing by 33.13%. Conversely, erecting the dorsal fin increases the wetted surface area, resulting in decreased maximum speed and efficiency during steady swimming phase. This finding partially explains why tuna erect their median fins during maneuvers or acceleration and fold them afterward to reduce drag. In addition, we verified that folding the median fins after acceleration does not significantly affect locomotion efficiency. This study supports the application of morphing median fins in undulating underwater robots and helps to further understand the impact of median fins on fish locomotion.

  Country:
  Genre: Research Report (1.00)
  Industry: Energy (0.68)