Goto

Collaborating Authors

 ccd


Description of Corner Cases in Automated Driving: Goals and Challenges

Bogdoll, Daniel, Breitenstein, Jasmin, Heidecker, Florian, Bieshaar, Maarten, Sick, Bernhard, Fingscheidt, Tim, Zöllner, J. Marius

arXiv.org Artificial Intelligence

Scaling the distribution of automated vehicles requires handling various unexpected and possibly dangerous situations, termed corner cases (CC). Since many modules of automated driving systems are based on machine learning (ML), CC are an essential part of the data for their development. However, there is only a limited amount of CC data in large-scale data collections, which makes them challenging in the context of ML. With a better understanding of CC, offline applications, e.g., dataset analysis, and online methods, e.g., improved performance of automated driving systems, can be improved. While there are knowledge-based descriptions and taxonomies for CC, there is little research on machine-interpretable descriptions. In this extended abstract, we will give a brief overview of the challenges and goals of such a description.




CCD: Mitigating Hallucinations in Radiology MLLMs via Clinical Contrastive Decoding

Zhang, Xi, Meng, Zaiqiao, Lever, Jake, Ho, Edmond S. L.

arXiv.org Artificial Intelligence

Multimodal large language models (MLLMs) have recently achieved remarkable progress in radiology by integrating visual perception with natural language understanding. However, they often generate clinically unsupported descriptions, known as medical hallucinations, which pose serious risks in medical applications that demand accuracy and image-grounded outputs. Through empirical analysis, we find that prompt-induced hallucinations remain prevalent in radiology MLLMs, largely due to over-sensitivity to clinical sections. To address this, we introduce Clinical Contrastive Decoding (CCD), a training-free and retrieval-free inference framework that integrates structured clinical signals from task-specific radiology expert models. CCD introduces a dual-stage contrastive mechanism to refine token-level logits during generation, thereby enhancing clinical fidelity without modifying the base MLLM. Experiments on three datasets and multiple models demonstrate that CCD consistently improves overall performance on radiology report generation (RRG). On the MIMIC-CXR dataset, it yields up to a 17% improvement in RadGraph-F1 when applied to state-of-the-art RRG models. Our approach provides a lightweight and generalisable solution for mitigating medical hallucinations, effectively bridging expert models and MLLMs in radiology.


DiaCBT: A Long-Periodic Dialogue Corpus Guided by Cognitive Conceptualization Diagram for CBT-based Psychological Counseling

Zhou, Yougen, Zhou, Ningning, Chen, Qin, Zhou, Jie, Zhou, Aimin, He, Liang

arXiv.org Artificial Intelligence

Psychotherapy reaches only a small fraction of individuals suffering from mental disorders due to social stigma and the limited availability of therapists. Large language models (LLMs), when equipped with professional psychotherapeutic skills, offer a promising solution to expand access to mental health services. However, the lack of psychological conversation datasets presents significant challenges in developing effective psychotherapy-guided conversational agents. In this paper, we construct a long-periodic dialogue corpus for counseling based on cognitive behavioral therapy (CBT). Our curated dataset includes multiple sessions for each counseling and incorporates cognitive conceptualization diagrams (CCDs) to guide client simulation across diverse scenarios. To evaluate the utility of our dataset, we train an in-depth counseling model and present a comprehensive evaluation framework to benchmark it against established psychological criteria for CBT-based counseling. Results demonstrate that DiaCBT effectively enhances LLMs' ability to emulate psychologists with CBT expertise, underscoring its potential for training more professional counseling agents.


A Evaluation Information

Neural Information Processing Systems

To evaluate the change that image corruptions have to face detection systems, we measure the precision of the corrupted images while using the detections from the clean image as ground truth. While this approach obviates the need for real ground truth bounding boxes, it is also a principled measurement strategy for our main research question. Since we are primarily interested in how the system is affected by the corruption, this metric is superior to using real ground truth bounding boxes. This follows because we're interested in isolating the change in a system under a corruption which is exactly what this method measures. To compute precision, we first observe the face detections on each clean image.


When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent Mert Gürbüzbalaban, Pablo A. Parrilo

Neural Information Processing Systems

The coordinate descent (CD) method is a classical optimization algorithm that has seen a revival of interest because of its competitive performance in machine learning applications. A number of recent papers provided convergence rate estimates for their deterministic (cyclic) and randomized variants that differ in the selection of update coordinates. These estimates suggest randomized coordinate descent (RCD) performs better than cyclic coordinate descent (CCD), although numerical experiments do not provide clear justification for this comparison. In this paper, we provide examples and more generally problem classes for which CCD (or CD with any deterministic order) is faster than RCD in terms of asymptotic worst-case convergence. Furthermore, we provide lower and upper bounds on the amount of improvement on the rate of CCD relative to RCD, which depends on the deterministic order used. We also provide a characterization of the best deterministic order (that leads to the maximum improvement in convergence rate) in terms of the combinatorial properties of the Hessian matrix of the objective function.


InfiniPot: Infinite Context Processing on Memory-Constrained LLMs

Kim, Minsoo, Shim, Kyuhong, Choi, Jungwook, Chang, Simyung

arXiv.org Artificial Intelligence

Handling long input contexts remains a significant challenge for Large Language Models (LLMs), particularly in resource-constrained environments such as mobile devices. Our work aims to address this limitation by introducing InfiniPot, a novel KV cache control framework designed to enable pre-trained LLMs to manage extensive sequences within fixed memory constraints efficiently, without requiring additional training. InfiniPot leverages Continual Context Distillation (CCD), an iterative process that compresses and retains essential information through novel importance metrics, effectively maintaining critical data even without access to future context. Our comprehensive evaluations indicate that InfiniPot significantly outperforms models trained for long contexts in various NLP tasks, establishing its efficacy and versatility. This work represents a substantial advancement toward making LLMs applicable to a broader range of real-world scenarios.


Semantic Prototypes: Enhancing Transparency Without Black Boxes

Menis-Mastromichalakis, Orfeas, Filandrianos, Giorgos, Liartis, Jason, Dervakos, Edmund, Stamou, Giorgos

arXiv.org Artificial Intelligence

As machine learning (ML) models and datasets increase in complexity, the demand for methods that enhance explainability and interpretability becomes paramount. Prototypes, by encapsulating essential characteristics within data, offer insights that enable tactical decision-making and enhance transparency. Traditional prototype methods often rely on sub-symbolic raw data and opaque latent spaces, reducing explainability and increasing the risk of misinterpretations. This paper presents a novel framework that utilizes semantic descriptions to define prototypes and provide clear explanations, effectively addressing the shortcomings of conventional methods. Our approach leverages concept-based descriptions to cluster data on the semantic level, ensuring that prototypes not only represent underlying properties intuitively but are also straightforward to interpret. Our method simplifies the interpretative process and effectively bridges the gap between complex data structures and human cognitive processes, thereby enhancing transparency and fostering trust. Our approach outperforms existing widely-used prototype methods in facilitating human understanding and informativeness, as validated through a user survey.


Cooperative Cognitive Dynamic System in UAV Swarms: Reconfigurable Mechanism and Framework

Jia, Ziye, You, Jiahao, Dong, Chao, Wu, Qihui, Zhou, Fuhui, Niyato, Dusit, Han, Zhu

arXiv.org Artificial Intelligence

As the demands for immediate and effective responses increase in both civilian and military domains, the unmanned aerial vehicle (UAV) swarms emerge as effective solutions, in which multiple cooperative UAVs can work together to achieve specific goals. However, how to manage such complex systems to ensure real-time adaptability lack sufficient researches. Hence, in this paper, we propose the cooperative cognitive dynamic system (CCDS), to optimize the management for UAV swarms. CCDS leverages a hierarchical and cooperative control structure that enables real-time data processing and decision. Accordingly, CCDS optimizes the UAV swarm management via dynamic reconfigurability and adaptive intelligent optimization. In addition, CCDS can be integrated with the biomimetic mechanism to efficiently allocate tasks for UAV swarms. Further, the distributed coordination of CCDS ensures reliable and resilient control, thus enhancing the adaptability and robustness. Finally, the potential challenges and future directions are analyzed, to provide insights into managing UAV swarms in dynamic heterogeneous networking.