Plotting

 Haehn, Daniel


Focus Directions Make Your Language Models Pay More Attention to Relevant Contexts

arXiv.org Artificial Intelligence

Long-context large language models (LLMs) are prone to be distracted by irrelevant contexts. The reason for distraction remains poorly understood. In this paper, we first identify the contextual heads, a special group of attention heads that control the overall attention of the LLM. Then, we demonstrate that distraction arises when contextual heads fail to allocate sufficient attention to relevant contexts and can be mitigated by increasing attention to these contexts. We further identify focus directions, located at the key and query activations of these heads, which enable them to allocate more attention to relevant contexts without explicitly specifying which context is relevant. We comprehensively evaluate the effect of focus direction on various long-context tasks and find out focus directions could help to mitigate the poor task alignment of the long-context LLMs. We believe our findings could promote further research on long-context LLM alignment.


Generalization of CNNs on Relational Reasoning with Bar Charts

arXiv.org Artificial Intelligence

--This paper presents a systematic study of the generalization of convolutional neural networks (CNNs) and humans on relational reasoning tasks with bar charts. We first revisit previous experiments on graphical perception and update the benchmark performance of CNNs. We then test the generalization performance of CNNs on a classic relational reasoning task: estimating bar length ratios in a bar chart, by progressively perturbing the standard visualizations. We further conduct a user study to compare the performance of CNNs and humans. Our results show that CNNs outperform humans only when the training and test data have the same visual encodings. Otherwise, they may perform worse. We also find that CNNs are sensitive to perturbations in various visual encodings, regardless of their relevance to the target bars. Y et, humans are mainly influenced by bar lengths. Our study suggests that robust relational reasoning with visualizations is challenging for CNNs. Improving CNNs' generalization performance may require training them to better recognize task-related visual properties. EEP neural networks, especially convolutional neural networks (CNNs), are increasingly being adopted in the visualization community for many tasks such as visual question answering [33], [34], automatic visualization design [3], and chart captioning [35], [44]. Despite their widespread use, the crucial question of how well these models generalize to previously unseen visualizations remains less explored. Understanding and enhancing this generalization ability is crucial for the real-world deployment of CNNs. Graphical perception [5] refers to the human ability to decode visually encoded quantities in visualizations. It plays a foundational role in understanding the relations between visual elements, such as the bar length ratios in bar charts. Zhenxing Cui is with the School of Computer Science and Technology, Shandong University, China. Lu Chen is with the State Key Lab of CAD&CG, Zhejiang University, China. Y unhai Wang is with the School of Information, Renmin University of China, China. Daniel Haehn is with the College of Science and Mathematics, University of Massachusetts Boston, USA.


Web-based Melanoma Detection

arXiv.org Artificial Intelligence

Melanoma is the most aggressive form of skin cancer, and early detection can significantly increase survival rates and prevent cancer spread. However, developing reliable automated detection techniques is difficult due to the lack of standardized datasets and evaluation methods. This study introduces a unified melanoma classification approach that supports 54 combinations of 11 datasets and 24 state-of-the-art deep learning architectures. It enables a fair comparison of 1,296 experiments and results in a lightweight model deployable to the web-based MeshNet architecture named Mela-D. This approach can run up to 33x faster by reducing parameters 24x to yield an analogous 88.8\% accuracy comparable with ResNet50 on previously unseen images. This allows efficient and accurate melanoma detection in real-world settings that can run on consumer-level hardware.


MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision

arXiv.org Artificial Intelligence

Prior to the deep learning era, shape was commonly used to describe the objects. Nowadays, state-of-the-art (SOTA) algorithms in medical imaging are predominantly diverging from computer vision, where voxel grids, meshes, point clouds, and implicit surface models are used. This is seen from numerous shape-related publications in premier vision conferences as well as the growing popularity of ShapeNet (about 51,300 models) and Princeton ModelNet (127,915 models). For the medical domain, we present a large collection of anatomical shapes (e.g., bones, organs, vessels) and 3D models of surgical instrument, called MedShapeNet, created to facilitate the translation of data-driven vision algorithms to medical applications and to adapt SOTA vision algorithms to medical problems. As a unique feature, we directly model the majority of shapes on the imaging data of real patients. As of today, MedShapeNet includes 23 dataset with more than 100,000 shapes that are paired with annotations (ground truth). Our data is freely accessible via a web interface and a Python application programming interface (API) and can be used for discriminative, reconstructive, and variational benchmarks as well as various applications in virtual, augmented, or mixed reality, and 3D printing. Exemplary, we present use cases in the fields of classification of brain tumors, facial and skull reconstructions, multi-class anatomy completion, education, and 3D printing. In future, we will extend the data and improve the interfaces. The project pages are: https://medshapenet.ikim.nrw/ and https://github.com/Jianningli/medshapenet-feedback


Lesion Search with Self-supervised Learning

arXiv.org Artificial Intelligence

Content-based image retrieval (CBIR) with self-supervised learning (SSL) accelerates clinicians' interpretation of similar images without manual annotations. We develop a CBIR from the contrastive learning SimCLR and incorporate a generalized-mean (GeM) pooling followed by L2 normalization to classify lesion types and retrieve similar images before clinicians' analysis. Results have shown improved performance. We additionally build an open-source application for image analysis and retrieval. The application is easy to integrate, relieving manual efforts and suggesting the potential to support clinicians' everyday activities.


AutoDOViz: Human-Centered Automation for Decision Optimization

arXiv.org Artificial Intelligence

We present AutoDOViz, an interactive user interface for automated decision optimization (AutoDO) using reinforcement learning (RL). Decision optimization (DO) has classically being practiced by dedicated DO researchers where experts need to spend long periods of time fine tuning a solution through trial-and-error. AutoML pipeline search has sought to make it easier for a data scientist to find the best machine learning pipeline by leveraging automation to search and tune the solution. More recently, these advances have been applied to the domain of AutoDO, with a similar goal to find the best reinforcement learning pipeline through algorithm selection and parameter tuning. However, Decision Optimization requires significantly more complex problem specification when compared to an ML problem. AutoDOViz seeks to lower the barrier of entry for data scientists in problem specification for reinforcement learning problems, leverage the benefits of AutoDO algorithms for RL pipeline search and finally, create visualizations and policy insights in order to facilitate the typical interactive nature when communicating problem formulation and solution proposals between DO experts and domain experts. In this paper, we report our findings from semi-structured expert interviews with DO practitioners as well as business consultants, leading to design requirements for human-centered automation for DO with RL. We evaluate a system implementation with data scientists and find that they are significantly more open to engage in DO after using our proposed solution. AutoDOViz further increases trust in RL agent models and makes the automated training and evaluation process more comprehensible. As shown for other automation in ML tasks, we also conclude automation of RL for DO can benefit from user and vice-versa when the interface promotes human-in-the-loop.


FiberStars: Visual Comparison of Diffusion Tractography Data between Multiple Subjects

arXiv.org Artificial Intelligence

Tractography from high-dimensional diffusion magnetic resonance imaging (dMRI) data allows brain's structural connectivity analysis. Recent dMRI studies aim to compare connectivity patterns across thousands of subjects to understand subtle abnormalities in brain's white matter connectivity across disease populations. Besides connectivity differences, researchers are also interested in investigating distributions of biologically sensitive dMRI derived metrics across subject groups. Existing software products focus solely on the anatomy or are not intuitive and restrict the comparison of multiple subjects. In this paper, we present the design and implementation of FiberStars, a visual analysis tool for tractography data that allows the interactive and scalable visualization of brain fiber clusters in 2D and 3D. With FiberStars, researchers can analyze and compare multiple subjects in large collections of brain fibers. To evaluate the usability of our software, we performed a quantitative user study. We asked non-experts to find patterns in a large tractography dataset with either FiberStars or AFQ-Browser, an existing dMRI exploration tool. Our results show that participants using FiberStars can navigate extensive collections of tractography faster and more accurately. We discuss our findings and provide an analysis of the requirements for comparative visualizations of tractography data. All our research, software, and results are available openly.