AITopics

2410.02268

Country:

Asia > China (0.28)
North America > United States > Wisconsin (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceOct-9-2023

Foundation Models Meet Visualizations: Challenges and Opportunities

Yang, Weikai, Liu, Mengchen, Wang, Zheng, Liu, Shixia

Recent studies have indicated that foundation models, such as BERT and GPT, excel in adapting to a variety of downstream tasks. This adaptability has established them as the dominant force in building artificial intelligence (AI) systems. As visualization techniques intersect with these models, a new research paradigm emerges. This paper divides these intersections into two main areas: visualizations for foundation models (VIS4FM) and foundation models for visualizations (FM4VIS). In VIS4FM, we explore the primary role of visualizations in understanding, refining, and evaluating these intricate models. This addresses the pressing need for transparency, explainability, fairness, and robustness. Conversely, within FM4VIS, we highlight how foundation models can be utilized to advance the visualization field itself. The confluence of foundation models and visualizations holds great promise, but it also comes with its own set of challenges. By highlighting these challenges and the growing opportunities, this paper seeks to provide a starting point for continued exploration in this promising avenue.

data mining, large language model, machine learning, (19 more...)

2310.05771

Country: Asia > China (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine (0.68)
Information Technology (0.46)
Education > Educational Setting (0.46)

Technology:

Information Technology > Visualization (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Human Computer Interaction > Interfaces (1.00)
(6 more...)

arXiv.org Artificial IntelligenceJul-15-2023

Visual Analytics For Machine Learning: A Data Perspective Survey

Wang, Junpeng, Liu, Shixia, Zhang, Wei

The past decade has witnessed a plethora of works that leverage the power of visualization (VIS) to interpret machine learning (ML) models. The corresponding research topic, VIS4ML, keeps growing at a fast pace. To better organize the enormous works and shed light on the developing trend of VIS4ML, we provide a systematic review of these works through this survey. Since data quality greatly impacts the performance of ML models, our survey focuses specifically on summarizing VIS4ML works from the data perspective. First, we categorize the common data handled by ML models into five types, explain the unique features of each type, and highlight the corresponding ML models that are good at learning from them. Second, from the large number of VIS4ML works, we tease out six tasks that operate on these types of data (i.e., data-centric tasks) at different stages of the ML pipeline to understand, diagnose, and refine ML models. Lastly, by studying the distribution of 143 surveyed papers across the five data types, six data-centric tasks, and their intersections, we analyze the prospective research directions and envision future research trends.

artificial intelligence, comput, machine learning, (17 more...)

2307.07712

Country:

North America > United States (0.28)
Asia > China (0.28)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.48)

Industry:

Health & Medicine (1.00)
Leisure & Entertainment > Games > Computer Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceFeb-18-2023

Visual Analysis of Discrimination in Machine Learning

Wang, Qianwen, Xu, Zhenhua, Chen, Zhutian, Wang, Yong, Liu, Shixia, Qu, Huamin

The growing use of automated decision-making in critical applications, such as crime prediction and college admission, has raised questions about fairness in machine learning. How can we decide whether different treatments are reasonable or discriminatory? In this paper, we investigate discrimination in machine learning from a visual analytics perspective and propose an interactive visualization tool, DiscriLens, to support a more comprehensive analysis. To reveal detailed information on algorithmic discrimination, DiscriLens identifies a collection of potentially discriminatory itemsets based on causal modeling and classification rules mining. By combining an extended Euler diagram with a matrix-based visualization, we develop a novel set visualization to facilitate the exploration and interpretation of discriminatory itemsets. A user study shows that users can interpret the visually encoded information in DiscriLens quickly and accurately. Use cases demonstrate that DiscriLens provides informative guidance in understanding and reducing algorithmic discrimination.

artificial intelligence, discrimination, machine learning, (17 more...)

doi: 10.1109/TVCG.2020.3030471

2007.15182

Country:

North America > Canada (0.68)
Europe > United Kingdom (0.67)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.93)
Research Report > Experimental Study (0.93)

Industry:

Banking & Finance (0.93)
Law > Labor & Employment Law (0.46)
Law > Civil Rights & Constitutional Law (0.46)
Education > Educational Setting (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

arXiv.org Machine LearningSep-21-2020

Interactive Steering of Hierarchical Clustering

Yang, Weikai, Wang, Xiting, Lu, Jie, Dou, Wenwen, Liu, Shixia

Hierarchical clustering is an important technique to organize big data for exploratory data analysis. However, existing one-size-fits-all hierarchical clustering methods often fail to meet the diverse needs of different users. To address this challenge, we present an interactive steering method to visually supervise constrained hierarchical clustering by utilizing both public knowledge (e.g., Wikipedia) and private knowledge from users. The novelty of our approach includes 1) automatically constructing constraints for hierarchical clustering using knowledge (knowledge-driven) and intrinsic data distribution (data-driven), and 2) enabling the interactive steering of clustering through a visual interface (user-driven). Our method first maps each data item to the most relevant items in a knowledge base. An initial constraint tree is then extracted using the ant colony optimization algorithm. The algorithm balances the tree width and depth and covers the data items with high confidence. Given the constraint tree, the data items are hierarchically clustered using evolutionary Bayesian rose tree. To clearly convey the hierarchical clustering results, an uncertainty-aware tree visualization has been developed to enable users to quickly locate the most uncertain sub-hierarchies and interactively improve them. The quantitative evaluation and case study demonstrate that the proposed approach facilitates the building of customized clustering trees in an efficient and effective manner.

artificial intelligence, health & medicine, hierarchy, (16 more...)

doi: 10.1109/TVCG.2020.2995100

2009.09618

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry:

Education > Educational Setting > Higher Education (0.67)
Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningSep-15-2020

Diagnosing Concept Drift with Visual Analytics

Yang, Weikai, Li, Zhen, Liu, Mengchen, Lu, Yafeng, Cao, Kelei, Maciejewski, Ross, Liu, Shixia

Concept drift is a phenomenon in which the distribution of a data stream changes over time in unforeseen ways, causing prediction models built on historical data to become inaccurate. While a variety of automated methods have been developed to identify when concept drift occurs, there is limited support for analysts who need to understand and correct their models when drift is detected. In this paper, we present a visual analytics method, DriftVis, to support model builders and analysts in the identification and correction of concept drift in streaming data. DriftVis combines a distribution-based drift detection method with a streaming scatterplot to support the analysis of drift caused by the distribution changes of data streams and to explore the impact of these changes on the model's accuracy. A quantitative experiment and two case studies on weather prediction and text classification have been conducted to demonstrate our proposed tool and illustrate how visual analytics can be used to support the detection, examination, and correction of concept drift.

comp, deep learning, neural network, (18 more...)

2007.14372

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.67)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

arXiv.org Machine LearningNov-11-2018

Recent Research Advances on Interactive Machine Learning

Jiang, Liu, Liu, Shixia, Chen, Changjian

Interactive Machine Learning (IML) is an iterative learning process that tightly couples a human with a machine learner, which is widely used by researchers and practitioners to effectively solve a wide variety of real-world application problems. Although recent years have witnessed the proliferation of IML in the field of visual analytics, most recent surveys either focus on a specific area of IML or aim to summarize a visualization field that is too generic for IML. In this paper, we systematically review the recent literature on IML and classify them into a task-oriented taxonomy built by us. We conclude the survey with a discussion of open challenges and research opportunities that we believe are inspiring for future work in IML.

deep learning, ieee transaction, neural network, (19 more...)

1811.04548

Genre:

Overview (1.00)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine (0.93)
Transportation > Infrastructure & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

arXiv.org Machine LearningOct-9-2018

Analyzing the Noise Robustness of Deep Neural Networks

Liu, Mengchen, Liu, Shixia, Su, Hang, Cao, Kelei, Zhu, Jun

The root cause is that the CNN cannot detect panda's ears in the adversarial examples (F As a result, these adversarial examples are misclassified: (a) input images; (b) datapath visualization at the layer and feature map levels; (c) neuron visualization. Deep neural networks (DNNs) are vulnerable to maliciously generated adversarial examples. These examples are intentionally designed by making imperceptible perturbations and often mislead a DNN into making an incorrect prediction. This phenomenon means that there is significant risk in applying DNNs to safety-critical applications, such as driverless cars. To address this issue, we present a visual analytics approach to explain the primary cause of the wrong predictions introduced by adversarial examples. The key is to analyze the datapaths of the adversarial examples and compare them with those of the normal examples. A datapath is a group of critical neurons and their connections. To this end, we formulate the datapath extraction as a subset selection problem and approximately solve it based on back-propagation. A multilevel visualization consisting of a segmented DAG (layer level), an Euler diagram (feature map level), and a heat map (neuron level), has been designed to help experts investigate datapaths from the high-level layers to the detailed neuron activations. Two case studies are conducted that demonstrate the promise of our approach in support of explaining the working mechanism of adversarial examples. S. Liu is the corresponding author. Deep neural networks (DNNs) have evolved to become state-ofthe-art in a torrent of artificial intelligence applications, such as image classification and language translation [26, 29, 59, 60]. However, researchers have recently found that DNNs are generally vulnerable to maliciously generated adversarial examples, which are intentionally designed to mislead a DNN into making incorrect predictions [34, 37, 53, 63]. This phenomenon brings high risk in applying DNNs to safety-and security-critical applications, such as driverless cars, facial recognition ATMs, and Face ID security on mobile phones [1]. Hence, there is a growing need to understand the inner workings of adversarial examples and identify the root cause of the incorrect predictions. There are two technical challenges to understanding and analyzing adversarial examples, which are derived from the discussions with machine learning experts (Sec. The first challenge is how to extract the datapath for adversarial examples.

adversarial example, deep learning, neural network, (20 more...)

1810.03913

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation (0.95)
Information Technology (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningApr-7-2018

Visual Analytics for Explainable Deep Learning

Choo, Jaegul, Liu, Shixia

Recently, deep learning has been advancing the state of the art in artificial intelligence to a new level, and humans rely on artificial intelligence techniques more than ever. However, even with such unprecedented advancements, the lack of explanation regarding the decisions made by deep learning models and absence of control over their internal processes act as major drawbacks in critical decision-making processes, such as precision medicine and law enforcement. In response, efforts are being made to make deep learning interpretable and controllable by humans. In this paper, we review visual analytics, information visualization, and machine learning perspectives relevant to this aim, and discuss potential challenges and future research directions.

deep learning, neural network, survey article, (20 more...)

1804.02527

Country: North America > United States (0.46)

Genre:

Overview (0.66)
Research Report (0.50)

Industry:

Government (0.47)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningFeb-22-2017

Scalable Inference for Nested Chinese Restaurant Process Topic Models

Chen, Jianfei, Zhu, Jun, Lu, Jie, Liu, Shixia

Nested Chinese Restaurant Process (nCRP) topic models are powerful nonparametric Bayesian methods to extract a topic hierarchy from a given text corpus, where the hierarchical structure is automatically determined by the data. Hierarchical Latent Dirichlet Allocation (hLDA) is a popular instance of nCRP topic models. However, hLDA has only been evaluated at small scale, because the existing collapsed Gibbs sampling and instantiated weight variational inference algorithms either are not scalable or sacrifice inference quality with mean-field assumptions. Moreover, an efficient distributed implementation of the data structures, such as dynamically growing count matrices and trees, is challenging. In this paper, we propose a novel partially collapsed Gibbs sampling (PCGS) algorithm, which combines the advantages of collapsed and instantiated weight algorithms to achieve good scalability as well as high model quality. An initialization strategy is presented to further improve the model quality. Finally, we propose an efficient distributed implementation of PCGS through vectorization, pre-processing, and a careful design of the concurrent data structures and communication strategy. Empirical studies show that our algorithm is 111 times more efficient than the previous open-source implementation for hLDA, with comparable or even better model quality. Our distributed implementation can extract 1,722 topics from a 131-million-document corpus with 28 billion tokens, which is 4-5 orders of magnitude larger than the previous largest corpus, with 50 machines in 7 hours.

algorithm, artificial intelligence, bayesian inference, (18 more...)

1702.07083

Country:

Europe (1.00)
North America > Canada > Nova Scotia (0.15)
Asia > India > Rajasthan (0.14)

Genre: Research Report (1.00)

Industry: Consumer Products & Services > Restaurants (0.71)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)