AITopics | Wang, Zhiwei

Plotting

Wang, Zhiwei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Anchor function: a type of benchmark functions for studying language models

Zhang, Zhongwang, Wang, Zhiwei, Yao, Junjie, Zhou, Zhangchen, Li, Xiaolong, E, Weinan, Xu, Zhi-Qin John

arXiv.org Artificial IntelligenceJan-16-2024

Understanding transformer-based language models is becoming increasingly crucial, particularly as they play pivotal roles in advancing towards artificial general intelligence. However, language model research faces significant challenges, especially for academic research groups with constrained resources. These challenges include complex data structures, unknown target functions, high computational costs and memory requirements, and a lack of interpretability in the inference process, etc. Drawing a parallel to the use of simple models in scientific research, we propose the concept of an anchor function. This is a type of benchmark function designed for studying language models in learning tasks that follow an "anchor-key" pattern. By utilizing the concept of an anchor function, we can construct a series of functions to simulate various language tasks. The anchor function plays a role analogous to that of mice in diabetes research, particularly suitable for academic research. We demonstrate the utility of the anchor function with an example, revealing two basic operations by attention structures in language models: shifting tokens and broadcasting one token from one position to many positions. These operations are also commonly observed in large language models. The anchor function framework, therefore, opens up a series of valuable and accessible research questions for further exploration, especially for theoretical study.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2401.08309

Country:

North America > United States (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine > Therapeutic Area (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dual-view Correlation Hybrid Attention Network for Robust Holistic Mammogram Classification

Wang, Zhiwei, Xian, Junlin, Liu, Kangyi, Li, Xin, Li, Qiang, Yang, Xin

arXiv.org Artificial IntelligenceJun-18-2023

Mammogram image is important for breast cancer screening, and typically obtained in a dual-view form, i.e., cranio-caudal (CC) and mediolateral oblique (MLO), to provide complementary information. However, previous methods mostly learn features from the two views independently, which violates the clinical knowledge and ignores the importance of dual-view correlation. In this paper, we propose a dual-view correlation hybrid attention network (DCHA-Net) for robust holistic mammogram classification. Specifically, DCHA-Net is carefully designed to extract and reinvent deep features for the two views, and meanwhile to maximize the underlying correlations between them. A hybrid attention module, consisting of local relation and non-local attention blocks, is proposed to alleviate the spatial misalignment of the paired views in the correlation maximization. A dual-view correlation loss is introduced to maximize the feature similarity between corresponding strip-like regions with equal distance to the chest wall, motivated by the fact that their features represent the same breast tissues, and thus should be highly-correlated. Experimental results on two public datasets, i.e., INbreast and CBIS-DDSM, demonstrate that DCHA-Net can well preserve and maximize feature correlations across views, and thus outperforms the state-of-the-arts for classifying a whole mammogram as malignant or not.

artificial intelligence, classification, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2306.10676

Country:

Europe > France (0.14)
Asia > China (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Data Science (0.68)

Add feedback

MFAI: A Scalable Bayesian Matrix Factorization Approach to Leveraging Auxiliary Information

Wang, Zhiwei, Zhang, Fa, Zheng, Cong, Hu, Xianghong, Cai, Mingxuan, Yang, Can

arXiv.org Artificial IntelligenceMar-4-2023

In various practical situations, matrix factorization methods suffer from poor data quality, such as high data sparsity and low signal-to-noise ratio (SNR). Here we consider a matrix factorization problem by utilizing auxiliary information, which is massively available in real applications, to overcome the challenges caused by poor data quality. Unlike existing methods that mainly rely on simple linear models to combine auxiliary information with the main data matrix, we propose to integrate gradient boosted trees in the probabilistic matrix factorization framework to effectively leverage auxiliary information (MFAI). Thus, MFAI naturally inherits several salient features of gradient boosted trees, such as the capability of flexibly modeling nonlinear relationships, and robustness to irrelevant features and missing values in auxiliary information. The parameters in MAFI can be automatically determined under the empirical Bayes framework, making it adaptive to the utilization of auxiliary information and immune to overfitting. Moreover, MFAI is computationally efficient and scalable to large-scale datasets by exploiting variational inference. We demonstrate the advantages of MFAI through comprehensive numerical results from simulation studies and real data analysis. Our approach is implemented in the R package mfair available at https://github.com/YangLabHKUST/mfair.

artificial intelligence, information, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2303.02566

Country:

North America > United States (0.46)
North America > Canada (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Media > Film (0.93)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Add feedback

A deep learning-based model reduction (DeePMR) method for simplifying chemical kinetics

Wang, Zhiwei, Zhang, Yaoyu, Zhao, Enhan, Ju, Yiguang, E, Weinan, Xu, Zhi-Qin John, Zhang, Tianhan

arXiv.org Artificial IntelligenceSep-8-2022

A deep learning-based model reduction (DeePMR) method for simplifying chemical kinetics is proposed and validated using high-temperature auto-ignitions, perfectly stirred reactors (PSR), and one-dimensional freely propagating flames of n-heptane/air mixtures. The mechanism reduction is modeled as an optimization problem on Boolean space, where a Boolean vector, each entry corresponding to a species, represents a reduced mechanism. The optimization goal is to minimize the reduced mechanism size given the error tolerance of a group of pre-selected benchmark quantities. The key idea of the DeePMR is to employ a deep neural network (DNN) to formulate the objective function in the optimization problem. In order to explore high dimensional Boolean space efficiently, an iterative DNN-assisted data sampling and DNN training procedure are implemented. The results show that DNN-assistance improves sampling efficiency significantly, selecting only $10^5$ samples out of $10^{34}$ possible samples for DNN to achieve sufficient accuracy. The results demonstrate the capability of the DNN to recognize key species and reasonably predict reduced mechanism performance. The well-trained DNN guarantees the optimal reduced mechanism by solving an inverse optimization problem. By comparing ignition delay times, laminar flame speeds, temperatures in PSRs, the resulting skeletal mechanism has fewer species (45 species) but the same level of accuracy as the skeletal mechanism (56 species) obtained by the Path Flux Analysis (PFA) method. In addition, the skeletal mechanism can be further reduced to 28 species if only considering atmospheric, near-stoichiometric conditions (equivalence ratio between 0.6 and 1.2). The DeePMR provides an innovative way to perform model reduction and demonstrates the great potential of data-driven methods in the combustion area.

artificial intelligence, machine learning, mechanism, (17 more...)

arXiv.org Artificial Intelligence

2201.02025

Country: Asia > China (0.31)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An Upper Limit of Decaying Rate with Respect to Frequency in Deep Neural Network

Luo, Tao, Ma, Zheng, Wang, Zhiwei, Xu, Zhi-Qin John, Zhang, Yaoyu

arXiv.org Artificial IntelligenceAug-21-2022

Deep neural network (DNN) usually learns the target function from low to high frequency, which is called frequency principle or spectral bias. This frequency principle sheds light on a high-frequency curse of DNNs -- difficult to learn high-frequency information. Inspired by the frequency principle, a series of works are devoted to develop algorithms for overcoming the high-frequency curse. A natural question arises: what is the upper limit of the decaying rate w.r.t. frequency when one trains a DNN? In this work, our theory, confirmed by numerical experiments, suggests that there is a critical decaying rate w.r.t. frequency in DNN training. Below the upper limit of the decaying rate, the DNN interpolates the training data by a function with a certain regularity. However, above the upper limit, the DNN interpolates the training data by a trivial function, i.e., a function is only non-zero at training data points. Our results indicate a better way to overcome the high-frequency curse is to design a proper pre-condition approach to shift high-frequency information to low-frequency one, which coincides with several previous developed algorithms for fast learning high-frequency information. More importantly, this work rigorously proves that the high-frequency curse is an intrinsic difficulty of DNNs.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Artificial Intelligence

2105.11675

Country: Asia > China (0.29)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multi-Scale One-Class Recurrent Neural Networks for Discrete Event Sequence Anomaly Detection

Wang, Zhiwei, Chen, Zhengzhang, Ni, Jingchao, Liu, Hui, Chen, Haifeng, Tang, Jiliang

arXiv.org Machine LearningAug-31-2020

Discrete event sequences are ubiquitous, such as an ordered event series of process interactions in Information and Communication Technology systems. Recent years have witnessed increasing efforts in detecting anomalies with discrete-event sequences. However, it still remains an extremely difficult task due to several intrinsic challenges including data imbalance issues, the discrete property of the events, and sequential nature of the data. To address these challenges, in this paper, we propose OC4Seq, a multi-scale one-class recurrent neural network for detecting anomalies in discrete event sequences. Specifically, OC4Seq integrates the anomaly detection objective with recurrent neural networks (RNNs) to embed the discrete event sequences into latent spaces, where anomalies can be easily detected. In addition, given that an anomalous sequence could be caused by either individual events, subsequences of events, or the whole sequence, we design a multi-scale RNN framework to capture different levels of sequential patterns simultaneously. Experimental results on three benchmark datasets show that OC4Seq consistently outperforms various representative baselines by a large margin. Moreover, through both quantitative and qualitative analysis, the importance of capturing multi-scale sequential patterns for event anomaly detection is verified.

deep learning, neural network, sequence, (18 more...)

arXiv.org Machine Learning

2008.13361

Country: North America > United States (0.93)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Chat as Expected: Learning to Manipulate Black-box Neural Dialogue Models

Liu, Haochen, Wang, Zhiwei, Derr, Tyler, Tang, Jiliang

arXiv.org Artificial IntelligenceMay-27-2020

Recently, neural network based dialogue systems have become ubiquitous in our increasingly digitalized society. However, due to their inherent opaqueness, some recently raised concerns about using neural models are starting to be taken seriously. In fact, intentional or unintentional behaviors could lead to a dialogue system to generate inappropriate responses. Thus, in this paper, we investigate whether we can learn to craft input sentences that result in a black-box neural dialogue model being manipulated into having its outputs contain target words or match target sentences. We propose a reinforcement learning based model that can generate such desired inputs automatically. Extensive experiments on a popular well-trained state-of-the-art neural dialogue model show that our method can successfully seek out desired inputs that lead to the target outputs in a considerable portion of cases. Consequently, our work reveals the potential of neural dialogue models to be manipulated, which inspires and opens the door towards developing strategies to defend them.

air transportation, deep learning, dialogue model, (21 more...)

arXiv.org Artificial Intelligence

2005.1317

Country: North America (0.28)

Genre: Research Report (0.50)

Industry: Transportation > Air (0.62)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Automatic Short Answer Grading via Multiway Attention Networks

Liu, Tiaoqiao, Ding, Wenbiao, Wang, Zhiwei, Tang, Jiliang, Huang, Gale Yan, Liu, Zitao

arXiv.org Artificial IntelligenceSep-23-2019

Automatic short answer grading (ASAG), which autonomously score student answers according to reference answers, provides a cost-effective and consistent approach to teaching professionals and can reduce their monotonous and tedious grading workloads. However, ASAG is a very challenging task due to two reasons: (1) student answers are made up of free text which requires a deep semantic understanding; and (2) the questions are usually open-ended and across many domains in K-12 scenarios. In this paper, we propose a generalized end-to-end ASAG learning framework which aims to (1) autonomously extract linguistic information from both student and reference answers; and (2) accurately model the semantic relations between free-text student and reference answers in open-ended domain. The proposed ASAG model is evaluated on a large real-world K-12 dataset and can outperform the state-of-the-art baselines in terms of various evaluation metrics. 1 Introduction Assessing the knowledge acquired by students is one of the most important aspects of the learning process as it provides feedback to help students correct their misunderstanding of knowledge and improves their overall learning performance. Traditionally, the assessing paradigm is often conducted by instructors or teachers.

computer based training, deep learning, educational technology, (21 more...)

arXiv.org Artificial Intelligence

1909.10166

Country:

North America > United States (0.14)
Asia > China (0.14)

Genre: Research Report (0.50)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.69)
Education > Educational Setting > Online (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Deep Knowledge Tracing with Side Information

Wang, Zhiwei, Feng, Xiaoqin, Tang, Jiliang, Huang, Gale Yan, Liu, Zitao

arXiv.org Artificial IntelligenceSep-1-2019

Monitoring student knowledge states or skill acquisition levels known as knowledge tracing, is a fundamental part of intelligent tutoring systems. Despite its inherent challenges, recent deep neural networks based knowledge tracing models have achieved great success, which is largely from models' ability to learn sequential dependencies of questions in student exercise data. However, in addition to sequential information, questions inherently exhibit side relations, which can enrich our understandings about student knowledge states and has great potentials to advance knowledge tracing. Thus, in this paper, we exploit side relations to improve knowledge tracing and design a novel framework DTKS. The experimental results on real education data validate the effectiveness of the proposed framework and demonstrate the importance of side information in knowledge tracing. 1 Introduction Knowledge tracing - where machine monitors students' knowledge states and their skill acquisition levels - is essential for personalized education and a fundamental part of intelligent tutoring systems [15,7,1,12].

computer based training, deep learning, knowledge, (20 more...)

arXiv.org Artificial Intelligence

1909.00372

Country: North America > United States (0.29)

Genre: Research Report (0.65)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

Recommender Systems with Heterogeneous Side Information

Liu, Tianqiao, Wang, Zhiwei, Tang, Jiliang, Yang, Songfan, Huang, Gale Yan, Liu, Zitao

arXiv.org Machine LearningJul-17-2019

In modern recommender systems, both users and items are associated with rich side information, which can help understand users and items. Such information is typically heterogeneous and can be roughly categorized into flat and hierarchical side information. While side information has been proved to be valuable, the majority of existing systems have exploited either only flat side information or only hierarchical side information due to the challenges brought by the heterogeneity. In this paper, we investigate the problem of exploiting heterogeneous side information for recommendations. Specifically, we propose a novel framework jointly captures flat and hierarchical side information with mathematical coherence. We demonstrate the effectiveness of the proposed framework via extensive experiments on various real-world datasets. Empirical results show that our approach is able to lead a significant performance gain over the state-of-the-art methods.

artificial intelligence, information, machine learning, (19 more...)

arXiv.org Machine Learning

1907.08679

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (1.00)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback