If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."
However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …
With the development of computer-aided diagnosis (CAD) and image scanning technology, Whole-slide Image (WSI) scanners are widely used in the field of pathological diagnosis. Therefore, WSI analysis has become the key to modern digital pathology. Since 2004, WSI has been used more and more in CAD. Since machine vision methods are usually based on semi-automatic or fully automatic computers, they are highly efficient and labor-saving. The combination of WSI and CAD technologies for segmentation, classification, and detection helps histopathologists obtain more stable and quantitative analysis results, save labor costs and improve diagnosis objectivity. This paper reviews the methods of WSI analysis based on machine learning. Firstly, the development status of WSI and CAD methods are introduced. Secondly, we discuss publicly available WSI datasets and evaluation metrics for segmentation, classification, and detection tasks. Then, the latest development of machine learning in WSI segmentation, classification, and detection are reviewed continuously. Finally, the existing methods are studied, the applicabilities of the analysis methods are analyzed, and the application prospects of the analysis methods in this field are forecasted.
Recently, graph-based algorithms have drawn much attention because of their impressive success in semi-supervised setups. For better model performance, previous studies learn to transform the topology of the input graph. However, these works only focus on optimizing the original nodes and edges, leaving the direction of augmenting existing data unexplored. In this paper, by simulating the generation process of graph signals, we propose a novel heuristic pre-processing technique, namely ELectoral COllege (ELCO), which automatically expands new nodes and edges to refine the label similarity within a dense subgraph. Substantially enlarging the original training set with high-quality generated labeled data, our framework can effectively benefit downstream models. To justify the generality and practicality of ELCO, we couple it with the popular Graph Convolution Network and Graph Attention Network to perform extensive evaluations on three standard datasets. In all setups tested, our method boosts the average score of base models by a large margin of 4.7 points, as well as consistently outperforms the state-of-the-art. We release our code and data on https://github.com/RingBDStack/ELCO to guarantee reproducibility.
Achieving transparency in black-box deep learning algorithms is still an open challenge. High dimensional features and decisions given by deep neural networks (NN) require new algorithms and methods to expose its mechanisms. Current state-of-the-art NN interpretation methods (e.g. Saliency maps, DeepLIFT, LIME, etc.) focus more on the direct relationship between NN outputs and inputs rather than the NN structure and operations itself. In current deep NN operations, there is uncertainty over the exact role played by neurons with fixed activation functions. In this paper, we achieve partially explainable learning model by symbolically explaining the role of activation functions (AF) under a scalable topology. This is carried out by modeling the AFs as adaptive Gaussian Processes (GP), which sit within a novel scalable NN topology, based on the Kolmogorov-Arnold Superposition Theorem (KST). In this scalable NN architecture, the AFs are generated by GP interpolation between control points and can thus be tuned during the back-propagation procedure via gradient descent. The control points act as the core enabler to both local and global adjustability of AF, where the GP interpolation constrains the intrinsic autocorrelation to avoid over-fitting. We show that there exists a trade-off between the NN's expressive power and interpretation complexity, under linear KST topology scaling. To demonstrate this, we perform a case study on a binary classification dataset of banknote authentication. By quantitatively and qualitatively investigating the mapping relationship between inputs and output, our explainable model can provide interpretation over each of the one-dimensional attributes. These early results suggest that our model has the potential to act as the final interpretation layer for deep neural networks.
Generalization is an important feature of neural network, and there have been many studies on it. Recently, with the development of quantum compu-ting, it brings new opportunities. In this paper, we studied a class of quantum neural network constructed by quantum gate. In this model, we mapped the feature data to a quantum state in Hilbert space firstly, and then implement unitary evolution on it, in the end, we can get the classification result by im-plement measurement on the quantum state. Since all the operations in quan-tum neural networks are unitary, the parameters constitute a hypersphere of Hilbert space. Compared with traditional neural network, the parameter space is flatter. Therefore, it is not easy to fall into local optimum, which means the quantum neural networks have better generalization. In order to validate our proposal, we evaluated our model on three public datasets, the results demonstrated that our model has better generalization than the classical neu-ral network with the same structure.
In this paper, We propose a general Riemannian proximal optimization algorithm with guaranteed convergence to solve Markov decision process (MDP) problems. To model policy functions in MDP, we employ Gaussian mixture model (GMM) and formulate it as a nonconvex optimization problem in the Riemannian space of positive semidefinite matrices. For two given policy functions, we also provide its lower bound on policy improvement by using bounds derived from the Wasserstein distance of GMMs. Preliminary experiments show the efficacy of our proposed Riemannian proximal policy optimization algorithm.
With the information explosion of news articles, personalized news recommendation has become important for users to quickly find news that they are interested in. Existing methods on news recommendation mainly include collaborative filtering methods which rely on direct user-item interactions and content based methods which characterize the content of user reading history. Although these methods have achieved good performances, they still suffer from data sparse problem, since most of them fail to extensively exploit high-order structure information (similar users tend to read similar news articles) in news recommendation systems. In this paper, we propose to build a heterogeneous graph to explicitly model the interactions among users, news and latent topics. The incorporated topic information would help indicate a user's interest and alleviate the sparsity of user-item interactions. Then we take advantage of graph neural networks to learn user and news representations that encode high-order structure information by propagating embeddings over the graph. The learned user embeddings with complete historic user clicks capture the users' long-term interests. We also consider a user's short-term interest using the recent reading history with an attention based LSTM model. Experimental results on real-world datasets show that our proposed model significantly outperforms state-of-the-art methods on news recommendation.
Knowledge Base (KB) completion, which aims to determine missing relation between entities, has raised increasing attention in recent years. Most existing methods either focus on the positional relationship between entity pair and single relation (1-hop path) in semantic space or concentrate on the joint probability of Random Walks on multi-hop paths among entities. However, they do not fully consider the intrinsic relationships of all the links among entities. By observing that the single relation and multi-hop paths between the same entity pair generally contain shared/similar semantic information, this paper proposes a novel method to capture the shared features between them as the basis for inferring missing relations. To capture the shared features jointly, we develop Hierarchical Attention Networks (HANs) to automatically encode the inputs into low-dimensional vectors, and exploit two partial parameter-shared components, one for feature source discrimination and the other for determining missing relations. By joint Adversarial Training (AT) the entire model, our method minimizes the classification error of missing relations, and ensures the source of shared features are difficult to discriminate in the meantime. The AT mechanism encourages our model to extract features that are both discriminative for missing relation prediction and shareable between single relation and multi-hop paths. We extensively evaluate our method on several large-scale KBs for relation completion. Experimental results show that our method consistently outperforms the baseline approaches. In addition, the hierarchical attention mechanism and the feature extractor in our model can be well interpreted and utilized in the related downstream tasks.
Word embedding has been widely used in many natural language processing tasks. In this paper, we focus on learning word embeddings through selective higher-order relationships in sentences to improve the embeddings to be less sensitive to local context and more accurate in capturing semantic compositionality. We present a novel multi-order dependency-based strategy to composite and represent the context under several essential constraints. In order to realize selective learning from the word contexts, we automatically assign the strengths of different dependencies between co-occurred words in the stochastic gradient descent process. We evaluate and analyze our proposed approach using several direct and indirect tasks for word embeddings. Experimental results demonstrate that our embeddings are competitive to or better than state-of-the-art methods and significantly outperform other methods in terms of context stability. The output weights and representations of dependencies obtained in our embedding model conform to most of the linguistic characteristics and are valuable for many downstream tasks.
Text normalization and part-of-speech (POS) tagging for social media data have been investigated recently, however, prior work has treated them separately. In this paper, we propose a joint Viterbi decoding process to determine each token’s POS tag and non-standard token’s correct form at the same time. In order to evaluate our approach, we create two new data sets with POS tag labels and non-standard tokens' correct forms. This is the first data set with such annotation. The experiment results demonstrate the effect of non-standard words on POS tagging, and also show that our proposed methods perform better than the state-of-the-art systems in both POS tagging and normalization
Chinese Pinyin input methods are very important for Chinese language processing. In many cases, users may make typing errors. For example, a user wants to type in "shenme" (什么, meaning "what" in English) but may type in "shenem" instead. Existing Pinyin input methods fail in converting such a Pinyin sequence with errors to the right Chinese words. To solve this problem, we developed an efficient error-tolerant Pinyin input method called "CHIME'' that can handle typing errors. By incorporating state-of-the-art techniques and language-specific features, the method achieves a better performance than state-of-the-art input methods. It can efficiently find relevant words in milliseconds for an input Pinyin sequence.