AITopics

2412.16715

Genre: Research Report (0.82)

Industry:

Health & Medicine > Diagnostic Medicine (0.94)
Health & Medicine > Therapeutic Area > Oncology > Carcinoma (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

arXiv.org Artificial IntelligenceSep-13-2023

A Latent Space Theory for Emergent Abilities in Large Language Models

Jiang, Hui

Languages are not created randomly but rather to communicate information. There is a strong association between languages and their underlying meanings, resulting in a sparse joint distribution that is heavily peaked according to their correlations. Moreover, these peak values happen to match with the marginal distribution of languages due to the sparsity. With the advent of LLMs trained on big data and large models, we can now precisely assess the marginal distribution of languages, providing a convenient means of exploring the sparse structures in the joint distribution for effective inferences. In this paper, we categorize languages as either unambiguous or {\epsilon}-ambiguous and present quantitative results to demonstrate that the emergent abilities of LLMs, such as language understanding, in-context learning, chain-of-thought prompting, and effective instruction fine-tuning, can all be attributed to Bayesian inference on the sparse joint distribution of languages.

large language model, latent space theory, natural language, (2 more...)

2304.0996

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

arXiv.org Artificial IntelligenceJul-14-2023

FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated Learning with Bayesian Inference-Based Adaptive Dropout

Xue, Jingjing, Liu, Min, Sun, Sheng, Wang, Yuwei, Jiang, Hui, Jiang, Xuefeng

Federated Learning (FL) emerges as a distributed machine learning paradigm without end-user data transmission, effectively avoiding privacy leakage. Participating devices in FL are usually bandwidth-constrained, and the uplink is much slower than the downlink in wireless networks, which causes a severe uplink communication bottleneck. A prominent direction to alleviate this problem is federated dropout, which drops fractional weights of local models. However, existing federated dropout studies focus on random or ordered dropout and lack theoretical support, resulting in unguaranteed performance. In this paper, we propose Federated learning with Bayesian Inference-based Adaptive Dropout (FedBIAD), which regards weight rows of local models as probability distributions and adaptively drops partial weight rows based on importance indicators correlated with the trend of local training loss. By applying FedBIAD, each client adaptively selects a high-quality dropping pattern with accurate approximations and only transmits parameters of non-dropped weight rows to mitigate uplink costs while improving accuracy. Theoretical analysis demonstrates that the convergence rate of the average generalization error of FedBIAD is minimax optimal up to a squared logarithmic factor. Extensive experiments on image classification and next-word prediction show that compared with status quo approaches, FedBIAD provides 2x uplink reduction with an accuracy increase of up to 2.41% even on non-Independent and Identically Distributed (non-IID) data, which brings up to 72% decrease in training time.

artificial intelligence, bayesian inference, machine learning, (16 more...)

2307.07172

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

arXiv.org Machine LearningMay-29-2019

Bandlimiting Neural Networks Against Adversarial Attacks

Lin, Yuping, A., Kasra Ahmadi K., Jiang, Hui

In this paper, we study the adversarial attack and defence problem in deep learning from the perspective of Fourier analysis. We first explicitly compute the Fourier transform of deep ReLU neural networks and show that there exist decaying but non-zero high frequency components in the Fourier spectrum of neural networks. We demonstrate that the vulnerability of neural networks towards adversarial samples can be attributed to these insignificant but non-zero high frequency components. Based on this analysis, we propose to use a simple post-averaging technique to smooth out these high frequency components to improve the robustness of neural networks against adversarial attacks. Experimental results on the ImageNet dataset have shown that our proposed method is universally effective to defend many existing adversarial attacking methods proposed in the literature, including FGSM, PGD, DeepFool and C&W attacks. Our post-averaging method is simple since it does not require any re-training, and meanwhile it can successfully defend over 95% of the adversarial samples generated by these methods without introducing any significant performance degradation (less than 1%) on the original clean images.

adversarial sample, deep learning, neural network, (16 more...)

1905.12797

Country: North America > Canada > Ontario (0.14)

Genre: Research Report (0.84)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Machine LearningMay-24-2019

Content based News Recommendation via Shortest Entity Distance over Knowledge Graphs

Joseph, Kevin, Jiang, Hui

Content-based news recommendation systems need to recommend news articles based on the topics and content of articles without using user specific information. Many news articles describe the occurrence of specific events and named entities including people, places or objects. In this paper, we propose a graph traversal algorithm as well as a novel weighting scheme for cold-start content based news recommendation utilizing these named entities. Seeking to create a higher degree of user-specific relevance, our algorithm computes the shortest distance between named entities, across news articles, over a large knowledge graph. Moreover, we have created a new human annotated data set for evaluating content based news recommendation systems. Experimental results show our method is suitable to tackle the hard cold-start problem and it produces stronger Pearson correlation to human similarity scores than other cold-start methods. Our method is also complementary and a combination with the conventional cold-start recommendation methods may yield significant performance gains. The dataset, CNRec, is available at: https://github.com/kevinj22/CNRec

artificial intelligence, recommendation, text processing, (19 more...)

1905.13132

Country: North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

arXiv.org Artificial IntelligenceMar-29-2019

A General FOFE-net Framework for Simple and Effective Question Answering over Knowledge Bases

Wu, Dekun, Nosirova, Nana, Jiang, Hui, Xu, Mingbin

Question answering over knowledge base (KB-QA) has recently become a popular research topic in NLP. One popular way to solve the KB-QA problem is to make use of a pipeline of several NLP modules, including entity discovery and linking (EDL) and relation detection. Recent success on KB-QA task usually involves complex network structures with sophisticated heuristics. Inspired by a previous work that builds a strong KB-QA baseline, we propose a simple but general neural model composed of fixed-size ordinally forgetting encoding (FOFE) and deep neural networks, called FOFE-net to solve KB-QA problem at different stages. For evaluation, we use two popular KB-QA datasets, SimpleQuestions and WebQSP, and a newly created dataset, FreebaseQA. The experimental results show that FOFE-net performs well on KB-QA subtasks, entity discovery and linking (EDL) and relation detection, and in turn pushing overall KB-QA system to achieve strong results on all datasets.

dataset, deep learning, neural network, (19 more...)

1903.12356

Country:

Europe (0.46)
Asia > Philippines (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

arXiv.org Artificial IntelligenceMar-5-2019

Why Learning of Large-Scale Neural Networks Behaves Like Convex Optimization

Jiang, Hui

In this paper, we present some theoretical work to explain why simple gradient descent methods are so successful in solving non-convex optimization problems in learning large-scale neural networks (NN). After introducing a mathematical tool called canonical space, we have proved that the objective functions in learning NNs are convex in the canonical model space. We further elucidate that the gradients between the original NN model space and the canonical space are related by a pointwise linear transformation, which is represented by the so-called disparity matrix. Furthermore, we have proved that gradient descent methods surely converge to a global minimum of zero loss provided that the disparity matrices maintain full rank. If this full-rank condition holds, the learning of NNs behaves in the same way as normal convex optimization. At last, we have shown that the chance to have singular disparity matrices is extremely slim in large NNs. In particular, when over-parameterized NNs are randomly initialized, the gradient decent algorithms converge to a global minimum of zero loss in probability.

artificial intelligence, machine learning, neural network, (14 more...)

1903.0214

Country: North America > Canada > Ontario (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)

arXiv.org Machine LearningFeb-22-2019

Fixed-Size Ordinally Forgetting Encoding Based Word Sense Disambiguation

Zhu, Xi, Xu, Mingbin, Jiang, Hui

In this paper, we present our method of using fixed-size ordinally forgetting encoding (FOFE) to solve the word sense disambiguation (WSD) problem. FOFE enables us to encode variable-length sequence of words into a theoretically unique fixed-size representation that can be fed into a feed forward neural network (FFNN), while keeping the positional information between words. In our method, a FOFE-based FFNN is used to train a pseudo language model over unlabelled corpus, then the pre-trained language model is capable of abstracting the surrounding context of polyseme instances in labelled corpus into context embeddings. Next, we take advantage of these context embeddings towards WSD classification. We conducted experiments on several WSD data sets, which demonstrates that our proposed method can achieve comparable performance to that of the state-of-the-art approach at the expense of much lower computational cost.

deep learning, language model, neural network, (20 more...)

1902.10246

Country: North America > Canada > Ontario (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningJan-19-2019

A New Perspective on Machine Learning: How to do Perfect Supervised Learning

Jiang, Hui

In this work, we introduce the concept of bandlimiting into the theory of machine learning because all physical processes are bandlimited by nature, including real-world machine learning tasks. After the bandlimiting constraint is taken into account, our theoretical analysis has shown that all practical machine learning tasks are asymptotically solvable in a perfect sense. Furthermore, the key towards this solvability almost solely relies on two factors: i) a sufficiently large amount of training samples beyond a threshold determined by a difficulty measurement of the underlying task; ii) a sufficiently complex model that is properly bandlimited. Moreover, for some special cases, we have derived new error bounds for perfect learning, which can quantify the difficulty of learning. These case-specific bounds are much tighter than the uniform bounds in conventional learning theory. Our results have provided a new perspective to explain the recent successes of large-scale supervised learning using complex models like neural networks.

artificial intelligence, inductive learning, neural network, (17 more...)

1901.02046

Country:

North America > United States > New York (0.14)
North America > Canada > Ontario (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.49)

arXiv.org Machine LearningNov-18-2018

DropFilter: A Novel Regularization Method for Learning Convolutional Neural Networks

Pan, Hengyue, Jiang, Hui, Niu, Xin, Dou, Yong

The past few years have witnessed the fast development of different regularization methods for deep learning models such as fully-connected deep neural networks (DNNs) and Convolutional Neural Networks (CNNs). Most of previous methods mainly consider to drop features from input data and hidden layers, such as Dropout, Cutout and DropBlocks. DropConnect select to drop connections between fully-connected layers. By randomly discard some features or connections, the above mentioned methods control the overfitting problem and improve the performance of neural networks. In this paper, we proposed two novel regularization methods, namely DropFilter and DropFilter-PLUS, for the learning of CNNs. Different from the previous methods, DropFilter and DropFilter-PLUS selects to modify the convolution filters. For DropFilter-PLUS, we find a suitable way to accelerate the learning process based on theoretical analysis. Experimental results on MNIST show that using DropFilter and DropFilter-PLUS may improve performance on image classification tasks.

deep learning, neural network, regularization method, (20 more...)

1811.06783

Country:

Asia > China (0.15)
North America > Canada (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)