AITopics

2310.13068

Country:

Europe > Belgium (0.14)
Asia > China (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

arXiv.org Artificial IntelligenceOct-18-2023

GRI: Graph-based Relative Isomorphism of Word Embedding Spaces

Ali, Muhammad Asif, Hu, Yan, Qin, Jianbin, Wang, Di

Automated construction of bilingual dictionaries using monolingual embedding spaces is a core challenge in machine translation. The end performance of these dictionaries relies upon the geometric similarity of individual spaces, i.e., their degree of isomorphism. Existing attempts aimed at controlling the relative isomorphism of different spaces fail to incorporate the impact of semantically related words in the training objective. To address this, we propose GRI that combines the distributional training objectives with attentive graph convolutions to unanimously consider the impact of semantically similar words required to define/compute the relative isomorphism of multiple spaces. Experimental evaluation shows that GRI outperforms the existing research by improving the average P@1 by a relative score of up to 63.6%. We release the codes for GRI at https://github.com/asif6827/GRI.

artificial intelligence, graph-based relative isomorphism, natural language, (2 more...)

2310.1236

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.87)

arXiv.org Artificial IntelligenceOct-17-2023

Reinforcement Learning-based Non-Autoregressive Solver for Traveling Salesman Problems

Xiao, Yubin, Wang, Di, Li, Boyang, Chen, Huanhuan, Pang, Wei, Wu, Xuan, Li, Hao, Xu, Dong, Liang, Yanchun, Zhou, You

The Traveling Salesman Problem (TSP) is a well-known combinatorial optimization problem with broad real-world applications. Recently, neural networks have gained popularity in this research area because they provide strong heuristic solutions to TSPs. Compared to autoregressive neural approaches, non-autoregressive (NAR) networks exploit the inference parallelism to elevate inference speed but suffer from comparatively low solution quality. In this paper, we propose a novel NAR model named NAR4TSP, which incorporates a specially designed architecture and an enhanced reinforcement learning strategy. To the best of our knowledge, NAR4TSP is the first TSP solver that successfully combines RL and NAR networks. The key lies in the incorporation of NAR network output decoding into the training process. NAR4TSP efficiently represents TSP encoded information as rewards and seamlessly integrates it into reinforcement learning strategies, while maintaining consistent TSP sequence constraints during both training and testing phases. Experimental results on both synthetic and real-world TSP instances demonstrate that NAR4TSP outperforms four state-of-the-art models in terms of solution quality, inference speed, and generalization to unseen scenarios.

artificial intelligence, machine learning, reinforcement learning-based non-autoregressive solver, (1 more...)

2308.0056

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.80)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.60)

arXiv.org Artificial IntelligenceOct-15-2023

Worst-Case Analysis is Maximum-A-Posteriori Estimation

Wu, Hongjun, Wang, Di

The worst-case resource usage of a program can provide useful information for many software-engineering tasks, such as performance optimization and algorithmic-complexity-vulnerability discovery. This paper presents a generic, adaptive, and sound fuzzing framework, called DSE-SMC, for estimating worst-case resource usage. DSE-SMC is generic because it is black-box as long as the user provides an interface for retrieving resource-usage information on a given input; adaptive because it automatically balances between exploration and exploitation of candidate inputs; and sound because it is guaranteed to converge to the true resource-usage distribution of the analyzed program. DSE-SMC is built upon a key observation: resource accumulation in a program is isomorphic to the soft-conditioning mechanism in Bayesian probabilistic programming; thus, worst-case resource analysis is isomorphic to the maximum-a-posteriori-estimation problem of Bayesian statistics. DSE-SMC incorporates sequential Monte Carlo (SMC) -- a generic framework for Bayesian inference -- with adaptive evolutionary fuzzing algorithms, in a sound manner, i.e., DSE-SMC asymptotically converges to the posterior distribution induced by resource-usage behavior of the analyzed program. Experimental evaluation on Java applications demonstrates that DSE-SMC is significantly more effective than existing black-box fuzzing methods for worst-case analysis.

evolutionary algorithm, machine learning, programming language, (17 more...)

2310.09774

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.46)
Information Technology (0.46)
Energy (0.34)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
(2 more...)

arXiv.org Machine LearningOct-12-2023

Differentially Private Non-convex Learning for Multi-layer Neural Networks

Shen, Hanpu, Wang, Cheng-Long, Xiang, Zihang, Ying, Yiming, Wang, Di

This paper focuses on the problem of Differentially Private Stochastic Optimization for (multi-layer) fully connected neural networks with a single output node. In the first part, we examine cases with no hidden nodes, specifically focusing on Generalized Linear Models (GLMs). We investigate the well-specific model where the random noise possesses a zero mean, and the link function is both bounded and Lipschitz continuous. We propose several algorithms and our analysis demonstrates the feasibility of achieving an excess population risk that remains invariant to the data dimension. We also delve into the scenario involving the ReLU link function, and our findings mirror those of the bounded link function. We conclude this section by contrasting well-specified and misspecified models, using ReLU regression as a representative example. In the second part of the paper, we extend our ideas to two-layer neural networks with sigmoid or ReLU activation functions in the well-specified model. In the third part, we study the theoretical guarantees of DP-SGD in Abadi et al. (2016) for fully connected multi-layer neural networks. By utilizing recent advances in Neural Tangent Kernel theory, we provide the first excess population risk when both the sample size and the width of the network are sufficiently large. Additionally, we discuss the role of some parameters in DP-SGD regarding their utility, both theoretically and empirically.

artificial intelligence, machine learning, neural network, (18 more...)

2310.08425

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceOct-11-2023

Improved Analysis of Sparse Linear Regression in Local Differential Privacy Model

Zhu, Liyang, Ding, Meng, Aggarwal, Vaneet, Xu, Jinhui, Wang, Di

In this paper, we revisit the problem of sparse linear regression in the local differential privacy (LDP) model. Existing research in the non-interactive and sequentially local models has focused on obtaining the lower bounds for the case where the underlying parameter is $1$-sparse, and extending such bounds to the more general $k$-sparse case has proven to be challenging. Moreover, it is unclear whether efficient non-interactive LDP (NLDP) algorithms exist. To address these issues, we first consider the problem in the $\epsilon$ non-interactive LDP model and provide a lower bound of $\Omega(\frac{\sqrt{dk\log d}}{\sqrt{n}\epsilon})$ on the $\ell_2$-norm estimation error for sub-Gaussian data, where $n$ is the sample size and $d$ is the dimension of the space. We propose an innovative NLDP algorithm, the very first of its kind for the problem. As a remarkable outcome, this algorithm also yields a novel and highly efficient estimator as a valuable by-product. Our algorithm achieves an upper bound of $\tilde{O}({\frac{d\sqrt{k}}{\sqrt{n}\epsilon}})$ for the estimation error when the data is sub-Gaussian, which can be further improved by a factor of $O(\sqrt{d})$ if the server has additional public but unlabeled data. For the sequentially interactive LDP model, we show a similar lower bound of $\Omega({\frac{\sqrt{dk}}{\sqrt{n}\epsilon}})$. As for the upper bound, we rectify a previous method and show that it is possible to achieve a bound of $\tilde{O}(\frac{k\sqrt{d}}{\sqrt{n}\epsilon})$. Our findings reveal fundamental differences between the non-private case, central DP model, and local DP model in the sparse linear regression problem.

artificial intelligence, local differential privacy model, machine learning, (2 more...)

2310.07367

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.80)

arXiv.org Machine LearningOct-9-2023

Theoretical Analysis of Robust Overfitting for Wide DNNs: An NTK Approach

Fu, Shaopeng, Wang, Di

Adversarial training (AT) is a canonical method for enhancing the robustness of deep neural networks (DNNs). However, recent studies empirically demonstrated that it suffers from robust overfitting, i.e., a long time AT can be detrimental to the robustness of DNNs. This paper presents a theoretical explanation of robust overfitting for DNNs. Specifically, we non-trivially extend the neural tangent kernel (NTK) theory to AT and prove that an adversarially trained wide DNN can be well approximated by a linearized DNN. Moreover, for squared loss, closed-form AT dynamics for the linearized DNN can be derived, which reveals a new AT degeneration phenomenon: a long-term AT will result in a wide DNN degenerates to that obtained without AT and thus cause robust overfitting. Based on our theoretical results, we further design a method namely Adv-NTK, the first AT algorithm for infinite-width DNNs. Experiments on real-world datasets show that Adv-NTK can help infinite-width DNNs enhance comparable robustness to that of their finite-width counterparts, which in turn justifies our theoretical findings. The code is available at https://github.com/fshp971/adv-ntk.

artificial intelligence, lin, machine learning, (17 more...)

2310.06112

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningSep-29-2023

Generalization Guarantees of Gradient Descent for Multi-Layer Neural Networks

Wang, Puyu, Lei, Yunwen, Wang, Di, Ying, Yiming, Zhou, Ding-Xuan

Recently, significant progress has been made in understanding the generalization of neural networks (NNs) trained by gradient descent (GD) using the algorithmic stability approach. However, most of the existing research has focused on one-hidden-layer NNs and has not addressed the impact of different network scaling parameters. In this paper, we greatly extend the previous work \cite{lei2022stability,richards2021stability} by conducting a comprehensive stability and generalization analysis of GD for multi-layer NNs. For two-layer NNs, our results are established under general network scaling parameters, relaxing previous conditions. In the case of three-layer NNs, our technical contribution lies in demonstrating its nearly co-coercive property by utilizing a novel induction strategy that thoroughly explores the effects of over-parameterization. As a direct application of our general findings, we derive the excess risk rate of $O(1/\sqrt{n})$ for GD algorithms in both two-layer and three-layer NNs. This sheds light on sufficient or necessary conditions for under-parameterized and over-parameterized NNs trained by GD to attain the desired risk rate of $O(1/\sqrt{n})$. Moreover, we demonstrate that as the scaling parameter increases or the network complexity decreases, less over-parameterization is required for GD to achieve the desired error rates. Additionally, under a low-noise condition, we obtain a fast risk rate of $O(1/n)$ for GD in both two-layer and three-layer NNs.

artificial intelligence, machine learning, null null 2 2, (18 more...)

2305.16891

Country: North America > United States (0.27)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

arXiv.org Artificial IntelligenceSep-15-2023

Fake News Detectors are Biased against Texts Generated by Large Language Models

Su, Jinyan, Zhuo, Terry Yue, Mansurov, Jonibek, Wang, Di, Nakov, Preslav

The spread of fake news has emerged as a critical challenge, undermining trust and posing threats to society. In the era of Large Language Models (LLMs), the capability to generate believable fake content has intensified these concerns. In this study, we present a novel paradigm to evaluate fake news detectors in scenarios involving both human-written and LLM-generated misinformation. Intriguingly, our findings reveal a significant bias in many existing detectors: they are more prone to flagging LLM-generated content as fake news while often misclassifying human-written fake news as genuine. This unexpected bias appears to arise from distinct linguistic patterns inherent to LLM outputs. To address this, we introduce a mitigation strategy that leverages adversarial training with LLM-paraphrased genuine news. The resulting model yielded marked improvements in detection accuracy for both human and LLM-generated news. To further catalyze research in this domain, we release two comprehensive datasets, \texttt{GossipCop++} and \texttt{PolitiFact++}, thus amalgamating human-validated articles with LLM-generated fake and real news.

detector, large language model, machine learning, (16 more...)

2309.08674

Country:

Europe (0.46)
Africa (0.46)
North America > Canada (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Machine LearningSep-8-2023

Adaptive Distributed Kernel Ridge Regression: A Feasible Distributed Learning Scheme for Data Silos

Wang, Di, Liu, Xiaotong, Lin, Shao-Bo, Zhou, Ding-Xuan

Data silos, mainly caused by privacy and interoperability, significantly constrain collaborations among different organizations with similar data for the same purpose. Distributed learning based on divide-and-conquer provides a promising way to settle the data silos, but it suffers from several challenges, including autonomy, privacy guarantees, and the necessity of collaborations. This paper focuses on developing an adaptive distributed kernel ridge regression (AdaDKRR) by taking autonomy in parameter selection, privacy in communicating non-sensitive information, and the necessity of collaborations in performance improvement into account. We provide both solid theoretical verification and comprehensive experiments for AdaDKRR to demonstrate its feasibility and effectiveness. Theoretically, we prove that under some mild conditions, AdaDKRR performs similarly to running the optimal learning algorithms on the whole data, verifying the necessity of collaborations and showing that no other distributed learning scheme can essentially beat AdaDKRR under the same conditions. Numerically, we test AdaDKRR on both toy simulations and two real-world applications to show that AdaDKRR is superior to other existing distributed learning schemes. All these results show that AdaDKRR is a feasible scheme to defend against data silos, which are highly desired in numerous application regions such as intelligent decision-making, pricing forecasting, and performance prediction for products.

artificial intelligence, local machine, machine learning, (15 more...)

2309.04236

Country: Asia (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.61)