AITopics

2302.06529

Country:

North America > United States (0.14)
Europe > Spain > Galicia > Madrid (0.04)
Africa > Kenya (0.04)
(5 more...)

Genre:

Research Report > Promising Solution (0.70)
Research Report > New Finding (0.68)
Overview > Innovation (0.60)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

arXiv.org Artificial IntelligenceJul-6-2023

A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond

Xiao, Yisheng, Wu, Lijun, Guo, Junliang, Li, Juntao, Zhang, Min, Qin, Tao, Liu, Tie-yan

Non-autoregressive (NAR) generation, which is first proposed in neural machine translation (NMT) to speed up inference, has attracted much attention in both machine learning and natural language processing communities. While NAR generation can significantly accelerate inference speed for machine translation, the speedup comes at the cost of sacrificed translation accuracy compared to its counterpart, autoregressive (AR) generation. In recent years, many new models and algorithms have been designed/proposed to bridge the accuracy gap between NAR generation and AR generation. In this paper, we conduct a systematic survey with comparisons and discussions of various non-autoregressive translation (NAT) models from different aspects. Specifically, we categorize the efforts of NAT into several groups, including data manipulation, modeling methods, training criterion, decoding algorithms, and the benefit from pre-trained models. Furthermore, we briefly review other applications of NAR models beyond machine translation, such as grammatical error correction, text summarization, text style transfer, dialogue, semantic parsing, automatic speech recognition, and so on. In addition, we also discuss potential directions for future exploration, including releasing the dependency of KD, reasonable training objectives, pre-training for NAR, and wider applications, etc. We hope this survey can help researchers capture the latest progress in NAR generation, inspire the design of advanced NAR models and algorithms, and enable industry practitioners to choose appropriate solutions for their applications. The web page of this survey is at \url{https://github.com/LitterBrother-Xiao/Overview-of-Non-autoregressive-Applications}.

machine learning, natural language, translation, (20 more...)

2204.09269

Country:

Asia > China > Beijing > Beijing (0.04)
North America > Mexico > Puebla (0.04)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Genre: Overview (1.00)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJul-6-2023

Learning Low-Dimensional Nonlinear Structures from High-Dimensional Noisy Data: An Integral Operator Approach

Ding, Xiucai, Ma, Rong

With rapid technological advancements in data collection and processing, massive large-scale and high-dimensional data sets are widely available nowadays in diverse research fields such as astronomy, business analytics, human genetics and microbiology. A common feature of these data sets is that their statistical and geometric properties can be well understood via a meaningful low-rank representation of reduced dimensionality. Learning low-dimensional structures from these high-dimensional noisy data is one of the central topics in statistics and data science. Moreover, nonlinear structures have been found predominant and intrinsic in many real-world data sets, which may not be easily captured or preserved in commonly used linear or quasi-linear methods such as principal component analysis (PCA), singular value decomposition (SVD) [53] and multidimensional scaling (MDS) [18]. As a longstanding and well-recognized technique for analyzing data sets with possibly nonlinear structures, kernel methods have been shown effective in various applications ranging from clustering, data visualization to classification and prediction [75, 47, 59]. On the other hand, spectral methods [23], as a fundamental tool for dimension reduction, are oftentimes applied in combination with kernel methods to better capture the underlying low-dimensional nonlinear structure in the data. These approaches are commonly referred to as nonlinear dimension reduction techniques; see Section 1.1 below for a brief overview.

artificial intelligence, machine learning, matrix, (14 more...)

2203.00126

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Overview (0.87)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Pahune, Saurabh, Chandrasekharan, Manoj

Several categories of Large Language Models (LLMs): A Short Survey

Large Language Models(LLMs)have become effective tools for natural language processing and have been used in many different fields. This essay offers a succinct summary of various LLM subcategories. The survey emphasizes recent developments and efforts made for various LLM kinds, including task-based financial LLMs, multilingual language LLMs, biomedical and clinical LLMs, vision language LLMs, and code language models. The survey gives a general summary of the methods, attributes, datasets, transformer models, and comparison metrics applied in each category of LLMs. Furthermore, it highlights unresolved problems in the field of developing chatbots and virtual assistants, such as boosting natural language processing, enhancing chatbot intelligence, and resolving moral and legal dilemmas. The purpose of this study is to provide readers, developers, academics, and users interested in LLM-based chatbots and virtual intelligent assistant technologies with useful information and future directions.

large language model, machine learning, natural language, (17 more...)

doi: 10.22214/ijraset.2023.54677

2307.10188

Country:

North America > Dominican Republic (0.04)
North America > United States > Ohio > Franklin County > Dublin (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(4 more...)

Genre:

Research Report (1.00)
Overview (0.93)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.93)
Banking & Finance (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Hartmann, Carsten, Richter, Lorenz

Transgressing the boundaries: towards a rigorous understanding of deep learning and its (non-)robustness

The recent advances in machine learning in various fields of applications can be largely attributed to the rise of deep learning (DL) methods and architectures. Despite being a key technology behind autonomous cars, image processing, speech recognition, etc., a notorious problem remains the lack of theoretical understanding of DL and related interpretability and (adversarial) robustness issues. Understanding the specifics of DL, as compared to, say, other forms of nonlinear regression methods or statistical learning, is interesting from a mathematical perspective, but at the same time it is of crucial importance in practice: treating neural networks as mere black boxes might be sufficient in certain cases, but many applications require waterproof performance guarantees and a deeper understanding of what could go wrong and why it could go wrong. It is probably fair to say that, despite being mathematically well founded as a method to approximate complicated functions, DL is mostly still more like modern alchemy that is firmly in the hands of engineers and computer scientists. Nevertheless, it is evident that certain specifics of DL that could explain its success in applications demands systematic mathematical approaches. In this work, we review robustness issues of DL and particularly bridge concerns and attempts from approximation theory to statistical learning theory. Further, we review Bayesian Deep Learning as a means for uncertainty quantification and rigorous explainability.

artificial intelligence, machine learning, neural network, (14 more...)

2307.02454

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > New York (0.04)
(6 more...)

Genre:

Overview (0.86)
Research Report (0.63)

Industry:

Transportation (0.74)
Health & Medicine (0.67)
Information Technology (0.67)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Hort, Max, Grishina, Anastasiia, Moonen, Leon

An Exploratory Literature Study on Sharing and Energy Use of Language Models for Source Code

Large language models trained on source code can support a variety of software development tasks, such as code recommendation and program repair. Large amounts of data for training such models benefit the models' performance. However, the size of the data and models results in long training times and high energy consumption. While publishing source code allows for replicability, users need to repeat the expensive training process if models are not shared. The main goal of the study is to investigate if publications that trained language models for software engineering (SE) tasks share source code and trained artifacts. The second goal is to analyze the transparency on training energy usage. We perform a snowballing-based literature search to find publications on language models for source code, and analyze their reusability from a sustainability standpoint. From 494 unique publications, we identified 293 relevant publications that use language models to address code-related tasks. Among them, 27% (79 out of 293) make artifacts available for reuse. This can be in the form of tools or IDE plugins designed for specific tasks or task-agnostic models that can be fine-tuned for a variety of downstream tasks. Moreover, we collect insights on the hardware used for model training, as well as training time, which together determine the energy consumption of the development process. We find that there are deficiencies in the sharing of information and artifacts for current studies on source code models for software engineering tasks, with 40% of the surveyed papers not sharing source code or trained artifacts. We recommend the sharing of source code as well as trained artifacts, to enable sustainable reproducibility. Moreover, comprehensive information on training times and hardware configurations should be shared for transparency on a model's carbon footprint.

artificial intelligence, machine learning, natural language, (16 more...)

2307.02443

Country:

Europe > Norway > Eastern Norway > Oslo (0.04)
North America > United States (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Information Technology > Services (0.67)
Energy > Power Industry (0.67)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Towards Open Federated Learning Platforms: Survey and Vision from Technical and Legal Perspectives

Duan, Moming

In recent years, the barriers to the development of Artificial Intelligence (AI) have been broken down with the rapid progress of ABC technologies in computing: AI, Big Data, and Cloud Computing, as well as the emergence of cost-effective specialized hardware [213] and software [98]. This has led to the world entering the third wave of AI development: Deep Learning [117]. The success of current data-driven AI relies on massive amounts of training data and follows a gather-and-analyze paradigm [233], which confronts with challenges of complying with rigorous data protection regulations such as OECD Privacy Guidelines [217] and General and Data Protection Regulation (GDPR) [223]. So although data-centric AI is currently mainstream paradigm, Federated Learning [132], a novel model-centric distributed collaborative training framework, is gaining popularity in both academia and industry for its advantages in complying with privacy regulations [219]. According to the definitions of IEEE Standard for Federated Machine Learning (FML, aka FL) [205], FL is a framework or system that enables multiple participants to collaboratively build and use machine learning models without disclosing the raw and private data owned by the participants while achieving good performance. For example, a typical workflow of FL systems is that the entity with modeling demand (aka FL server) first deploys the FL services and initializes the model training task, and then distributing this task to participants with training data (aka FL clients) for modeling [13]. Based on this workflow pattern, many FL frameworks have been derived with specialized improvements in communication [111, 161, 240], optimizaiton [107, 129, 133], robustness [44, 124, 198] and privacy [14, 32, 62]. While these fascinating improvements greatly enhance the utility of FL, they all follow a task-based interaction paradigm, in which an FL server dominates the cooperation between FL participants. In this narrow interpretation of FL, the data owner is treated more like a worker than a collaborator and performs training primarily for the benefit of the server's goals.

large language model, machine learning, natural language, (17 more...)

2307.0214

Country:

North America > United States > Virginia (0.04)
Asia > Singapore > Central Region > Singapore (0.04)
Africa (0.04)
(5 more...)

Genre:

Workflow (1.00)
Overview (1.00)
Research Report > Promising Solution (0.34)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.45)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Make A Long Image Short: Adaptive Token Length for Vision Transformers

Zhou, Qiqi, Zhu, Yichen

The vision transformer is a model that breaks down each image into a sequence of tokens with a fixed length and processes them similarly to words in natural language processing. Although increasing the number of tokens typically results in better performance, it also leads to a considerable increase in computational cost. Motivated by the saying "A picture is worth a thousand words," we propose an innovative approach to accelerate the ViT model by shortening long images. Specifically, we introduce a method for adaptively assigning token length for each image at test time to accelerate inference speed. First, we train a Resizable-ViT (ReViT) model capable of processing input with diverse token lengths. Next, we extract token-length labels from ReViT that indicate the minimum number of tokens required to achieve accurate predictions. We then use these labels to train a lightweight Token-Length Assigner (TLA) that allocates the optimal token length for each image during inference. The TLA enables ReViT to process images with the minimum sufficient number of tokens, reducing token numbers in the ViT model and improving inference speed. Our approach is general and compatible with modern vision transformer architectures, significantly reducing computational costs. We verified the effectiveness of our methods on multiple representative ViT models on image classification and action recognition.

artificial intelligence, deep learning, machine learning, (16 more...)

2307.02092

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre:

Research Report (1.00)
Overview (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Laskar, Md Tahmid Rahman, Bari, M Saiful, Rahman, Mizanur, Bhuiyan, Md Amran Hossen, Joty, Shafiq, Huang, Jimmy Xiangji

A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets

The development of large language models (LLMs) such as ChatGPT has brought a lot of attention recently. However, their evaluation in the benchmark academic datasets remains under-explored due to the difficulty of evaluating the generative outputs produced by this model against the ground truth. In this paper, we aim to present a thorough evaluation of ChatGPT's performance on diverse academic datasets, covering tasks like question-answering, text summarization, code generation, commonsense reasoning, mathematical problem-solving, machine translation, bias detection, and ethical considerations. Specifically, we evaluate ChatGPT across 140 tasks and analyze 255K responses it generates in these datasets. This makes our work the largest evaluation of ChatGPT in NLP benchmarks. In short, our study aims to validate the strengths and weaknesses of ChatGPT in various tasks and provide insights for future research using LLMs. We also report a new emergent ability to follow multi-query instructions that we mostly found in ChatGPT and other instruction-tuned models. Our extensive evaluation shows that even though ChatGPT is capable of performing a wide variety of tasks, and may obtain impressive performance in several benchmark datasets, it is still far from achieving the ability to reliably solve many challenging tasks. By providing a thorough assessment of ChatGPT's performance across diverse NLP tasks, this paper sets the stage for a targeted deployment of ChatGPT-like LLMs in real-world applications.

large language model, machine learning, natural language, (17 more...)

2305.18486

Country:

Asia > India (0.14)
North America > United States > Washington > King County > Seattle (0.14)
Europe > United Kingdom > England (0.04)
(20 more...)

Genre:

Overview (1.00)
Personal > Interview (0.46)
Research Report > New Finding (0.46)

Industry:

Media (1.00)
Leisure & Entertainment > Sports (1.00)
Health & Medicine (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJul-4-2023

Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review

Wong, Man Fai, Guo, Shangxin, Hang, Ching Nam, Ho, Siu Wai, Tan, Chee Wei

This paper provides a comprehensive review of the literature concerning the utilization of Natural Language Processing (NLP) techniques, with a particular focus on transformer-based large language models (LLMs) trained using Big Code, within the domain of AI-assisted programming tasks. LLMs, augmented with software naturalness, have played a crucial role in facilitating AI-assisted programming applications, including code generation, code completion, code translation, code refinement, code summarization, defect detection, and clone detection. Notable examples of such applications include the GitHub Copilot powered by OpenAI's Codex and DeepMind AlphaCode. This paper presents an overview of the major LLMs and their applications in downstream tasks related to AI-assisted programming. Furthermore, it explores the challenges and opportunities associated with incorporating NLP techniques with software naturalness in these applications, with a discussion on extending AI-assisted programming capabilities to Apple's Xcode for mobile software development. This paper also presents the challenges of and opportunities for incorporating NLP techniques with software naturalness, empowering developers with advanced coding assistance and streamlining the software development process.

large language model, machine learning, natural language, (16 more...)

doi: 10.3390/e25060888

2307.02503

Country:

Asia > China > Hong Kong (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Berlin (0.04)
(3 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Law > Intellectual Property & Technology Law (1.00)
Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)