AITopics | Wu, Di

Plotting

Wu, Di

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Switch EMA: A Free Lunch for Better Flatness and Sharpness

Li, Siyuan, Liu, Zicheng, Tian, Juanxi, Wang, Ge, Wang, Zedong, Jin, Weiyang, Wu, Di, Tan, Cheng, Lin, Tao, Liu, Yang, Sun, Baigui, Li, Stan Z.

arXiv.org Artificial IntelligenceFeb-14-2024

From both theoretical and empirical aspects, we demonstrate The complexity and high-dimensional parameter space of that SEMA can help DNNs to reach generalization modern DNNs has posed great challenges in optimization, optima that better trade-off between such as gradient vanishing or exploding, overfitting, and degeneration flatness and sharpness. To verify the effectiveness of large batch size (You et al., 2020). To address of SEMA, we conduct comparison experiments these obstacles, two branches of research have been conducted: with discriminative, generative, and regression improving optimizers or enhancing optimization by tasks on vision and language datasets, including regularization techniques. According to their characteristics image classification, self-supervised learning, object in Tab. 1, the improved optimizers (Kingma & Ba, 2014; detection and segmentation, image generation, Loshchilov & Hutter, 2019; Ginsburg et al., 2018; Zhang video prediction, attribute regression, and et al., 2019; Foret et al., 2021) tend to be more expensive language modeling. Comprehensive results with and focus on sharpness(deeper optimal) by refining the gradient, popular optimizers and networks show that SEMA while the popular regularizations (Srivastava et al., is a free lunch for DNN training by improving performances 2014; Cubuk et al., 2019; Zhang et al., 2018; Izmailov et al., and boosting convergence speeds.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.0924

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.63)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Robust Path Planning via Learning from Demonstrations for Robotic Catheters in Deformable Environments

Li, Zhen, Lambranzi, Chiara, Wu, Di, Segato, Alice, De Marco, Federico, Poorten, Emmanuel Vander, Dankelman, Jenny, De Momi, Elena

arXiv.org Artificial IntelligenceFeb-1-2024

Navigation through tortuous and deformable vessels using catheters with limited steering capability underscores the need for reliable path planning. State-of-the-art path planners do not fully account for the deformable nature of the environment. This work proposes a robust path planner via a learning from demonstrations method, named Curriculum Generative Adversarial Imitation Learning (C-GAIL). This path planning framework takes into account the interaction between steerable catheters and vessel walls and the deformable property of vessels. In-silico comparative experiments show that the proposed network achieves smaller targeting errors, and a higher success rate, compared to a state-of-the-art approach based on GAIL. The in-vitro validation experiments demonstrate that the path generated by the proposed C-GAIL path planner aligns better with the actual steering capability of the pneumatic artificial muscle-driven catheter utilized in this study. Therefore, the proposed approach can provide enhanced support to the user in navigating the catheter towards the target with greater precision, in contrast to the conventional centerline-following technique. The targeting and tracking errors are 1.26$\pm$0.55mm and 5.18$\pm$3.48mm, respectively. The proposed path planning framework exhibits superior performance in managing uncertainty associated with vessel deformation, thereby resulting in lower tracking errors.

catheter, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2402.00537

Country:

Europe > Italy (0.15)
Europe > Belgium (0.14)
Europe > Netherlands (0.14)
(2 more...)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Equipment & Supplies (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Graph Attention-based Reinforcement Learning for Trajectory Design and Resource Assignment in Multi-UAV Assisted Communication

Feng, Zikai, Wu, Di, Huang, Mengxing, Yuen, Chau

arXiv.org Artificial IntelligenceJan-31-2024

In the multiple unmanned aerial vehicle (UAV)- assisted downlink communication, it is challenging for UAV base stations (UAV BSs) to realize trajectory design and resource assignment in unknown environments. The cooperation and competition between UAV BSs in the communication network leads to a Markov game problem. Multi-agent reinforcement learning is a significant solution for the above decision-making. However, there are still many common issues, such as the instability of the system and low utilization of historical data, that limit its application. In this paper, a novel graph-attention multi-agent trust region (GA-MATR) reinforcement learning framework is proposed to solve the multi-UAV assisted communication problem. Graph recurrent network is introduced to process and analyze complex topology of the communication network, so as to extract useful information and patterns from observational information. The attention mechanism provides additional weighting for conveyed information, so that the critic network can accurately evaluate the value of behavior for UAV BSs. This provides more reliable feedback signals and helps the actor network update the strategy more effectively. Ablation simulations indicate that the proposed approach attains improved convergence over the baselines. UAV BSs learn the optimal communication strategies to achieve their maximum cumulative rewards. Additionally, multi-agent trust region method with monotonic convergence provides an estimated Nash equilibrium for the multi-UAV assisted communication Markov game.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2401.1788

Country:

Asia > China (0.14)
Asia > Taiwan (0.14)

Genre: Research Report (0.50)

Industry:

Information Technology (0.48)
Telecommunications (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Analysis of Knowledge Tracing performance on synthesised student data

Pagonis, Panagiotis, Hartung, Kai, Wu, Di, Georges, Munir, Gröttrup, Sören

arXiv.org Artificial IntelligenceJan-30-2024

Knowledge Tracing (KT) aims to predict the future performance of students by tracking the development of their knowledge states. Despite all the recent progress made in this field, the application of KT models in education systems is still restricted from the data perspectives: 1) limited access to real life data due to data protection concerns, 2) lack of diversity in public datasets, 3) noises in benchmark datasets such as duplicate records. To resolve these problems, we simulated student data with three statistical strategies based on public datasets and tested their performance on two KT baselines. While we observe only minor performance improvement with additional synthetic data, our work shows that using only synthetic data for training can lead to similar performance as real data.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2401.16832

Country:

Europe > Germany (0.14)
North America > United States (0.14)

Genre: Research Report (0.50)

Industry:

Education > Educational Setting > Online (0.93)
Education > Educational Technology > Educational Software > Computer Based Training (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)
Information Technology > Data Science (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.68)

Add feedback

A Systematic Literature Review on Explainability for Machine/Deep Learning-based Software Engineering Research

Cao, Sicong, Sun, Xiaobing, Widyasari, Ratnadira, Lo, David, Wu, Xiaoxue, Bo, Lili, Zhang, Jiale, Li, Bin, Liu, Wei, Wu, Di, Chen, Yixin

arXiv.org Artificial IntelligenceJan-25-2024

The remarkable achievements of Artificial Intelligence (AI) algorithms, particularly in Machine Learning (ML) and Deep Learning (DL), have fueled their extensive deployment across multiple sectors, including Software Engineering (SE). However, due to their black-box nature, these promising AI-driven SE models are still far from being deployed in practice. This lack of explainability poses unwanted risks for their applications in critical tasks, such as vulnerability detection, where decision-making transparency is of paramount importance. This paper endeavors to elucidate this interdisciplinary domain by presenting a systematic literature review of approaches that aim to improve the explainability of AI models within the context of SE. The review canvasses work appearing in the most prominent SE & AI conferences and journals, and spans 63 papers across 21 unique SE tasks. Based on three key Research Questions (RQs), we aim to (1) summarize the SE tasks where XAI techniques have shown success to date; (2) classify and analyze different XAI techniques; and (3) investigate existing evaluation approaches. Based on our findings, we identified a set of challenges remaining to be addressed in existing studies, together with a roadmap highlighting potential opportunities we deemed appropriate and important for future work.

artificial intelligence, explanation, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2401.14617

Country:

Asia (0.46)
Oceania > Australia (0.14)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Education (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How Far Can 100 Samples Go? Unlocking Overall Zero-Shot Multilingual Translation via Tiny Multi-Parallel Data

Wu, Di, Tan, Shaomu, Meng, Yan, Stap, David, Monz, Christof

arXiv.org Artificial IntelligenceJan-22-2024

Zero-shot translation is an open problem, aiming to translate between language pairs unseen during training in Multilingual Machine Translation (MMT). A common, albeit resource-consuming, solution is to mine as many translation directions as possible to add to the parallel corpus. In this paper, we show that the zero-shot capability of an English-centric model can be easily enhanced by fine-tuning with a very small amount of multi-parallel data. For example, on the EC30 dataset, we show that up to +21.7 ChrF non-English overall improvements (870 directions) can be achieved by using only 100 multi-parallel samples, meanwhile preserving capability in English-centric directions. We further study the size effect of fine-tuning data and its transfer capabilities. Surprisingly, our empirical analysis shows that comparable overall improvements can be achieved even through fine-tuning in a small, randomly sampled direction set (10\%). Also, the resulting non-English performance is quite close to the upper bound (complete translation). Due to its high efficiency and practicality, we encourage the community 1) to consider the use of the fine-tuning method as a strong baseline for zero-shot translation and 2) to construct more comprehensive and high-quality multi-parallel data to cover real-world demand.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2401.12413

Country:

Europe (1.00)
Asia > Middle East > UAE (0.13)
Asia > Middle East > Republic of Türkiye (0.13)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.88)

Add feedback

Beyond Shared Vocabulary: Increasing Representational Word Similarities across Languages for Multilingual Machine Translation

Wu, Di, Monz, Christof

arXiv.org Artificial IntelligenceJan-20-2024

Using a vocabulary that is shared across languages is common practice in Multilingual Neural Machine Translation (MNMT). In addition to its simple design, shared tokens play an important role in positive knowledge transfer, assuming that shared tokens refer to similar meanings across languages. However, when word overlap is small, especially due to different writing systems, transfer is inhibited. In this paper, we define word-level information transfer pathways via word equivalence classes and rely on graph networks to fuse word embeddings across languages. Our experiments demonstrate the advantages of our approach: 1) embeddings of words with similar meanings are better aligned across languages, 2) our method achieves consistent BLEU improvements of up to 2.3 points for high- and low-resource MNMT, and 3) less than 1.0\% additional trainable parameters are required with a limited increase in computational costs, while inference time remains identical to the baseline. We release the codebase to the community.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.14189

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Hallucination Detection and Hallucination Mitigation: An Investigation

Luo, Junliang, Li, Tianyu, Wu, Di, Jenkin, Michael, Liu, Steve, Dudek, Gregory

arXiv.org Artificial IntelligenceJan-16-2024

Large language models (LLMs), including ChatGPT, Bard, and Llama, have achieved remarkable successes over the last two years in a range of different applications. In spite of these successes, there exist concerns that limit the wide application of LLMs. A key problem is the problem of hallucination. Hallucination refers to the fact that in addition to correct responses, LLMs can also generate seemingly correct but factually incorrect responses. This report aims to present a comprehensive review of the current literature on both hallucination detection and hallucination mitigation. We hope that this report can serve as a good reference for both engineers and researchers who are interested in LLMs and applying them to real world tasks.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2401.08358

Country:

Europe (1.00)
North America > Canada > Quebec (0.14)
North America > United States > Texas (0.14)
North America > United States > Michigan (0.14)

Genre:

Overview (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Masked Modeling for Self-supervised Representation Learning on Vision and Beyond

Li, Siyuan, Zhang, Luyuan, Wang, Zedong, Wu, Di, Wu, Lirong, Liu, Zicheng, Xia, Jun, Tan, Cheng, Liu, Yang, Sun, Baigui, Li, Stan Z.

arXiv.org Artificial IntelligenceJan-9-2024

As the deep learning revolution marches on, self-supervised learning has garnered increasing attention in recent years thanks to its remarkable representation learning ability and the low dependence on labeled data. Among these varied self-supervised techniques, masked modeling has emerged as a distinctive approach that involves predicting parts of the original data that are proportionally masked during training. This paradigm enables deep models to learn robust representations and has demonstrated exceptional performance in the context of computer vision, natural language processing, and other modalities. In this survey, we present a comprehensive review of the masked modeling framework and its methodology. We elaborate on the details of techniques within masked modeling, including diverse masking strategies, recovering targets, network architectures, and more. Then, we systematically investigate its wide-ranging applications across domains. Furthermore, we also explore the commonalities and differences between masked modeling methods in different fields. Toward the end of this paper, we conclude by discussing the limitations of current techniques and point out several potential avenues for advancing masked modeling research. A paper list project with this survey is available at \url{https://github.com/Lupin1998/Awesome-MIM}.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2401.00897

Country: Asia > China > Zhejiang Province (0.14)

Genre: Overview (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Split-and-Privatize Framework for Large Language Model Fine-Tuning

Shen, Xicong, Liu, Yang, Liu, Huiqi, Hong, Jue, Duan, Bing, Huang, Zirui, Mao, Yunlong, Wu, Ye, Wu, Di

arXiv.org Artificial IntelligenceDec-24-2023

Fine-tuning is a prominent technique to adapt a pre-trained language model to downstream scenarios. In parameter-efficient fine-tuning, only a small subset of modules are trained over the downstream datasets, while leaving the rest of the pre-trained model frozen to save computation resources. In recent years, a popular productization form arises as Model-as-a-Service (MaaS), in which vendors provide abundant pre-trained language models, server resources and core functions, and customers can fine-tune, deploy and invoke their customized model by accessing the one-stop MaaS with their own private dataset. In this paper, we identify the model and data privacy leakage risks in MaaS fine-tuning, and propose a Split-and-Privatize (SAP) framework, which manage to mitigate the privacy issues by adapting the existing split learning architecture. The proposed SAP framework is sufficiently investigated by experiments, and the results indicate that it can enhance the empirical privacy by 62% at the cost of 1% model performance degradation on the Stanford Sentiment Treebank dataset.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2312.15603

Country: North America > United States (0.47)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.85)

Add feedback