AITopics | Sun, Yu

Plotting

Sun, Yu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Only Pick Once -- Multi-Object Picking Algorithms for Picking Exact Number of Objects Efficiently

Ye, Zihe, Sun, Yu

arXiv.org Artificial IntelligenceJul-5-2023

Abstract--Picking up multiple objects at once is a grasping skill that makes a human worker efficient in many domains. This paper presents a system to pick a requested number of objects by only picking once (OPO). The proposed Only-Pick-Once System (OPOS) contains several graph-based algorithms that convert the layout of objects into a graph, cluster nodes in the graph, rank and select candidate clusters based on their topology. OPOS also has a multi-object picking predictor based on a convolutional neural network for estimating how many objects would be picked up with a given gripper location and orientation. This paper presents four evaluation metrics and three protocols to evaluate the proposed OPOS. The results show OPOS has very high success rates for two and three objects when only picking once. Using OPOS can significantly outperform two to three times single object picking in terms of efficiency. The results also show OPOS can generalize to unseen size and shape objects. Figure 1: Examples scenes of batch picking for four shapes: cube, cylinder, cuboid, hexagon. I. INTRODUCTION In warehouses, workers usually perform batch picking to investigations on the mechanism of holding multiple objects improve efficiency, also called multi-order picking. Nevertheless, For instance, a worker could be instructed to pick four boxes none of these works studied how to pick multiple of toothpaste or three jars of a cosmetic product from a bin.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2307.02662

Country: North America > United States > Florida > Hillsborough County > Tampa (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Consumer Products & Services > Personal Products (0.74)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Test-Time Training on Nearest Neighbors for Large Language Models

Hardt, Moritz, Sun, Yu

arXiv.org Artificial IntelligenceJun-7-2023

Many recent efforts aim to augment language models with relevant information retrieved from a database at test time. We avoid the need for prompt engineering by directly fine-tuning the model on data retrieved at test time using its standard training setup. For this purpose, we build a large-scale distributed nearest neighbor index based on text embeddings of the Pile dataset. Given a query to a language model, our system retrieves the neighbors of the query and fine-tunes the model on the text data corresponding to those neighbors. Surprisingly, retrieving and training on as few as 20 neighbors, each for only one gradient iteration, drastically improves performance across more than twenty language modeling tasks in the Pile benchmark. For example, test-time training significantly narrows the performance gap between a small GPT2 model and a GPTNeo model, more than ten times larger, that was specifically trained to convergence on the Pile. Sufficient index quality and size, however, are important. Our work establishes a valuable first baseline for implementing test-time training in the context of large language models, opening the door to numerous promising research avenues.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.18466

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Spectral Heterogeneous Graph Convolutions via Positive Noncommutative Polynomials

He, Mingguo, Wei, Zhewei, Feng, Shikun, Huang, Zhengjie, Li, Weibin, Sun, Yu, Yu, Dianhai

arXiv.org Artificial IntelligenceMay-31-2023

Heterogeneous Graph Neural Networks (HGNNs) have gained significant popularity in various heterogeneous graph learning tasks. However, most HGNNs rely on spatial domain-based message passing and attention modules for information propagation and aggregation. These spatial-based HGNNs neglect the utilization of spectral graph convolutions, which are the foundation of Graph Convolutional Networks (GCN) on homogeneous graphs. Inspired by the effectiveness and scalability of spectral-based GNNs on homogeneous graphs, this paper explores the extension of spectral-based GNNs to heterogeneous graphs. We propose PSHGCN, a novel heterogeneous convolutional network based on positive noncommutative polynomials. PSHGCN provides a simple yet effective approach for learning spectral graph convolutions on heterogeneous graphs. Moreover, we demonstrate the rationale of PSHGCN in graph optimization. We conducted an extensive experimental study to show that PSHGCN can learn diverse spectral heterogeneous graph convolutions and achieve superior performance in node classification tasks. Our code is available at https://github.com/ivam-he/PSHGCN.

artificial intelligence, graph convolution, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2305.19872

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages

Chai, Yekun, Wang, Shuohuan, Pang, Chao, Sun, Yu, Tian, Hao, Wu, Hua

arXiv.org Artificial IntelligenceMay-19-2023

Software engineers working with the same programming language (PL) may speak different natural languages (NLs) and vice versa, erecting huge barriers to communication and working efficiency. Recent studies have demonstrated the effectiveness of generative pre-training in computer programs, yet they are always English-centric. In this work, we step towards bridging the gap between multilingual NLs and multilingual PLs for large language models (LLMs). We release ERNIE-Code, a unified pre-trained language model for 116 NLs and 6 PLs. We employ two methods for universal cross-lingual pre-training: span-corruption language modeling that learns patterns from monolingual NL or PL; and pivot-based translation language modeling that relies on parallel data of many NLs and PLs. Extensive results show that ERNIE-Code outperforms previous multilingual LLMs for PL or NL across a wide range of end tasks of code intelligence, including multilingual code-to-text, text-to-code, code-to-code, and text-to-text generation. We further show its advantage of zero-shot prompting on multilingual code summarization and text-to-text translation. We release our code and pre-trained checkpoints.

machine learning, natural language, programming language, (16 more...)

arXiv.org Artificial Intelligence

2212.06742

Country:

Europe (1.00)
North America > United States > New York (0.14)
Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts

Feng, Zhida, Zhang, Zhenyu, Yu, Xintong, Fang, Yewei, Li, Lanxin, Chen, Xuyi, Lu, Yuxiang, Liu, Jiaxiang, Yin, Weichong, Feng, Shikun, Sun, Yu, Chen, Li, Tian, Hao, Wu, Hua, Wang, Haifeng

arXiv.org Artificial IntelligenceMar-27-2023

Recent progress in diffusion models has revolutionized the popular technology of text-to-image generation. While existing approaches could produce photorealistic high-resolution images with text conditions, there are still several open problems to be solved, which limits the further improvement of image fidelity and text relevancy. In this paper, we propose ERNIE-ViLG 2.0, a large-scale Chinese text-to-image diffusion model, to progressively upgrade the quality of generated images by: (1) incorporating fine-grained textual and visual knowledge of key elements in the scene, and (2) utilizing different denoising experts at different denoising stages. With the proposed mechanisms, ERNIE-ViLG 2.0 not only achieves a new state-of-the-art on MS-COCO with zero-shot FID score of 6.75, but also significantly outperforms recent models in terms of image fidelity and image-text alignment, with side-by-side human evaluation on the bilingual prompt set ViLG-300.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2210.15257

Country:

North America > United States > Maryland (0.14)
North America > Canada > Quebec (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.51)

Add feedback

Label Information Enhanced Fraud Detection against Low Homophily in Graphs

Wang, Yuchen, Zhang, Jinghui, Huang, Zhengjie, Li, Weibin, Feng, Shikun, Ma, Ziheng, Sun, Yu, Yu, Dianhai, Dong, Fang, Jin, Jiahui, Wang, Beilun, Luo, Junzhou

arXiv.org Artificial IntelligenceFeb-20-2023

Node classification is a substantial problem in graph-based fraud detection. Many existing works adopt Graph Neural Networks (GNNs) to enhance fraud detectors. While promising, currently most GNN-based fraud detectors fail to generalize to the low homophily setting. Besides, label utilization has been proved to be significant factor for node classification problem. But we find they are less effective in fraud detection tasks due to the low homophily in graphs. In this work, we propose GAGA, a novel Group AGgregation enhanced TrAnsformer, to tackle the above challenges. Specifically, the group aggregation provides a portable method to cope with the low homophily issue. Such an aggregation explicitly integrates the label information to generate distinguishable neighborhood information. Along with group aggregation, an attempt towards end-to-end trainable group encoding is proposed which augments the original feature space with the class labels. Meanwhile, we devise two additional learnable encodings to recognize the structural and relational context. Then, we combine the group aggregation and the learnable encodings into a Transformer encoder to capture the semantic information. Experimental results clearly show that GAGA outperforms other competitive graph-based fraud detectors by up to 24.39% on two trending public datasets and a real-world industrial dataset from Anonymous. Even more, the group aggregation is demonstrated to outperform other label utilization methods (e.g., C&S, BoT/UniMP) in the low homophily setting.

data mining, information, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3543507.3583373

2302.10407

Country:

Asia > China (0.48)
North America > United States > Texas (0.16)

Genre: Research Report (0.82)

Industry:

Law Enforcement & Public Safety > Fraud (1.00)
Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

ERNIE 3.0 Tiny: Frustratingly Simple Method to Improve Task-Agnostic Distillation Generalization

Liu, Weixin, Chen, Xuyi, Liu, Jiaxiang, Feng, Shikun, Sun, Yu, Tian, Hao, Wu, Hua

arXiv.org Artificial IntelligenceJan-9-2023

Task-agnostic knowledge distillation attempts to address the problem of deploying large pretrained language model in resource-constrained scenarios by compressing a large pretrained model called teacher into a smaller one called student such that the student can be directly finetuned on downstream tasks and retains comparable performance. However, we empirically find that there is a generalization gap between the student and the teacher in existing methods. In this work, we show that we can leverage multi-task learning in task-agnostic distillation to advance the generalization of the resulted student. In particular, we propose Multi-task Infused Task-agnostic Knowledge Distillation (MITKD). We first enhance the teacher by multi-task training it on multiple downstream tasks and then perform distillation to produce the student. Experimental results demonstrate that our method yields a student with much better generalization, significantly outperforms existing baselines, and establishes a new state-of-the-art result on in-domain, out-domain, and low-resource datasets in the setting of task-agnostic distillation. Moreover, our method even exceeds an 8x larger BERT$_{\text{Base}}$ on SQuAD and four GLUE tasks. In addition, by combining ERNIE 3.0, our method achieves state-of-the-art results on 10 Chinese datasets.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2301.03416

Country:

Europe (1.00)
North America > United States > Minnesota (0.29)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

Fan, Xiaoran, Pang, Chao, Yuan, Tian, Bai, He, Zheng, Renjie, Zhu, Pengfei, Wang, Shuohuan, Chen, Junkun, Chen, Zeyu, Huang, Liang, Sun, Yu, Wu, Hua

arXiv.org Artificial IntelligenceDec-4-2022

Speech representation learning has improved both speech understanding and speech synthesis tasks for single language. However, its ability in cross-lingual scenarios has not been explored. In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing. We propose a speech-text joint pretraining framework, where we randomly mask the spectrogram and the phonemes given a speech example and its transcription. By learning to reconstruct the masked parts of the input in different languages, our model shows great improvements over speaker-embedding-based multi-speaker TTS methods. Moreover, our framework is end-to-end for both the training and the inference without any finetuning effort. In cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing tasks, our experiments show that our model outperforms speaker-embedding-based multi-speaker TTS methods.

artificial intelligence, machine learning, speech, (17 more...)

arXiv.org Artificial Intelligence

2211.03545

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.96)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

X-PuDu at SemEval-2022 Task 6: Multilingual Learning for English and Arabic Sarcasm Detection

Han, Yaqian, Chai, Yekun, Wang, Shuohuan, Sun, Yu, Huang, Hongyi, Chen, Guanghao, Xu, Yitong, Yang, Yang

arXiv.org Artificial IntelligenceNov-30-2022

Detecting sarcasm and verbal irony from people's subjective statements is crucial to understanding their intended meanings and real sentiments and positions in social scenarios. This paper describes the X-PuDu system that participated in SemEval-2022 Task 6, iSarcasmEval - Intended Sarcasm Detection in English and Arabic, which aims at detecting intended sarcasm in various settings of natural language understanding. Our solution finetunes pre-trained language models, such as ERNIE-M and DeBERTa, under the multilingual settings to recognize the irony from Arabic and English texts. Our system ranked second out of 43, and ninth out of 32 in Task A: one-sentence detection in English and Arabic; fifth out of 22 in Task B: binary multi-label classification in English; first out of 16, and fifth out of 13 in Task C: sentence-pair detection in English and Arabic.

detection, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2211.16883

Country:

North America > United States (0.28)
North America > Canada (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.69)

Add feedback

X-PuDu at SemEval-2022 Task 7: A Replaced Token Detection Task Pre-trained Model with Pattern-aware Ensembling for Identifying Plausible Clarifications

Shang, Junyuan, Wang, Shuohuan, Sun, Yu, Yu, Yanjun, Zhou, Yue, Xiang, Li, Yang, Guixiu

arXiv.org Artificial IntelligenceNov-27-2022

This paper describes our winning system on SemEval 2022 Task 7: Identifying Plausible Clarifications of Implicit and Underspecified Phrases in Instructional Texts. A replaced token detection pre-trained model is utilized with minorly different task-specific heads for SubTask-A: Multi-class Classification and SubTask-B: Ranking. Incorporating a pattern-aware ensemble method, our system achieves a 68.90% accuracy score and 0.8070 spearman's rank correlation score surpassing the 2nd place with a large margin by 2.7 and 2.2 percent points for SubTask-A and SubTask-B, respectively. Our approach is simple and easy to implement, and we conducted ablation studies and qualitative and quantitative analyses for the working strategies used in our system.

arxiv preprint arxiv, machine learning, natural language, (11 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2022.semeval-1.152

2211.14734

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback