AITopics | Xu, Linli

Collaborating Authors

Xu, Linli

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA

Zhu, Yongxin, Liu, Zhen, Liang, Yukang, Li, Xin, Liu, Hao, Bao, Changcun, Xu, Linli

arXiv.org Artificial IntelligenceApr-4-2023

In this paper, we propose a novel multi-modal framework for Scene Text Visual Question Answering (STVQA), which requires models to read scene text in images for question answering. Apart from text or visual objects, which could exist independently, scene text naturally links text and visual modalities together by conveying linguistic semantics while being a visual object in an image simultaneously. Different to conventional STVQA models which take the linguistic semantics and visual semantics in scene text as two separate features, in this paper, we propose a paradigm of "Locate Then Generate" (LTG), which explicitly unifies this two semantics with the spatial bounding box as a bridge connecting them. Specifically, at first, LTG locates the region in an image that may contain the answer words with an answer location module (ALM) consisting of a region proposal network and a language refinement network, both of which can transform to each other with one-to-one mapping via the scene text bounding box. Next, given the answer words selected by ALM, LTG generates a readable answer sequence with an answer generation module (AGM) based on a pre-trained language model. As a benefit of the explicit alignment of the visual and linguistic semantics, even without any scene text based pre-training tasks, LTG can boost the absolute accuracy by +6.06% and +6.92% on the TextVQA dataset and the ST-VQA dataset respectively, compared with a non-pre-training baseline. We further demonstrate that LTG effectively unifies visual and text modalities through the spatial bounding box connection, which is underappreciated in previous methods.

machine learning, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

2304.01603

Country: North America > United States (0.68)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Difformer: Empowering Diffusion Models on the Embedding Space for Text Generation

Gao, Zhujin, Guo, Junliang, Tan, Xu, Zhu, Yongxin, Zhang, Fang, Bian, Jiang, Xu, Linli

arXiv.org Artificial IntelligenceJan-31-2023

Diffusion models have achieved state-of-the-art synthesis quality on both visual and audio tasks, and recent works further adapt them to textual data by diffusing on the embedding space. In this paper, we conduct systematic studies and analyze the challenges between the continuous data space and the embedding space which have not been carefully explored. Firstly, the data distribution is learnable for embeddings, which may lead to the collapse of the loss function. Secondly, as the norm of embeddings varies between popular and rare words, adding the same noise scale will lead to sub-optimal results. In addition, we find the normal level of noise causes insufficient training of the model. To address the above challenges, we propose Difformer, an embedding diffusion model based on Transformer, which consists of three essential modules including an additional anchor loss function, a layer normalization module for embeddings, and a noise factor to the Gaussian noise. Experiments on two seminal text generation tasks including machine translation and text summarization show the superiority of Difformer over compared embedding diffusion baselines.

artificial intelligence, machine translation, natural language, (15 more...)

arXiv.org Artificial Intelligence

2212.09412

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

STL-SGD: Speeding Up Local SGD with Stagewise Communication Period

Shen, Shuheng, Cheng, Yifei, Liu, Jingchang, Xu, Linli

arXiv.org Machine LearningJun-11-2020

Distributed parallel stochastic gradient descent algorithms are workhorses for large scale machine learning tasks. Among them, local stochastic gradient descent (Local SGD) has attracted significant attention due to its low communication complexity. Previous studies prove that the communication complexity of Local SGD with a fixed or an adaptive communication period is in the order of $O (N^{\frac{3}{2}} T^{\frac{1}{2}})$ and $O (N^{\frac{3}{4}} T^{\frac{3}{4}})$ when the data distributions on clients are identical (IID) or otherwise (Non-IID). In this paper, to accelerate the convergence by reducing the communication complexity, we propose \textit{ST}agewise \textit{L}ocal \textit{SGD} (STL-SGD), which increases the communication period gradually along with decreasing learning rate. We prove that STL-SGD can keep the same convergence rate and linear speedup as mini-batch SGD. In addition, as the benefit of increasing the communication period, when the objective is strongly convex or satisfies the Polyak-\L ojasiewicz condition, the communication complexity of STL-SGD is $O (N \log{T})$ and $O (N^{\frac{1}{2}} T^{\frac{1}{2}})$ for the IID case and the Non-IID case respectively, achieving significant improvements over Local SGD. Experiments on both convex and non-convex problems demonstrate the superior performance of STL-SGD.

artificial intelligence, local sgd, machine learning, (15 more...)

arXiv.org Machine Learning

2006.06377

Country: Asia > China (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.95)

Add feedback

Faster Distributed Deep Net Training: Computation and Communication Decoupled Stochastic Gradient Descent

Shen, Shuheng, Xu, Linli, Liu, Jingchang, Liang, Xianfeng, Cheng, Yifei

arXiv.org Machine LearningJun-28-2019

With the increase in the amount of data and the expansion of model scale, distributed parallel training becomes an important and successful technique to address the optimization challenges. Nevertheless, although distributed stochastic gradient descent (SGD) algorithms can achieve a linear iteration speedup, they are limited significantly in practice by the communication cost, making it difficult to achieve a linear time speedup. In this paper, we propose a computation and communication decoupled stochastic gradient descent (CoCoD-SGD) algorithm to run computation and communication in parallel to reduce the communication cost. We prove that CoCoD-SGD has a linear iteration speedup with respect to the total computation capability of the hardware resources. In addition, it has a lower communication complexity and better time speedup comparing with traditional distributed SGD algorithms. Experiments on deep neural network training demonstrate the significant improvements of CoCoD-SGD: when training ResNet18 and VGG16 with 16 Geforce GTX 1080Ti GPUs, CoCoD-SGD is up to 2-3$\times$ faster than traditional synchronous SGD.

artificial intelligence, communication, neural network, (17 more...)

arXiv.org Machine Learning

1906.12043

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Accelerating Stochastic Gradient Descent Using Antithetic Sampling

Liu, Jingchang, Xu, Linli

arXiv.org Machine LearningOct-7-2018

But a rather high variance introduced by the stochastic gradient in each step may slow down the convergence. In this paper, we propose the antithetic sampling strategy to reduce the variance by taking advantage of the internal structure in dataset. Under this new strategy, stochastic gradients in a mini-batch are no longer independent but negatively correlated as much as possible, while the mini-batch stochastic gradient is still an unbiased estimator of full gradient. For the binary classification problems, we just need to calculate the antithetic samples in advance, and reuse the result in each iteration, which is practical. Experiments are provided to confirm the effectiveness of the proposed method.

artificial intelligence, machine learning, variance, (16 more...)

arXiv.org Machine Learning

1810.03124

Country: North America > United States (0.14)

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Enhancing Network Embedding with Auxiliary Information: An Explicit Matrix Factorization Perspective

Guo, Junliang, Xu, Linli, Huang, Xunpeng, Chen, Enhong

arXiv.org Machine LearningMar-4-2018

Recent advances in the field of network embedding have shown the low-dimensional network representation is playing a critical role in network analysis. However, most of the existing principles of network embedding do not incorporate auxiliary information such as content and labels of nodes flexibly. In this paper, we take a matrix factorization perspective of network embedding, and incorporate structure, content and label information of the network simultaneously. For structure, we validate that the matrix we construct preserves high-order proximities of the network. Label information can be further integrated into the matrix via the process of random walk sampling to enhance the quality of embedding in an unsupervised manner, i.e., without leveraging downstream classifiers. In addition, we generalize the Skip-Gram Negative Sampling model to integrate the content of the network in a matrix factorization framework. As a consequence, network embedding can be learned in a unified framework integrating network structure and node content as well as label information simultaneously. We demonstrate the efficacy of the proposed model with the tasks of semi-supervised node classification and link prediction on a variety of real-world benchmark network datasets.

artificial intelligence, information, survey article, (18 more...)

arXiv.org Machine Learning

1711.04094

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

How Images Inspire Poems: Generating Classical Chinese Poetry from Images with Memory Networks

Xu, Linli (University of Science and Technology of China) | Jiang, Liang ( University of Science and Technology of China ) | Qin, Chuan (University of Science and Technology of China) | Wang, Zhe (Ant Financial Services Group) | Du, Dongfang (University of Science and Technology of China)

AAAI ConferencesFeb-8-2018

With the recent advances of neural models and natural language processing, automatic generation of classical Chinese poetry has drawn significant attention due to its artistic and cultural value. Previous works mainly focus on generating poetry given keywords or other text information, while visual inspirations for poetry have been rarely explored. Generating poetry from images is much more challenging than generating poetry from text, since images contain very rich visual information which cannot be described completely using several keywords, and a good poem should convey the image accurately. In this paper, we propose a memory based neural model which exploits images to generate poems. Specifically, an Encoder-Decoder model with a topic memory network is proposed to generate classical Chinese poetry from images. To the best of our knowledge, this is the first work attempting to generate classical Chinese poetry from images with neural networks. A comprehensive experimental investigation with both human evaluation and quantitative analysis demonstrates that the proposed model can generate poems which convey images accurately.

deep learning, keyword, neural network, (21 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: Asia > China (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Bridging Video Content and Comments: Synchronized Video Description with Temporal Summarization of Crowdsourced Time-Sync Comments

Xu, Linli (University of Science and Technology of China) | Zhang, Chao ( University of Science and Technology of China )

AAAI ConferencesFeb-14-2017

With the rapid growth of online sharing media, we are facing a huge collection of videos. In the meantime, due to the volume and complexity of video data, it can be tedious and time consuming to index or annotate videos. In this paper, we propose to generate temporal descriptions of videos by exploiting the information of crowdsourced time-sync comments which are receiving increasing popularity on many video sharing websites. In this framework, representative and interesting comments of a video are selected and highlighted along the timeline, which provide an informative description of the video in a time-sync manner. The challenge of the proposed application comes from the extremely informal and noisy nature of the comments, which are usually short sentences and on very different topics. To resolve these issues, we propose a novel temporal summarization model based on the data reconstruction principle, where representative comments are selected in order to best reconstruct the original corpus at the text level as well as the topic level while incorporating the temporal correlations of the comments. Experimental results on real-world data demonstrate the effectiveness of the proposed framework and justify the idea of exploiting crowdsourced time-sync comments as a bridge to describe videos.

artificial intelligence, optimization problem, summarization, (20 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: Asia (0.14)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Communications > Social Media > Crowdsourcing (0.82)

Add feedback

Make Workers Work Harder: Decoupled Asynchronous Proximal Stochastic Gradient Descent

Li, Yitan, Xu, Linli, Zhong, Xiaowei, Ling, Qing

arXiv.org Machine LearningMay-21-2016

Asynchronous parallel optimization algorithms for solving large-scale machine learning problems have drawn significant attention from academia to industry recently. This paper proposes a novel algorithm, decoupled asynchronous proximal stochastic gradient descent (DAP-SGD), to minimize an objective function that is the composite of the average of multiple empirical losses and a regularization term. Unlike the traditional asynchronous proximal stochastic gradient descent (TAP-SGD) in which the master carries much of the computation load, the proposed algorithm off-loads the majority of computation tasks from the master to workers, and leaves the master to conduct simple addition operations. This strategy yields an easy-to-parallelize algorithm, whose performance is justified by theoretical convergence analyses. To be specific, DAP-SGD achieves an $O(\log T/T)$ rate when the step-size is diminishing and an ergodic $O(1/\sqrt{T})$ rate when the step-size is constant, where $T$ is the number of total iterations.

artificial intelligence, machine learning, proximal operator, (14 more...)

arXiv.org Machine Learning

1605.06619

Country:

Europe (0.46)
North America > Canada > Quebec (0.14)

Genre: Research Report (0.82)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Temporally Adaptive Restricted Boltzmann Machine for Background Modeling

Xu, Linli (University of Science and Technology of China) | Li, Yitan (University of Science and Technology of China) | Wang, Yubo (University of Science and Technology of China) | Chen, Enhong (University of Science and Technology of China)

AAAI ConferencesMar-6-2015

We examine the fundamental problem of background modeling which is to model the background scenes in video sequences and segment the moving objects from the background. A novel approach is proposed based on the Restricted Boltzmann Machine (RBM) while exploiting the temporal nature of the problem. In particular, we augment the standard RBM to take a window of sequential video frames as input and generate the background model while enforcing the background smoothly adapting to the temporal changes. As a result, the augmented temporally adaptive model can generate stable background given noisy inputs and adapt quickly to the changes in background while keeping all the advantages of RBMs including exact inference and effective learning procedure. Experimental results demonstrate the effectiveness of the proposed method in modeling the temporal nature in background.

background, deep learning, neural network, (19 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: Asia (0.14)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.62)

Add feedback