AITopics | Xu, Ying

Collaborating Authors

Xu, Ying

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators

Herzog, Alexander, Rao, Kanishka, Hausman, Karol, Lu, Yao, Wohlhart, Paul, Yan, Mengyuan, Lin, Jessica, Arenas, Montserrat Gonzalez, Xiao, Ted, Kappler, Daniel, Ho, Daniel, Rettinghouse, Jarek, Chebotar, Yevgen, Lee, Kuang-Huei, Gopalakrishnan, Keerthana, Julian, Ryan, Li, Adrian, Fu, Chuyuan Kelly, Wei, Bob, Ramesh, Sangeetha, Holden, Khem, Kleiven, Kim, Rendleman, David, Kirmani, Sean, Bingham, Jeff, Weisz, Jon, Xu, Ying, Lu, Wenlong, Bennice, Matthew, Fong, Cody, Do, David, Lam, Jessica, Bai, Yunfei, Holson, Benjie, Quinlan, Michael, Brown, Noah, Kalakrishnan, Mrinal, Ibarz, Julian, Pastor, Peter, Levine, Sergey

arXiv.org Artificial IntelligenceMay-5-2023

We describe a system for deep reinforcement learning of robotic manipulation skills applied to a large-scale real-world task: sorting recyclables and trash in office buildings. Real-world deployment of deep RL policies requires not only effective training algorithms, but the ability to bootstrap real-world training and enable broad generalization. To this end, our system combines scalable deep RL from real-world data with bootstrapping from training in simulation, and incorporates auxiliary inputs from existing computer vision systems as a way to boost generalization to novel objects, while retaining the benefits of end-to-end training. We analyze the tradeoffs of different design decisions in our system, and present a large-scale empirical validation that includes training on real-world data gathered over the course of 24 months of experimentation, across a fleet of 23 robots in three office buildings, with a total training set of 9527 hours of robotic experience. Our final validation also consists of 4800 evaluation trials across 240 waste station configurations, in order to evaluate in detail the impact of the design decisions in our system, the scaling effects of including more real-world data, and the performance of the method on novel objects. The projects website and videos can be found at \href{http://rl-at-scale.github.io}{rl-at-scale.github.io}.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2305.0327

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

StoryBuddy: A Human-AI Collaborative Chatbot for Parent-Child Interactive Storytelling with Flexible Parental Involvement

Zhang, Zheng, Xu, Ying, Wang, Yanhao, Yao, Bingsheng, Ritchie, Daniel, Wu, Tongshuang, Yu, Mo, Wang, Dakuo, Li, Toby Jia-Jun

arXiv.org Artificial IntelligenceFeb-12-2022

Despite its benefits for children's skill development and parent-child bonding, many parents do not often engage in interactive storytelling by having story-related dialogues with their child due to limited availability or challenges in coming up with appropriate questions. While recent advances made AI generation of questions from stories possible, the fully-automated approach excludes parent involvement, disregards educational goals, and underoptimizes for child engagement. Informed by need-finding interviews and participatory design (PD) results, we developed StoryBuddy, an AI-enabled system for parents to create interactive storytelling experiences. StoryBuddy's design highlighted the need for accommodating dynamic user needs between the desire for parent involvement and parent-child bonding and the goal of minimizing parent intervention when busy. The PD revealed varied assessment and educational goals of parents, which StoryBuddy addressed by supporting configuring question types and tracking child progress. A user study validated StoryBuddy's usability and suggested design insights for future parent-AI collaboration systems.

machine learning, natural language, storybuddy, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3491102.3517479

2202.06205

Country:

Europe > United Kingdom (0.67)
North America > United States > Pennsylvania (0.28)
North America > United States > New York > New York County > New York City (0.15)
North America > United States > California > Santa Clara County (0.14)

Genre:

Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
Instructional Material (1.00)
(2 more...)

Industry:

Education > Educational Setting (1.00)
Education > Curriculum > Subject-Specific Education (0.67)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

It is AI's Turn to Ask Human a Question: Question and Answer Pair Generation for Children Storybooks in FairytaleQA Dataset

Yao, Bingsheng, Wang, Dakuo, Wu, Tongshuang, Hoang, Tran, Sun, Branda, Li, Toby Jia-Jun, Yu, Mo, Xu, Ying

arXiv.org Artificial IntelligenceSep-8-2021

Existing question answering (QA) datasets are created mainly for the application of having AI to be able to answer questions asked by humans. But in educational applications, teachers and parents sometimes may not know what questions they should ask a child that can maximize their language learning results. With a newly released book QA dataset (FairytaleQA), which educational experts labeled on 46 fairytale storybooks for early childhood readers, we developed an automated QA generation model architecture for this novel application. Our model (1) extracts candidate answers from a given storybook passage through carefully designed heuristics based on a pedagogical framework; (2) generates appropriate questions corresponding to each extracted answer using a language model; and, (3) uses another QA model to rank top QA-pairs. Automatic and human evaluations show that our model outperforms baselines. We also demonstrate that our method can help with the scarcity issue of the children's book QA dataset via data augmentation on 200 unlabeled storybooks.

artificial intelligence, dataset, natural language, (15 more...)

arXiv.org Artificial Intelligence

2109.03423

Country: North America > United States > California (0.15)

Genre: Research Report (0.64)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)

Add feedback

UIBert: Learning Generic Multimodal Representations for UI Understanding

Bai, Chongyang, Zang, Xiaoxue, Xu, Ying, Sunkara, Srinivas, Rastogi, Abhinav, Chen, Jindong, Arcas, Blaise Aguera y

arXiv.org Artificial IntelligenceAug-10-2021

To improve the accessibility of smart devices and to simplify their usage, building models which understand user interfaces (UIs) and assist users to complete their tasks is critical. However, unique challenges are proposed by UI-specific characteristics, such as how to effectively leverage multimodal UI features that involve image, text, and structural metadata and how to achieve good performance when high-quality labeled data is unavailable. To address such challenges we introduce UIBert, a transformer-based joint image-text model trained through novel pre-training tasks on large-scale unlabeled UI data to learn generic feature representations for a UI and its components. Our key intuition is that the heterogeneous features in a UI are self-aligned, i.e., the image and text features of UI components, are predictive of each other. We propose five pretraining tasks utilizing this self-alignment among different features of a UI component and across various components in the same UI. We evaluate our method on nine real-world downstream UI tasks where UIBert outperforms strong multimodal baselines by up to 9.26% accuracy.

human computer interaction, neural network, uibert, (18 more...)

arXiv.org Artificial Intelligence

2107.13731

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Multimodal Icon Annotation For Mobile Applications

Zang, Xiaoxue, Xu, Ying, Chen, Jindong

arXiv.org Artificial IntelligenceJul-9-2021

Annotating user interfaces (UIs) that involves localization and classification of meaningful UI elements on a screen is a critical step for many mobile applications such as screen readers and voice control of devices. Annotating object icons, such as menu, search, and arrow backward, is especially challenging due to the lack of explicit labels on screens, their similarity to pictures, and their diverse shapes. Existing studies either use view hierarchy or pixel based methods to tackle the task. Pixel based approaches are more popular as view hierarchy features on mobile platforms are often incomplete or inaccurate, however it leaves out instructional information in the view hierarchy such as resource-ids or content descriptions. We propose a novel deep learning based multi-modal approach that combines the benefits of both pixel and view hierarchy features as well as leverages the state-of-the-art object detection techniques. In order to demonstrate the utility provided, we create a high quality UI dataset by manually annotating the most commonly used 29 icons in Rico, a large scale mobile design dataset consisting of 72k UI screenshots. The experimental results indicate the effectiveness of our multi-modal approach. Our model not only outperforms a widely used object classification baseline but also pixel based object detection models. Our study sheds light on how to combine view hierarchy with pixel features for annotating UI elements.

deep learning, human computer interaction, view hierarchy, (19 more...)

arXiv.org Artificial Intelligence

2107.04452

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

Grey-box Adversarial Attack And Defence For Sentiment Classification

Xu, Ying, Zhong, Xu, Yepes, Antonio Jimeno, Lau, Jey Han

arXiv.org Artificial IntelligenceMar-22-2021

We introduce a grey-box adversarial attack and defence framework for sentiment classification. We address the issues of differentiability, label preservation and input reconstruction for adversarial attack and defence in one unified framework. Our results show that once trained, the attacking model is capable of generating high-quality adversarial examples substantially faster (one order of magnitude less in time) than state-of-the-art attacking methods. These examples also preserve the original sentiment according to human evaluation. Additionally, our framework produces an improved classifier that is robust in defending against multiple adversarial attacking methods. Code is available at: https://github.com/ibm-aur-nlp/adv-def-text-dist.

adversarial example, deep learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2103.11576

Country:

Oceania > Australia (0.14)
North America (0.14)

Genre: Research Report > New Finding (0.86)

Industry:

Information Technology > Security & Privacy (0.82)
Government > Military (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.86)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.71)

Add feedback

Understanding in Artificial Intelligence

Maetschke, Stefan, Iraola, David Martinez, Barnard, Pieter, ShafieiBavani, Elaheh, Zhong, Peter, Xu, Ying, Yepes, Antonio Jimeno

arXiv.org Artificial IntelligenceJan-16-2021

However, this progress is largely driven by increased computational power, namely GPU's, and bigger data sets but not due to radically new algorithms or knowledge representations. Artificial Neural Networks and Stochastic Gradient Descent, popularized in the 80's [3], remain the fundamental building blocks for most modern AI systems. While very successful for many applications, especially in vision, the purely deep-learning based approach has significant weaknesses. For instance, CNN's struggle with same-different relations [4], fail when long-chained reasoning is needed [5], are non-decomposable, cannot easily incorporate symbolic knowledge, and are hampered by a lack of model interpretability. Many current methods essentially compute higher order statistics over basic elements such as pixels, phonemes, letters or words to process inputs but do not explicitly model the building blocks and their relations in a (de)composable and interpretable way.

arxiv preprint arxiv, deep learning, neural network, (17 more...)

arXiv.org Artificial Intelligence

2101.06573

Country:

Asia (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Health & Medicine (0.92)
Leisure & Entertainment > Games (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

ActionBert: Leveraging User Actions for Semantic Understanding of User Interfaces

He, Zecheng, Sunkara, Srinivas, Zang, Xiaoxue, Xu, Ying, Liu, Lijuan, Wichers, Nevan, Schubiner, Gabriel, Lee, Ruby, Chen, Jindong

arXiv.org Artificial IntelligenceDec-22-2020

As mobile devices are becoming ubiquitous, regularly interacting with a variety of user interfaces (UIs) is a common aspect of daily life for many people. To improve the accessibility of these devices and to enable their usage in a variety of settings, building models that can assist users and accomplish tasks through the UI is vitally important. However, there are several challenges to achieve this. First, UI components of similar appearance can have different functionalities, making understanding their function more important than just analyzing their appearance. Second, domain-specific features like Document Object Model (DOM) in web pages and View Hierarchy (VH) in mobile applications provide important signals about the semantics of UI elements, but these features are not in a natural language format. Third, owing to a large diversity in UIs and absence of standard DOM or VH representations, building a UI understanding model with high coverage requires large amounts of training data. Inspired by the success of pre-training based approaches in NLP for tackling a variety of problems in a data-efficient way, we introduce a new pre-trained UI representation model called ActionBert. Our methodology is designed to leverage visual, linguistic and domain-specific features in user interaction traces to pre-train generic feature representations of UIs and their components. Our key intuition is that user actions, e.g., a sequence of clicks on different UI components, reveals important information about their functionality. We evaluate the proposed model on a wide variety of downstream tasks, ranging from icon classification to UI component retrieval based on its natural language description. Experiments show that the proposed ActionBert model outperforms multi-modal baselines across all downstream tasks by up to 15.5%.

artificial intelligence, neural network, ui component, (18 more...)

arXiv.org Artificial Intelligence

2012.1235

Genre: Research Report (0.50)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Human Computer Interaction > Interfaces (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Similarity Kernel and Clustering via Random Projection Forests

Yan, Donghui, Gu, Songxiang, Xu, Ying, Qin, Zhiwei

arXiv.org Machine LearningAug-27-2019

Similarity plays a fundamental role in many areas, including data mining, machine learning, statistics and various applied domains. Inspired by the success of ensemble methods and the flexibility of trees, we propose to learn a similarity kernel called rpf-kernel through random projection forests (rpForests). Our theoretical analysis reveals a highly desirable property of rpf-kernel: far-away (dissimilar) points have a low similarity value while nearby (similar) points would have a high similarity}, and the similarities have a native interpretation as the probability of points remaining in the same leaf nodes during the growth of rpForests. The learned rpf-kernel leads to an effective clustering algorithm--rpfCluster. On a wide variety of real and benchmark datasets, rpfCluster compares favorably to K-means clustering, spectral clustering and a state-of-the-art clustering ensemble algorithm--Cluster Forests. Our approach is simple to implement and readily adapt to the geometry of the underlying data. Given its desirable theoretical property and competitive empirical performance when applied to clustering, we expect rpf-kernel to be applicable to many problems of an unsupervised nature or as a regularizer in some supervised or weakly supervised settings.

health & medicine, similarity kernel, survey article, (20 more...)

arXiv.org Machine Learning

1908.10506

Country:

North America > United States > Massachusetts > Bristol County > Dartmouth (0.14)
North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback

Learning over inherently distributed data

Yan, Donghui, Xu, Ying

arXiv.org Machine LearningJul-30-2019

The recent decades have seen a surge of interests in distributed computing. Existing work focus primarily on either distributed computing platforms, data query tools, or, algorithms to divide big data and conquer at individual machines etc. It is, however, increasingly often that the data of interest are inherently distributed, i.e., data are stored at multiple distributed sites due to diverse collection channels, business operations etc. We propose to enable learning and inference in such a setting via a general framework based on the distortion minimizing local transformations. This framework only requires a small amount of local signatures to be shared among distributed sites, eliminating the need of having to transmitting big data. Computation can be done very efficiently via parallel local computation. The error incurred due to distributed computing vanishes when increasing the size of local signatures. As the shared data need not be in their original form, data privacy may also be preserved. Experiments on linear (logistic) regression and Random Forests have shown promise of this approach. This framework is expected to apply to a general class of tools in learning and inference with the continuity property.

inference, oncology, survey article, (23 more...)

arXiv.org Machine Learning

1907.13208

Country: North America > United States > Massachusetts > Bristol County > Dartmouth (0.14)

Genre: Research Report > New Finding (0.49)

Industry:

Information Technology > Security & Privacy (0.66)
Health & Medicine > Therapeutic Area (0.47)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.51)

Add feedback