AITopics | Li, Juncheng

Plotting

Li, Juncheng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sim-MEES: Modular End-Effector System Grasping Dataset for Mobile Manipulators in Cluttered Environments

Li, Juncheng, Cappelleri, David J.

arXiv.org Artificial IntelligenceMay-17-2023

In this paper, we present Sim-MEES: a large-scale synthetic dataset that contains 1,550 objects with varying difficulty levels and physics properties, as well as 11 million grasp labels for mobile manipulators to plan grasps using different gripper modalities in cluttered environments. Our dataset generation process combines analytic models and dynamic simulations of the entire cluttered environment to provide accurate grasp labels. We provide a detailed study of our proposed labeling process for both parallel jaw grippers and suction cup grippers, comparing them with state-of-the-art methods to demonstrate how Sim-MEES can provide precise grasp labels in cluttered environments.

artificial intelligence, gripper, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2305.1058

Country: North America > United States > Indiana > Tippecanoe County (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)

Add feedback

DBA: Efficient Transformer with Dynamic Bilinear Low-Rank Attention

Qin, Bosheng, Li, Juncheng, Tang, Siliang, Zhuang, Yueting

arXiv.org Artificial IntelligenceNov-23-2022

Many studies have been conducted to improve the efficiency of Transformer from quadric to linear. Among them, the low-rank-based methods aim to learn the projection matrices to compress the sequence length. However, the projection matrices are fixed once they have been learned, which compress sequence length with dedicated coefficients for tokens in the same position. Adopting such input-invariant projections ignores the fact that the most informative part of a sequence varies from sequence to sequence, thus failing to preserve the most useful information that lies in varied positions. In addition, previous efficient Transformers only focus on the influence of sequence length while neglecting the effect of hidden state dimension. To address the aforementioned problems, we present an efficient yet effective attention mechanism, namely the Dynamic Bilinear Low-Rank Attention (DBA), which compresses the sequence length by input-sensitive dynamic projection matrices and achieves linear time and space complexity by jointly optimizing the sequence length and hidden state dimension while maintaining state-of-the-art performance. Specifically, we first theoretically demonstrate that the sequence length can be compressed non-destructively from a novel perspective of information theory, with compression matrices dynamically determined by the input sequence. Furthermore, we show that the hidden state dimension can be approximated by extending the Johnson-Lindenstrauss lemma, optimizing the attention in bilinear form. Theoretical analysis shows that DBA is proficient in capturing high-order relations in cross-attention problems. Experiments over tasks with diverse sequence length conditions show that DBA achieves state-of-the-art performance compared with various strong baselines while maintaining less memory consumption with higher speed.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2211.16368

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning

Zhang, Wenqiao, Shi, Haochen, Guo, Jiannan, Zhang, Shengyu, Cai, Qingpeng, Li, Juncheng, Luo, Sihui, Zhuang, Yueting

arXiv.org Artificial IntelligenceDec-13-2021

Text-based image captioning (TextCap) requires simultaneous comprehension of visual content and reading the text of images to generate a natural language description. Although a task can teach machines to understand the complex human environment further given that text is omnipresent in our daily surroundings, it poses additional challenges in normal captioning. A text-based image intuitively contains abundant and complex multimodal relational content, that is, image details can be described diversely from multiview rather than a single caption. Certainly, we can introduce additional paired training data to show the diversity of images' descriptions, this process is labor-intensive and time-consuming for TextCap pair annotations with extra texts. Based on the insight mentioned above, we investigate how to generate diverse captions that focus on different image parts using an unpaired training paradigm. We propose the Multimodal relAtional Graph adversarIal inferenCe (MAGIC) framework for diverse and unpaired TextCap. This framework can adaptively construct multiple multimodal relational graphs of images and model complex relationships among graphs to represent descriptive diversity. Moreover, a cascaded generative adversarial network is developed from modeled graphs to infer the unpaired caption generation in image-sentence feature alignment and linguistic coherence levels. We validate the effectiveness of MAGIC in generating diverse captions from different relational information items of an image. Experimental results show that MAGIC can generate very promising outcomes without using any image-caption training pairs.

caption, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2112.06558

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

Revisiting Factorizing Aggregated Posterior in Learning Disentangled Representations

Cheng, Ze, Li, Juncheng, Wang, Chenxu, Gu, Jixuan, Xu, Hao, Li, Xinjian, Metze, Florian

arXiv.org Artificial IntelligenceSep-12-2020

In the problem of learning disentangled representations, one of the promising methods is to factorize aggregated posterior by penalizing the total correlation of sampled latent variables. However, this well-motivated strategy has a blind spot: there is a disparity between the sampled latent representation and its corresponding mean representation. In this paper, we provide a theoretical explanation that low total correlation of sampled representation cannot guarantee low total correlation of the mean representation. Indeed, we prove that for the multivariate normal distributions, the mean representation with arbitrarily high total correlation can have a corresponding sampled representation with bounded total correlation. We also propose a method to eliminate this disparity. Experiments show that our model can learn a mean representation with much lower total correlation, hence a factorized mean representation. Moreover, we offer a detailed explanation of the limitations of factorizing aggregated posterior -- factor disintegration. Our work indicates a potential direction for future research of disentangled learning.

artificial intelligence, machine learning, representation, (14 more...)

arXiv.org Artificial Intelligence

2009.05739

Country: Asia > China (0.14)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Walking with MIND: Mental Imagery eNhanceD Embodied QA

Li, Juncheng, Tang, Siliang, Wu, Fei, Zhuang, Yueting

arXiv.org Artificial IntelligenceAug-5-2019

The EmbodiedQA is a task of training an embodied agent by intelligently navigating in a simulated environment and gathering visual information to answer questions. Existing approaches fail to explicitly model the mental imagery function of the agent, while the mental imagery is crucial to embodied cognition, and has a close relation to many high-level meta-skills such as generalization and interpretation. In this paper, we propose a novel Mental Imagery eNhanceD (MIND) module for the embodied agent, as well as a relevant deep reinforcement framework for training. The MIND module can not only model the dynamics of the environment (e.g. 'what might happen if the agent passes through a door') but also help the agent to create a better understanding of the environment (e.g. 'The refrigerator is usually in the kitchen'). Such knowledge makes the agent a faster and better learner in locating a feasible policy with only a few trails. Furthermore, the MIND module can generate mental images that are treated as short-term subgoals by our proposed deep reinforcement framework. These mental images facilitate policy learning since short-term subgoals are easy to achieve and reusable. This yields better planning efficiency than other algorithms that learn a policy directly from primitive actions. Finally, the mental images visualize the agent's intentions in a way that human can understand, and this endows our agent's actions with more interpretability. The experimental results and further analysis prove that the agent with the MIND module is superior to its counterparts not only in EQA performance but in many other aspects such as route planning, behavioral interpretation, and the ability to generalize from a few examples.

agent, deep learning, neural network, (17 more...)

arXiv.org Artificial Intelligence

1908.01482

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.75)

Add feedback

Adversarial camera stickers: A physical camera-based attack on deep learning systems

Li, Juncheng, Schmidt, Frank R., Kolter, J. Zico

arXiv.org Machine LearningApr-3-2019

Recent work has thoroughly documented the susceptibility of deep learning systems to adversarial examples, but most such instances directly manipulate the digital input to a classifier. Although a smaller line of work considers physical adversarial attacks, in all cases these involve manipulating the object of interest, e.g., putting a physical sticker on a object to misclassify it, or manufacturing an object specifically intended to be misclassified. In this work, we consider an alternative question: is it possible to fool deep classifiers, over all perceived objects of a certain type, by physically manipulating the camera itself? We show that this is indeed possible, that by placing a carefully crafted and mainly-translucent sticker over the lens of a camera, one can create universal perturbations of the observed images that are inconspicuous, yet reliably misclassify target objects as a different (targeted) class. To accomplish this, we propose an iterative procedure for both updating the attack perturbation (to make it adversarial for a given classifier), and the threat model itself (to ensure it is physically realizable). For example, we show that we can achieve physically-realizable attacks that fool ImageNet classifiers in a targeted fashion 49.6% of the time. This presents a new class of physically-realizable threat models to consider in the context of adversarially robust machine learning. Our demo video can be viewed at: https://youtu.be/wUVmL33Fx54

deep learning, neural network, perturbation, (17 more...)

arXiv.org Machine Learning

1904.00759

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Knowledge Oriented Intelligent Audio Analytics

Oltramari, Alessandro (Bosch Research and Technology Center, Pittsburgh (CR/RTC3.1-NA)) | Szurley, Joseph (Bosch Research and Technology Center, Pittsburgh (CR/RTC3.1-NA)) | Das, Samarjit (Bosch Research and Technology Center, Pittsburgh (CR/RTC3.1-NA)) | Francis, Jonathan (Bosch Research and Technology Center, Pittsburgh (CR/RTC3.1-NA), Carnegie Mellon University) | Li, Juncheng (Bosch Research and Technology Center, Pittsburgh (CR/RTC3.1-NA), Carnegie Mellon University)

AAAI ConferencesApr-6-2018

In this position paper we discuss the benefits of combining knowledge technologies and deep learning (DL) for audio analytics: knowledge can enable high-level reasoning, helping to scale up intelligent systems from sound recognition to event analysis. We will also argue that a knowledge-integrated DL framework is key to enable smart environments.

knowledge oriented intelligent audio analytic

AAAI Conferences

Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence (0.53)

Add feedback