AITopics | Fan, Xiang

Collaborating Authors

Fan, Xiang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion

Fan, Xiang, Bhattad, Anand, Krishna, Ranjay

arXiv.org Artificial IntelligenceMar-22-2024

We introduce Videoshop, a training-free video editing algorithm for localized semantic edits. Videoshop allows users to use any editing software, including Photoshop and generative inpainting, to modify the first frame; it automatically propagates those changes, with semantic, spatial, and temporally consistent motion, to the remaining frames. Unlike existing methods that enable edits only through imprecise textual instructions, Videoshop allows users to add or remove objects, semantically change objects, insert stock photos into videos, etc. with fine-grained control over locations and appearance. We achieve this through image-based video editing by inverting latents with noise extrapolation, from which we generate videos conditioned on the edited image. Videoshop produces higher quality edits against 6 baselines on 2 editing benchmarks using 10 evaluation metrics.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2403.14617

Genre:

Workflow (0.46)
Research Report (0.40)

Industry: Media > Photography (0.36)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

Quantifying & Modeling Multimodal Interactions: An Information Decomposition Framework

Liang, Paul Pu, Cheng, Yun, Fan, Xiang, Ling, Chun Kai, Nie, Suzanne, Chen, Richard, Deng, Zihao, Allen, Nicholas, Auerbach, Randy, Mahmood, Faisal, Salakhutdinov, Ruslan, Morency, Louis-Philippe

arXiv.org Artificial IntelligenceDec-10-2023

The recent explosion of interest in multimodal applications has resulted in a wide selection of datasets and methods for representing and integrating information from different modalities. Despite these empirical advances, there remain fundamental research questions: How can we quantify the interactions that are necessary to solve a multimodal task? Subsequently, what are the most suitable multimodal models to capture these interactions? To answer these questions, we propose an information-theoretic approach to quantify the degree of redundancy, uniqueness, and synergy relating input modalities with an output task. We term these three measures as the PID statistics of a multimodal distribution (or PID for short), and introduce two new estimators for these PID statistics that scale to high-dimensional distributions. To validate PID estimation, we conduct extensive experiments on both synthetic datasets where the PID is known and on large-scale multimodal benchmarks where PID estimations are compared with human annotations. Finally, we demonstrate their usefulness in (1) quantifying interactions within multimodal datasets, (2) quantifying interactions captured by multimodal models, (3) principled approaches for model selection, and (4) three real-world case studies engaging with domain experts in pathology, mood prediction, and robotic perception where our framework helps to recommend strong multimodal models for each application.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2302.12247

Country: North America > United States > Pennsylvania (0.14)

Genre: Research Report > New Finding (0.87)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Nano: Nested Human-in-the-Loop Reward Learning for Few-shot Language Model Control

Fan, Xiang, Lyu, Yiwei, Liang, Paul Pu, Salakhutdinov, Ruslan, Morency, Louis-Philippe

arXiv.org Artificial IntelligenceSep-22-2023

Pretrained language models have demonstrated extraordinary capabilities in language generation. However, real-world tasks often require controlling the distribution of generated text in order to mitigate bias, promote fairness, and achieve personalization. Existing techniques for controlling the distribution of generated text only work with quantified distributions, which require pre-defined categories, proportions of the distribution, or an existing corpus following the desired distributions. However, many important distributions, such as personal preferences, are unquantified. In this work, we tackle the problem of generating text following arbitrary distributions (quantified and unquantified) by proposing Nano, a few-shot human-in-the-loop training algorithm that continuously learns from human feedback. Nano achieves state-of-the-art results on single topic/attribute as well as quantified distribution control compared to previous works. We also show that Nano is able to learn unquantified distributions, achieves personalization, and captures differences between different individuals' personal preferences with high sample efficiency.

annotator, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2023.findings-acl.758

2211.0575

Country:

Asia (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Health & Medicine (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

High-Modality Multimodal Transformer: Quantifying Modality & Interaction Heterogeneity for High-Modality Representation Learning

Liang, Paul Pu, Lyu, Yiwei, Fan, Xiang, Tsaw, Jeffrey, Liu, Yudong, Mo, Shentong, Yogatama, Dani, Morency, Louis-Philippe, Salakhutdinov, Ruslan

arXiv.org Artificial IntelligenceJun-28-2023

Many real-world problems are inherently multimodal, from spoken language, gestures, and paralinguistics humans use to communicate, to force, proprioception, and visual sensors on robots. While there has been an explosion of interest in multimodal learning, these methods are focused on a small set of modalities primarily in language, vision, and audio. In order to accelerate generalization towards diverse and understudied modalities, this paper studies efficient representation learning for high-modality scenarios involving a large set of diverse modalities. Since adding new models for every new modality becomes prohibitively expensive, a critical technical challenge is heterogeneity quantification: how can we measure which modalities encode similar information and interactions in order to permit parameter sharing with previous modalities? This paper proposes two new information theoretic metrics for heterogeneity quantification: (1) modality heterogeneity studies how similar 2 modalities {X1,X2} are by measuring how much information can be transferred from X1 to X2, while (2) interaction heterogeneity studies how similarly pairs of modalities {X1,X2}, {X3,X4} interact by measuring how much information can be transferred from fusing {X1,X2} to {X3,X4}. We show the importance of these 2 proposed metrics as a way to automatically prioritize the fusion of modalities that contain unique information or interactions. The result is a single model, HighMMT, that scales up to 10 modalities (text, image, audio, video, sensors, proprioception, speech, time-series, sets, and tables) and 15 tasks from 5 research areas. Not only does HighMMT outperform prior methods on the tradeoff between performance and efficiency, it also demonstrates a crucial scaling behavior: performance continues to improve with each modality added, and it transfers to entirely new modalities and tasks during fine-tuning.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2203.01311

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(6 more...)

Add feedback

MultiZoo & MultiBench: A Standardized Toolkit for Multimodal Deep Learning

Liang, Paul Pu, Lyu, Yiwei, Fan, Xiang, Agarwal, Arav, Cheng, Yun, Morency, Louis-Philippe, Salakhutdinov, Ruslan

arXiv.org Artificial IntelligenceJun-28-2023

Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiZoo, a public toolkit consisting of standardized implementations of > 20 core multimodal algorithms and MultiBench, a large-scale benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. Together, these provide an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To enable holistic evaluation, we offer a comprehensive methodology to assess (1) generalization, (2) time and space complexity, and (3) modality robustness. MultiBench paves the way towards a better understanding of the capabilities and limitations of multimodal models, while ensuring ease of use, accessibility, and reproducibility. Our toolkits are publicly available, will be regularly updated, and welcome inputs from the community.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2306.16413

Genre: Research Report (0.50)

Industry: Health & Medicine (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Model Stealing Attack against Multi-Exit Networks

Pan, Li, Peizhuo, Lv, Kai, Chen, Yuling, Cai, Fan, Xiang, Shengzhi, Zhang

arXiv.org Artificial IntelligenceMay-22-2023

Compared to traditional neural networks with a single exit, a multi-exit network has multiple exits that allow for early output from intermediate layers of the model, thus bringing significant improvement in computational efficiency while maintaining similar recognition accuracy. When attempting to steal such valuable models using traditional model stealing attacks, we found that conventional methods can only steal the model's classification function while failing to capture its output strategy. This results in a significant decrease in computational efficiency for the stolen substitute model, thereby losing the advantages of multi-exit networks. In this paper, we propose the first model stealing attack to extract both the model function and output strategy. We employ bayesian changepoint detection to analyze the target model's output strategy and use performance loss and strategy loss to guide the training of the substitute model. Furthermore, we designed a novel output strategy search algorithm that can find the optimal output strategy to maximize the consistency between the victim model and the substitute model's outputs. Through experiments on multiple mainstream multi-exit networks and benchmark datasets, we thoroughly demonstrates the effectiveness of our method.

artificial intelligence, machine learning, victim model, (15 more...)

arXiv.org Artificial Intelligence

2305.13584

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications (0.93)

Add feedback

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning

Liang, Paul Pu, Lyu, Yiwei, Fan, Xiang, Wu, Zetian, Cheng, Yun, Wu, Jason, Chen, Leslie, Wu, Peter, Lee, Michelle A., Zhu, Yuke, Salakhutdinov, Ruslan, Morency, Louis-Philippe

arXiv.org Artificial IntelligenceJul-15-2021

Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world applications in multimedia, affective computing, robotics, finance, human-computer interaction, and healthcare. Unfortunately, multimodal research has seen limited resources to study (1) generalization across domains and modalities, (2) complexity during training and inference, and (3) robustness to noisy and missing modalities. In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiBench, a systematic and unified large-scale benchmark spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. MultiBench provides an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To enable holistic evaluation, MultiBench offers a comprehensive methodology to assess (1) generalization, (2) time and space complexity, and (3) modality robustness. MultiBench introduces impactful challenges for future research, including scalability to large-scale multimodal datasets and robustness to realistic imperfections. To accompany this benchmark, we also provide a standardized implementation of 20 core approaches in multimodal learning. Simply applying methods proposed in different research areas can improve the state-of-the-art performance on 9/15 datasets. Therefore, MultiBench presents a milestone in unifying disjoint efforts in multimodal research and paves the way towards a better understanding of the capabilities and limitations of multimodal models, all the while ensuring ease of use, accessibility, and reproducibility. MultiBench, our standardized code, and leaderboards are publicly available, will be regularly updated, and welcomes inputs from the community.

computer vision and pattern recognition, deep learning, human computer interaction, (27 more...)

arXiv.org Artificial Intelligence

2107.07502

Country:

Europe (0.67)
North America > United States > Massachusetts (0.28)
North America > United States > Texas > Travis County > Austin (0.13)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)
Instructional Material > Course Syllabus & Notes (0.45)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Law (1.00)
(7 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Information Management (1.00)
Information Technology > Human Computer Interaction > Interfaces (1.00)
(7 more...)

Add feedback