AITopics | Chang, Bo

Plotting

Chang, Bo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

EVOLvE: Evaluating and Optimizing LLMs For Exploration

Nie, Allen, Su, Yi, Chang, Bo, Lee, Jonathan N., Chi, Ed H., Le, Quoc V., Chen, Minmin

arXiv.org Artificial IntelligenceOct-8-2024

Despite their success in many domains, large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty. This is crucial as many real-world applications, ranging from personalized recommendations to healthcare interventions, demand that LLMs not only predict but also actively learn to make optimal decisions through exploration. In this work, we measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications. We develop a comprehensive suite of environments, including both context-free and contextual bandits with varying task difficulties, to benchmark LLMs' performance. Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs: by providing explicit algorithm-guided support during inference; and through algorithm distillation via in-context demonstrations and fine-tuning, using synthetic data generated from these algorithms. Impressively, these techniques allow us to achieve superior exploration performance with smaller models, surpassing larger models on various tasks. We conducted an extensive ablation study to shed light on various factors, such as task difficulty and data representation, that influence the efficiency of LLM exploration. Additionally, we conduct a rigorous analysis of the LLM's exploration efficiency using the concept of regret, linking its ability to explore to the model size and underlying algorithm.

exploitation value, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.06238

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry:

Media > Film (0.47)
Health & Medicine (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Latent User Intent Modeling for Sequential Recommenders

Chang, Bo, Karatzoglou, Alexandros, Wang, Yuyan, Xu, Can, Chi, Ed H., Chen, Minmin

arXiv.org Artificial IntelligenceMar-27-2023

Sequential recommender models are essential components of modern industrial recommender systems. These models learn to predict the next items a user is likely to interact with based on his/her interaction history on the platform. Most sequential recommenders however lack a higher-level understanding of user intents, which often drive user behaviors online. Intent modeling is thus critical for understanding users and optimizing long-term user experience. We propose a probabilistic modeling approach and formulate user intent as latent variables, which are inferred based on user behavior signals using variational autoencoders (VAE). The recommendation policy is then adjusted accordingly given the inferred user intent. We demonstrate the effectiveness of the latent user intent modeling via offline analyses as well as live experiments on a large-scale industrial recommendation platform.

artificial intelligence, machine learning, user intent, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3543873.3584641

2211.09832

Country: North America > United States (0.49)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

CopulaGNN: Towards Integrating Representational and Correlational Roles of Graphs in Graph Neural Networks

Ma, Jiaqi, Chang, Bo, Zhang, Xuefei, Mei, Qiaozhu

arXiv.org Machine LearningOct-5-2020

However, graphs encode diverse types of information and thus play different roles in data representation. In this paper, we distinguish the representational and the correlational roles played by the graphs in node-level prediction tasks, and we investigate how Graph Neural Network (GNN) models can effectively leverage both types of information. Conceptually, the representational information provides guidance for the model to construct better node features; while the correlational information indicates the correlation between node outcomes conditional on node features. Through a simulation study, we find that many popular GNN models are incapable of effectively utilizing the correlational information. By leveraging the idea of the copula, a principled way to describe the dependence among multivariate random variables, we offer a general solution. The proposed Copula Graph Neural Network (CopulaGNN) can take a wide range of GNN models as base models and utilize both representational and correlational information stored in the graphs. Experimental results on two types of regression tasks verify the effectiveness of the proposed method. Graphs, as flexible data representations that store rich relational information, have been commonly used in data science tasks.

artificial intelligence, graph, neural network, (19 more...)

arXiv.org Machine Learning

2010.02089

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Point Process Flows

Mehrasa, Nazanin, Deng, Ruizhi, Ahmed, Mohamed Osama, Chang, Bo, He, Jiawei, Durand, Thibaut, Brubaker, Marcus, Mori, Greg

arXiv.org Machine LearningOct-18-2019

Event sequences can be modeled by temporal point processes (TPPs) to capture their asynchronous and probabilistic nature. We contribute an intensity-free framework that directly models the point process as a non-parametric distribution by utilizing normalizing flows. This approach is capable of capturing highly complex temporal distributions and does not rely on restrictive parametric forms. Comparisons with state-of-the-art baseline models on both synthetic and challenging real-life datasets show that the proposed framework is effective at modeling the stochasticity of discrete event sequences.

dataset, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

1910.08281

Genre: Research Report (0.50)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Communications > Social Media (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks

Chang, Bo, Chen, Minmin, Haber, Eldad, Chi, Ed H.

arXiv.org Machine LearningFeb-25-2019

Recurrent neural networks have gained widespread use in modeling sequential data. Learning long-term dependencies using these models remains difficult though, due to exploding or vanishing gradients. In this paper, we draw connections between recurrent networks and ordinary differential equations. A special form of recurrent networks called the AntisymmetricRNN is proposed under this theoretical framework, which is able to capture long-term dependencies thanks to the stability property of its underlying differential equation. Existing approaches to improving RNN trainability often incur significant computation overhead. In comparison, AntisymmetricRNN achieves the same goal by design. AntisymmetricRNN exhibits much more predictable dynamics. It outperforms regular LSTM models on tasks requiring long-term memory and matches the performance on tasks where short-term dependencies dominate despite being much simpler. Modeling complex temporal dependencies in sequential data using RNNs, especially the long-term dependencies, remains an open challenge.

deep learning, matrix, neural network, (18 more...)

arXiv.org Machine Learning

1902.09689

Country: North America > Canada > British Columbia (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs

Gilboa, Dar, Chang, Bo, Chen, Minmin, Yang, Greg, Schoenholz, Samuel S., Chi, Ed H., Pennington, Jeffrey

arXiv.org Machine LearningJan-25-2019

Training recurrent neural networks (RNNs) on long sequence tasks is plagued with difficulties arising from the exponential explosion or vanishing of signals as they propagate forward or backward through the network. Many techniques have been proposed to ameliorate these issues, including various algorithmic and architectural modifications. Two of the most successful RNN architectures, the LSTM and the GRU, do exhibit modest improvements over vanilla RNN cells, but they still suffer from instabilities when trained on very long sequences. In this work, we develop a mean field theory of signal propagation in LSTMs and GRUs that enables us to calculate the time scales for signal propagation as well as the spectral properties of the state-to-state Jacobians. By optimizing these quantities in terms of the initialization hyperparameters, we derive a novel initialization scheme that eliminates or reduces training instabilities. We demonstrate the efficacy of our initialization scheme on multiple sequence tasks, on which it enables successful training while a standard initialization either fails completely or is orders of magnitude slower. We also observe a beneficial effect on generalization performance using this new initialization.

deep learning, mean field theory, neural network, (15 more...)

arXiv.org Machine Learning

1901.08987

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Where and When to Look? Spatio-temporal Attention for Action Recognition in Videos

Meng, Lili, Zhao, Bo, Chang, Bo, Huang, Gao, Tung, Frederick, Sigal, Leonid

arXiv.org Machine LearningOct-1-2018

Inspired by the observation that humans are able to process videos efficiently by only paying attention when and where it is needed, we propose a novel spatial-temporal attention mechanism for video-based action recognition. For spatial attention, we learn a saliency mask to allow the model to focus on the most salient parts of the feature maps. For temporal attention, we employ a soft temporal attention mechanism to identify the most relevant frames from an input video. Further, we propose a set of regularizers that ensure that our attention mechanism attends to coherent regions in space and time. Our model is efficient, as it proposes a separable spatio-temporal mechanism for video attention, while being able to identify important parts of the video both spatially and temporally. We demonstrate the efficacy of our approach on three public video action recognition datasets. The proposed approach leads to state-of-the-art performance on all of them, including the new large-scale Moments in Time dataset. Furthermore, we quantitatively and qualitatively evaluate our model's ability to accurately localize discriminative regions spatially and critical frames temporally. This is despite our model only being trained with per video classification labels.

attention mechanism, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

1810.04511

Country: North America (0.14)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Sports (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Reversible Architectures for Arbitrarily Deep Residual Neural Networks

Chang, Bo (University of British Columbia, Xtract Technologies Inc.) | Meng, Lili (University of British Columbia, Xtract Technologies Inc.) | Haber, Eldad (University of British Columbia, Xtract Technologies Inc.) | Ruthotto, Lars (Emory University, Xtract Technologies Inc.) | Begert, David (Xtract Technologies Inc.) | Holtham, Elliot (Xtract Technologies Inc.)

AAAI ConferencesFeb-8-2018

Recently, deep residual networks have been successfully applied in many computer vision and natural language processing tasks, pushing the state-of-the-art performance with deeper and wider architectures. In this work, we interpret deep residual networks as ordinary differential equations (ODEs), which have long been studied in mathematics and physics with rich theoretical and empirical success. From this interpretation, we develop a theoretical framework on stability and reversibility of deep neural networks, and derive three reversible neural network architectures that can go arbitrarily deep in theory. The reversibility property allows a memory-efficient implementation, which does not need to store the activations for most hidden layers. Together with the stability of our architectures, this enables training deeper networks using only modest computational resources. We provide both theoretical analyses and empirical results. Experimental results demonstrate the efficacy of our architectures against several strong baselines on CIFAR-10, CIFAR-100 and STL-10 with superior or on-par state-of-the-art performance. Furthermore, we show our architectures yield superior results when trained using fewer training data.

architecture, deep learning, neural network, (18 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > Canada > British Columbia (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multi-level Residual Networks from Dynamical Systems View

Chang, Bo, Meng, Lili, Haber, Eldad, Tung, Frederick, Begert, David

arXiv.org Machine LearningFeb-1-2018

Deep residual networks (ResNets) and their variants are widely used in many computer vision applications and natural language processing tasks. However, the theoretical principles for designing and training ResNets are still not fully understood. Recently, several points of view have emerged to try to interpret ResNet theoretically, such as unraveled view, unrolled iterative estimation and dynamical systems view. In this paper, we adopt the dynamical systems point of view, and analyze the lesioning properties of ResNet both theoretically and experimentally. Based on these analyses, we additionally propose a novel method for accelerating ResNet training. We apply the proposed method to train ResNets and Wide ResNets for three image classification benchmarks, reducing training time by more than 40% with superior or on-par accuracy.

deep learning, neural network, resnet, (19 more...)

arXiv.org Machine Learning

1710.10348

Country: North America > Canada > British Columbia > Metro Vancouver Regional District (0.14)

Genre: Research Report (0.84)

Technology: