AITopics | Wang, Kaixin

Collaborating Authors

Wang, Kaixin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

$\text{M}^{\text{3}}$: A Modular World Model over Streams of Tokens

Cohen, Lior, Wang, Kaixin, Kang, Bingyi, Gadot, Uri, Mannor, Shie

arXiv.org Artificial IntelligenceFeb-17-2025

Token-based world models emerged as a promising modular framework, modeling dynamics over token streams while optimizing tokenization separately. While successful in visual environments with discrete actions (e.g., Atari games), their broader applicability remains uncertain. In this paper, we introduce $\text{M}^{\text{3}}$, a $\textbf{m}$odular $\textbf{w}$orld $\textbf{m}$odel that extends this framework, enabling flexible combinations of observation and action modalities through independent modality-specific components. $\text{M}^{\text{3}}$ integrates several improvements from existing literature to enhance agent performance. Through extensive empirical evaluation across diverse benchmarks, $\text{M}^{\text{3}}$ achieves state-of-the-art sample efficiency for planning-free world models. Notably, among these methods, it is the first to reach a human-level median score on Atari 100K, with superhuman performance on 13 games. We $\href{https://github.com/leor-c/M3}{\text{open-source our code and weights}}$.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.11537

Country:

North America > United States (0.46)
Europe > Austria (0.28)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Add feedback

Milmer: a Framework for Multiple Instance Learning based Multimodal Emotion Recognition

Wang, Zaitian, He, Jian, Liang, Yu, Hu, Xiyuan, Peng, Tianhao, Wang, Kaixin, Wang, Jiakai, Zhang, Chenlong, Zhang, Weili, Niu, Shuang, Xie, Xiaoyang

arXiv.org Artificial IntelligenceFeb-1-2025

Emotions play a crucial role in human behavior and decision-making, making emotion recognition a key area of interest in human-computer interaction (HCI). This study addresses the challenges of emotion recognition by integrating facial expression analysis with electroencephalogram (EEG) signals, introducing a novel multimodal framework-Milmer. The proposed framework employs a transformer-based fusion approach to effectively integrate visual and physiological modalities. It consists of an EEG preprocessing module, a facial feature extraction and balancing module, and a cross-modal fusion module. To enhance visual feature extraction, we fine-tune a pre-trained Swin Transformer on emotion-related datasets. Additionally, a cross-attention mechanism is introduced to balance token representation across modalities, ensuring effective feature integration. A key innovation of this work is the adoption of a multiple instance learning (MIL) approach, which extracts meaningful information from multiple facial expression images over time, capturing critical temporal dynamics often overlooked in previous studies. Extensive experiments conducted on the DEAP dataset demonstrate the superiority of the proposed framework, achieving a classification accuracy of 96.72% in the four-class emotion recognition task. Ablation studies further validate the contributions of each module, highlighting the significance of advanced feature extraction and fusion strategies in enhancing emotion recognition performance. Our code are available at https://github.com/liangyubuaa/Milmer.

artificial intelligence, emotion recognition, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2502.00547

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)

Add feedback

How Far is Video Generation from World Model: A Physical Law Perspective

Kang, Bingyi, Yue, Yang, Lu, Rui, Lin, Zhijie, Zhao, Yang, Wang, Kaixin, Huang, Gao, Feng, Jiashi

arXiv.org Artificial IntelligenceNov-4-2024

OpenAI's Sora highlights the potential of video generation for developing world models that adhere to fundamental physical laws. However, the ability of video generation models to discover such laws purely from visual data without human priors can be questioned. A world model learning the true law should give predictions robust to nuances and correctly extrapolate on unseen scenarios. In this work, we evaluate across three key scenarios: in-distribution, out-of-distribution, and combinatorial generalization. We developed a 2D simulation testbed for object movement and collisions to generate videos deterministically governed by one or more classical mechanics laws. This provides an unlimited supply of data for large-scale experimentation and enables quantitative evaluation of whether the generated videos adhere to physical laws. We trained diffusion-based video generation models to predict object movements based on initial frames. Our scaling experiments show perfect generalization within the distribution, measurable scaling behavior for combinatorial generalization, but failure in out-of-distribution scenarios. Further experiments reveal two key insights about the generalization mechanisms of these models: (1) the models fail to abstract general physical rules and instead exhibit "case-based" generalization behavior, i.e., mimicking the closest training example; (2) when generalizing to new cases, models are observed to prioritize different factors when referencing training data: color > size > velocity > shape. Our study suggests that scaling alone is insufficient for video generation models to uncover fundamental physical laws, despite its role in Sora's broader success. See our project page at https://phyworld.github.io

artificial intelligence, machine learning, video, (16 more...)

arXiv.org Artificial Intelligence

2411.02385

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG

Du, Xueying, Zheng, Geng, Wang, Kaixin, Feng, Jiayi, Deng, Wentai, Liu, Mingwei, Chen, Bihuan, Peng, Xin, Ma, Tao, Lou, Yiling

arXiv.org Artificial IntelligenceJun-19-2024

Vulnerability detection is essential for software quality assurance. In recent years, deep learning models (especially large language models) have shown promise in vulnerability detection. In this work, we propose a novel LLM-based vulnerability detection technique Vul-RAG, which leverages knowledge-level retrieval-augmented generation (RAG) framework to detect vulnerability for the given code in three phases. First, Vul-RAG constructs a vulnerability knowledge base by extracting multi-dimension knowledge via LLMs from existing CVE instances; second, for a given code snippet, Vul-RAG} retrieves the relevant vulnerability knowledge from the constructed knowledge base based on functional semantics; third, Vul-RAG leverages LLMs to check the vulnerability of the given code snippet by reasoning the presence of vulnerability causes and fixing solutions of the retrieved vulnerability knowledge. Our evaluation of Vul-RAG on our constructed benchmark PairVul shows that Vul-RAG substantially outperforms all baselines by 12.96\%/110\% relative improvement in accuracy/pairwise-accuracy. In addition, our user study shows that the vulnerability knowledge generated by Vul-RAG can serve as high-quality explanations which can improve the manual detection accuracy from 0.60 to 0.77.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2406.11147

Country:

Europe (0.68)
North America > United States (0.28)
Asia > Middle East > UAE (0.14)
Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Token-Based World Models with Parallel Observation Prediction

Cohen, Lior, Wang, Kaixin, Kang, Bingyi, Mannor, Shie

arXiv.org Artificial IntelligenceFeb-8-2024

Motivated by the success of Transformers when applied to sequences of discrete symbols, token-based world models (TBWMs) were recently proposed as sample-efficient methods. In TBWMs, the world model consumes agent experience as a language-like sequence of tokens, where each observation constitutes a sub-sequence. However, during imagination, the sequential token-by-token generation of next observations results in a severe bottleneck, leading to long training times, poor GPU utilization, and limited representations. To resolve this bottleneck, we devise a novel Parallel Observation Prediction (POP) mechanism. POP augments a Retentive Network (RetNet) with a novel forward mode tailored to our reinforcement learning setting. We incorporate POP in a novel TBWM agent named REM (Retentive Environment Model), showcasing a 15.4x faster imagination compared to prior TBWMs. REM attains superhuman performance on 12 out of 26 games of the Atari 100K benchmark, while training in less than 12 hours. Our code is available at \url{https://github.com/leor-c/REM}.

large language model, machine learning, reinforcement learning, (22 more...)

arXiv.org Artificial Intelligence

2402.05643

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Materials > Chemicals > Industrial Gases > Liquified Gas (0.46)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.46)
Energy > Oil & Gas > Midstream (0.46)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
(3 more...)

Add feedback

C-Procgen: Empowering Procgen with Controllable Contexts

Tan, Zhenxiong, Wang, Kaixin, Wang, Xinchao

arXiv.org Artificial IntelligenceNov-13-2023

C-Procgen provides access to over 200 unique game contexts across 16 games. It allows for detailed configuration of environments, ranging from game mechanics to agent attributes. This makes the procedural generation process, previously a black-box in Procgen, more transparent and adaptable for various research needs. The upgrade enhances dynamic context management and individualized assignments, while maintaining computational efficiency. C-Procgen's controllable contexts make it applicable in diverse reinforcement learning research areas, such as learning dynamics analysis, curriculum learning, and transfer learning. We believe that C-Procgen will fill a gap in the current literature and offer a valuable toolkit for future works.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2311.07312

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.77)

Add feedback

Policy Gradient for Reinforcement Learning with General Utilities

Kumar, Navdeep, Wang, Kaixin, Levy, Kfir, Mannor, Shie

arXiv.org Artificial IntelligenceAug-29-2023

In Reinforcement Learning (RL), the goal of agents is to discover an optimal policy that maximizes the expected cumulative rewards. This objective may also be viewed as finding a policy that optimizes a linear function of its state-action occupancy measure, hereafter referred as Linear RL. However, many supervised and unsupervised RL problems are not covered in the Linear RL framework, such as apprenticeship learning, pure exploration and variational intrinsic control, where the objectives are non-linear functions of the occupancy measures. RL with non-linear utilities looks unwieldy, as methods like Bellman equation, value iteration, policy gradient, dynamic programming that had tremendous success in Linear RL, fail to trivially generalize. In this paper, we derive the policy gradient theorem for RL with general utilities. The policy gradient theorem proves to be a cornerstone in Linear RL due to its elegance and ease of implementability. Our policy gradient theorem for RL with general utilities shares the same elegance and ease of implementability. Based on the policy gradient theorem derived, we also present a simple sample-based algorithm. We believe our results will be of interest to the community and offer inspiration to future works in this generalized setting.

linear rl, machine learning, reinforcement learning, (10 more...)

arXiv.org Artificial Intelligence

2210.00991

Country: North America > United States (0.28)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation

Du, Xueying, Liu, Mingwei, Wang, Kaixin, Wang, Hanlin, Liu, Junwei, Chen, Yixuan, Feng, Jiayi, Sha, Chaofeng, Peng, Xin, Lou, Yiling

arXiv.org Artificial IntelligenceAug-14-2023

In this work, we make the first attempt to evaluate LLMs in a more challenging code generation scenario, i.e. class-level code generation. We first manually construct the first class-level code generation benchmark ClassEval of 100 class-level Python code generation tasks with approximately 500 person-hours. Based on it, we then perform the first study of 11 state-of-the-art LLMs on class-level code generation. Based on our results, we have the following main findings. First, we find that all existing LLMs show much worse performance on class-level code generation compared to on standalone method-level code generation benchmarks like HumanEval; and the method-level coding ability cannot equivalently reflect the class-level coding ability among LLMs. Second, we find that GPT-4 and GPT-3.5 still exhibit dominate superior than other LLMs on class-level code generation, and the second-tier models includes Instruct-Starcoder, Instruct-Codegen, and Wizardcoder with very similar performance. Third, we find that generating the entire class all at once (i.e. holistic generation strategy) is the best generation strategy only for GPT-4 and GPT-3.5, while method-by-method generation (i.e. incremental and compositional) is better strategies for the other models with limited ability of understanding long instructions and utilizing the middle information. Lastly, we find the limited model ability of generating method-dependent code and discuss the frequent error types in generated classes. Our benchmark is available at https://github.com/FudanSELab/ClassEval.

code generation, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2308.01861

Country:

Europe (0.67)
Africa (0.46)
North America (0.46)
South America > Brazil > Rio de Janeiro (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Robust Reinforcement Learning via Adversarial Kernel Approximation

Wang, Kaixin, Gadot, Uri, Kumar, Navdeep, Levy, Kfir, Mannor, Shie

arXiv.org Artificial IntelligenceJun-9-2023

In reinforcement learning (RL), we are concerned with learning good policies for sequential decisionmaking problems modeled as Markov Decision Processes (MDPs) [29, 35]. MDPs assume that the transition model of the environment is fixed across training and testing, but this is often violated in practical applications. For example, when deploying a simulator-trained robot in reality, a notable challenge is the substantial disparity between the simulated environment and the intricate complexities of the real world, leading to potential subpar performance upon deployment. Such a mismatch may significantly degrade the performance of the trained policy in testing. To deal with this issue, the robust MDP (RMDP) framework has been introduced in [16, 24, 44], aiming to learn policies that are robust to perturbation of the transition model within an uncertainty set.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2306.05859

Country: North America > United States > Arizona (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

An Efficient Solution to s-Rectangular Robust Markov Decision Processes

Kumar, Navdeep, Levy, Kfir, Wang, Kaixin, Mannor, Shie

arXiv.org Artificial IntelligenceJan-31-2023

In Markov Decision Processes (MDPs), an agent interacts with the environment and learns to optimally behave in it [28]. However, the MDP solution may be very sensitive to little changes in the model parameters [23]. Hence we should be cautious applying the solution of the MDP, when the model is changing or when there is uncertainty in the model parameters. Robust MDPs provide a way to address this issue, where an agent can learn to optimally behave even when the model parameters are uncertain [15, 29, 18]. Another motivation to study robust MDPs is that they can lead to better generalization [33, 34, 25] compared to non-robust solutions.

artificial intelligence, iteration, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2301.13642

Country: North America > United States (0.47)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback