AITopics | Chen, Jun

Collaborating Authors

Chen, Jun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Constrained Gaussian Wasserstein Optimal Transport with Commutative Covariance Matrices

Chen, Jun, Wang, Jia, Li, Ruibin, Zhou, Han, Dong, Wei, Liu, Huan, Yu, Yuanhao

arXiv.org Artificial IntelligenceMar-5-2025

Optimal transport has found widespread applications in signal processing and machine learning. Among its many equivalent formulations, optimal transport seeks to reconstruct a random variable/vector with a prescribed distribution at the destination while minimizing the expected distortion relative to a given random variable/vector at the source. However, in practice, certain constraints may render the optimal transport plan infeasible. In this work, we consider three types of constraints: rate constraints, dimension constraints, and channel constraints, motivated by perception-aware lossy compression, generative principal component analysis, and deep joint source-channel coding, respectively. Special attenion is given to the setting termed Gaussian Wasserstein optimal transport, where both the source and reconstruction variables are multivariate Gaussian, and the end-to-end distortion is measured by the mean squared error. We derive explicit results for the minimum achievable mean squared error under the three aforementioned constraints when the covariance matrices of the source and reconstruction variables commute.

artificial intelligence, machine learning, optimal transport, (16 more...)

arXiv.org Artificial Intelligence

2503.03744

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

On Self-Adaptive Perception Loss Function for Sequential Lossy Compression

Salehkalaibar, Sadaf, Phan, Buu, Cai, Likun, Dick, Joao Atz, Yu, Wei, Chen, Jun, Khisti, Ashish

arXiv.org Artificial IntelligenceFeb-14-2025

We consider causal, low-latency, sequential lossy compression, with mean squared-error (MSE) as the distortion loss, and a perception loss function (PLF) to enhance the realism of reconstructions. As the main contribution, we propose and analyze a new PLF that considers the joint distribution between the current source frame and the previous reconstructions. We establish the theoretical rate-distortion-perception function for first-order Markov sources and analyze the Gaussian model in detail. From a qualitative perspective, the proposed metric can simultaneously avoid the error-permanence phenomenon and also better exploit the temporal correlation between high-quality reconstructions. The proposed metric is referred to as self-adaptive perception loss function (PLF-SA), as its behavior adapts to the quality of reconstructed frames. We provide a detailed comparison of the proposed perception loss function with previous approaches through both information theoretic analysis as well as experiments involving moving MNIST and UVG datasets.

artificial intelligence, machine learning, reconstruction, (14 more...)

arXiv.org Artificial Intelligence

2502.10628

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Hyperbolic Binary Neural Network

Chen, Jun, Xiang, Jingyang, Huang, Tianxin, Zhao, Xiangrui, Liu, Yong

arXiv.org Artificial IntelligenceJan-6-2025

Binary Neural Network (BNN) converts full-precision weights and activations into their extreme 1-bit counterparts, making it particularly suitable for deployment on lightweight mobile devices. While binary neural networks are typically formulated as a constrained optimization problem and optimized in the binarized space, general neural networks are formulated as an unconstrained optimization problem and optimized in the continuous space. This paper introduces the Hyperbolic Binary Neural Network (HBNN) by leveraging the framework of hyperbolic geometry to optimize the constrained problem. Specifically, we transform the constrained problem in hyperbolic space into an unconstrained one in Euclidean space using the Riemannian exponential map. On the other hand, we also propose the Exponential Parametrization Cluster (EPC) method, which, compared to the Riemannian exponential map, shrinks the segment domain based on a diffeomorphism. This approach increases the probability of weight flips, thereby maximizing the information gain in BNNs. Experimental results on CIFAR10, CIFAR100, and ImageNet classification datasets with VGGsmall, ResNet18, and ResNet34 models illustrate the superior performance of our HBNN over state-of-the-art methods.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

2501.03471

Country: Asia > China (0.70)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents

Chen, Jun, Xu, Dannong, Fei, Junjie, Feng, Chun-Mei, Elhoseiny, Mohamed

arXiv.org Artificial IntelligenceDec-6-2024

Large multimodal models (LMMs) have achieved impressive progress in vision-language understanding, yet they face limitations in real-world applications requiring complex reasoning over a large number of images. Existing benchmarks for multi-image question-answering are limited in scope, each question is paired with only up to 30 images, which does not fully capture the demands of large-scale retrieval tasks encountered in the real-world usages. To reduce these gaps, we introduce two document haystack benchmarks, dubbed DocHaystack and InfoHaystack, designed to evaluate LMM performance on large-scale visual document retrieval and understanding. Additionally, we propose V-RAG, a novel, vision-centric retrieval-augmented generation (RAG) framework that leverages a suite of multimodal vision encoders, each optimized for specific strengths, and a dedicated question-document relevance module. V-RAG sets a new standard, with a 9% and 11% improvement in Recall@1 on the challenging DocHaystack-1000 and InfoHaystack-1000 benchmarks, respectively, compared to the previous best baseline models. Additionally, integrating V-RAG with LMMs enables them to efficiently operate across thousands of images, yielding significant improvements on our DocHaystack and InfoHaystack benchmarks. Our code and datasets are available at https://github.com/Vision-CAIR/dochaystacks

benchmark, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2411.1674

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Landing Trajectory Prediction for UAS Based on Generative Adversarial Network

Xiang, Jun, Essick, Drake, Bautista, Luiz Gonzalez, Xie, Junfei, Chen, Jun

arXiv.org Artificial IntelligenceNov-21-2024

Models for trajectory prediction are an essential component of many advanced air mobility studies. These models help aircraft detect conflict and plan avoidance maneuvers, which is especially important in Unmanned Aircraft systems (UAS) landing management due to the congested airspace near vertiports. In this paper, we propose a landing trajectory prediction model for UAS based on Generative Adversarial Network (GAN). The GAN is a prestigious neural network that has been developed for many years. In previous research, GAN has achieved many state-of-the-art results in many generation tasks. The GAN consists of one neural network generator and a neural network discriminator. Because of the learning capacity of the neural networks, the generator is capable to understand the features of the sample trajectory. The generator takes the previous trajectory as input and outputs some random status of a flight. According to the results of the experiences, the proposed model can output more accurate predictions than the baseline method(GMR) in various datasets. To evaluate the proposed model, we also create a real UAV landing dataset that includes more than 2600 trajectories of drone control manually by real pilots.

artificial intelligence, machine learning, trajectory, (15 more...)

arXiv.org Artificial Intelligence

2411.14403

Genre: Research Report (0.64)

Industry: Transportation > Air (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Convex Approximation of Probabilistic Reachable Sets from Small Samples Using Self-supervised Neural Networks

Xiang, Jun, Chen, Jun

arXiv.org Artificial IntelligenceNov-21-2024

Probabilistic Reachable Set (PRS) plays a crucial role in many fields of autonomous systems, yet efficiently generating PRS remains a significant challenge. This paper presents a learning approach to generating 2-dimensional PRS for states in a dynamic system. Traditional methods such as Hamilton-Jacobi reachability analysis, Monte Carlo, and Gaussian process classification face significant computational challenges or require detailed dynamics information, limiting their applicability in realistic situations. Existing data-driven methods may lack accuracy. To overcome these limitations, we propose leveraging neural networks, commonly used in imitation learning and computer vision, to imitate expert methods to generate PRS approximations. We trained the neural networks using a multi-label, self-supervised learning approach. We selected the fine-tuned convex approximation method as the expert to create expert PRS. Additionally, we continued sampling from the distribution to obtain a diverse array of sample sets. Given a small sample set, the trained neural networks can replicate the PRS approximation generated by the expert method, while the generation speed is much faster.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

2411.14356

Country: North America > United States (0.93)

Genre: Research Report (0.40)

Industry:

Information Technology (0.68)
Health & Medicine (0.68)
Transportation > Air (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Transformer-based Heuristic for Advanced Air Mobility Planning

Xiang, Jun, Chen, Jun

arXiv.org Artificial IntelligenceNov-21-2024

Safety is extremely important for urban flights of autonomous Unmanned Aerial Vehicles (UAVs). Risk-aware path planning is one of the most effective methods to guarantee the safety of UAVs. This type of planning can be represented as a Constrained Shortest Path (CSP) problem, which seeks to find the shortest route that meets a predefined safety constraint. Solving CSP problems is NP-hard, presenting significant computational challenges. Although traditional methods can accurately solve CSP problems, they tend to be very slow. Previously, we introduced an additional safety dimension to the traditional A* algorithm, known as ASD A*, to effectively handle Constrained Shortest Path (CSP) problems. Then, we developed a custom learning-based heuristic using transformer-based neural networks, which significantly reduced computational load and enhanced the performance of the ASD A* algorithm. In this paper, we expand our dataset to include more risk maps and tasks, improve the proposed model, and increase its performance. We also introduce a new heuristic strategy and a novel neural network, which enhance the overall effectiveness of our approach.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2411.14427

Genre: Research Report (0.64)

Industry:

Transportation (0.68)
Information Technology > Robotics & Automation (0.48)
Aerospace & Defense > Aircraft (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Minimum Entropy Coupling with Bottleneck

Ebrahimi, M. Reza, Chen, Jun, Khisti, Ashish

arXiv.org Artificial IntelligenceOct-28-2024

This paper investigates a novel lossy compression framework operating under logarithmic loss, designed to handle situations where the reconstruction distribution diverges from the source distribution. This framework is especially relevant for applications that require joint compression and retrieval, and in scenarios involving distributional shifts due to processing. We show that the proposed formulation extends the classical minimum entropy coupling framework by integrating a bottleneck, allowing for a controlled degree of stochasticity in the coupling. We explore the decomposition of the Minimum Entropy Coupling with Bottleneck (MEC-B) into two distinct optimization problems: Entropy-Bounded Information Maximization (EBIM) for the encoder, and Minimum Entropy Coupling (MEC) for the decoder. Through extensive analysis, we provide a greedy algorithm for EBIM with guaranteed performance, and characterize the optimal solution near functional mappings, yielding significant theoretical insights into the structural complexity of this problem. Furthermore, we illustrate the practical application of MEC-B through experiments in Markov Coding Games (MCGs) under rate limits. These games simulate a communication scenario within a Markov Decision Process, where an agent must transmit a compressed message from a sender to a receiver through its actions. Our experiments highlight the trade-offs between MDP rewards and receiver accuracy across various compression rates, showcasing the efficacy of our method compared to conventional compression baseline.

artificial intelligence, information, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2410.21666

Country:

North America > Canada > Ontario (0.28)
North America > United States (0.28)

Genre: Research Report > Experimental Study (0.93)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Add feedback

Data-driven Probabilistic Trajectory Learning with High Temporal Resolution in Terminal Airspace

Xiang, Jun, Chen, Jun

arXiv.org Artificial IntelligenceSep-25-2024

Predicting flight trajectories is a research area that holds significant merit. In this paper, we propose a data-driven learning framework, that leverages the predictive and feature extraction capabilities of the mixture models and seq2seq-based neural networks while addressing prevalent challenges caused by error propagation and dimensionality reduction. After training with this framework, the learned model can improve long-step prediction accuracy significantly given the past trajectories and the context information. The accuracy and effectiveness of the approach are evaluated by comparing the predicted trajectories with the ground truth. The results indicate that the proposed method has outperformed the state-of-the-art predicting methods on a terminal airspace flight trajectory dataset. The trajectories generated by the proposed method have a higher temporal resolution(1 timestep per second vs 0.1 timestep per second) and are closer to the ground truth.

data mining, machine learning, trajectory, (16 more...)

arXiv.org Artificial Intelligence

2409.17359

Country: North America > United States (0.68)

Genre: Research Report (0.82)

Industry:

Transportation > Air (1.00)
Transportation > Infrastructure & Services (0.94)
Government (0.93)
Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MiniGPT-Med: Large Language Model as a General Interface for Radiology Diagnosis

Alkhaldi, Asma, Alnajim, Raneem, Alabdullatef, Layan, Alyahya, Rawan, Chen, Jun, Zhu, Deyao, Alsinan, Ahmed, Elhoseiny, Mohamed

arXiv.org Artificial IntelligenceJul-4-2024

Recent advancements in artificial intelligence (AI) have precipitated significant breakthroughs in healthcare, particularly in refining diagnostic procedures. However, previous studies have often been constrained to limited functionalities. This study introduces MiniGPT-Med, a vision-language model derived from large-scale language models and tailored for medical applications. MiniGPT-Med demonstrates remarkable versatility across various imaging modalities, including X-rays, CT scans, and MRIs, enhancing its utility. The model is capable of performing tasks such as medical report generation, visual question answering (VQA), and disease identification within medical imagery. Its integrated processing of both image and textual clinical data markedly improves diagnostic accuracy. Our empirical assessments confirm MiniGPT-Med's superior performance in disease grounding, medical report generation, and VQA benchmarks, representing a significant step towards reducing the gap in assisting radiology practice. Furthermore, it achieves state-of-the-art performance on medical report generation, higher than the previous best model by 19\% accuracy. MiniGPT-Med promises to become a general interface for radiology diagnoses, enhancing diagnostic efficiency across a wide range of medical imaging applications.

large language model, minigpt-med, natural language, (14 more...)

arXiv.org Artificial Intelligence

2407.04106

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback