AITopics | Wang, Junhao

Plotting

Wang, Junhao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Feature-EndoGaussian: Feature Distilled Gaussian Splatting in Surgical Deformable Scene Reconstruction

Li, Kai, Wang, Junhao, Han, William, Zhao, Ding

arXiv.org Artificial IntelligenceMar-8-2025

Minimally invasive surgery (MIS) has transformed clinical practice by reducing recovery times, minimizing complications, and enhancing precision. Nonetheless, MIS inherently relies on indirect visualization and precise instrument control, posing unique challenges. Recent advances in artificial intelligence have enabled real-time surgical scene understanding through techniques such as image classification, object detection, and segmentation, with scene reconstruction emerging as a key element for enhanced intraoperative guidance. Although neural radiance fields (NeRFs) have been explored for this purpose, their substantial data requirements and slow rendering inhibit real-time performance. In contrast, 3D Gaussian Splatting (3DGS) offers a more efficient alternative, achieving state-of-the-art performance in dynamic surgical scene reconstruction. In this work, we introduce Feature-EndoGaussian (FEG), an extension of 3DGS that integrates 2D segmentation cues into 3D rendering to enable real-time semantic and scene reconstruction. By leveraging pretrained segmentation foundation models, FEG incorporates semantic feature distillation within the Gaussian deformation framework, thereby enhancing both reconstruction fidelity and segmentation accuracy. On the EndoNeRF dataset, FEG achieves superior performance (SSIM of 0.97, PSNR of 39.08, and LPIPS of 0.03) compared to leading methods. Additionally, on the EndoVis18 dataset, FEG demonstrates competitive class-wise segmentation metrics while balancing model size and real-time performance.

artificial intelligence, image understanding, segmentation, (16 more...)

arXiv.org Artificial Intelligence

2503.06161

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Surgery (0.93)
Health & Medicine > Diagnostic Medicine > Imaging (0.47)

Technology: Information Technology > Artificial Intelligence > Vision > Image Understanding (0.66)

Add feedback

Skeleton-Guided-Translation: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation

Zhang, Xing, Wen, Jiaheng, Yang, Fangkai, Zhao, Pu, Kang, Yu, Wang, Junhao, Wang, Maoquan, Huang, Yufan, Nallipogu, Elsie, Lin, Qingwei, Dang, Yingnong, Rajmohan, Saravan, Zhang, Dongmei, Zhang, Qi

arXiv.org Artificial IntelligenceJan-27-2025

The advancement of large language models has intensified the need to modernize enterprise applications and migrate legacy systems to secure, versatile languages. However, existing code translation benchmarks primarily focus on individual functions, overlooking the complexities involved in translating entire repositories, such as maintaining inter-module coherence and managing dependencies. While some recent repository-level translation benchmarks attempt to address these challenges, they still face limitations, including poor maintainability and overly coarse evaluation granularity, which make them less developer-friendly. We introduce Skeleton-Guided-Translation, a framework for repository-level Java to C# code translation with fine-grained quality evaluation. It uses a two-step process: first translating the repository's structural "skeletons", then translating the full repository guided by these skeletons. Building on this, we present TRANSREPO-BENCH, a benchmark of high quality open-source Java repositories and their corresponding C# skeletons, including matching unit tests and build configurations. Our unit tests are fixed and can be applied across multiple or incremental translations without manual adjustments, enhancing automation and scalability in evaluations. Additionally, we develop fine-grained evaluation metrics that assess translation quality at the individual test case level, addressing traditional binary metrics' inability to distinguish when build failures cause all tests to fail. Evaluations using TRANSREPO-BENCH highlight key challenges and advance more accurate repository level code translation.

large language model, machine learning, translation, (20 more...)

arXiv.org Artificial Intelligence

2501.1605

Country: North America > Canada (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

DI-BENCH: Benchmarking Large Language Models on Dependency Inference with Testable Repositories at Scale

Zhang, Linghao, Wang, Junhao, He, Shilin, Zhang, Chaoyun, Kang, Yu, Li, Bowen, Wen, Jiaheng, Xie, Chengxing, Wang, Maoquan, Huang, Yufan, Nallipogu, Elsie, Lin, Qingwei, Dang, Yingnong, Rajmohan, Saravan, Zhang, Dongmei, Zhang, Qi

arXiv.org Artificial IntelligenceJan-23-2025

Large Language Models have advanced automated software development, however, it remains a challenge to correctly infer dependencies, namely, identifying the internal components and external packages required for a repository to successfully run. Existing studies highlight that dependency-related issues cause over 40\% of observed runtime errors on the generated repository. To address this, we introduce DI-BENCH, a large-scale benchmark and evaluation framework specifically designed to assess LLMs' capability on dependency inference. The benchmark features 581 repositories with testing environments across Python, C#, Rust, and JavaScript. Extensive experiments with textual and execution-based metrics reveal that the current best-performing model achieves only a 42.9% execution pass rate, indicating significant room for improvement. DI-BENCH establishes a new viewpoint for evaluating LLM performance on repositories, paving the way for more robust end-to-end software synthesis.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.13699

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

SweetTokenizer: Semantic-Aware Spatial-Temporal Tokenizer for Compact Visual Discretization

Tan, Zhentao, Xue, Ben, Jia, Jian, Wang, Junhao, Ye, Wencai, Shi, Shaoyun, Sun, Mingjie, Wu, Wenjin, Chen, Quan, Jiang, Peng

arXiv.org Artificial IntelligenceDec-16-2024

This paper presents the \textbf{S}emantic-a\textbf{W}ar\textbf{E} spatial-t\textbf{E}mporal \textbf{T}okenizer (SweetTokenizer), a compact yet effective discretization approach for vision data. Our goal is to boost tokenizers' compression ratio while maintaining reconstruction fidelity in the VQ-VAE paradigm. Firstly, to obtain compact latent representations, we decouple images or videos into spatial-temporal dimensions, translating visual information into learnable querying spatial and temporal tokens through a \textbf{C}ross-attention \textbf{Q}uery \textbf{A}uto\textbf{E}ncoder (CQAE). Secondly, to complement visual information during compression, we quantize these tokens via a specialized codebook derived from off-the-shelf LLM embeddings to leverage the rich semantics from language modality. Finally, to enhance training stability and convergence, we also introduce a curriculum learning strategy, which proves critical for effective discrete visual representation learning. SweetTokenizer achieves comparable video reconstruction fidelity with only \textbf{25\%} of the tokens used in previous state-of-the-art video tokenizers, and boost video generation results by \textbf{32.9\%} w.r.t gFVD. When using the same token number, we significantly improves video and image reconstruction results by \textbf{57.1\%} w.r.t rFVD on UCF-101 and \textbf{37.2\%} w.r.t rFID on ImageNet-1K. Additionally, the compressed tokens are imbued with semantic information, enabling few-shot recognition capabilities powered by LLMs in downstream applications.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.10443

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

MCVO: A Generic Visual Odometry for Arbitrarily Arranged Multi-Cameras

Yu, Huai, Wang, Junhao, He, Yao, Yang, Wen, Xia, Gui-Song

arXiv.org Artificial IntelligenceDec-4-2024

Making multi-camera visual SLAM systems easier to set up and more robust to the environment is always one of the focuses of vision robots. Existing monocular and binocular vision SLAM systems have narrow FoV and are fragile in textureless environments with degenerated accuracy and limited robustness. Thus multi-camera SLAM systems are gaining attention because they can provide redundancy for texture degeneration with wide FoV. However, current multi-camera SLAM systems face massive data processing pressure and elaborately designed camera configurations, leading to estimation failures for arbitrarily arranged multi-camera systems. To address these problems, we propose a generic visual odometry for arbitrarily arranged multi-cameras, which can achieve metric-scale state estimation with high flexibility in the cameras' arrangement. Specifically, we first design a learning-based feature extraction and tracking framework to shift the pressure of CPU processing of multiple video streams. Then we use the rigid constraints between cameras to estimate the metric scale poses for robust SLAM system initialization. Finally, we fuse the features of the multi-cameras in the SLAM back-end to achieve robust pose estimation and online scale optimization. Additionally, multi-camera features help improve the loop detection for pose graph optimization. Experiments on KITTI-360 and MultiCamData datasets validate the robustness of our method over arbitrarily placed cameras. Compared with other stereo and multi-camera visual SLAM systems, our method obtains higher pose estimation accuracy with better generalization ability. Our codes and online demos are available at \url{https://github.com/JunhaoWang615/MCVO}

artificial intelligence, estimation, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.03146

Country:

Asia (0.28)
North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.40)

Industry: Information Technology (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

A General Framework on Enhancing Portfolio Management with Reinforcement Learning

Li, Yinheng, Wang, Junhao, Cao, Yijie

arXiv.org Artificial IntelligenceOct-27-2023

Portfolio management is the art and science in fiance that concerns continuous reallocation of funds and assets across financial instruments to meet the desired returns to risk profile. Deep reinforcement learning (RL) has gained increasing interest in portfolio management, where RL agents are trained base on financial data to optimize the asset reallocation process. Though there are prior efforts in trying to combine RL and portfolio management, previous works did not consider practical aspects such as transaction costs or short selling restrictions, limiting their applicability. To address these limitations, we propose a general RL framework for asset management that enables continuous asset weights, short selling and making decisions with relevant features. We compare the performance of three different RL algorithms: Policy Gradient with Actor-Critic (PGAC), Proximal Policy Optimization (PPO), and Evolution Strategies (ES) and demonstrate their advantages in a simulated environment with transaction costs. Our work aims to provide more options for utilizing RL frameworks in real-life asset management scenarios and can benefit further research in financial applications.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1911.1188

Country:

Asia (0.16)
North America > United States (0.14)

Genre: Research Report (0.64)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Tight Memory-Regret Lower Bounds for Streaming Bandits

Li, Shaoang, Zhang, Lan, Wang, Junhao, Li, Xiang-Yang

arXiv.org Artificial IntelligenceJun-13-2023

In this paper, we investigate the streaming bandits problem, wherein the learner aims to minimize regret by dealing with online arriving arms and sublinear arm memory. We establish the tight worst-case regret lower bound of $\Omega \left( (TB)^{\alpha} K^{1-\alpha}\right), \alpha = 2^{B} / (2^{B+1}-1)$ for any algorithm with a time horizon $T$, number of arms $K$, and number of passes $B$. The result reveals a separation between the stochastic bandits problem in the classical centralized setting and the streaming setting with bounded arm memory. Notably, in comparison to the well-known $\Omega(\sqrt{KT})$ lower bound, an additional double logarithmic factor is unavoidable for any streaming bandits algorithm with sublinear memory permitted. Furthermore, we establish the first instance-dependent lower bound of $\Omega \left(T^{1/(B+1)} \sum_{\Delta_x>0} \frac{\mu^*}{\Delta_x}\right)$ for streaming bandits. These lower bounds are derived through a unique reduction from the regret-minimization setting to the sample complexity analysis for a sequence of $\epsilon$-optimal arms identification tasks, which maybe of independent interest. To complement the lower bound, we also provide a multi-pass algorithm that achieves a regret upper bound of $\tilde{O} \left( (TB)^{\alpha} K^{1 - \alpha}\right)$ using constant arm memory.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2306.07903

Country: Asia (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Self-supervised learning for infant cry analysis

Gorin, Arsenii, Subakan, Cem, Abdoli, Sajjad, Wang, Junhao, Latremouille, Samantha, Onu, Charles

arXiv.org Artificial IntelligenceMay-2-2023

In this paper, we explore self-supervised learning (SSL) for analyzing a first-of-its-kind database of cry recordings containing clinical indications of more than a thousand newborns. Specifically, we target cry-based detection of neurological injury as well as identification of cry triggers such as pain, hunger, and discomfort. Annotating a large database in the medical setting is expensive and time-consuming, typically requiring the collaboration of several experts over years. Leveraging large amounts of unlabeled audio data to learn useful representations can lower the cost of building robust models and, ultimately, clinical solutions. In this work, we experiment with self-supervised pre-training of a convolutional neural network on large audio datasets. We show that pre-training with SSL contrastive loss (SimCLR) performs significantly better than supervised pre-training for both neuro injury and cry triggers. In addition, we demonstrate further performance gains through SSL-based domain adaptation using unlabeled infant cries. We also show that using such SSL-based pre-training for adaptation to cry sounds decreases the need for labeled data of the overall system.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2305.01578

Country:

Africa > Nigeria (0.47)
North America > Canada > Quebec (0.29)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Avoidance Learning Using Observational Reinforcement Learning

Venuto, David, Boussioux, Leonard, Wang, Junhao, Dali, Rola, Chakravorty, Jhelum, Bengio, Yoshua, Precup, Doina

arXiv.org Machine LearningSep-24-2019

Imitation learning seeks to learn an expert policy from sampled demonstrations. However, in the real world, it is often difficult to find a perfect expert and avoiding dangerous behaviors becomes relevant for safety reasons. We present the idea of \textit{learning to avoid}, an objective opposite to imitation learning in some sense, where an agent learns to avoid a demonstrator policy given an environment. We define avoidance learning as the process of optimizing the agent's reward while avoiding dangerous behaviors given by a demonstrator. In this work we develop a framework of avoidance learning by defining a suitable objective function for these problems which involves the \emph{distance} of state occupancy distributions of the expert and demonstrator policies. We use density estimates for state occupancy measures and use the aforementioned distance as the reward bonus for avoiding the demonstrator. We validate our theory with experiments using a wide range of partially observable environments. Experimental results show that we are able to improve sample efficiency during training compared to state of the art policy optimization and safety methods.

agent, artificial intelligence, optimization problem, (17 more...)

arXiv.org Machine Learning

1909.11228

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Anomaly Detection with Joint Representation Learning of Content and Connection

Wang, Junhao, Wang, Renhao, Kulshrestha, Aayushi, Rabbany, Reihaneh

arXiv.org Machine LearningJun-16-2019

Social media sites are becoming a key factor in politics. These platforms are easy to manipulate for the purpose of distorting information space to confuse and distract voters. Past works to identify disruptive patterns are mostly focused on analyzing the content of tweets. In this study, we jointly embed the information from both user posted content as well as a user's follower network, to detect groups of densely connected users in an unsupervised fashion. We then investigate these dense sub-blocks of users to flag anomalous behavior. In our experiments, we study the tweets related to the upcoming 2019 Canadian Elections, and observe a set of densely-connected users engaging in local politics in different provinces, and exhibiting troll-like behavior.

fingerprint, social media, us government, (17 more...)

arXiv.org Machine Learning

1906.12328

Country:

North America > United States (1.00)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report > New Finding (0.54)

Industry:

Information Technology > Services (0.69)
Government > Voting & Elections (0.69)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.72)

Add feedback