AITopics | Wang, Jing

Plotting

Wang, Jing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Vision-based Tactile Image Generation via Contact Condition-guided Diffusion Model

Lin, Xi, Xu, Weiliang, Mao, Yixian, Wang, Jing, Lv, Meixuan, Liu, Lu, Luo, Xihui, Li, Xinming

arXiv.org Artificial IntelligenceDec-2-2024

Vision-based tactile sensors, through high-resolution optical measurements, can effectively perceive the geometric shape of objects and the force information during the contact process, thus helping robots acquire higher-dimensional tactile data. Vision-based tactile sensor simulation supports the acquisition and understanding of tactile information without physical sensors by accurately capturing and analyzing contact behavior and physical properties. However, the complexity of contact dynamics and lighting modeling limits the accurate reproduction of real sensor responses in simulations, making it difficult to meet the needs of different sensor setups and affecting the reliability and effectiveness of strategy transfer to practical applications. In this letter, we propose a contact-condition guided diffusion model that maps RGB images of objects and contact force data to high-fidelity, detail-rich vision-based tactile sensor images. Evaluations show that the three-channel tactile images generated by this method achieve a 60.58% reduction in mean squared error and a 38.1% reduction in marker displacement error compared to existing approaches based on lighting model and mechanical model, validating the effectiveness of our approach. The method is successfully applied to various types of tactile vision sensors and can effectively generate corresponding tactile images under complex loads. Additionally, it demonstrates outstanding reconstruction of fine texture features of objects in a Montessori tactile board texture generation task.

artificial intelligence, machine learning, tactile image, (16 more...)

arXiv.org Artificial Intelligence

2412.01639

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.83)

Add feedback

Act Now: A Novel Online Forecasting Framework for Large-Scale Streaming Data

Liang, Daojun, Zhang, Haixia, Wang, Jing, Yuan, Dongfeng, Zhang, Minggao

arXiv.org Artificial IntelligenceNov-27-2024

In this paper, we find that existing online forecasting methods have the following issues: 1) They do not consider the update frequency of streaming data and directly use labels (future signals) to update the model, leading to information leakage. 2) Eliminating information leakage can exacerbate concept drift and online parameter updates can damage prediction accuracy. 3) Leaving out a validation set cuts off the model's continued learning. 4) Existing GPU devices cannot support online learning of large-scale streaming data. To address the above issues, we propose a novel online learning framework, Act-Now, to improve the online prediction on large-scale streaming data. Firstly, we introduce a Random Subgraph Sampling (RSS) algorithm designed to enable efficient model training. Then, we design a Fast Stream Buffer (FSB) and a Slow Stream Buffer (SSB) to update the model online. FSB updates the model immediately with the consistent pseudo- and partial labels to avoid information leakage. SSB updates the model in parallel using complete labels from earlier times. Further, to address concept drift, we propose a Label Decomposition model (Lade) with statistical and normalization flows. Lade forecasts both the statistical variations and the normalized future values of the data, integrating them through a combiner to produce the final predictions. Finally, we propose to perform online updates on the validation set to ensure the consistency of model learning on streaming data. Extensive experiments demonstrate that the proposed Act-Now framework performs well on large-scale streaming data, with an average 28.4% and 19.5% performance improvement, respectively. Experiments can be reproduced via https://github.com/Anoise/Act-Now.

artificial intelligence, forecasting, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2412.00108

Country: North America > United States > Maryland (0.14)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > Continuing Education (0.34)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Redefining in Dictionary: Towards an Enhanced Semantic Understanding of Creative Generation

Feng, Fu, Xie, Yucheng, Yang, Xu, Wang, Jing, Geng, Xin

arXiv.org Artificial IntelligenceNov-20-2024

Given the challenge atively generated using . Furthermore, this that diffusion models face in directly generating creativity, meta-creativity enables direct concept combinations without existing methods typically rely on synthesizing reference requiring additional training, much like generating "a prompts or images to achieve creative effects. This significantly reduces both time and computational instance, to combine "Lettuce" and "Mantis" creatively, complexity compared to state-of-the-art (SOTA) ConceptLab [43] merges tokens representing these concepts creative generation methods, such as ConceptLab [43] (4s into a new composite token, while BASS [22] uses predefined vs. 120s per image, 30 speedup) and BASS [22] (4s vs. sampling rules to search for creative outcomes from a 2400s per image, 600 speedup), while maintaining linguistic large pool of candidate images. Further each generation, which leads to high computational costs evaluations using GPT-4o [1] and user studies indicate superior and limited practicality for online applications. In contrast, performance of CreTok in terms of integration, originality, "a blue banana" can be generated directly without additional and aesthetics, underscoring its effectiveness in fostering training, due to its clear and concrete semantics, especially combinatorial creativity. Inspired by this, we may Our contributions are as follows: (1) We propose Cre-ask: Can we awaken the creativity of diffusion models by Tok, a method designed to enhance models' meta-ability enhancing their semantic understanding of "creative"? To by enabling a enhanced understanding of abstract and ambiguous achieve this, we propose CreTok, which redefines "creative" adjectives (e.g., "creative" or "beautiful") through as a new specialized token, , allowing it their redefinition as new tokens. This redefinition we redefine the abstract term "creative" within our proposed enhances the model's semantic understanding for CangJie dataset for the TP2O task, and introduce combinatorial creativity, as shown in Figure 1c. Specifically, text-to-image (T2I) models and creative generation methods CreTok builds on the definition of "creativity" from in terms of computational complexity, human preference the TP2O task [22] for combinatorial object generation, ratings, text-image alignment, and other key metrics. ") and human-like creativity, a critical yet underexplored aspect an adaptive prompt (e.g., "A photo of a mixture"). of AI research [28, 29].

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.2416

Genre:

Research Report > Promising Solution (0.46)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Large-scale cross-modality pretrained model enhances cardiovascular state estimation and cardiomyopathy detection from electrocardiograms: An AI system development and multi-center validation study

Ding, Zhengyao, Hu, Yujian, Xu, Youyao, Zhao, Chengchen, Li, Ziyu, Mao, Yiheng, Li, Haitao, Li, Qian, Wang, Jing, Chen, Yue, Chen, Mengjia, Wang, Longbo, Chu, Xuesen, Pan, Weichao, Liu, Ziyi, Wu, Fei, Zhang, Hongkun, Chen, Ting, Huang, Zhengxing

arXiv.org Artificial IntelligenceNov-19-2024

Cardiovascular diseases (CVDs) present significant challenges for early and accurate diagnosis. While cardiac magnetic resonance imaging (CMR) is the gold standard for assessing cardiac function and diagnosing CVDs, its high cost and technical complexity limit accessibility. In contrast, electrocardiography (ECG) offers promise for large-scale early screening. This study introduces CardiacNets, an innovative model that enhances ECG analysis by leveraging the diagnostic strengths of CMR through cross-modal contrastive learning and generative pretraining. CardiacNets serves two primary functions: (1) it evaluates detailed cardiac function indicators and screens for potential CVDs, including coronary artery disease, cardiomyopathy, pericarditis, heart failure and pulmonary hypertension, using ECG input; and (2) it enhances interpretability by generating high-quality CMR images from ECG data. We train and validate the proposed CardiacNets on two large-scale public datasets (the UK Biobank with 41,519 individuals and the MIMIC-IV-ECG comprising 501,172 samples) as well as three private datasets (FAHZU with 410 individuals, SAHZU with 464 individuals, and QPH with 338 individuals), and the findings demonstrate that CardiacNets consistently outperforms traditional ECG-only models, substantially improving screening accuracy. Furthermore, the generated CMR images provide valuable diagnostic support for physicians of all experience levels. This proof-of-concept study highlights how ECG can facilitate cross-modal insights into cardiac function assessment, paving the way for enhanced CVD screening and diagnosis at a population level.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2411.13602

Country: Asia > China > Zhejiang Province (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference

Tang, Peng, Liu, Jiacheng, Hou, Xiaofeng, Pu, Yifei, Wang, Jing, Heng, Pheng-Ann, Li, Chao, Guo, Minyi

arXiv.org Artificial IntelligenceNov-5-2024

The Mixture-of-Experts (MoE) architecture has demonstrated significant advantages in the era of Large Language Models (LLMs), offering enhanced capabilities with reduced inference costs. However, deploying MoE-based LLMs on memoryconstrained edge devices remains challenging due to their substantial memory requirements. While existing expertoffloading methods alleviate the memory requirements, they often incur significant expert-loading costs or compromise model accuracy. We present HOBBIT, a mixed precision expert offloading system to enable flexible and efficient MoE inference. Our key insight is that dynamically replacing less critical cache-miss experts with low precision versions can substantially reduce expert-loading latency while preserving model accuracy. HOBBIT introduces three innovative techniques that map the natural hierarchy of MoE computation: (1) a token-level dynamic expert loading mechanism, (2) a layer-level adaptive expert prefetching technique, and (3) a sequence-level multidimensional expert caching policy. These innovations fully leverage the benefits of mixedprecision expert inference. By implementing HOBBIT on top of the renowned LLM inference framework Llama.cpp, we evaluate its performance across different edge devices with representative MoE models. The results demonstrate that HOBBIT achieves up to a 9.93x speedup in decoding compared to state-of-the-art MoE offloading systems.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2411.01433

Country:

Asia > China (0.28)
North America > United States (0.17)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.48)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Towards Better Performance in Incomplete LDL: Addressing Data Imbalance

Kou, Zhiqiang, Xuan, Haoyuan, Wang, Jing, Jia, Yuheng, Geng, Xin

arXiv.org Artificial IntelligenceOct-17-2024

Label Distribution Learning (LDL) is a novel machine learning paradigm that addresses the problem of label ambiguity and has found widespread applications. Obtaining complete label distributions in real-world scenarios is challenging, which has led to the emergence of Incomplete Label Distribution Learning (InLDL). However, the existing InLDL methods overlook a crucial aspect of LDL data: the inherent imbalance in label distributions. To address this limitation, we propose \textbf{Incomplete and Imbalance Label Distribution Learning (I$^2$LDL)}, a framework that simultaneously handles incomplete labels and imbalanced label distributions. Our method decomposes the label distribution matrix into a low-rank component for frequent labels and a sparse component for rare labels, effectively capturing the structure of both head and tail labels. We optimize the model using the Alternating Direction Method of Multipliers (ADMM) and derive generalization error bounds via Rademacher complexity, providing strong theoretical guarantees. Extensive experiments on 15 real-world datasets demonstrate the effectiveness and robustness of our proposed framework compared to existing InLDL methods.

artificial intelligence, label distribution, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2410.13579

Country: Asia > China (0.94)

Genre: Research Report (0.50)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.93)

Add feedback

DiffFluid: Plain Diffusion Models are Effective Predictors of Flow Dynamics

Luo, Dongyu, Wu, Jianyu, Wang, Jing, Xie, Hairun, Yue, Xiangyu, Tang, Shixiang

arXiv.org Artificial IntelligenceSep-20-2024

We showcase the plain diffusion models with Transformers are effective predictors of fluid dynamics under various working conditions, e.g., Darcy flow and high Reynolds number. Unlike traditional fluid dynamical solvers that depend on complex architectures to extract intricate correlations and learn underlying physical states, our approach formulates the prediction of flow dynamics as the image translation problem and accordingly leverage the plain diffusion model to tackle the problem. This reduction in model design complexity does not compromise its ability to capture complex physical states and geometric features of fluid dynamical equations, leading to high-precision solutions. In preliminary tests on various fluid-related benchmarks, our DiffFluid achieves consistent state-of-the-art performance, particularly in solving the Navier-Stokes equations in fluid dynamics, with a relative precision improvement of +44.8%. In addition, we achieved relative improvements of +14.0% and +11.3% in the Darcy flow equation and the airfoil problem with Euler's equation, respectively. Code will be released at https://github.com/DongyuLUO/DiffFluid upon acceptance.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2409.13665

Country:

Asia > China (0.28)
Europe (0.28)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

WAVE: Weight Template for Adaptive Initialization of Variable-sized Models

Feng, Fu, Xie, Yucheng, Wang, Jing, Geng, Xin

arXiv.org Artificial IntelligenceJul-15-2024

The expansion of model parameters underscores the significance of pre-trained models; however, the constraints encountered during model deployment necessitate models of variable sizes. Consequently, the traditional pre-training and fine-tuning paradigm fails to address the initialization problem when target models are incompatible with pre-trained models. We tackle this issue from a multitasking perspective and introduce \textbf{WAVE}, which incorporates a set of shared \textbf{W}eight templates for \textbf{A}daptive initialization of \textbf{V}ariable-siz\textbf{E}d Models. During initialization, target models will initialize the corresponding weight scalers tailored to their model size, which are sufficient to learn the connection rules of weight templates based on the Kronecker product from a limited amount of data. For the construction of the weight templates, WAVE utilizes the \textit{Learngene} framework, which structurally condenses common knowledge from ancestry models into weight templates as the learngenes through knowledge distillation. This process allows the integration of pre-trained models' knowledge into structured knowledge according to the rules of weight templates. We provide a comprehensive benchmark for the learngenes, and extensive experiments demonstrate the efficacy of WAVE. The results show that WAVE achieves state-of-the-art performance when initializing models with various depth and width, and even outperforms the direct pre-training of $n$ entire models, particularly for smaller models, saving approximately $n\times$ and $5\times$ in computational and storage resources, respectively. WAVE simultaneously achieves the most efficient knowledge transfer across a series of datasets, specifically achieving an average improvement of 1.8\% and 1.2\% on 7 downstream datasets.

artificial intelligence, machine learning, weight template, (17 more...)

arXiv.org Artificial Intelligence

2406.17503

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Static and multivariate-temporal attentive fusion transformer for readmission risk prediction

Sun, Zhe, Li, Runzhi, Wang, Jing, Chen, Gang, Yan, Siyu, Ma, Lihong

arXiv.org Artificial IntelligenceJul-14-2024

Background: Accurate short-term readmission prediction of ICU patients is significant in improving the efficiency of resource assignment by assisting physicians in making discharge decisions. Clinically, both individual static static and multivariate temporal data collected from ICU monitors play critical roles in short-term readmission prediction. Informative static and multivariate temporal feature representation capturing and fusion present challenges for accurate readmission prediction. Methods:We propose a novel static and multivariate-temporal attentive fusion transformer (SMTAFormer) to predict short-term readmission of ICU patients by fully leveraging the potential of demographic and dynamic temporal data. In SMTAFormer, we first apply an MLP network and a temporal transformer network to learn useful static and temporal feature representations, respectively. Then, the well-designed static and multivariate temporal feature fusion module is applied to fuse static and temporal feature representations by modeling intra-correlation among multivariate temporal features and constructing inter-correlation between static and multivariate temporal features. Results: We construct a readmission risk assessment (RRA) dataset based on the MIMIC-III dataset. The extensive experiments show that SMTAFormer outperforms advanced methods, in which the accuracy of our proposed method is up to 86.6%, and the area under the receiver operating characteristic curve (AUC) is up to 0.717. Conclusion: Our proposed SMTAFormer can efficiently capture and fuse static and multivariate temporal feature representations. The results show that SMTAFormer significantly improves the short-term readmission prediction performance of ICU patients through comparisons to strong baselines.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2407.11096

Country:

Asia > China (0.47)
North America > United States > Massachusetts (0.28)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.96)
Health & Medicine > Diagnostic Medicine (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Mining (0.95)

Add feedback

TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise Queries

Liang, Renjie, Li, Li, Zhang, Chongzhi, Wang, Jing, Zhu, Xizhou, Sun, Aixin

arXiv.org Artificial IntelligenceJul-9-2024

In this paper, we propose the task of \textit{Ranked Video Moment Retrieval} (RVMR) to locate a ranked list of matching moments from a collection of videos, through queries in natural language. Although a few related tasks have been proposed and studied by CV, NLP, and IR communities, RVMR is the task that best reflects the practical setting of moment search. To facilitate research in RVMR, we develop the TVR-Ranking dataset, based on the raw videos and existing moment annotations provided in the TVR dataset. Our key contribution is the manual annotation of relevance levels for 94,442 query-moment pairs. We then develop the $NDCG@K, IoU\geq \mu$ evaluation metric for this new task and conduct experiments to evaluate three baseline models. Our experiments show that the new RVMR task brings new challenges to existing models and we believe this new dataset contributes to the research on multi-modality search. The dataset is available at \url{https://github.com/Ranking-VMR/TVR-Ranking}

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2407.06597

Country: Asia (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Add feedback