AITopics | Wang, Shan

Collaborating Authors

Wang, Shan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Inorganic Catalyst Efficiency Prediction Based on EAPCR Model: A Deep Learning Solution for Multi-Source Heterogeneous Data

Liu, Zhangdi, An, Ling, Song, Mengke, Yu, Zhuohang, Wang, Shan, Qi, Kezhen, Zhang, Zhenyu, Zhou, Chichun

arXiv.org Artificial IntelligenceMar-10-2025

The design of inorganic catalysts and the prediction of their catalytic efficiency are fundamental challenges in chemistry and materials science. Traditional catalyst evaluation methods primarily rely on machine learning techniques; however, these methods often struggle to process multi-source heterogeneous data, limiting both predictive accuracy and generalization. To address these limitations, this study introduces the Embedding-Attention-Permutated CNN-Residual (EAPCR) deep learning model. EAPCR constructs a feature association matrix using embedding and attention mechanisms and enhances predictive performance through permutated CNN architectures and residual connections. This approach enables the model to accurately capture complex feature interactions across various catalytic conditions, leading to precise efficiency predictions. EAPCR serves as a powerful tool for computational researchers while also assisting domain experts in optimizing catalyst design, effectively bridging the gap between data-driven modeling and experimental applications. We evaluate EAPCR on datasets from TiO2 photocatalysis, thermal catalysis, and electrocatalysis, demonstrating its superiority over traditional machine learning methods (e.g., linear regression, random forest) as well as conventional deep learning models (e.g., ANN, NNs). Across multiple evaluation metrics (MAE, MSE, R2, and RMSE), EAPCR consistently outperforms existing approaches. These findings highlight the strong potential of EAPCR in inorganic catalytic efficiency prediction. As a versatile deep learning framework, EAPCR not only improves predictive accuracy but also establishes a solid foundation for future large-scale model development in inorganic catalysis.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2503.07424

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Materials > Chemicals > Specialty Chemicals (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Increasing SLAM Pose Accuracy by Ground-to-Satellite Image Registration

Zhang, Yanhao, Shi, Yujiao, Wang, Shan, Vora, Ankit, Perincherry, Akhil, Chen, Yongbo, Li, Hongdong

arXiv.org Artificial IntelligenceApr-14-2024

Vision-based localization for autonomous driving has been of great interest among researchers. When a pre-built 3D map is not available, the techniques of visual simultaneous localization and mapping (SLAM) are typically adopted. Due to error accumulation, visual SLAM (vSLAM) usually suffers from long-term drift. This paper proposes a framework to increase the localization accuracy by fusing the vSLAM with a deep-learning-based ground-to-satellite (G2S) image registration method. In this framework, a coarse (spatial correlation bound check) to fine (visual odometry consistency check) method is designed to select the valid G2S prediction. The selected prediction is then fused with the SLAM measurement by solving a scaled pose graph problem. To further increase the localization accuracy, we provide an iterative trajectory fusion pipeline. The proposed framework is evaluated on two well-known autonomous driving datasets, and the results demonstrate the accuracy and robustness in terms of vehicle localization.

machine learning, pattern recognition, prediction, (21 more...)

arXiv.org Artificial Intelligence

2404.09169

Country: Oceania > Australia (0.28)

Genre: Research Report (0.70)

Industry: Automobiles & Trucks (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.60)

Add feedback

Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants

Yu, Tianyu, Hu, Jinyi, Yao, Yuan, Zhang, Haoye, Zhao, Yue, Wang, Chongyi, Wang, Shan, Pan, Yinxv, Xue, Jiao, Li, Dahai, Liu, Zhiyuan, Zheng, Hai-Tao, Sun, Maosong

arXiv.org Artificial IntelligenceOct-1-2023

Recent Multimodal Large Language Models (MLLMs) exhibit impressive abilities to perceive images and follow open-ended instructions. The capabilities of MLLMs depend on two crucial factors: the model architecture to facilitate the feature alignment of visual modules and large language models; the multimodal instruction tuning datasets for human instruction following. (i) For the model architecture, most existing models introduce an external bridge module to connect vision encoders with language models, which needs an additional feature-alignment pre-training. In this work, we discover that compact pre-trained vision language models can inherently serve as ``out-of-the-box'' bridges between vision and language. Based on this, we propose Muffin framework, which directly employs pre-trained vision-language models to act as providers of visual signals. (ii) For the multimodal instruction tuning datasets, existing methods omit the complementary relationship between different datasets and simply mix datasets from different tasks. Instead, we propose UniMM-Chat dataset which explores the complementarities of datasets to generate 1.1M high-quality and diverse multimodal instructions. We merge information describing the same image from diverse datasets and transforms it into more knowledge-intensive conversation data. Experimental results demonstrate the effectiveness of the Muffin framework and UniMM-Chat dataset. Muffin achieves state-of-the-art performance on a wide range of vision-language tasks, significantly surpassing state-of-the-art models like LLaVA and InstructBLIP. Our model and dataset are all accessible at https://github.com/thunlp/muffin.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2310.00653

Country: Asia > China (0.14)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Industry:

Transportation > Air (1.00)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages

Hu, Jinyi, Yao, Yuan, Wang, Chongyi, Wang, Shan, Pan, Yinxu, Chen, Qianyu, Yu, Tianyu, Wu, Hanghao, Zhao, Yue, Zhang, Haoye, Han, Xu, Lin, Yankai, Xue, Jiao, Li, Dahai, Liu, Zhiyuan, Sun, Maosong

arXiv.org Artificial IntelligenceAug-23-2023

Recently there has been a significant surge in multimodal learning in terms of both image-to-text and text-to-image generation. However, the success is typically limited to English, leaving other languages largely behind. Building a competitive counterpart in other languages is highly challenging due to the low-resource nature of non-English multimodal data (i.e., lack of large-scale, high-quality image-text data). In this work, we propose MPM, an effective training paradigm for training large multimodal models in low-resource languages. MPM demonstrates that Multilingual language models can Pivot zero-shot Multimodal learning across languages. Specifically, based on a strong multilingual large language model, multimodal models pretrained on English-only image-text data can well generalize to other languages in a zero-shot manner for both image-to-text and text-to-image generation, even surpassing models trained on image-text data in native languages. Taking Chinese as a practice of MPM, we build large multimodal models VisCPM in image-to-text and text-to-image generation, which achieve state-of-the-art (open-source) performance in Chinese. To facilitate future research, we open-source codes and model weights at https://github.com/OpenBMB/VisCPM.git.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2308.12038

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.82)

Industry:

Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Model Calibration in Dense Classification with Adaptive Label Perturbation

Liu, Jiawei, Ye, Changkun, Wang, Shan, Cui, Ruikai, Zhang, Jing, Zhang, Kaihao, Barnes, Nick

arXiv.org Artificial IntelligenceAug-2-2023

For safety-related applications, it is crucial to produce trustworthy deep neural networks whose prediction is associated with confidence that can represent the likelihood of correctness for subsequent decision-making. Existing dense binary classification models are prone to being over-confident. To improve model calibration, we propose Adaptive Stochastic Label Perturbation (ASLP) which learns a unique label perturbation level for each training image. ASLP employs our proposed Self-Calibrating Binary Cross Entropy (SC-BCE) loss, which unifies label perturbation processes including stochastic approaches (like DisturbLabel), and label smoothing, to correct calibration while maintaining classification rates. ASLP follows Maximum Entropy Inference of classic statistical mechanics to maximise prediction entropy with respect to missing information. It performs this while: (1) preserving classification accuracy on known data as a conservative solution, or (2) specifically improves model calibration degree by minimising the gap between the prediction accuracy and expected confidence of the target training label. Extensive results demonstrate that ASLP can significantly improve calibration degrees of dense binary classification models on both in-distribution and out-of-distribution data. The code is available on https://github.com/Carlisle-Liu/ASLP.

artificial intelligence, calibration degree, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2307.13539

Country:

Europe > Portugal (0.14)
Europe > Germany (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Samplable Anonymous Aggregation for Private Federated Data Analysis

Talwar, Kunal, Wang, Shan, McMillan, Audra, Jina, Vojta, Feldman, Vitaly, Basile, Bailey, Cahill, Aine, Chan, Yi Sheng, Chatzidakis, Mike, Chen, Junye, Chick, Oliver, Chitnis, Mona, Ganta, Suman, Goren, Yusuf, Granqvist, Filip, Guo, Kristine, Jacobs, Frederic, Javidbakht, Omid, Liu, Albert, Low, Richard, Mascenik, Dan, Myers, Steve, Park, David, Park, Wonhee, Parsa, Gianni, Pauly, Tommy, Priebe, Christian, Rishi, Rehan, Rothblum, Guy, Scaria, Michael, Song, Linmao, Song, Congzheng, Tarbe, Karl, Vogt, Sebastian, Winstrom, Luke, Zhou, Shundong

arXiv.org Artificial IntelligenceJul-27-2023

Learning aggregate population trends can allow for better data-driven decisions, and application of machine learning can improve user experience. Compared to learning from public curated datasets, learning from a larger population offers several benefits. As an example, a next-word prediction model trained on words typed by users (a) can better fit the actual distribution of language used on devices, (b) can adapt faster to shifts in distribution, and (c) can more faithfully represent smaller sub-populations that may not be well-represented in curated datasets. At the same time, training such models may involve sensitive user data.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2307.15017

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.28)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)

Add feedback

CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X

Zheng, Qinkai, Xia, Xiao, Zou, Xu, Dong, Yuxiao, Wang, Shan, Xue, Yufei, Wang, Zihan, Shen, Lei, Wang, Andi, Li, Yang, Su, Teng, Yang, Zhilin, Tang, Jie

arXiv.org Artificial IntelligenceMar-30-2023

Large pre-trained code generation models, such as OpenAI Codex, can generate syntax- and function-correct code, making the coding of programmers more productive and our pursuit of artificial general intelligence closer. In this paper, we introduce CodeGeeX, a multilingual model with 13 billion parameters for code generation. CodeGeeX is pre-trained on 850 billion tokens of 23 programming languages as of June 2022. Our extensive experiments suggest that CodeGeeX outperforms multilingual code models of similar scale for both the tasks of code generation and translation on HumanEval-X. Building upon HumanEval (Python only), we develop the HumanEval-X benchmark for evaluating multilingual models by hand-writing the solutions in C++, Java, JavaScript, and Go. In addition, we build CodeGeeX-based extensions on Visual Studio Code, JetBrains, and Cloud Studio, generating 4.7 billion tokens for tens of thousands of active users per week. Our user study demonstrates that CodeGeeX can help to increase coding efficiency for 83.4% of its users. Finally, CodeGeeX is publicly accessible and in Sep. 2022, we open-sourced its code, model weights (the version of 850B tokens), API, extensions, and HumanEval-X at https://github.com/THUDM/CodeGeeX.

machine learning, natural language, programming language, (18 more...)

arXiv.org Artificial Intelligence

2303.17568

Genre:

Research Report (0.70)
Questionnaire & Opinion Survey (0.54)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Resource-aware Probability-based Collaborative Odor Source Localization Using Multiple UAVs

Wang, Shan, Sun, Sheng, Liu, Min, Gao, Bo, Wang, Yuwei

arXiv.org Artificial IntelligenceMar-7-2023

Benefitting from UAVs' characteristics of flexible deployment and controllable movement in 3D space, odor source localization with multiple UAVs has been a hot research area in recent years. Considering the limited resources and insufficient battery capacities of UAVs, it is necessary to fast locate the odor source with low-complexity computation and minimal interaction under complicated environmental states. To this end, we propose a multi-UAV collaboration based odor source localization (\textit{MUC-OSL}) method, where source estimation and UAV navigation are iteratively performed, aiming to accelerate the searching process and reduce the resource consumption of UAVs. Specifically, in the source estimation phase, we present a collaborative particle filter algorithm on the basis of UAVs' cognitive difference and Gaussian fitting to improve source estimation accuracy. In the following navigation phase, an adaptive path planning algorithm is designed based on Partially Observable Markov Decision Process (POMDP) to distributedly determine the subsequent flying direction and moving steps of each UAV. The results of experiments conducted on two simulation platforms demonstrate that \textit{MUC-OSL} outperforms existing efforts in terms of mean search time and success rate, and effectively reduces the resource consumption of UAVs.

artificial intelligence, machine learning, uav, (16 more...)

arXiv.org Artificial Intelligence

2303.0383

Genre: Research Report (0.81)

Industry: Energy (0.90)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback