AITopics | Ren, Yiming

Collaborating Authors

Ren, Yiming

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization

Zhang, Zefeng, Tang, Hengzhu, Sheng, Jiawei, Zhang, Zhenyu, Ren, Yiming, Li, Zhenyang, Yin, Dawei, Ma, Duohe, Liu, Tingwen

arXiv.org Artificial IntelligenceMar-23-2025

Multimodal Large Language Models excel in various tasks, yet often struggle with modality bias, where the model tends to rely heavily on a single modality and overlook critical information in other modalities, which leads to incorrect focus and generating irrelevant responses. In this paper, we propose using the paradigm of preference optimization to solve the modality bias problem, including RLAIFVBias, a debiased preference optimization dataset, and a Noise Aware Preference Optimization algorithm. Specifically, we first construct the dataset by introducing perturbations to reduce the informational content of certain modalities, compelling the model to rely on a specific modality when generating negative responses. To address the inevitable noise in automatically constructed data, we combine the noise robust Mean Absolute Error with the Binary Cross Entropy in Direct Preference Optimization by a negative Box Cox transformation, and dynamically adjust the algorithm noise robustness based on the evaluated noise levels in the data. Extensive experiments validate our approach, demonstrating not only its effectiveness in mitigating modality bias but also its significant role in minimizing hallucinations.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.17928

Country:

Oceania > Australia (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SymbioSim: Human-in-the-loop Simulation Platform for Bidirectional Continuing Learning in Human-Robot Interaction

Chen, Haoran, Xu, Yiteng, Ren, Yiming, Ye, Yaoqin, Li, Xinran, Ding, Ning, Cong, Peishan, Wang, Ziyi, Liu, Bushi, Chen, Yuhan, Dou, Zhiyang, Leng, Xiaokun, Li, Manyi, Ma, Yuexin, Tu, Changhe

arXiv.org Artificial IntelligenceFeb-11-2025

The development of intelligent robots seeks to seamlessly integrate them into the human world, providing assistance and companionship in daily life and work, with the ultimate goal of achieving human-robot symbiosis. To realize this vision, robots must continuously learn and evolve through consistent interaction and collaboration with humans, while humans need to gradually develop an understanding of and trust in robots through shared experiences. However, training and testing algorithms directly on physical robots involve substantial costs and safety risks. Moreover, current robotic simulators fail to support real human participation, limiting their ability to provide authentic interaction experiences and gather valuable human feedback. In this paper, we introduce SymbioSim, a novel human-in-the-loop robotic simulation platform designed to enable the safe and efficient development, evaluation, and optimization of human-robot interactions. By leveraging a carefully designed system architecture and modules, SymbioSim delivers a natural and realistic interaction experience, facilitating bidirectional continuous learning and adaptation for both humans and robots. Extensive experiments and user studies demonstrate the platform's promising performance and highlight its potential to significantly advance research on human-robot symbiosis.

artificial intelligence, interaction, robot, (15 more...)

arXiv.org Artificial Intelligence

2502.07358

Country: Asia > China (0.47)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.93)
Education > Educational Setting > Continuing Education (0.70)

Technology: Information Technology > Artificial Intelligence > Robots > Humanoid Robots (1.00)

Add feedback

NPC: Neural Predictive Control for Fuel-Efficient Autonomous Trucks

Ren, Jiaping, Xiang, Jiahao, Gao, Hongfei, Zhang, Jinchuan, Ren, Yiming, Ma, Yuexin, Wu, Yi, Yang, Ruigang, Li, Wei

arXiv.org Artificial IntelligenceDec-18-2024

Fuel efficiency is a crucial aspect of long-distance cargo transportation by oil-powered trucks that economize on costs and decrease carbon emissions. Current predictive control methods depend on an accurate model of vehicle dynamics and engine, including weight, drag coefficient, and the Brake-specific Fuel Consumption (BSFC) map of the engine. We propose a pure data-driven method, Neural Predictive Control (NPC), which does not use any physical model for the vehicle. After training with over 20,000 km of historical data, the novel proposed NVFormer implicitly models the relationship between vehicle dynamics, road slope, fuel consumption, and control commands using the attention mechanism. Based on the online sampled primitives from the past of the current freight trip and anchor-based future data synthesis, the NVFormer can infer optimal control command for reasonable fuel consumption. The physical model-free NPC outperforms the base PCC method with 2.41% and 3.45% more significant fuel saving in simulation and open-road highway testing, respectively.

artificial intelligence, fuel consumption, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICRA57147.2024.10611146

2412.13618

Country:

Asia > China (0.15)
Europe (0.14)

Genre: Research Report (0.64)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Energy > Oil & Gas (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.83)

Add feedback

Evaluation of OpenAI o1: Opportunities and Challenges of AGI

Zhong, Tianyang, Liu, Zhengliang, Pan, Yi, Zhang, Yutong, Zhou, Yifan, Liang, Shizhe, Wu, Zihao, Lyu, Yanjun, Shu, Peng, Yu, Xiaowei, Cao, Chao, Jiang, Hanqi, Chen, Hanxu, Li, Yiwei, Chen, Junhao, Hu, Huawen, Liu, Yihen, Zhao, Huaqin, Xu, Shaochen, Dai, Haixing, Zhao, Lin, Zhang, Ruidong, Zhao, Wei, Yang, Zhenyuan, Chen, Jingyuan, Wang, Peilong, Ruan, Wei, Wang, Hui, Zhao, Huan, Zhang, Jing, Ren, Yiming, Qin, Shihuan, Chen, Tong, Li, Jiaxi, Zidan, Arif Hassan, Jahin, Afrar, Chen, Minheng, Xia, Sichen, Holmes, Jason, Zhuang, Yan, Wang, Jiaqi, Xu, Bochen, Xia, Weiran, Yu, Jichao, Tang, Kaibo, Yang, Yaxuan, Sun, Bolun, Yang, Tao, Lu, Guoyu, Wang, Xianqiao, Chai, Lilong, Li, He, Lu, Jin, Sun, Lichao, Zhang, Xin, Ge, Bao, Hu, Xintao, Zhang, Lian, Zhou, Hua, Zhang, Lu, Zhang, Shu, Liu, Ninghao, Jiang, Bei, Kong, Linglong, Xiang, Zhen, Ren, Yudan, Liu, Jun, Jiang, Xi, Bao, Yu, Zhang, Wei, Li, Xiang, Li, Gang, Liu, Wei, Shen, Dinggang, Sikora, Andrea, Zhai, Xiaoming, Zhu, Dajiang, Liu, Tianming

arXiv.org Artificial IntelligenceSep-27-2024

This comprehensive study evaluates the performance of OpenAI's o1-preview large language model across a diverse array of complex reasoning tasks, spanning multiple domains, including computer science, mathematics, natural sciences, medicine, linguistics, and social sciences. Through rigorous testing, o1-preview demonstrated remarkable capabilities, often achieving human-level or superior performance in areas ranging from coding challenges to scientific reasoning and from language processing to creative problem-solving. Key findings include: -83.3% success rate in solving complex competitive programming problems, surpassing many human experts. -Superior ability in generating coherent and accurate radiology reports, outperforming other evaluated models. -100% accuracy in high school-level mathematical reasoning tasks, providing detailed step-by-step solutions. -Advanced natural language inference capabilities across general and specialized domains like medicine. -Impressive performance in chip design tasks, outperforming specialized models in areas such as EDA script generation and bug analysis. -Remarkable proficiency in anthropology and geology, demonstrating deep understanding and reasoning in these specialized fields. -Strong capabilities in quantitative investing. O1 has comprehensive financial knowledge and statistical modeling skills. -Effective performance in social media analysis, including sentiment analysis and emotion recognition. The model excelled particularly in tasks requiring intricate reasoning and knowledge integration across various fields. While some limitations were observed, including occasional errors on simpler problems and challenges with certain highly specialized concepts, the overall results indicate significant progress towards artificial general intelligence.

chip design-engineering assistant chatbot, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2409.18486

Country:

Asia > China (1.00)
North America > United States > Texas (0.47)
North America > United States > Georgia > Clarke County > Athens (0.14)
(3 more...)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
(2 more...)

Industry:

Leisure & Entertainment (1.00)
Information Technology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
(12 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.70)

Add feedback

Needle In A Multimodal Haystack

Wang, Weiyun, Zhang, Shuibo, Ren, Yiming, Duan, Yuchen, Li, Tiantong, Liu, Shuo, Hu, Mengkang, Chen, Zhe, Zhang, Kaipeng, Lu, Lewei, Zhu, Xizhou, Luo, Ping, Qiao, Yu, Dai, Jifeng, Shao, Wenqi, Wang, Wenhai

arXiv.org Artificial IntelligenceJun-11-2024

With the rapid advancement of multimodal large language models (MLLMs), their evaluation has become increasingly comprehensive. However, understanding long multimodal content, as a foundational ability for real-world applications, remains underexplored. In this work, we present Needle In A Multimodal Haystack (MM-NIAH), the first benchmark specifically designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents. Our benchmark includes three types of evaluation tasks: multimodal retrieval, counting, and reasoning. In each task, the model is required to answer the questions according to different key information scattered throughout the given multimodal document. Evaluating the leading MLLMs on MM-NIAH, we observe that existing models still have significant room for improvement on these tasks, especially on vision-centric evaluation. We hope this work can provide a platform for further research on long multimodal document comprehension and contribute to the advancement of MLLMs.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2406.0723

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text

Liu, Pengfei, Ren, Yiming, Tao, Jun, Ren, Zhixiang

arXiv.org Artificial IntelligenceFeb-6-2024

Large language models have made significant strides in natural language processing, enabling innovative applications in molecular science by processing textual representations of molecules. However, most existing language models cannot capture the rich information with complex molecular structures or images. In this paper, we introduce GIT-Mol, a multi-modal large language model that integrates the Graph, Image, and Text information. To facilitate the integration of multi-modal molecular data, we propose GIT-Former, a novel architecture that is capable of aligning all modalities into a unified latent space. We achieve a 5%-10% accuracy increase in properties prediction and a 20.2% boost in molecule generation validity compared to the baselines. With the any-to-language molecular translation strategy, our model has the potential to perform more downstream tasks, such as compound name recognition and chemical reaction prediction.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.compbiomed.2024.108073

2308.06911

Country: Asia > China > Guangdong Province (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

3D-Mol: A Novel Contrastive Learning Framework for Molecular Property Prediction with 3D Information

Kuang, Taojie, Ren, Yiming, Ren, Zhixiang

arXiv.org Artificial IntelligenceJan-17-2024

Molecular property prediction, crucial for early drug candidate screening and optimization, has seen advancements with deep learning-based methods. While deep learning-based methods have advanced considerably, they often fall short in fully leveraging 3D spatial information. Specifically, current molecular encoding techniques tend to inadequately extract spatial information, leading to ambiguous representations where a single one might represent multiple distinct molecules. Moreover, existing molecular modeling methods focus predominantly on the most stable 3D conformations, neglecting other viable conformations present in reality. To address these issues, we propose 3D-Mol, a novel approach designed for more accurate spatial structure representation. It deconstructs molecules into three hierarchical graphs to better extract geometric information. Additionally, 3D-Mol leverages contrastive learning for pretraining on 20 million unlabeled data, treating their conformations with identical topological structures as weighted positive pairs and contrasting ones as negatives, based on the similarity of their 3D conformation descriptors and fingerprints. We compare 3D-Mol with various state-of-the-art baselines on 7 benchmarks and demonstrate our outstanding performance.

artificial intelligence, information, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2309.17366

Country:

Asia (0.46)
North America > United States > Massachusetts (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Predicting Football Match Outcomes with eXplainable Machine Learning and the Kelly Index

Ren, Yiming, Susnjak, Teo

arXiv.org Artificial IntelligenceNov-28-2022

In this work, a machine learning approach is developed for predicting the outcomes of football matches. The novelty of this research lies in the utilisation of the Kelly Index to first classify matches into categories where each one denotes the different levels of predictive difficulty. Classification models using a wide suite of algorithms were developed for each category of matches in order to determine the efficacy of the approach. In conjunction to this, a set of previously unexplored features were engineering including Elo-based variables. The dataset originated from the Premier League match data covering the 2019-2021 seasons. The findings indicate that the process of decomposing the predictive problem into sub-tasks was effective and produced competitive results with prior works, while the ensemble-based methods were the most effective. The paper also devised an investment strategy in order to evaluate its effectiveness by benchmarking against bookmaker odds. An approach was developed that minimises risk by combining the Kelly Index with the predefined confidence thresholds of the predictive models. The experiments found that the proposed strategy can return a profit when following a conservative approach that focuses primarily on easy-to-predict matches where the predictive models display a high confidence level.

artificial intelligence, machine learning, prediction, (15 more...)

arXiv.org Artificial Intelligence

2211.15734

Country: Europe (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback