AITopics | gao

Collaborating Authors

gao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Intel wants cheap Windows laptops to stop feeling cheap

PCWorldJun-15-2026, 13:00:00 GMT

PCWorld reports on Intel's Project Firefly initiative, which aims to bring premium laptop features like all-metal construction and fanless design to budget-friendly devices. The project centers on Intel's new Core Series 3 'Wildcat Lake' processor, engineered with cost-reduction technologies and simplified motherboard designs to make laptops more affordable. Major manufacturers including Dell, HP, Lenovo, Acer, and Asus will ship these reimagined mainstream laptops targeting students and small businesses. People everywhere are talking about Apple's cheaper MacBook Neo laptop. Now Windows is preparing to retake the mainstream laptop market with Project Firefly, inspired by smartphone design.

artificial intelligence, home robotic performance privacy productivity, laptop, (10 more...)

PCWorld

Industry:

Information Technology > Security & Privacy (0.97)
Information Technology > Hardware (0.91)
Leisure & Entertainment > Games > Computer Games (0.53)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Robots (0.53)

Add feedback

Automated Hazard Detection in Construction Sites Using Large Language and Vision-Language Models

Sahraoui, Islem

arXiv.org Artificial IntelligenceNov-21-2025

This thesis explores a multimodal AI framework for enhancing construction safety through the combined analysis of textual and visual data. In safety-critical environments such as construction sites, accident data often exists in multiple formats, such as written reports, inspection records, and site imagery, making it challenging to synthesize hazards using traditional approaches. To address this, this thesis proposed a multimodal AI framework that combines text and image analysis to assist in identifying safety hazards on construction sites. Two case studies were consucted to evaluate the capabilities of large language models (LLMs) and vision-language models (VLMs) for automated hazard identification.The first case study introduces a hybrid pipeline that utilizes GPT 4o and GPT 4o mini to extract structured insights from a dataset of 28,000 OSHA accident reports (2000-2025). The second case study extends this investigation using Molmo 7B and Qwen2 VL 2B, lightweight, open-source VLMs. Using the public ConstructionSite10k dataset, the performance of the two models was evaluated on rule-level safety violation detection using natural language prompts. This experiment served as a cost-aware benchmark against proprietary models and allowed testing at scale with ground-truth labels. Despite their smaller size, Molmo 7B and Quen2 VL 2B showed competitive performance in certain prompt configurations, reinforcing the feasibility of low-resource multimodal systems for rule-aware safety monitoring.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.1572

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Construction & Engineering (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

OpenAI's new LLM exposes the secrets of how AI really works

MIT Technology ReviewNov-13-2025, 18:00:00 GMT

The experimental model won't compete with the biggest and best, but it could tell us why they behave in weird ways--and how trustworthy they really are. ChatGPT maker OpenAI has built an experimental large language model that is far easier to understand than typical models. That's a big deal, because today's LLMs are black boxes: Nobody fully understands how they do what they do. Building a model that is more transparent sheds light on how LLMs work in general, helping researchers figure out why models hallucinate, why they go off the rails, and just how far we should trust them with critical tasks. "As these AI systems get more powerful, they're going to get integrated more and more into very important domains," Leo Gao, a research scientist at OpenAI, told in an exclusive preview of the new work. "It's very important to make sure they're safe."

large language model, machine learning, natural language, (17 more...)

MIT Technology Review

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.88)

Add feedback

Real-Time Crowd Counting for Embedded Systems with Lightweight Architecture

Zhao, Zhiyuan, Wen, Yubin, Yang, Siyu, Ning, Lichen, Liu, Yuandong, Gao, Junyu

arXiv.org Artificial IntelligenceOct-16-2025

Crowd counting is a task of estimating the number of the crowd through images, which is extremely valuable in the fields of intelligent security, urban planning, public safety management, and so on. However, the existing counting methods have some problems in practical application on embedded systems for these fields, such as excessive model parameters, abundant complex calculations, etc. The practical application of embedded systems requires the model to be real-time, which means that the model is fast enough. Considering the aforementioned problems, we design a super real-time model with a stem-encoder-decoder structure for crowd counting tasks, which achieves the fastest inference compared with state-of-the-arts. Firstly, large convolution kernels in the stem network are used to enlarge the receptive field, which effectively extracts detailed head information. Then, in the encoder part, we use conditional channel weighting and multi-branch local fusion block to merge multi-scale features with low computational consumption. This part is crucial to the super real-time performance of the model. Finally, the feature pyramid networks are added to the top of the encoder to alleviate its incomplete fusion problems. Experiments on three benchmarks show that our network is suitable for super real-time crowd counting on embedded systems, ensuring competitive accuracy. At the same time, the proposed network reasoning speed is the fastest. Specifically, the proposed network achieves 381.7 FPS on NVIDIA GTX 1080Ti and 71.9 FPS on NVIDIA Jetson TX1.

artificial intelligence, machine learning, real time system, (14 more...)

arXiv.org Artificial Intelligence

2510.1325

Genre: Research Report (1.00)

Industry: Information Technology (0.87)

Technology:

Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Explore the Reinforcement Learning for the LLM based ASR and TTS system

Gao, Changfeng, Li, Yabin, An, Keyu, Gao, Zhifu, Du, Zhihao, Zhao, Han, Li, Xiangang

arXiv.org Artificial IntelligenceSep-24-2025

In recent years, large language models (LLMs) have played an important role in automatic speech recognition (ASR) and text-to-speech (TTS) systems. While reinforcement learning (RL) has significantly enhanced LLM performance in text-based tasks, its application to ASR and TTS remains underexplored due to the complexity of training audio-based models. In this study, we propose a lightweight RL framework tailored for audio-based LLMs that can process audio inputs and generate audio outputs. Based on this framework, we evaluate the effectiveness of reinforcement learning on both ASR and TTS tasks. For the ASR task, we experiment with different rule-based reward functions within the Group Relative Policy Optimization (GRPO) framework and investigate the impact of RL data construction. For the TTS task, we compare GRPO with Differentiable Reward Optimization (DiffRO) and further combine the two approaches to achieve improved performance. Our experiments demonstrate that RL can significantly enhance the performance of both ASR and TTS systems, even with limited training data and a small number of optimization steps.

large language model, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2509.18569

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)

Add feedback

LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer

Dong, Yaoxian, Gao, Yifan, Li, Haoyue, Cui, Yanfen, Gao, Xin

arXiv.org Artificial IntelligenceJul-16-2025

Accurate preoperative assessment of lymph node (LN) metastasis in rectal cancer guides treatment decisions, yet conventional MRI evaluation based on morphological criteria shows limited diagnostic performance. While some artificial intelligence models have been developed, they often operate as black boxes, lacking the interpretability needed for clinical trust. Moreover, these models typically evaluate nodes in isolation, overlooking the patient-level context. To address these limitations, we introduce LRMR, an LLM-Driven Relational Multi-node Ranking framework. This approach reframes the diagnostic task from a direct classification problem into a structured reasoning and ranking process. The LRMR framework operates in two stages. First, a multimodal large language model (LLM) analyzes a composite montage image of all LNs from a patient, generating a structured report that details ten distinct radiological features. Second, a text-based LLM performs pairwise comparisons of these reports between different patients, establishing a relative risk ranking based on the severity and number of adverse features. We evaluated our method on a retrospective cohort of 117 rectal cancer patients. LRMR achieved an area under the curve (AUC) of 0.7917 and an F1-score of 0.7200, outperforming a range of deep learning baselines, including ResNet50 (AUC 0.7708). Ablation studies confirmed the value of our two main contributions: removing the relational ranking stage or the structured prompting stage led to a significant performance drop, with AUCs falling to 0.6875 and 0.6458, respectively. Our work demonstrates that decoupling visual perception from cognitive reasoning through a two-stage LLM framework offers a powerful, interpretable, and effective new paradigm for assessing lymph node metastasis in rectal cancer.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2507.11457

Country: Asia > China (0.68)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Differentiable Reward Optimization for LLM based TTS system

Gao, Changfeng, Du, Zhihao, Zhang, Shiliang

arXiv.org Artificial IntelligenceJul-9-2025

This paper proposes a novel Differentiable Reward Optimization (DiffRO) method aimed at enhancing the performance of neural codec language models based text-to-speech (TTS) systems. In contrast to conventional reinforcement learning from human feedback (RLHF) approaches applied to TTS, DiffRO directly compute the rewards based on neural codec tokens, rather than relying on synthesized audio. Furthermore, we employ the Gumbel-Softmax technique to render the reward function differentiable, thereby streamlining the RLHF training process. Additionally, we introduce a multi-task reward (MTR) model which can provide feedback from different perspectives and find that it can augment the system's capability to follow instructions effectively. Experimental results indicate that DiffRO significantly improves the pronunciation accuracy of the TTS system, achieving state-of-the-art (SOT A) WER results on the seed-tts-eval benchmark. Moreover, with the integration of the MTR model, we demonstrate the ability to control emotional and quality attributes in a zero-shot manner.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.05911

Genre: Research Report (0.85)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)

Add feedback

Review for NeurIPS paper: Noise-Contrastive Estimation for Multivariate Point Processes

Neural Information Processing SystemsJan-23-2025, 11:01:19 GMT

The paper derives a new estimation method for multi-variate point processes that is based on the'ranking'-variant of NCE. The paper is borderline: two reviewers think that the difference to previous work by Gao (who use NCE to estimate point-processes) and the empirical comparison is not sufficient. Two other reviewers disagree, with one in particular arguing that the paper should be accepted. The meta-reviewer thinks that the theory in the paper is sufficiently different from Gao's work, and that the theoretical aspects of the paper are deeper and more rigorous. The results do not follow directly from previous work by Gutmann & Hyvarinen (2012) or Ma & Collins (2018). The empirical results are good and the method should be useful in practice.

multivariate point process, neurips paper, noise-contrastive estimation, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.56)

Add feedback

Quantum-inspired Reinforcement Learning for Synthesizable Drug Design

Wang, Dannong, Chen, Jintai, Liang, Zhiding, Fu, Tianfan, Liu, Xiao-Yang

arXiv.org Artificial IntelligenceSep-13-2024

Synthesizable molecular design (also known as synthesizable molecular optimization) is a fundamental problem in drug discovery, and involves designing novel molecular structures to improve their properties according to drug-relevant oracle functions (i.e., objective) while ensuring synthetic feasibility. However, existing methods are mostly based on random search. To address this issue, in this paper, we introduce a novel approach using the reinforcement learning method with quantum-inspired simulated annealing policy neural network to navigate the vast discrete space of chemical structures intelligently. Specifically, we employ a deterministic REINFORCE algorithm using policy neural networks to output transitional probability to guide state transitions and local search using genetic algorithm to refine solutions to a local optimum within each iteration. Our methods are evaluated with the Practical Molecular Optimization (PMO) benchmark framework with a 10K query budget. We further showcase the competitive performance of our method by comparing it against the state-of-the-art genetic algorithms-based method.

algorithm, learning, molecule, (14 more...)

arXiv.org Artificial Intelligence

2409.09183

Country: North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Rotations of G\"odel algebras with modal operators

Flaminio, Tommaso, Godo, Lluis, Menchón, Paula, Rodriguez, Ricardo O.

arXiv.org Artificial IntelligenceMay-23-2024

The present paper is devoted to study the effect of connected and disconnected rotations of G\"odel algebras with operators grounded on directly indecomposable structures. The structures resulting from this construction we will present are nilpotent minimum (with or without negation fixpoint, depending on whether the rotation is connected or disconnected) with special modal operators defined on a directly indecomposable algebra. In this paper we will present a (quasi-)equational definition of these latter structures. Our main results show that directly indecomposable nilpotent minimum algebras (with or without negation fixpoint) with modal operators are fully characterized as connected and disconnected rotations of directly indecomposable G\"odel algebras endowed with modal operators.

algebra, logic, operator, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-08971-8_55

2405.19354

Country:

South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
South America > Colombia > Bogotá D.C. > Bogotá (0.04)
South America > Brazil (0.04)
(3 more...)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.32)

Add feedback