AITopics | Han, Han

Collaborating Authors

Han, Han

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models

Wu, Mengsong, Zhu, Tong, Han, Han, Zhang, Xiang, Shao, Wenbiao, Chen, Wenliang

arXiv.org Artificial IntelligenceMar-20-2025

Tool learning can further broaden the usage scenarios of large language models (LLMs). However most of the existing methods either need to finetune that the model can only use tools seen in the training data, or add tool demonstrations into the prompt with lower efficiency. In this paper, we present a new Tool Learning method Chain-of-Tools. It makes full use of the powerful semantic representation capability of frozen LLMs to finish tool calling in CoT reasoning with a huge and flexible tool pool which may contain unseen tools. Especially, to validate the effectiveness of our approach in the massive unseen tool scenario, we construct a new dataset SimpleToolQuestions. We conduct experiments on two numerical reasoning benchmarks (GSM8K-XL and FuncQA) and two knowledge-based question answering benchmarks (KAMEL and SimpleToolQuestions). Experimental results show that our approach performs better than the baseline. We also identify dimensions of the model output that are critical in tool selection, enhancing the model interpretability. Our code and data are available at: https://github.com/fairyshine/Chain-of-Tools .

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2503.16779

Country: Asia > China (0.14)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models

Han, Han, Zhu, Tong, Zhang, Xiang, Wu, Mengsong, Xiong, Hao, Chen, Wenliang

arXiv.org Artificial IntelligenceJan-7-2025

Large language models (LLMs) combined with tool learning have gained impressive results in real-world applications. During tool learning, LLMs may call multiple tools in nested orders, where the latter tool call may take the former response as its input parameters. However, current research on the nested tool learning capabilities is still under-explored, since the existing benchmarks lack relevant data instances. To address this problem, we introduce NesTools to bridge the current gap in comprehensive nested tool learning evaluations. NesTools comprises a novel automatic data generation method to construct large-scale nested tool calls with different nesting structures. With manual review and refinement, the dataset is in high quality and closely aligned with real-world scenarios. Therefore, NesTools can serve as a new benchmark to evaluate the nested tool learning abilities of LLMs. We conduct extensive experiments on 22 LLMs, and provide in-depth analyses with NesTools, which shows that current LLMs still suffer from the complex nested tool learning task.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.11805

Country:

Asia (0.68)
North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.68)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmark

Wu, Mengsong, Zhu, Tong, Han, Han, Tan, Chuanyuan, Zhang, Xiang, Chen, Wenliang

arXiv.org Artificial IntelligenceMay-14-2024

This paper presents a new tool learning dataset Seal-Tools, which contains self-instruct API-like tools. Seal-Tools not only offers a large number of tools, but also includes instances which demonstrate the practical application of tools. Seeking to generate data on a large scale while ensuring reliability, we propose a self-instruct method to generate tools and instances, allowing precise control over the process. Moreover, our Seal-Tools contains hard instances that call multiple tools to complete the job, among which some are nested tool callings. For precise and comprehensive evaluation, we use strict format control and design three metrics from different dimensions. Therefore, Seal-Tools can serve as a new benchmark to evaluate the tool-calling ability of LLMs. Finally, we evaluate several prevalent LLMs and our finetuned model on Seal-Tools. The results show that current systems are far from perfect. The code, data and experiment results are available at https://github.com/fairyshine/Seal-Tools .

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2405.08355

Country: Asia > China (0.28)

Genre: Research Report (0.70)

Industry:

Consumer Products & Services (0.94)
Transportation > Passenger (0.68)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Event-Based Eye Tracking. AIS 2024 Challenge Survey

Wang, Zuowen, Gao, Chang, Wu, Zongwei, Conde, Marcos V., Timofte, Radu, Liu, Shih-Chii, Chen, Qinyu, Zha, Zheng-jun, Zhai, Wei, Han, Han, Liao, Bohao, Wu, Yuliang, Wan, Zengyu, Wang, Zhong, Cao, Yang, Tan, Ganchao, Chen, Jinze, Pei, Yan Ru, Brüers, Sasskia, Crouzet, Sébastien, McLelland, Douglas, Coenen, Oliver, Zhang, Baoheng, Gao, Yizhao, Li, Jingyuan, So, Hayden Kwok-Hay, Bich, Philippe, Boretti, Chiara, Prono, Luciano, Lică, Mircea, Dinucu-Jianu, David, Grîu, Cătălin, Lin, Xiaopeng, Ren, Hongwei, Cheng, Bojun, Zhang, Xinan, Vial, Valentin, Yezzi, Anthony, Tsai, James

arXiv.org Artificial IntelligenceApr-17-2024

This survey reviews the AIS 2024 Event-Based Eye Tracking (EET) Challenge. The task of the challenge focuses on processing eye movement recorded with event cameras and predicting the pupil center of the eye. The challenge emphasizes efficient eye tracking with event cameras to achieve good task accuracy and efficiency trade-off. During the challenge period, 38 participants registered for the Kaggle competition, and 8 teams submitted a challenge factsheet. The novel and diverse methods from the submitted factsheets are reviewed and analyzed in this survey to advance future event-based eye tracking research.

artificial intelligence, machine learning, representation, (20 more...)

arXiv.org Artificial Intelligence

2404.1177

Country:

Asia > China (0.28)
Europe > Netherlands > South Holland (0.14)

Genre: Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Information Technology (0.68)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Traffic Sign Interpretation in Real Road Scene

Yang, Chuang, Zhuang, Kai, Chen, Mulin, Ma, Haozhao, Han, Xu, Han, Tao, Guo, Changxing, Han, Han, Zhao, Bingxuan, Wang, Qi

arXiv.org Artificial IntelligenceNov-28-2023

Most existing traffic sign-related works are dedicated to detecting and recognizing part of traffic signs individually, which fails to analyze the global semantic logic among signs and may convey inaccurate traffic instruction. Following the above issues, we propose a traffic sign interpretation (TSI) task, which aims to interpret global semantic interrelated traffic signs (e.g.,~driving instruction-related texts, symbols, and guide panels) into a natural language for providing accurate instruction support to autonomous or assistant driving. Meanwhile, we design a multi-task learning architecture for TSI, which is responsible for detecting and recognizing various traffic signs and interpreting them into a natural language like a human. Furthermore, the absence of a public TSI available dataset prompts us to build a traffic sign interpretation dataset, namely TSI-CN. The dataset consists of real road scene images, which are captured from the highway and the urban way in China from a driver's perspective. It contains rich location labels of texts, symbols, and guide panels, and the corresponding natural language description labels. Experiments on TSI-CN demonstrate that the TSI task is achievable and the TSI architecture can interpret traffic signs from scenes successfully even if there is a complex semantic logic among signs. The TSI-CN dataset and the source code of the TSI architecture will be publicly available after the revision process.

machine learning, natural language, recognition, (17 more...)

arXiv.org Artificial Intelligence

2311.10793

Country: Asia > China (0.88)

Genre: Research Report (0.40)

Industry: Transportation (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fitting Auditory Filterbanks with Multiresolution Neural Networks

Lostanlen, Vincent, Haider, Daniel, Han, Han, Lagrange, Mathieu, Balazs, Peter, Ehler, Martin

arXiv.org Artificial IntelligenceJul-25-2023

Waveform-based deep learning faces a dilemma between nonparametric and parametric approaches. On one hand, convolutional neural networks (convnets) may approximate any linear time-invariant system; yet, in practice, their frequency responses become more irregular as their receptive fields grow. On the other hand, a parametric model such as LEAF is guaranteed to yield Gabor filters, hence an optimal time-frequency localization; yet, this strong inductive bias comes at the detriment of representational capacity. In this paper, we aim to overcome this dilemma by introducing a neural audio model, named multiresolution neural network (MuReNN). The key idea behind MuReNN is to train separate convolutional operators over the octave subbands of a discrete wavelet transform (DWT). Since the scale of DWT atoms grows exponentially between octaves, the receptive fields of the subsequent learnable convolutions in MuReNN are dilated accordingly. For a given real-world dataset, we fit the magnitude response of MuReNN to that of a well-established auditory filterbank: Gammatone for speech, CQT for music, and third-octave for urban sounds, respectively. This is a form of knowledge distillation (KD), in which the filterbank ''teacher'' is engineered by domain knowledge while the neural network ''student'' is optimized from data. We compare MuReNN to the state of the art in terms of goodness of fit after KD on a hold-out set and in terms of Heisenberg time-frequency localization. Compared to convnets and Gabor convolutions, we find that MuReNN reaches state-of-the-art performance on all three optimization problems.

artificial intelligence, filterbank, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2307.13821

Country:

Europe > Austria > Vienna (0.15)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Texas (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Perceptual-Neural-Physical Sound Matching

Han, Han, Lostanlen, Vincent, Lagrange, Mathieu

arXiv.org Artificial IntelligenceMar-13-2023

Sound matching algorithms seek to approximate a target waveform by parametric audio synthesis. Deep neural networks have achieved promising results in matching sustained harmonic tones. However, the task is more challenging when targets are nonstationary and inharmonic, e.g., percussion. We attribute this problem to the inadequacy of loss function. On one hand, mean square error in the parametric domain, known as "P-loss", is simple and fast but fails to accommodate the differing perceptual significance of each parameter. On the other hand, mean square error in the spectrotemporal domain, known as "spectral loss", is perceptually motivated and serves in differentiable digital signal processing (DDSP). Yet, spectral loss is a poor predictor of pitch intervals and its gradient may be computationally expensive; hence a slow convergence. Against this conundrum, we present Perceptual-Neural-Physical loss (PNP). PNP is the optimal quadratic approximation of spectral loss while being as fast as P-loss during training. We instantiate PNP with physical modeling synthesis as decoder and joint time-frequency scattering transform (JTFS) as spectral representation. We demonstrate its potential on matching synthetic drum sounds in comparison with other loss functions.

artificial intelligence, machine learning, spectral loss, (18 more...)

arXiv.org Artificial Intelligence

2301.02886

Genre: Research Report (0.50)

Industry:

Media > Music (0.93)
Leisure & Entertainment (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Mesostructures: Beyond Spectrogram Loss in Differentiable Time-Frequency Analysis

Vahidi, Cyrus, Han, Han, Wang, Changhong, Lagrange, Mathieu, Fazekas, György, Lostanlen, Vincent

arXiv.org Artificial IntelligenceJan-24-2023

Computer musicians refer to mesostructures as the intermediate levels of articulation between the microstructure of waveshapes and the macrostructure of musical forms. Examples of mesostructures include melody, arpeggios, syncopation, polyphonic grouping, and textural contrast. Despite their central role in musical expression, they have received limited attention in deep learning. Currently, autoencoders and neural audio synthesizers are only trained and evaluated at the scale of microstructure: i.e., local amplitude variations up to 100 milliseconds or so. In this paper, we formulate and address the problem of mesostructural audio modeling via a composition of a differentiable arpeggiator and time-frequency scattering. We empirically demonstrate that time--frequency scattering serves as a differentiable model of similarity between synthesis parameters that govern mesostructure. By exposing the sensitivity of short-time spectral distances to time alignment, we motivate the need for a time-invariant and multiscale differentiable time--frequency model of similarity at the level of both local spectra and spectrotemporal modulations.

artificial intelligence, machine learning, time shift, (18 more...)

arXiv.org Artificial Intelligence

2301.10183

Country: Europe (0.68)

Genre: Research Report (0.50)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback