Goto

Collaborating Authors

 zipcode


FunReason-MT Technical Report: Advanced Data Synthesis Solution for Real-world Multi-Turn Tool-use

Xu, Zengzhuang, Hao, Bingguang, Wang, Zechuan, Wen, Yuntao, Xu, Xinyi, Liu, Yang, Chen, Long, Wang, Dong, Wang, Maolin, Zhao, Tong, Chen, Yicheng, Peng, Cunyin, Gu, Jinjie, Gan, Leilei, Zhao, Xiangyu, Zhuang, Chenyi, Gu, Shi

arXiv.org Artificial Intelligence

Function calling (FC) empowers large language models (LLMs) and autonomous agents to interface with external tools, a critical capability for solving complex, real-world problems. As this ability becomes increasingly central to advanced AI systems, the need for high-quality, multi-turn training data to develop and refine it cannot be overstated. Existing data synthesis methods, such as random environment sampling or multi-agent role-playing, are not powerful enough to generate high-quality data in real-world environments. Practical challenges come in three folds: targeted data synthesis, hard query construction, and multi-turn logical dependency. To address these structural deficiencies, we present FunReason-MT, a novel data synthesis framework for real-world multi-turn tool use. FunReason-MT resolves the complexity barrier in multi-turn FC data by employing 1) Environment-API Graph Interactions to gather varied high-quality trajectories with targeted tool, 2) Advanced Tool-Query Synthesis to simplify hard query construction, and 3) Guided Iterative Chain for sophisticated CoT generation. Evaluations on Berkeley Function-Calling Leaderboard (BFCLv3) demonstrate the power of our framework: a 4B model built upon FunReason-MT generated data achieves state-of-the-art performance among comparable-sized models. Further performance improvements on BFCLv4 confirm that FunReason-MT provides a reliable and robust source for agentic learning.


GenAI on Wall Street -- Opportunities and Risk Controls

Shen, Jackie

arXiv.org Artificial Intelligence

We give an overview on the emerging applications of GenAI in the financial industry, especially within investment banks. Inherent to these exciting opportunities is a new realm of risks that must be managed properly. By heeding both the Yin and Yang sides of GenAI, we can accelerate its organic growth while safeguarding the entire financial industry during this nascent era of AI.


EVKG: An Interlinked and Interoperable Electric Vehicle Knowledge Graph for Smart Transportation System

Qi, Yanlin, Mai, Gengchen, Zhu, Rui, Zhang, Michael

arXiv.org Artificial Intelligence

Over the past decade, the electric vehicle industry has experienced unprecedented growth and diversification, resulting in a complex ecosystem. To effectively manage this multifaceted field, we present an EV-centric knowledge graph (EVKG) as a comprehensive, cross-domain, extensible, and open geospatial knowledge management system. The EVKG encapsulates essential EV-related knowledge, including EV adoption, electric vehicle supply equipment, and electricity transmission network, to support decision-making related to EV technology development, infrastructure planning, and policy-making by providing timely and accurate information and analysis. To enrich and contextualize the EVKG, we integrate the developed EV-relevant ontology modules from existing well-known knowledge graphs and ontologies. This integration enables interoperability with other knowledge graphs in the Linked Data Open Cloud, enhancing the EVKG's value as a knowledge hub for EV decision-making. Using six competency questions, we demonstrate how the EVKG can be used to answer various types of EV-related questions, providing critical insights into the EV ecosystem. Our EVKG provides an efficient and effective approach for managing the complex and diverse EV industry. By consolidating critical EV-related knowledge into a single, easily accessible resource, the EVKG supports decision-makers in making informed choices about EV technology development, infrastructure planning, and policy-making. As a flexible and extensible platform, the EVKG is capable of accommodating a wide range of data sources, enabling it to evolve alongside the rapidly changing EV landscape.


TabPy: Combining Python and Tableau - KDnuggets

#artificialintelligence

Can we integrate the power of Python calculation with a Tableau? That question was encourage me to start exploring the possibility of using Python calculation in Tableau, and I ended up with a TabPy. How can we use TabPy to integrating Python and Tableau? In this article, I will introduce TabPy and go through an example of how we can use it. TabPy is an Analytics Extension from Tableau which enables us as a user to execute Python scripts and saved functions using Tableau.


How to Solve the New $1 Million Kaggle Problem - Home Value Estimates

@machinelearnbot

A new competition is posted on Kaggle, and the prize is $1.2 Million. Here we provide some help about solving this new problem: improving home value estimates, sponsored by Zillow. We have published in the past about home value forecasting, see here, and also .here and here. In this article, I provide specific advice related to this new competition, to anyone interested in competing or curious about home value forecasting. Additional advice can be obtained by contacting me. More specifically, I provide here high-level advice, rather than about selecting specific statistical models or algorithms, though I also discuss algorithm selection in the last section.


How to Solve the New $1 Million Kaggle Problem - Home Value Estimates

#artificialintelligence

A new competition is posted on Kaggle, and the prize is $1.2 Million. Here we provide some help about solving this new problem: improving home value estimates, sponsored by Zillow. We have published in the past about home value forecasting, see here, and also .here and here. In this article, I provide specific advice related to this new competition, to anyone interested in competing or curious about home value forecasting. Additional advice can be obtained by contacting me. More specifically, I provide here high-level advice, rather than about selecting specific statistical models or algorithms, though I also discuss algorithm selection in the last section.


Spatial Semantic Scan: Jointly Detecting Subtle Events and their Spatial Footprint

Maurya, Abhinav

arXiv.org Machine Learning

Many methods have been proposed for detecting emerging events in text streams using topic modeling. However, these methods have shortcomings that make them unsuitable for rapid detection of locally emerging events on massive text streams. We describe Spatially Compact Semantic Scan (SCSS) that has been developed specifically to overcome the shortcomings of current methods in detecting new spatially compact events in text streams. SCSS employs alternating optimization between using semantic scan (Liu and Neill (2011)) to estimate contrastive foreground topics in documents, and discovering spatial neighborhoods (Shao et al. (2011)) with high occurrence of documents containing the foreground topics. We evaluate our method on Emergency Department chief complaints dataset (ED dataset) to verify the effectiveness of our method in detecting real-world disease outbreaks from free-text ED chief complaint data.


Learning Semantic Definitions of Online Information Sources

Carman, M. J., Knoblock, C. A.

Journal of Artificial Intelligence Research

The Internet contains a very large number of information sources providing many types of data from weather forecasts to travel deals and financial information. These sources can be accessed via Web-forms, Web Services, RSS feeds and so on. In order to make automated use of these sources, we need to model them semantically, but writing semantic descriptions for Web Services is both tedious and error prone. In this paper we investigate the problem of automatically generating such models. We introduce a framework for learning Datalog definitions of Web sources. In order to learn these definitions, our system actively invokes the sources and compares the data they produce with that of known sources of information. It then performs an inductive logic search through the space of plausible source definitions in order to learn the best possible semantic model for each new source. In this paper we perform an empirical evaluation of the system using real-world Web sources. The evaluation demonstrates the effectiveness of the approach, showing that we can automatically learn complex models for real sources in reasonable time. We also compare our system with a complex schema matching system, showing that our approach can handle the kinds of problems tackled by the latter.