prompt example
PRODIGY: Enabling In-context Learning Over Graphs
In-context learning is the ability of a pretrained model to adapt to novel and diverse downstream tasks by conditioning on prompt examples, without optimizing any parameters. While large language models have demonstrated this ability, how in-context learning could be performed over graphs is unexplored.
PRODIGY: Enabling In-context Learning Over Graphs
While large language models have demonstrated this ability, how in-context learning could be performed over graphs is unexplored. In this paper, we develop Pr etraining O ver D iverse I n-Context G raph S y stems (PRODIGY), the first pretraining framework that enables in-context learning over graphs.
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
- Information Technology > Data Science > Data Mining (0.94)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)
FinRpt: Dataset, Evaluation System and LLM-based Multi-agent Framework for Equity Research Report Generation
Jin, Song, Li, Shuqi, Zhang, Shukun, Yan, Rui
While LLMs have shown great success in financial tasks like stock prediction and question answering, their application in fully automating Equity Research Report generation remains uncharted territory. In this paper, we formulate the Equity Research Report (ERR) Generation task for the first time. To address the data scarcity and the evaluation metrics absence, we present an open-source evaluation benchmark for ERR generation - FinRpt. We frame a Dataset Construction Pipeline that integrates 7 financial data types and produces a high-quality ERR dataset automatically, which could be used for model training and evaluation. We also introduce a comprehensive evaluation system including 11 metrics to assess the generated ERRs. Moreover, we propose a multi-agent framework specifically tailored to address this task, named FinRpt-Gen, and train several LLM-based agents on the proposed datasets using Supervised Fine-Tuning and Reinforcement Learning. Experimental results indicate the data quality and metrics effectiveness of the benchmark FinRpt and the strong performance of FinRpt-Gen, showcasing their potential to drive innovation in the ERR generation field. All code and datasets are publicly available.
- Asia > China > Hubei Province > Wuhan (0.04)
- Asia > Middle East > Saudi Arabia > Mecca Province > Thuwal (0.04)
- Asia > China > Hong Kong (0.04)
- (2 more...)
- Research Report > New Finding (0.91)
- Research Report > Experimental Study (0.81)
- Law (1.00)
- Health & Medicine (1.00)
- Banking & Finance > Trading (1.00)
- Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (0.46)
Can LLMs Correct Themselves? A Benchmark of Self-Correction in LLMs
Tie, Guiyao, Yuan, Zenghui, Zhao, Zeli, Hu, Chaoran, Gu, Tianhe, Zhang, Ruihang, Zhang, Sizhe, Wu, Junran, Tu, Xiaoyue, Jin, Ming, Wen, Qingsong, Chen, Lixing, Zhou, Pan, Sun, Lichao
Self-correction of large language models (LLMs) emerges as a critical component for enhancing their reasoning performance. Although various self-correction methods have been proposed, a comprehensive evaluation of these methods remains largely unexplored, and the question of whether LLMs can truly correct themselves is a matter of significant interest and concern. In this study, we introduce CorrectBench, a benchmark developed to evaluate the effectiveness of self-correction strategies, including intrinsic, external, and fine-tuned approaches, across three tasks: commonsense reasoning, mathematical reasoning, and code generation. Our findings reveal that: 1) Self-correction methods can improve accuracy, especially for complex reasoning tasks; 2) Mixing different self-correction strategies yields further improvements, though it reduces efficiency; 3) Reasoning LLMs (e.g., DeepSeek-R1) have limited optimization under additional self-correction methods and have high time costs. Interestingly, a comparatively simple chain-of-thought (CoT) baseline demonstrates competitive accuracy and efficiency. These results underscore the potential of self-correction to enhance LLM's reasoning performance while highlighting the ongoing challenge of improving their efficiency. Consequently, we advocate for further research focused on optimizing the balance between reasoning capabilities and operational efficiency. Project Page: https://correctbench.github.io/
- North America > The Bahamas (0.14)
- North America > United States > Colorado (0.04)
- North America > United States > California > San Bernardino County > San Bernardino (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Government (1.00)
- Media > Film (0.45)
Tailored Teaching with Balanced Difficulty: Elevating Reasoning in Multimodal Chain-of-Thought via Prompt Curriculum
Yang, Xinglong, Feng, Quan, Pan, Zhongying, Chen, Xiang, Tian, Yu, Li, Wentong, Qiao, Shuofei, Geng, Yuxia, Zhao, Xingyu, Huang, Sheng-Jun
The effectiveness of Multimodal Chain-of-Thought (MCoT) prompting is often limited by the use of randomly or manually selected examples. These examples fail to account for both model-specific knowledge distributions and the intrinsic complexity of the tasks, resulting in suboptimal and unstable model performance. To address this, we propose a novel framework inspired by the pedagogical principle of "tailored teaching with balanced difficulty". We reframe prompt selection as a prompt curriculum design problem: constructing a well ordered set of training examples that align with the model's current capabilities. Our approach integrates two complementary signals: (1) model-perceived difficulty, quantified through prediction disagreement in an active learning setup, capturing what the model itself finds challenging; and (2) intrinsic sample complexity, which measures the inherent difficulty of each question-image pair independently of any model. By jointly analyzing these signals, we develop a difficulty-balanced sampling strategy that ensures the selected prompt examples are diverse across both dimensions. Extensive experiments conducted on five challenging benchmarks and multiple popular Multimodal Large Language Models (MLLMs) demonstrate that our method yields substantial and consistent improvements and greatly reduces performance discrepancies caused by random sampling, providing a principled and robust approach for enhancing multimodal reasoning.
- North America > United States > California (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
PRODIGY: Enabling In-context Learning Over Graphs
While large language models have demonstrated this ability, how in-context learning could be performed over graphs is unexplored. In this paper, we develop Pr etraining O ver D iverse I n-Context G raph S y stems (PRODIGY), the first pretraining framework that enables in-context learning over graphs.
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
- Information Technology > Data Science > Data Mining (0.94)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)
SimCity: Multi-Agent Urban Development Simulation with Rich Interactions
Feng, Yeqi, Lu, Yucheng, Su, Hongyu, He, Tianxing
We present SimCity, a multi-agent framework that leverages LLMs to model an interpretable macroeconomic system with heterogeneous agents and rich interactions. Unlike classical equilibrium models that limit heterogeneity for tractability, or traditional agent-based models (ABMs) that rely on hand-crafted decision rules, SimCity enables flexible, adaptive behavior with transparent natural-language reasoning. Within SimCity, four core agent types (households, firms, a central bank, and a government) deliberate and participate in a frictional labor market, a heterogeneous goods market, and a financial market. Furthermore, a Vision-Language Model (VLM) determines the geographic placement of new firms and renders a mapped virtual city, allowing us to study both macroeconomic regularities and urban expansion dynamics within a unified environment. To evaluate the framework, we compile a checklist of canonical macroeconomic phenomena, including price elasticity of demand, Engel's Law, Okun's Law, the Phillips Curve, and the Beveridge Curve, and show that SimCity naturally reproduces these empirical patterns while remaining robust across simulation runs.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (2 more...)
- Banking & Finance > Economy (1.00)
- Banking & Finance > Real Estate (0.94)
- Banking & Finance > Trading (0.93)
- Government > Regional Government > North America Government > United States Government (0.93)
Summarize-Exemplify-Reflect: Data-driven Insight Distillation Empowers LLMs for Few-shot Tabular Classification
Yuan, Yifei, Li, Jiatong, Zhang, Weijia, Aliannejadi, Mohammad, Kanoulas, Evangelos, Hu, Renjun
Recent studies show the promise of large language models (LLMs) for few-shot tabular classification but highlight challenges due to the variability in structured data. To address this, we propose distilling data into actionable insights to enable robust and effective classification by LLMs. Drawing inspiration from human learning processes, we introduce InsightTab, an insight distillation framework guided by principles of divide-and-conquer, easy-first, and reflective learning. Our approach integrates rule summarization, strategic exemplification, and insight reflection through deep collaboration between LLMs and data modeling techniques. The obtained insights enable LLMs to better align their general knowledge and capabilities with the particular requirements of specific tabular tasks. We extensively evaluate InsightTab on nine datasets. The results demonstrate consistent improvement over state-of-the-art methods. Ablation studies further validate the principle-guided distillation process, while analyses emphasize InsightTab's effectiveness in leveraging labeled data and managing bias.
- North America > United States > California (0.04)
- Asia > China (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Promising Solution (0.66)
- Health & Medicine > Therapeutic Area > Endocrinology (1.00)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Education > Educational Setting (0.93)
- Banking & Finance (0.68)