semantic signal
Reasoning Scaffolding: Distilling the Flow of Thought from LLMs
Wen, Xiangyu, Huang, Junhua, Li, Zeju, Li, Min, Zhong, Jianyuan, Xu, Zhijian, Yuan, Mingxuan, Huang, Yongxiang, Xu, Qiang
The prevailing approach to distilling reasoning from Large Language Models (LLMs)-behavioral cloning from textual rationales-is fundamentally limited. It teaches Small Language Models (SLMs) to mimic surface-level patterns rather than the underlying algorithmic structure of thought, resulting in a critical lack of logical robustness. We argue that instead of cloning text, distillation should transfer this algorithmic structure directly. We introduce Reasoning Scaffolding}, a framework that reframes reasoning as a structured generation process. Our method first abstracts the teacher's thought process into a sequence of discrete, interpretable semantic signals (e.g., Contrast, Addition) that act as a scaffold. The student model is then trained via a multi-task objective to both (1)predict the next semantic signal, anticipating the reasoning flow, and (2)generate the corresponding step, conditioned on that signal. This multi-task scheme acts as a powerful regularizer, compelling the student to internalize the computational patterns of coherent reasoning. On a suite of challenging reasoning benchmarks, our method significantly outperforms state-of-the-art distillation in both accuracy and logical consistency, providing a path towards creating smaller models that are genuine reasoners, not just fluent mimics.
- North America > United States (0.28)
- North America > Bermuda (0.05)
- Asia > China > Hong Kong (0.04)
- Atlantic Ocean > North Atlantic Ocean (0.04)
- Law > Criminal Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Health & Medicine (1.00)
- Education (0.88)
EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration
Hong, Minjie, Xia, Yan, Wang, Zehan, Zhu, Jieming, Wang, Ye, Cai, Sihang, Yang, Xiaoda, Dai, Quanyu, Dong, Zhenhua, Zhang, Zhimeng, Zhao, Zhou
Large language models (LLMs) are increasingly leveraged as foundational backbones in the development of advanced recommender systems, offering enhanced capabilities through their extensive knowledge and reasoning. Existing llm-based recommender systems (RSs) often face challenges due to the significant differences between the linguistic semantics of pre-trained LLMs and the collaborative semantics essential for RSs. These systems use pre-trained linguistic semantics but learn collaborative semantics from scratch via the llm-Backbone. However, LLMs are not designed for recommendations, leading to inefficient collaborative learning, weak result correlations, and poor integration of traditional RS features. To address these challenges, we propose EAGER-LLM, a decoder-only llm-based generative recommendation framework that integrates endogenous and exogenous behavioral and semantic information in a non-intrusive manner. Specifically, we propose 1)dual-source knowledge-rich item indices that integrates indexing sequences for exogenous signals, enabling efficient link-wide processing; 2)non-invasive multiscale alignment reconstruction tasks guide the model toward a deeper understanding of both collaborative and semantic signals; 3)an annealing adapter designed to finely balance the model's recommendation performance with its comprehension capabilities. We demonstrate EAGER-LLM's effectiveness through rigorous testing on three public benchmarks.
- Oceania > Australia > New South Wales > Sydney (0.05)
- Asia > China > Zhejiang Province > Hangzhou (0.05)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > New York > New York County > New York City (0.04)
Analyzing Multimodal Objectives Through the Lens of Generative Diffusion Guidance
Recent years have witnessed astonishing advances in the field of multimodal representation learning, with contrastive learning being the cornerstone for major breakthroughs. Latest works delivered further improvements by incorporating different objectives such as masked modeling and captioning into the frameworks, but our understanding on how these objectives facilitate learning remains vastly incomplete. In this paper, we leverage the fact that classifier-guided diffusion models generate images that reflect the semantic signals provided by the classifier to study the characteristics of multimodal learning objectives. Specifically, we compare contrastive, matching and captioning loss in terms of their semantic signals, and introduce a simple baseline that not only supports our analyses but also improves the quality of generative guidance in a straightforward manner. Vision-Language Pretraining (VLP) has attracted great attention from the community for its wide and robust applications in different downstream tasks. Recently, cross-modal generative models (Ramesh et al., 2021; Saharia et al., 2022; Rombach et al., 2022; Kong et al., 2022) are also gaining wide popularity thanks to the powerful capacity of diffusion models (Sohl-Dickstein et al., 2015; Ho et al., 2020) and readily available guidance of vision-language foundation models.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > South Korea > Seoul > Seoul (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Why you should NOT use MS MARCO to evaluate semantic search - KDnuggets
MS MARCO is a collection of large scale datasets released by Microsoft with the intent of helping the advance of deep learning research related to search. It was our first choice when we decided to create a tutorial showing how to setup a text search application with Vespa. It was getting a lot of attention from the community, in great part due to the intense competition around leaderboards. Besides, being a large and challenging annotated corpus of documents, it checked all the boxes at the time. We followed up the first basic search tutorial with a blog post and a tutorial on how to use ML in Vespa to improve the text search application.
Learning and Planning with a Semantic Model
Wu, Yi, Wu, Yuxin, Tamar, Aviv, Russell, Stuart, Gkioxari, Georgia, Tian, Yuandong
Building deep reinforcement learning agents that can generalize and adapt to unseen environments remains a fundamental challenge for AI. This paper describes progresses on this challenge in the context of man-made environments, which are visually diverse but contain intrinsic semantic regularities. We propose a hybrid model-based and model-free approach, LEArning and Planning with Semantics (LEAPS), consisting of a multi-target sub-policy that acts on visual inputs, and a Bayesian model over semantic structures. When placed in an unseen environment, the agent plans with the semantic model to make high-level decisions, proposes the next sub-target for the sub-policy to execute, and updates the semantic model based on new observations. We perform experiments in visual navigation tasks using House3D, a 3D environment that contains diverse human-designed indoor scenes with real-world objects. LEAPS outperforms strong baselines that do not explicitly plan using the semantic content. Deep reinforcement learning (DRL) has undoubtedly witnessed strong achievements in recent years (Silver et al., 2016; Mnih et al., 2015; Levine et al., 2016).
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
- (2 more...)