mtp
Prompt Less, Smile More: MTP with Semantic Engineering in Lieu of Prompt Engineering
Dantanarayana, Jayanaka L., Kashmira, Savini, Nathees, Thakee, Zhang, Zichen, Flautner, Krisztian, Tang, Lingjia, Mars, Jason
AI-Integrated programming is emerging as a foundational paradigm for building intelligent systems with large language models (LLMs). Recent approaches such as Meaning Typed Programming (MTP) automate prompt generation by leveraging the semantics already present in code. However, many real-world applications depend on contextual cues, developer intent, and domain-specific reasoning that extend beyond what static code semantics alone can express. To address this limitation, we introduce Semantic Engineering, a lightweight method for enriching program semantics so that LLM-based systems can more accurately reflect developer intent without requiring full manual prompt design. We present Semantic Context Annotations (SemTexts), a language-level mechanism that allows developers to embed natural-language context directly into program constructs. Integrated into the Jac programming language, Semantic Engineering extends MTP to incorporate these enriched semantics during prompt generation. We further introduce a benchmark suite designed to reflect realistic AI-Integrated application scenarios. Our evaluation shows that Semantic Engineering substantially improves prompt fidelity, achieving performance comparable to Prompt Engineering while requiring significantly less developer effort.
MTP: Exploring Multimodal Urban Traffic Profiling with Modality Augmentation and Spectrum Fusion
Xiang, Haolong, Wang, Peisi, Xu, Xiaolong, Yi, Kun, Zhang, Xuyun, Sheng, Quanzheng, Beheshti, Amin, Fan, Wei
With rapid urbanization in the modern era, traffic signals from various sensors have been playing a significant role in monitoring the states of cities, which provides a strong foundation in ensuring safe travel, reducing traffic congestion and optimizing urban mobility. Most existing methods for traffic signal modeling often rely on the original data modality, i.e., numerical direct readings from the sensors in cities. However, this unimodal approach overlooks the semantic information existing in multimodal heterogeneous urban data in different perspectives, which hinders a comprehensive understanding of traffic signals and limits the accurate prediction of complex traffic dynamics. To address this problem, we propose a novel Multimodal framework, MTP, for urban Traffic Profiling, which learns multimodal features through numeric, visual, and textual perspectives. The three branches drive for a multimodal perspective of urban traffic signal learning in the frequency domain, while the frequency learning strategies delicately refine the information for extraction. Specifically, we first conduct the visual augmentation for the traffic signals, which transforms the original modality into frequency images and periodicity images for visual learning. Also, we augment descriptive texts for the traffic signals based on the specific topic, background information and item description for textual learning. To complement the numeric information, we utilize frequency multilayer perceptrons for learning on the original modality. We design a hierarchical contrastive learning on the three branches to fuse the spectrum of three modalities. Finally, extensive experiments on six real-world datasets demonstrate superior performance compared with the state-of-the-art approaches.
Fast and Expressive Multi-Token Prediction with Probabilistic Circuits
Grivas, Andreas, Loconte, Lorenzo, van Krieken, Emile, Nawrot, Piotr, Zhao, Yu, Wielewski, Euan, Minervini, Pasquale, Ponti, Edoardo, Vergari, Antonio
Multi-token prediction (MTP) is a prominent strategy to significantly speed up generation in large language models (LLMs), including byte-level LLMs, which are tokeniser-free but prohibitively slow. However, existing MTP methods often sacrifice expressiveness by assuming independence between future tokens. In this work, we investigate the trade-off between expressiveness and latency in MTP within the framework of probabilistic circuits (PCs). Our framework, named MTPC, allows one to explore different ways to encode the joint distributions over future tokens by selecting different circuit architectures, generalising classical models such as (hierarchical) mixture models, hidden Markov models and tensor networks. We show the efficacy of MTPC by retrofitting existing byte-level LLMs, such as EvaByte. Our experiments show that, when combined with speculative decoding, MTPC significantly speeds up generation compared to MTP with independence assumptions, while guaranteeing to retain the performance of the original verifier LLM. We also rigorously study the optimal trade-off between expressiveness and latency when exploring the possible parameterisations of MTPC, such as PC architectures and partial layer sharing between the verifier and draft LLMs.
Beyond Multi-Token Prediction: Pretraining LLMs with Future Summaries
Mahajan, Divyat, Goyal, Sachin, Idrissi, Badr Youbi, Pezeshki, Mohammad, Mitliagkas, Ioannis, Lopez-Paz, David, Ahuja, Kartik
Next-token prediction (NTP) has driven the success of large language models (LLMs), but it struggles with long-horizon reasoning, planning, and creative writing, with these limitations largely attributed to teacher-forced training. Multi-token prediction (MTP) partially mitigates these issues by predicting several future tokens at once, but it mostly captures short-range dependencies and offers limited improvement. We propose future summary prediction (FSP), which trains an auxiliary head to predict a compact representation of the long-term future, preserving information relevant for long-form generations. We explore two variants of FSP: handcrafted summaries, for example, a bag of words summary of the future of the sequence, and learned summaries, which use embeddings produced by a reverse language model trained from right to left. Large-scale pretraining experiments (3B and 8B-parameter models) demonstrate that FSP provides improvements over both NTP and MTP across math, reasoning, and coding benchmarks.
Memory Transfer Planning: LLM-driven Context-Aware Code Adaptation for Robot Manipulation
Kagaya, Tomoyuki, Lakshmi, Subramanian, Lou, Yuxuan, Yuan, Thong Jing, Karlekar, Jayashree, Pranata, Sugiri, Murakami, Natsuki, Kinose, Akira, You, Yang
Large language models (LLMs) are increasingly explored in robot manipulation, but many existing methods struggle to adapt to new environments. Many systems require either environment-specific policy training or depend on fixed prompts and single-shot code generation, leading to limited transferability and manual re-tuning. We introduce Memory Transfer Planning (MTP), a framework that leverages successful control-code examples from different environments as procedural knowledge, using them as in-context guidance for LLM-driven planning. Specifically, MTP (i) generates an initial plan and code using LLMs, (ii) retrieves relevant successful examples from a code memory, and (iii) contextually adapts the retrieved code to the target setting for re-planning without updating model parameters. We evaluate MTP on RLBench, CALVIN, and a physical robot, demonstrating effectiveness beyond simulation. Across these settings, MTP consistently improved success rate and adaptability compared with fixed-prompt code generation, naive retrieval, and memory-free re-planning. Furthermore, in hardware experiments, leveraging a memory constructed in simulation proved effective. MTP provides a practical approach that exploits procedural knowledge to realize robust LLM-based planning across diverse robotic manipulation scenarios, enhancing adaptability to novel environments and bridging simulation and real-world deployment.
Predicting the Order of Upcoming Tokens Improves Language Modeling
Zuhri, Zayd M. K., Fuadi, Erland Hilman, Aji, Alham Fikri
Multi-Token Prediction (MTP) has been proposed as an auxiliary objective to improve next-token prediction (NTP) in language model training but shows inconsistent improvements, underperforming in standard NLP benchmarks. We argue that MTP's exact future token prediction is too difficult as an auxiliary loss. Instead, we propose Token Order Prediction (TOP), which trains models to order upcoming tokens by their proximity using a learning-to-rank loss. TOP requires only a single additional unembedding layer compared to MTP's multiple transformer layers. We pretrain models of 340M, 1.8B, and 7B parameters using NTP, MTP, and TOP objectives. Results on eight standard NLP benchmarks show that TOP overall outperforms both NTP and MTP even at scale. Our code is available at https://github.com/zaydzuhri/token-order-prediction
Path-integral molecular dynamics with actively-trained and universal machine learning force fields
Solovykh, A. A., Rybin, N. E., Novikov, I. S., Shapeev, A. V.
Accounting for nuclear quantum effects (NQEs) can significantly alter material properties at finite temperatures. Atomic modeling using the path-integral molecular dynamics (PIMD) method can fully account for such effects, but requires computationally efficient and accurate models of interatomic interactions. Empirical potentials are fast but may lack sufficient accuracy, whereas quantum-mechanical calculations are highly accurate but computationally expensive. Machine-learned interatomic potentials offer a solution to this challenge, providing near-quantum-mechanical accuracy while maintaining high computational efficiency compared to density functional theory (DFT) calculations. In this context, an interface was developed to integrate moment tensor potentials (MTPs) from the MLIP-2 software package into PIMD calculations using the i-PI software package. This interface was then applied to active learning of potentials and to investigate the influence of NQEs on material properties, namely the temperature dependence of lattice parameters and thermal expansion coefficients, as well as radial distribution functions, for lithium hydride (LiH) and silicon (Si) systems. The results were compared with experimental data, quasi-harmonic approximation calculations, and predictions from the universal machine learning force field MatterSim. These comparisons demonstrated the high accuracy and effectiveness of the MTP-PIMD approach.
On Simulating Thin-Film Processes at the Atomic Scale Using Machine Learned Force Fields
Natarajan, S. Kondati, Schneider, J., Pandey, N., Wellendorff, J., Smidstrup, S.
Atomistic modeling of thin-film processes provides an avenue not only for discovering key chemical mechanisms of the processes but also to extract quantitative metrics on the events and reactions taking place at the gas-surface interface. Molecular dynamics (MD) is a powerful computational method to study the evolution of a process at the atomic scale, but studies of industrially relevant processes usually require suitable force fields, which are in general not available for all processes of interest. However, machine learned force fields (MLFF) are conquering the field of computational materials and surface science. In this paper, we demonstrate how to efficiently build MLFFs suitable for process simulations and provide two examples for technologically relevant processes: precursor pulse in the atomic layer deposition of HfO2 and atomic layer etching of MoS2.
New Intent Discovery with Pre-training and Contrastive Learning
Zhang, Yuwei, Zhang, Haode, Zhan, Li-Ming, Lam, Albert Y. S., Wu, Xiao-Ming
New intent discovery aims to uncover novel intent categories from user utterances to expand the set of supported intent classes. It is a critical task for the development and service expansion of a practical dialogue system. Despite its importance, this problem remains under-explored in the literature. Existing approaches typically rely on a large amount of labeled utterances and employ pseudo-labeling methods for representation learning and clustering, which are label-intensive, inefficient, and inaccurate. In this paper, we provide new solutions to two important research questions for new intent discovery: (1) how to learn semantic utterance representations and (2) how to better cluster utterances. Particularly, we first propose a multi-task pre-training strategy to leverage rich unlabeled data along with external labeled data for representation learning. Then, we design a new contrastive loss to exploit self-supervisory signals in unlabeled data for clustering. Extensive experiments on three intent recognition benchmarks demonstrate the high effectiveness of our proposed method, which outperforms state-of-the-art methods by a large margin in both unsupervised and semi-supervised scenarios. The source code will be available at https://github.com/zhang-yu-wei/MTP-CLNN.
On multi-token prediction for efficient LLM inference
Mehra, Somesh, Garcia, Javier Alonso, Mauch, Lukas
We systematically investigate multi-token prediction (MTP) capabilities within LLMs pre-trained for next-token prediction (NTP). We first show that such models inherently possess MTP capabilities via numerical marginalization over intermediate token probabilities, though performance is data-dependent and improves with model scale. Furthermore, we explore the challenges of integrating MTP heads into frozen LLMs and find that their hidden layers are strongly specialized for NTP, making adaptation non-trivial. Finally, we show that while joint training of MTP heads with the backbone improves performance, it cannot fully overcome this barrier, prompting further research in this direction. Our findings provide a deeper understanding of MTP applied to pretrained LLMs, informing strategies for accelerating inference through parallel token prediction. In recent years, decoder-only transformers have emerged as the state-of-the-art models for language modeling and are widely adopted for large language models (LLMs).