phi-3-mini
Training-Free Spectral Fingerprints of Voice Processing in Transformers
Different transformer architectures implement identical linguistic computations via distinct connectivity patterns, yielding model imprinted ``computational fingerprints'' detectable through spectral analysis. Using graph signal processing on attention induced token graphs, we track changes in algebraic connectivity (Fiedler value, $Δλ_2$) under voice alternation across 20 languages and three model families, with a prespecified early window (layers 2--5). Our analysis uncovers clear architectural signatures: Phi-3-Mini shows a dramatic English specific early layer disruption ($\overline{Δλ_2}_{[2,5]}\!\approx\!-0.446$) while effects in 19 other languages are minimal, consistent with public documentation that positions the model primarily for English use. Qwen2.5-7B displays small, distributed shifts that are largest for morphologically rich languages, and LLaMA-3.2-1B exhibits systematic but muted responses. These spectral signatures correlate strongly with behavioral differences (Phi-3: $r=-0.976$) and are modulated by targeted attention head ablations, linking the effect to early attention structure and confirming functional relevance. Taken together, the findings are consistent with the view that training emphasis can leave detectable computational imprints: specialized processing strategies that manifest as measurable connectivity patterns during syntactic transformations. Beyond voice alternation, the framework differentiates reasoning modes, indicating utility as a simple, training free diagnostic for revealing architectural biases and supporting model reliability analysis.
- Research Report > New Finding (0.94)
- Research Report > Experimental Study (0.69)
Edge-FIT: Federated Instruction Tuning of Quantized LLMs for Privacy-Preserving Smart Home Environments
Venkatesh, Vinay, Kamanuru, Vamsidhar R, Kumar, Lav, Kothari, Nikita
This paper proposes Edge-FIT (Federated Instruction Tuning on the Edge), a scalable framework for Federated Instruction Tuning (FIT) of Large Language Models (LLMs). Traditional Federated Learning (TFL) methods, like FedAvg, fail when confronted with the massive parameter size of LLMs [3], [6]. Our Edge-FIT framework combines federated learning with 4-bit Quantized Low-Rank Adaptation (QLORA), mitigating the core issues of communication and computational overhead. We demonstrate this by filtering the general-purpose Databricks Dolly 15k dataset for the IoT domain. Experimental results show the Edge-FIT tuned Llama 2(7B) achieves an F1-Score of 0.89. We also demonstrate a viable trade-off using the 3.8B Phi-3-mini model, validating Edge-FIT as a scalable framework for decentralized LLM deployment on home compute gateways.
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
OpenGrok: Enhancing SNS Data Processing with Distilled Knowledge and Mask-like Mechanisms
AI, Lumen, School, Zaozhuang No. 28 Middle, Ji, Shihao, Song, Zihui, Zhong, Fucheng, Jia, Jisen, Wu, Zhaobo, Cao, Zheyi, Xu, Tianhao
This report details Lumen Labs' novel approach to processing Social Networking Service (SNS) data. We leverage knowledge distillation, specifically a simple distillation method inspired by DeepSeek-R1's CoT acquisition, combined with prompt hacking, to extract valuable training data from the Grok model. This data is then used to fine-tune a Phi-3-mini model, augmented with a mask-like mechanism specifically designed for handling the nuances of SNS data. Our method demonstrates state-of-the-art (SOTA) performance on several SNS data processing tasks, outperforming existing models like Grok, Phi-3, and GPT-4. We provide a comprehensive analysis of our approach, including mathematical formulations, engineering details, ablation studies, and comparative evaluations.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
- Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.70)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.55)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)
FastDraft: How to Train Your Draft
Zafrir, Ofir, Margulis, Igor, Shteyman, Dorin, Boudoukh, Guy
Speculative Decoding has gained popularity as an effective technique for accelerating the auto-regressive inference process of Large Language Models (LLMs). However, Speculative Decoding entirely relies on the availability of efficient draft models, which are often lacking for many existing language models due to a stringent constraint of vocabulary incompatibility. In this work we introduce FastDraft, a novel and efficient approach for pre-training and aligning a draft model to any large language model by incorporating efficient pre-training, followed by fine-tuning over synthetic datasets generated by the target model. We demonstrate FastDraft by training two highly parameter efficient drafts for the popular Phi-3-mini and Llama-3.1-8B models. Using FastDraft, we were able to produce a draft with approximately 10 billion tokens on a single server with 8 Intel$^\circledR$ Gaudi$^\circledR$ 2 accelerators in under 24 hours. Our results show that the draft model achieves impressive results in key metrics of acceptance rate, block efficiency and up to 3x memory bound speed up when evaluated on code completion and up to 2x in summarization, text completion and instruction tasks. We validate our theoretical findings through benchmarking on the latest Intel$^\circledR$ Core$^{\tiny \text{TM}}$ Ultra, achieving a wall-clock time speedup of up to 2x, indicating a significant reduction in runtime. Due to its high quality, FastDraft unlocks large language models inference on AI-PC and other edge-devices.
Beyond Words: Evaluating Large Language Models in Transportation Planning
Ying, Shaowei, Li, Zhenlong, Yu, Manzhu
The resurgence and rapid advancement of Generative Artificial Intelligence (GenAI) in 2023 has catalyzed transformative shifts across numerous industry sectors, including urban transportation and logistics. This study investigates the evaluation of Large Language Models (LLMs), specifically GPT-4 and Phi-3-mini, to enhance transportation planning. The study assesses the performance and spatial comprehension of these models through a transportation-informed evaluation framework that includes general geospatial skills, general transportation domain skills, and real-world transportation problem-solving. Utilizing a mixed-methods approach, the research encompasses an evaluation of the LLMs' general Geographic Information System (GIS) skills, general transportation domain knowledge as well as abilities to support human decision-making in the real-world transportation planning scenarios of congestion pricing. Results indicate that GPT-4 demonstrates superior accuracy and reliability across various GIS and transportation-specific tasks compared to Phi-3-mini, highlighting its potential as a robust tool for transportation planners. Nonetheless, Phi-3-mini exhibits competence in specific analytical scenarios, suggesting its utility in resource-constrained environments. The findings underscore the transformative potential of GenAI technologies in urban transportation planning. Future work could explore the application of newer LLMs and the impact of Retrieval-Augmented Generation (RAG) techniques, on a broader set of real-world transportation planning and operations challenges, to deepen the integration of advanced AI models in transportation management practices.
- North America > United States > Pennsylvania > Centre County > University Park (0.04)
- North America > United States > Florida (0.04)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- (5 more...)
- Research Report > Experimental Study (0.92)
- Research Report > New Finding (0.65)
- Transportation > Passenger (1.00)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (1.00)
- (2 more...)
Fast Training Dataset Attribution via In-Context Learning
Fotouhi, Milad, Bahadori, Mohammad Taha, Feyisetan, Oluwaseyi, Arabshahi, Payman, Heckerman, David
Training Data Attribution (TDA) refers to the task of quantifying contributions of different data sources on the outputs of a model (Park et al., 2023; Nguyen et al., 2023). This task is essential for debugging the processes of curating corpora for training and for improving the training of neural networks. Understanding the contribution of data sources allows us to assess the monetary value of proprietary training data, which is crucial for fair compensation and data management (Ghorbani & Zou, 2019; Nohyun et al., 2022). Existing methods for TDA, primarily fall into two categories: retraining-based methods and influence function-based methods, as detailed in recent surveys (Hammoudeh & Lowd, 2024; Worledge et al., 2024). Retraining approaches such as those by (Feldman & Zhang, 2020; Ghorbani & Zou, 2019) involve retraining the model without the target data source.
Microsoft debuts Phi Silica, AI specifically for Copilot PCs
In 2023, Microsoft was a big believer in large language models, running in the cloud. At Microsoft Build, the company launched Phi Silica, a small language model designed to run specifically on the NPUs in new Copilot PCs. In April, Microsoft announced Phi-3-mini, a model small enough to run on a local PC. Phi Silica is a derivative of Phi-3-mini, designed specifically to run on the Copilot PCs that Microsoft announced Monday. Most interactions with AI take place in the cloud; Microsoft's existing Copilot service, even on your PC, talks to a Microsoft remote server.
Copilot running locally on your PC? Microsoft's new AI could be the key
Many people believe that Microsoft will eventually provide a version of Copilot that will run within Windows right on your PC. Microsoft may have just tipped its hand with a new AI model, or LLM, that is specifically designed to run on local devices. On Monday, Microsoft introduced "Phi-3-mini," a 3.8-billion parameter language model that the company claims rivals the performance of the slightly older ChatGPT 3.5 and Mixtral 8x7B. The paper is titled "Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone," which is clear evidence that Microsoft now has an LLM that can run directly on your PC. Microsoft hasn't said that Phi-3-mini will be the next Copilot, running locally on your PC.