AITopics | persyn

Collaborating Authors

persyn

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Find Your Optimal Teacher: Personalized Data Synthesis via Router-Guided Multi-Teacher Distillation

Zhang, Hengyuan, Yang, Shiping, Liang, Xiao, Shang, Chenming, Jiang, Yuxuan, Tao, Chaofan, Xiong, Jing, So, Hayden Kwok-Hay, Xie, Ruobing, Chang, Angel X., Wong, Ngai

arXiv.org Artificial IntelligenceOct-14-2025

Training student models on synthetic data generated by strong teacher models is a promising way to distilling the capabilities of teachers. However, recent studies show that stronger models are not always optimal teachers, revealing a mismatch between teacher outputs and student learnability. To address this issue, we propose PerSyn (Personalized data Synthesis), a novel synthesis strategy that operates under a new ``Route then Generate'' paradigm to create data tailored to each student model, enabling it to learn more effectively. Specifically, PerSyn first assigns each prompt to its optimal teacher via a query-level router that jointly considers student learnability and teacher response quality. Each teacher then synthesizes data only for its assigned prompts, making the process more efficient than the conventional ``Generate then Select'' paradigm, where all teachers must generate parallel responses for the entire prompt set before constructing the final dataset. Extensive experiments across different model families and scales demonstrate that PerSyn consistently achieves superior or comparable performance to all baselines in instruct tuning and math reasoning settings. Further analysis verifies the effectiveness of PerSyn and offers extra insights to propel future research.

large language model, machine learning, teacher model, (17 more...)

arXiv.org Artificial Intelligence

2510.10925

Country:

Asia (0.93)
North America > United States > Maryland (0.28)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.88)

Industry: Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

GoSGD: Distributed Optimization for Deep Learning with Gossip Exchange

Blot, Michael, Picard, David, Cord, Matthieu

arXiv.org Machine LearningApr-4-2018

With deep convolutional neural networks (CNN) introduced by [1] and [2], computer vision tasks and more specifically image classification have made huge improvements in the years following [3]. CNN performances benefit a lot from big collections of annotated images like [4] or [5]. They are trained by optimizing a loss function with gradient descents computed on random mini-batches according to [6]. The method called stochastic gradient descent (SGD) has proved to be very efficient to train neural networks in general. However current CNN structures are extremely deep like the 100 layers ResNet of [7] and contains a lot of parameters (around 60M for Alexnet [3] and 130M for vgg [8]). Those structures involve heavy gradient computation times making the training on big data-sets very slow. Computation on GPU accelerates the training but requires huge local memory caches. Nevertheless the mini-batch optimization seems suitable for distributing the training over several threads. Many methods have been proposed like 1 [9, 10], which propose to distribute the batches over different threads called workers that periodically exchange information via a central thread to synchronize their models.

communication, gosgd, persyn, (12 more...)

arXiv.org Machine Learning

1804.01852

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > France > Île-de-France > Yvelines > Cergy-Pontoise (0.04)
Europe > France > Île-de-France > Val-d'Oise > Cergy-Pontoise (0.04)
(2 more...)

Genre: Research Report (0.52)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.92)

Add feedback