Goto

Collaborating Authors

 distilled


Energy-Efficient Vision Transformer Inference for Edge-AI Deployment

Amanzhol, Nursultan, Park, Jurn-Gyu

arXiv.org Artificial Intelligence

Abstract--The growing deployment of Vision Transformers (ViTs) on energy-constrained devices requires evaluation methods that go beyond accuracy alone. We present a two-stage pipeline for assessing ViT energy efficiency that combines device-agnostic model selection with device-related measurements. The device-agnostic stage uses the NetScore metric for screening; the device-related stage ranks models with the Sustainable Accuracy Metric (SAM). Results show that hybrid models such as LeViT_Conv_192 reduce energy by up to 53% on TX2 relative to a ViT baseline (e.g., SAM5=1.44 on TX2/CIF AR-10), while distilled models such as TinyViT-11M_Distilled excel on the mobile GPU (e.g., SAM5=1.72 on RTX 3050/CIF AR-10 and SAM5=0.76 on RTX 3050/ImageNet-1K). ECENTL Y, Vision Transformers (ViTs) have emerged as the state-of-the-art in many of computer vision tasks, from image classification to object detection [1].

  Country: Asia > Kazakhstan > Akmola Region > Astana (0.04)
  Genre: Research Report > New Finding (0.48)
  Industry: Energy (0.94)

MinorBench: A hand-built benchmark for content-based risks for children

Khoo, Shaun, Chua, Gabriel, Shong, Rachel

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are rapidly entering children's lives -- through parent-driven adoption, schools, and peer networks -- yet current AI ethics and safety research do not adequately address content-related risks specific to minors. In this paper, we highlight these gaps with a real-world case study of an LLMbased chatbot deployed in a middle school setting, revealing how students used and sometimes misused the system. We evaluate six prominent LLMs under different system prompts, demonstrating substantial variability in their childsafety compliance. Our results inform practical steps for more robust, childfocused safety mechanisms and underscore the urgency of tailoring AI systems to safeguard young users. Large Language Models (LLMs) have seen rapid adoption in educational settings, with both teachers and students recognizing their potential for personalized feedback and instant instructional support. Recent surveys indicate that over half of K-12 teachers in some regions now use LLMs for lesson planning, grading assistance, or creative class activities, while approximately onethird of students--some as young as 12--have experimented with such models for schoolwork (Common Sense Media, 2024). However, the emergence of LLMs in schools raises concerns about children's vulnerability. Children are still developing critical thinking skills, often place higher trust in authoritative-sounding answers, and may not fully understand an AI's limitations.


BabyLlama-2: Ensemble-Distilled Models Consistently Outperform Teachers With Limited Data

Tastet, Jean-Loup, Timiryasov, Inar

arXiv.org Artificial Intelligence

We present BabyLlama-2, a 345 million parameter model distillation-pretrained from two teachers on a 10 million word corpus for the BabyLM competition. On BLiMP and SuperGLUE benchmarks, BabyLlama-2 outperforms baselines trained on both 10 and 100 million word datasets with the same data mix, as well as its teacher models. Through an extensive hyperparameter sweep, we demonstrate that the advantages of distillation cannot be attributed to suboptimal hyperparameter selection of the teachers. Our findings underscore the need for further investigation into distillation techniques, particularly in data-limited settings.


STRUM-LLM: Attributed and Structured Contrastive Summarization

Gunel, Beliz, Wendt, James B., Xie, Jing, Zhou, Yichao, Vo, Nguyen, Fisher, Zachary, Tata, Sandeep

arXiv.org Artificial Intelligence

Users often struggle with decision-making between two options (A vs B), as it usually requires time-consuming research across multiple web pages. We propose STRUM-LLM that addresses this challenge by generating attributed, structured, and helpful contrastive summaries that highlight key differences between the two options. STRUM-LLM identifies helpful contrast: the specific attributes along which the two options differ significantly and which are most likely to influence the user's decision. Our technique is domain-agnostic, and does not require any human-labeled data or fixed attribute list as supervision. STRUM-LLM attributes all extractions back to the input sources along with textual evidence, and it does not have a limit on the length of input sources that it can process. STRUM-LLM Distilled has 100x more throughput than the models with comparable performance while being 10x smaller. In this paper, we provide extensive evaluations for our method and lay out future directions for our currently deployed system.


CrysGNN : Distilling pre-trained knowledge to enhance property prediction for crystalline materials

Das, Kishalay, Samanta, Bidisha, Goyal, Pawan, Lee, Seung-Cheol, Bhattacharjee, Satadeep, Ganguly, Niloy

arXiv.org Artificial Intelligence

In recent years, graph neural network (GNN) based approaches have emerged as a powerful technique to encode complex topological structure of crystal materials in an enriched representation space. These models are often supervised in nature and using the property-specific training data, learn relationship between crystal structure and different properties like formation energy, bandgap, bulk modulus, etc. Most of these methods require a huge amount of property-tagged data to train the system which may not be available for different properties. However, there is an availability of a huge amount of crystal data with its chemical composition and structural bonds. To leverage these untapped data, this paper presents CrysGNN, a new pre-trained GNN framework for crystalline materials, which captures both node and graph level structural information of crystal graphs using a huge amount of unlabelled material data. Further, we extract distilled knowledge from CrysGNN and inject into different state of the art property predictors to enhance their property prediction accuracy. We conduct extensive experiments to show that with distilled knowledge from the pre-trained model, all the SOTA algorithms are able to outperform their own vanilla version with good margins. We also observe that the distillation process provides a significant improvement over the conventional approach of finetuning the pre-trained model. We have released the pre-trained model along with the large dataset of 800K crystal graph which we carefully curated; so that the pretrained model can be plugged into any existing and upcoming models to enhance their prediction accuracy.


Announcing a New Science Magazine from Yale - Facts So Romantic

Nautilus

Open any newspaper, on-screen or off, and you'll find that scientific controversy underlies many of the day's most hotly debated issues. The arguments surrounding genetically modified organisms, the threat of artificial intelligence to human existence, and stem cell research are exemplary. Science, a domain that we might naively expect to provide objective knowledge and definitive answers, has always been and will remain forever contested. What is the non-expert--that is, most of us--to do? For most issues, interpreting research findings or parsing the academic debate is infeasible.