AITopics

Motivated by the large progress made by large language models (LLMs), we introduce the framework of verbalized machine learning (VML). In contrast to conventional machine learning models that are typically optimized over a continuous parameter space, VML constrains the parameter space to be human-interpretable natural language. Such a constraint leads to a new perspective of function approximation, where an LLM with a text prompt can be viewed as a function parameterized by the text prompt. Guided by this perspective, we revisit classical machine learning problems, such as regression and classification, and find that these problems can be solved by an LLM-parameterized learner and optimizer. The major advantages of VML include (1) easy encoding of inductive bias: prior knowledge about the problem and hypothesis class can be encoded in natural language and fed into the LLM-parameterized learner; (2) automatic model class selection: the optimizer can automatically select a concrete model class based on data and verbalized prior knowledge, and it can update the model class during training; and (3) interpretable learner updates: the LLM-parameterized optimizer can provide explanations for why each learner update is performed. We conduct several studies to empirically evaluate the effectiveness of VML, and hope that VML can serve as a stepping stone to stronger interpretability and trustworthiness in ML.

class label, model description, pattern description, (14 more...)

2406.04344

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Workflow (0.70)
Research Report (0.49)
Instructional Material (0.46)

Industry:

Health & Medicine > Therapeutic Area (0.71)
Health & Medicine > Diagnostic Medicine (0.68)
Education (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Sparsity-Accelerated Training for Large Language Models

Ma, Da, Chen, Lu, Wang, Pengyu, Xu, Hongshen, Li, Hanqi, Sun, Liangtai, Zhu, Su, Fan, Shuai, Yu, Kai

Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks but often require additional training, such as continual pre-training and supervised fine-tuning. However, the costs associated with this, primarily due to their large parameter count, remain high. This paper proposes leveraging \emph{sparsity} in pre-trained LLMs to expedite this training process. By observing sparsity in activated neurons during forward iterations, we identify the potential for computational speed-ups by excluding inactive neurons. We address associated challenges by extending existing neuron importance evaluation metrics and introducing a ladder omission rate scheduler. Our experiments on Llama-2 demonstrate that Sparsity-Accelerated Training (SAT) achieves comparable or superior performance to standard training while significantly accelerating the process. Specifically, SAT achieves a $45\%$ throughput improvement in continual pre-training and saves $38\%$ training time in supervised fine-tuning in practice. It offers a simple, hardware-agnostic, and easily deployable framework for additional LLM training. Our code is available at https://github.com/OpenDFM/SAT.

fine-tuning, language model, neuron, (16 more...)

2406.01392

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre:

Instructional Material (0.61)
Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Yang, Qianlan, Wang, Yu-Xiong

ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories

Training autonomous agents with sparse rewards is a long-standing problem in online reinforcement learning (RL), due to low data efficiency. Prior work overcomes this challenge by extracting useful knowledge from offline data, often accomplished through the learning of action distribution from offline data and utilizing the learned distribution to facilitate online RL. However, since the offline data are given and fixed, the extracted knowledge is inherently limited, making it difficult to generalize to new tasks. We propose a novel approach that leverages offline data to learn a generative diffusion model, coined as Adaptive Trajectory Diffuser (ATraDiff). This model generates synthetic trajectories, serving as a form of data augmentation and consequently enhancing the performance of online RL methods. The key strength of our diffuser lies in its adaptability, allowing it to effectively handle varying trajectory lengths and mitigate distribution shifts between online and offline data. Because of its simplicity, ATraDiff seamlessly integrates with a wide spectrum of RL methods. Empirical evaluation shows that ATraDiff consistently achieves state-of-the-art performance across a variety of environments, with particularly pronounced improvements in complicated settings. Our code and demo video are available at https://atradiff.github.io .

atradiff, diffusion model, trajectory, (12 more...)

2406.04323

Country:

North America > United States > Illinois > Champaign County > Urbana (0.14)
Europe > Austria > Vienna (0.14)

Genre:

Research Report > New Finding (0.93)
Instructional Material > Online (0.62)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Do, Tam Thuc, Eftekhar, Parham, Hosseini, Seyed Alireza, Cheung, Gene, Chou, Philip

Interpretable Lightweight Transformer via Unrolling of Learned Graph Smoothness Priors

We build interpretable and lightweight transformer-like neural networks by unrolling iterative optimization algorithms that minimize graph smoothness priors -- the quadratic graph Laplacian regularizer (GLR) and the $\ell_1$-norm graph total variation (GTV) -- subject to an interpolation constraint. The crucial insight is that a normalized signal-dependent graph learning module amounts to a variant of the basic self-attention mechanism in conventional transformers. Unlike "black-box" transformers that require learning of large key, query and value matrices to compute scaled dot products as affinities and subsequent output embeddings, resulting in huge parameter sets, our unrolled networks employ shallow CNNs to learn low-dimensional features per node to establish pairwise Mahalanobis distances and construct sparse similarity graphs. At each layer, given a learned graph, the target interpolated signal is simply a low-pass filtered output derived from the minimization of an assumed graph smoothness prior, leading to a dramatic reduction in parameter count. Experiments for two image interpolation applications verify the restoration performance, parameter efficiency and robustness to covariate shift of our graph-based unrolled networks compared to conventional transformers.

algorithm, graph, matrix, (16 more...)

2406.0409

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Asia > India (0.04)

Genre:

Research Report (0.50)
Instructional Material > Course Syllabus & Notes (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Representational Alignment Supports Effective Machine Teaching

Sucholutsky, Ilia, Collins, Katherine M., Malaviya, Maya, Jacoby, Nori, Liu, Weiyang, Sumers, Theodore R., Korakakis, Michalis, Bhatt, Umang, Ho, Mark, Tenenbaum, Joshua B., Love, Brad, Pardos, Zachary A., Weller, Adrian, Griffiths, Thomas L.

A good teacher should not only be knowledgeable; but should be able to communicate in a way that the student understands -- to share the student's representation of the world. In this work, we integrate insights from machine teaching and pragmatic communication with the burgeoning literature on representational alignment to characterize a utility curve defining a relationship between representational alignment and teacher capability for promoting student learning. To explore the characteristics of this utility curve, we design a supervised learning environment that disentangles representational alignment from teacher accuracy. We conduct extensive computational experiments with machines teaching machines, complemented by a series of experiments in which machines teach humans. Drawing on our findings that improved representational alignment with a student improves student learning outcomes (i.e., task accuracy), we design a classroom matching procedure that assigns students to teachers based on the utility curve. If we are to design effective machine teachers, it is not enough to build teachers that are accurate -- we want teachers that can align, representationally, to their students too.

accuracy, representational alignment, student, (15 more...)

2406.04302

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Oklahoma > Payne County > Cushing (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Instructional Material (1.00)
Research Report > Experimental Study (0.68)
Research Report > New Finding (0.67)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)

Thakar, Pooja, Mehta, Anil, Manisha, null

Unified Prediction Model for Employability in Indian Higher Education System

Educational Data Mining has become extremely popular among researchers in last decade. Prior effort in this area was only directed towards prediction of academic performance of a student. Very less number of researches are directed towards predicting employability of a student i.e. prediction of students performance in campus placements at an early stage of enrollment. Furthermore, existing researches on students employability prediction are not universal in approach and is either based upon only one type of course or University/Institute. Henceforth, is not scalable from one context to another. With the necessity of unification, data of professional technical courses namely Bachelor in Engineering/Technology and Masters in Computer Applications students have been collected from 17 states of India. To deal with such a data, a unified predictive model has been developed and applied on 17 states datasets. The research done in this paper proves that model has universal application and can be applied to various states and institutes pan India with different cultural background and course structure. This paper also explores and proves statistically that there is no significant difference in Indian Education System with respect to states as far as prediction of employability of students is concerned. Model provides a generalized solution for student employability prediction in Indian Scenario.

artificial intelligence, machine learning, modeling & simulation, (16 more...)

2407.17591

Country:

Asia > India > Chhattisgarh (0.05)
Asia > India > West Bengal (0.05)
Asia > India > Uttarakhand (0.05)
(11 more...)

Genre:

Research Report > New Finding (0.68)
Instructional Material > Course Syllabus & Notes (0.55)
Research Report > Experimental Study (0.50)

Industry:

Education > Educational Setting > Higher Education (0.42)
Education > Assessment & Standards (0.35)

Technology:

Information Technology > Modeling & Simulation (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.48)

Thakar, Pooja, Mehta, Anil, Manisha, null

Cluster Model for parsimonious selection of variables and enhancing Students Employability Prediction

Educational Data Mining (EDM) is a promising field, where data mining is widely used for predicting students performance. One of the most prevalent and recent challenge that higher education faces today is making students skillfully employable. Institutions possess large volume of data; still they are unable to reveal knowledge and guide their students. Data in education is generally very large, multidimensional and unbalanced in nature. Process of extracting knowledge from such data has its own set of problems and is a very complicated task. In this paper, Engineering and MCA (Masters in Computer Applications) students data is collected from various universities and institutes pan India. The dataset is large, unbalanced and multidimensional in nature. A cluster based model is presented in this paper, which, when applied at preprocessing stage helps in parsimonious selection of variables and improves the performance of predictive algorithms. Hence, facilitate in better prediction of Students Employability.

algorithm, predictive algorithm, student, (14 more...)

2407.16884

Country:

Asia > India > Rajasthan > Jaipur (0.05)
Asia > Malaysia (0.04)
Asia > Singapore (0.04)

Genre:

Research Report > New Finding (0.94)
Instructional Material (0.93)

Industry: Education > Educational Setting > Higher Education (0.66)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Code Comparison Tuning for Code Large Language Models

Jiang, Yufan, He, Qiaozhi, Zhuang, Xiaomin, Wu, Zhihua

We present Code Comparison Tuning (CCT), a simple and effective tuning method for code large language models (Code LLMs) to better handle subtle code errors. Specifically, we integrate the concept of comparison into instruction tuning, both at the token and sequence levels, enabling the model to discern even the slightest deviations in code. To compare the original code with an erroneous version containing manually added code errors, we use token-level preference loss for detailed token-level comparisons. Additionally, we combine code segments to create a new instruction tuning sample for sequence-level comparisons, enhancing the model's bug-fixing capability. Experimental results on the HumanEvalFix benchmark show that CCT surpasses instruction tuning in pass@1 scores by up to 4 points across diverse code LLMs, and extensive analysis demonstrates the effectiveness of our method.

arxiv preprint arxiv, instruction, language model, (11 more...)

2403.19121

Genre:

Research Report (0.50)
Instructional Material (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Zeinalipour, Kamyar, Keptiğ, Yusuf Gökberk, Maggini, Marco, Gori, Marco

Automating Turkish Educational Quiz Generation Using Large Language Models

Crafting quizzes from educational content is a pivotal activity that benefits both teachers and students by reinforcing learning and evaluating understanding. In this study, we introduce a novel approach to generate quizzes from Turkish educational texts, marking a pioneering endeavor in educational technology specifically tailored to the Turkish educational context. We present a specialized dataset, named the Turkish-Quiz-Instruct, comprising an extensive collection of Turkish educational texts accompanied by multiple-choice and short-answer quizzes. This research leverages the capabilities of Large Language Models (LLMs), including GPT-4-Turbo, GPT-3.5-Turbo, Llama-2-7b-chat-hf, and Llama-2-13b-chat-hf, to automatically generate quiz questions and answers from the Turkish educational content. Our work delineates the methodology for employing these LLMs in the context of Turkish educational material, thereby opening new avenues for automated Turkish quiz generation. The study not only demonstrates the efficacy of using such models for generating coherent and relevant quiz content but also sets a precedent for future research in the domain of automated educational content creation for languages other than English. The Turkish-Quiz-Instruct dataset is introduced as a valuable resource for researchers and practitioners aiming to explore the boundaries of educational technology and language-specific applications of LLMs in Turkish. By addressing the challenges of quiz generation in a non-English context specifically Turkish, this study contributes significantly to the field of Turkish educational technology, providing insights into the potential of leveraging LLMs for educational purposes across diverse linguistic landscapes.

arxiv preprint arxiv, dataset, question generation, (13 more...)

2406.03397

Country:

Europe > Italy (0.04)
Asia > China > Liaoning Province > Dalian (0.04)

Genre:

Instructional Material (1.00)
Research Report > New Finding (0.67)

Industry:

Education > Educational Technology (0.89)
Education > Educational Setting (0.68)
Education > Assessment & Standards > Student Performance (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

BIPED: Pedagogically Informed Tutoring System for ESL Education

Kwon, Soonwoo, Kim, Sojung, Park, Minju, Lee, Seunghyun, Kim, Kyuseok

Thereafter, we analyzed the dataset post-hoc from a pedagogical As Large Language Models (LLMs) such as viewpoint and developed a categorization GPT (Achiam et al., 2023) revolutionize the field of dialogue acts, which comprises 34 tutor acts and of natural language generation, both researchers 9 student acts. Finally, we annotated the data using and practitioners have put an increasing amount the defined dialogue act categories. of effort into developing Conversational Intelligent As for the development of CITS, we employ Tutoring Systems (CITS) that leverage the the framework (Macina et al., 2023b; Wang et al., generative capabilities of LLM's (Tack and Piech, 2023a) whereby the LLM first chooses the suitable 2022; Abdelghani et al., 2022; Park et al., 2024; tutor act, then generates the corresponding Lee et al., 2023). Specifically, LLMs have the potential utterance. We believe this approach enables the to teach English as a Second/Foreign Language model to generate a more focused response that (ESL/EFL), for they may serve as readilyavailable does not deviate from the chosen tutor intent. We tutors that can emulate native-speaking consider two implementations of such CITS, one contexts (Park et al., 2024; Lee et al., 2023).

eng, student, tutor, (15 more...)

2406.03486

Country:

Asia > Middle East > Republic of Türkiye (0.04)
Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
Oceania > Australia (0.04)
(3 more...)

Genre:

Research Report (1.00)
Personal > Interview (1.00)
Instructional Material (0.93)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)