Goto

Collaborating Authors

 Instructional Material


How to Do Machine Learning with Small Data? -- A Review from an Industrial Perspective

arXiv.org Artificial Intelligence

Artificial intelligence experienced a technological breakthrough in science, industry, and everyday life in the recent few decades. The advancements can be credited to the ever-increasing availability and miniaturization of computational resources that resulted in exponential data growth. However, because of the insufficient amount of data in some cases, employing machine learning in solving complex tasks is not straightforward or even possible. As a result, machine learning with small data experiences rising importance in data science and application in several fields. The authors focus on interpreting the general term of "small data" and their engineering and industrial application role. They give a brief overview of the most important industrial applications of machine learning and small data. Small data is defined in terms of various characteristics compared to big data, and a machine learning formalism was introduced. Five critical challenges of machine learning with small data in industrial applications are presented: unlabeled data, imbalanced data, missing data, insufficient data, and rare events. Based on those definitions, an overview of the considerations in domain representation and data acquisition is given along with a taxonomy of machine learning approaches in the context of small data.


Knowledge Tracing Challenge: Optimal Activity Sequencing for Students

arXiv.org Artificial Intelligence

Knowledge tracing is a method used in education to assess and track the acquisition of knowledge by individual learners. It involves using a variety of techniques, such as quizzes, tests, and other forms of assessment, to determine what a learner knows and does not know about a particular subject. The goal of knowledge tracing is to identify gaps in understanding and provide targeted instruction to help learners improve their understanding and retention of material. This can be particularly useful in situations where learners are working at their own pace, such as in online learning environments. By providing regular feedback and adjusting instruction based on individual needs, knowledge tracing can help learners make more efficient progress and achieve better outcomes. Effectively solving the KT problem would unlock the potential of computer-aided education applications such as intelligent tutoring systems, curriculum learning, and learning materials recommendations. In this paper, we will present the results of the implementation of two Knowledge Tracing algorithms on a newly released dataset as part of the AAAI2023 Global Knowledge Tracing Challenge.


Faster Algorithms for Structured Linear and Kernel Support Vector Machines

arXiv.org Machine Learning

Quadratic programming is a ubiquitous prototype in convex programming. Many combinatorial optimizations on graphs and machine learning problems can be formulated as quadratic programming; for example, Support Vector Machines (SVMs). Linear and kernel SVMs have been among the most popular models in machine learning over the past three decades, prior to the deep learning era. Generally, a quadratic program has an input size of $\Theta(n^2)$, where $n$ is the number of variables. Assuming the Strong Exponential Time Hypothesis ($\textsf{SETH}$), it is known that no $O(n^{2-o(1)})$ algorithm exists (Backurs, Indyk, and Schmidt, NIPS'17). However, problems such as SVMs usually feature much smaller input sizes: one is given $n$ data points, each of dimension $d$, with $d \ll n$. Furthermore, SVMs are variants with only $O(1)$ linear constraints. This suggests that faster algorithms are feasible, provided the program exhibits certain underlying structures. In this work, we design the first nearly-linear time algorithm for solving quadratic programs whenever the quadratic objective has small treewidth or admits a low-rank factorization, and the number of linear constraints is small. Consequently, we obtain a variety of results for SVMs: * For linear SVM, where the quadratic constraint matrix has treewidth $\tau$, we can solve the corresponding program in time $\widetilde O(n\tau^{(\omega+1)/2}\log(1/\epsilon))$; * For linear SVM, where the quadratic constraint matrix admits a low-rank factorization of rank-$k$, we can solve the corresponding program in time $\widetilde O(nk^{(\omega+1)/2}\log(1/\epsilon))$; * For Gaussian kernel SVM, where the data dimension $d = \Theta(\log n)$ and the squared dataset radius is small, we can solve it in time $O(n^{1+o(1)}\log(1/\epsilon))$. We also prove that when the squared dataset radius is large, then $\Omega(n^{2-o(1)})$ time is required.


PAC-Bayesian Generalization Bounds for Adversarial Generative Models

arXiv.org Machine Learning

Moreover, models and develop generalization bounds for having generalization bounds not only contributes to the theoretical models based on the Wasserstein distance and understanding of GANs themselves, but also to the the total variation distance. Our first result on understanding of the structure of real-life datasets, if those the Wasserstein distance assumes the instance can be provably approximated by GAN-generated data. In space is bounded, while our second result takes addition, given that GANs are used for data-augmentation advantage of dimensionality reduction. Our results in fields such as medical image classification (see e.g. Frid-naturally apply to Wasserstein GANs and Adar et al., 2018), theoretical guarantees can substantiate Energy-Based GANs, and our bounds provide the soundness of such applications.


Understanding Optimization of Deep Learning via Jacobian Matrix and Lipschitz Constant

arXiv.org Machine Learning

This article provides a comprehensive understanding of optimization in deep learning, with a primary focus on the challenges of gradient vanishing and gradient exploding, which normally lead to diminished model representational ability and training instability, respectively. We analyze these two challenges through several strategic measures, including the improvement of gradient flow and the imposition of constraints on a network's Lipschitz constant. To help understand the current optimization methodologies, we categorize them into two classes: explicit optimization and implicit optimization. Explicit optimization methods involve direct manipulation of optimizer parameters, including weight, gradient, learning rate, and weight decay. Implicit optimization methods, by contrast, focus on improving the overall landscape of a network by enhancing its modules, such as residual shortcuts, normalization methods, attention mechanisms, and activations. In this article, we provide an in-depth analysis of these two optimization classes and undertake a thorough examination of the Jacobian matrices and the Lipschitz constants of many widely used deep learning modules, highlighting existing issues as well as potential improvements. Moreover, we also conduct a series of analytical experiments to substantiate our theoretical discussions. This article does not aim to propose a new optimizer or network. Rather, our intention is to present a comprehensive understanding of optimization in deep learning. We hope that this article will assist readers in gaining a deeper insight in this field and encourages the development of more robust, efficient, and high-performing models.


Semantics-Empowered Communication: A Tutorial-cum-Survey

arXiv.org Artificial Intelligence

Along with the springing up of the semantics-empowered communication (SemCom) research, it is now witnessing an unprecedentedly growing interest towards a wide range of aspects (e.g., theories, applications, metrics and implementations) in both academia and industry. In this work, we primarily aim to provide a comprehensive survey on both the background and research taxonomy, as well as a detailed technical tutorial. Specifically, we start by reviewing the literature and answering the "what" and "why" questions in semantic transmissions. Afterwards, we present the ecosystems of SemCom, including history, theories, metrics, datasets and toolkits, on top of which the taxonomy for research directions is presented. Furthermore, we propose to categorize the critical enabling techniques by explicit and implicit reasoning-based methods, and elaborate on how they evolve and contribute to modern content & channel semantics-empowered communications. Besides reviewing and summarizing the latest efforts in SemCom, we discuss the relations with other communication levels (e.g., conventional communications) from a holistic and unified viewpoint. Subsequently, in order to facilitate future developments and industrial applications, we also highlight advanced practical techniques for boosting semantic accuracy, robustness, and large-scale scalability, just to mention a few. Finally, we discuss the technical challenges that shed light on future research opportunities.


ChatGPT as Co-Advisor in Scientific Initiation: Action Research with Project-Based Learning in Elementary Education

arXiv.org Artificial Intelligence

Background: In the contemporary educational landscape, technology has the power to drive innovative pedagogical practices. Overcoming the resistance of teachers and students to adopting new methods and technologies is a challenge that needs to be addressed. Objectives: To evaluate the effectiveness of ChatGPT as a co-advisor in research projects and its influence on the implementation of Project-Based Learning (PBL), as well as overcoming resistance to the use of new pedagogical methodologies. Design: An action-research methodology was employed, including unstructured interviews and the application of questionnaires via Google Forms. Setting and Participants: The research was conducted in an elementary school, involving 353 students and 16 teachers. Data Collection and Analysis: Data were gathered through observations and notes in meetings and interviews, complemented by electronic questionnaires, with quantitative and qualitative analyses performed via Microsoft Excel and Google Forms. Results: The introduction of ChatGPT as a pedagogical tool led to increased student engagement and decreased teacher resistance, reflected in recognition at local science fairs. Conclusion: The study confirmed the utility of ChatGPT in school research co-orientation, highlighting its role in facilitating PBL and promoting cultural changes in educational practice, with proactive school management identified as a catalysing element in adapting to educational innovations.


Transdisciplinary AI Education: The Confluence of Curricular and Community Needs in the Instruction of Artificial Intelligence

arXiv.org Artificial Intelligence

The integration of artificial intelligence (AI) into education has the potential to transform the way we learn and teach. In this paper, we examine the current state of AI in education and explore the potential benefits and challenges of incorporating this technology into the classroom. The approaches currently available for AI education often present students with experiences only focusing on discrete computer science concepts agnostic to a larger curriculum. However, teaching AI must not be siloed or interdisciplinary. Rather, AI instruction ought to be transdisciplinary, including connections to the broad curriculum and community in which students are learning. This paper delves into the AI program currently in development for Neom Community School and the larger Education, Research, and Innovation Sector in Neom, Saudi Arabia s new megacity under development. In this program, AI is both taught as a subject and to learn other subjects within the curriculum through the school systems International Baccalaureate (IB) approach, which deploys learning through Units of Inquiry. This approach to education connects subjects across a curriculum under one major guiding question at a time. The proposed method offers a meaningful approach to introducing AI to students throughout these Units of Inquiry, as it shifts AI from a subject that students like or not like to a subject that is taught throughout the curriculum.


Online Continual Learning via Logit Adjusted Softmax

arXiv.org Artificial Intelligence

Online continual learning is a challenging problem where models must learn from a non-stationary data stream while avoiding catastrophic forgetting. Inter-class imbalance during training has been identified as a major cause of forgetting, leading to model prediction bias towards recently learned classes. In this paper, we theoretically analyze that inter-class imbalance is entirely attributed to imbalanced class-priors, and the function learned from intra-class intrinsic distributions is the Bayes-optimal classifier. To that end, we present that a simple adjustment of model logits during training can effectively resist prior class bias and pursue the corresponding Bayes-optimum. Our proposed method, Logit Adjusted Softmax, can mitigate the impact of inter-class imbalance not only in class-incremental but also in realistic general setups, with little additional computational cost. We evaluate our approach on various benchmarks and demonstrate significant performance improvements compared to prior arts. For example, our approach improves the best baseline by 4.6% on CIFAR10.


Neuro-GPT: Developing A Foundation Model for EEG

arXiv.org Artificial Intelligence

To handle the scarcity and heterogeneity of electroencephalography (EEG) data for Brain-Computer Interface (BCI) tasks, and to harness the power of large publicly available data sets, we propose Neuro-GPT, a foundation model consisting of an EEG encoder and a GPT model. The foundation model is pre-trained on a large-scale data set using a self-supervised task that learns how to reconstruct masked EEG segments. We then fine-tune the model on a Motor Imagery Classification task to validate its performance in a low-data regime (9 subjects). Our experiments demonstrate that applying a foundation model can significantly improve classification performance compared to a model trained from scratch, which provides evidence for the generalizability of the foundation model and its ability to address challenges of data scarcity and heterogeneity in EEG.