Goto

Collaborating Authors

 Instructional Material


Flora: Low-Rank Adapters Are Secretly Gradient Compressors

arXiv.org Artificial Intelligence

Despite large neural networks demonstrating remarkable abilities to complete different tasks, they require excessive memory usage to store the optimization states for training. To alleviate this, the low-rank adaptation (LoRA) is proposed to reduce the optimization states by training fewer parameters. However, LoRA restricts overall weight update matrices to be low-rank, limiting the model performance. In this work, we investigate the dynamics of LoRA and identify that it can be approximated by a random projection. Based on this observation, we propose Flora, which is able to achieve high-rank updates by resampling the projection matrices while enjoying the sublinear space complexity of optimization states. We conduct experiments across different tasks and model architectures to verify the effectiveness of our approach.


InterpretCC: Conditional Computation for Inherently Interpretable Neural Networks

arXiv.org Artificial Intelligence

Real-world interpretability for neural networks is a tradeoff between three concerns: 1) it requires humans to trust the explanation approximation (e.g. post-hoc approaches), 2) it compromises the understandability of the explanation (e.g. automatically identified feature masks), and 3) it compromises the model performance (e.g. decision trees). These shortcomings are unacceptable for human-facing domains, like education, healthcare, or natural language, which require trustworthy explanations, actionable interpretations, and accurate predictions. In this work, we present InterpretCC (interpretable conditional computation), a family of interpretable-by-design neural networks that guarantee human-centric interpretability while maintaining comparable performance to state-of-the-art models by adaptively and sparsely activating features before prediction. We extend this idea into an interpretable mixture-of-experts model, that allows humans to specify topics of interest, discretely separates the feature space for each data point into topical subnetworks, and adaptively and sparsely activates these topical subnetworks. We demonstrate variations of the InterpretCC architecture for text and tabular data across several real-world benchmarks: six online education courses, news classification, breast cancer diagnosis, and review sentiment.


This 30 e-degree will help you master ChatGPT

PCWorld

Artificial intelligence and machine learning are providing new ways to maximize productivity and efficiency these days, but only if you really know how to use them. ChatGPT is one of the most popular AI tools on the market and this ChatGPT & Automation E-Degree will help you learn how to use it like an expert. This 25-hour course is taught by Eduonix Learning Solutions (4.4/5-star instructor rating) and covers the basics as well as more advanced topics. You'll explore a variety of practical, real-world applications of ChatGPT and learn how to tailor your queries to get the exact results you want. Whether you're looking to streamline business processes through automation, gather data insights, scale your content output, or just improve your communication skills, this course will help you tap into ChatGPT to meet all of your needs.


Learning Style Identification Using Semi-Supervised Self-Taught Labeling

arXiv.org Artificial Intelligence

Education is a dynamic field that must be adaptable to sudden changes and disruptions caused by events like pandemics, war, and natural disasters related to climate change. When these events occur, traditional classrooms with traditional or blended delivery can shift to fully online learning, which requires an efficient learning environment that meets students' needs. While learning management systems support teachers' productivity and creativity, they typically provide the same content to all learners in a course, ignoring their unique learning styles. To address this issue, we propose a semi-supervised machine learning approach that detects students' learning styles using a data mining technique. We use the commonly used Felder Silverman learning style model and demonstrate that our semi-supervised method can produce reliable classification models with few labeled data. We evaluate our approach on two different courses and achieve an accuracy of 88.83% and 77.35%, respectively. Our work shows that educational data mining and semi-supervised machine learning techniques can identify different learning styles and create a personalized learning environment.


Adaptive scheduling for adaptive sampling in POS taggers construction

arXiv.org Artificial Intelligence

However, managing large amounts of information is an expensive, time-consuming and non-trivial activity, especially when expert knowledge is needed. Furthermore, having access to vast data bases does not imply that ml algorithms must use them all and a subset is therefore preferred, provided it does not reduce the quality of the mined knowledge. Such observations then supply the same learning power with far less computational cost and allow the training process to be speeded up, whilst their nature and optimal size are rarely obvious. This justifies the interest of developing efficient sampling techniques, which involves anticipating the link between performance and experience regarding the accuracy of the system we are generating. At this point, correctness with respect to the working hypotheses and robustness against changes to them should be guaranteed in order to supply a practical solution. The former ensures the effectiveness of the proposed strategy in the framework considered, while the latter enables fluctuations in the learning conditions to be assimilated without compromising correctness, thus providing reliability to our calculations. An area of work that is particularly sensitive to these inconveniences is natural language processing (nlp), the components of which are increasingly based on ml [3, 50].


Modeling of learning curves with applications to pos tagging

arXiv.org Artificial Intelligence

An algorithm to estimate the evolution of learning curves on the whole of a training data base, based on the results obtained from a portion and using a functional strategy, is introduced. We approximate iteratively the sought value at the desired time, independently of the learning technique used and once a point in the process, called prediction level, has been passed. The proposal proves to be formally correct with respect to our working hypotheses and includes a reliable proximity condition. This allows the user to fix a convergence threshold with respect to the accuracy finally achievable, which extends the concept of stopping criterion and seems to be effective even in the presence of distorting observations. Our aim is to evaluate the training effort, supporting decision making in order to reduce the need for both human and computational resources during the learning process. The proposal is of interest in at least three operational procedures. The first is the anticipation of accuracy gain, with the purpose of measuring how much work is needed to achieve a certain degree of performance. The second relates the comparison of efficiency between systems at training time, with the objective of completing this task only for the one that best suits our requirements. The prediction of accuracy is also a valuable item of information for customizing systems, since we can estimate in advance the impact of settings on both the performance and the development costs. Using the generation of part-of-speech taggers as an example application, the experimental results are consistent with our expectations.


Sample as You Infer: Predictive Coding With Langevin Dynamics

arXiv.org Artificial Intelligence

It is well known that neuronal systems, including their dynamics and responses, are rife with noise at multiple levels (Faisal et al., 2008; Shadlen & Newsome, 1998). These sources of noise arise from, amongst other things, stochastic processes occuring at the sub-cellular level, impacting neuronal response through, for example, fluctuations in membrane-potential (Derksen & Verveen, 1966). Yet the precise role of such randomness, in information processing, continues to be an open question (McDonnell & Ward, 2011; Deco et al., 2013). The Langevin PC algorithm suggests one such role may be in the principled exploration of the latent space of hypotheses under one's generative model. Secondly, from the perspective of Langevin PC as an in-silico generative modelling algorithm we note a number of interesting avenues that we have not had the time to explore here. These include: Models with a hierarchy of stochastic variables, such as those found in most state of the art VAE models (Child, 2021; Vahdat & Kautz, 2021; Hazami et al., 2022). Which may require adopting a corresponding top-down hierarchical warm-start model. Automatic convergence criteria for determining when our Markov chain has converged to a certain level of error (Roy, 2020). Underdamped Langevin dynamics, which incorporate auxiliary momentum variables into the Langevin sampling to achieve an accelerated rate of convergence (Cheng et al., 2018; Ma et al., 2019).


Machine Intelligence in Africa: a survey

arXiv.org Artificial Intelligence

In the last 5 years, the availability of large audio datasets in African countries has opened unlimited opportunities to build machine intelligence (MI) technologies that are closer to the people and speak, learn, understand, and do businesses in local languages, including for those who cannot read and write. Unfortunately, these audio datasets are not fully exploited by current MI tools, leaving several Africans out of MI business opportunities. Additionally, many state-of-the-art MI models are not culture-aware, and the ethics of their adoption indexes are questionable. The lack thereof is a major drawback in many applications in Africa. This paper summarizes recent developments in machine intelligence in Africa from a multi-layer multiscale and culture-aware ethics perspective, showcasing MI use cases in 54 African countries through 400 articles on MI research, industry, government actions, as well as uses in art, music, the informal economy, and small businesses in Africa. The survey also opens discussions on the reliability of MI rankings and indexes in the African continent as well as algorithmic definitions of unclear terms used in MI.


InceptionCapsule: Inception-Resnet and CapsuleNet with self-attention for medical image Classification

arXiv.org Artificial Intelligence

Initial weighting is significant in deep neural networks because the random selection of weights produces different outputs and increases the probability of overfitting and underfitting. On the other hand, vector-based approaches to extract vector features need rich vectors for more accurate classification. The InceptionCapsule approach is presented to alleviate these two problems. This approach uses transfer learning and the Inception-ResNet model to avoid random selection of weights, which takes initial weights from ImageNet. It also uses the output of Inception middle layers to generate rich vectors. Extracted vectors are given to a capsule network for learning, which is equipped with an attention technique. Kvasir data and BUSI with the GT dataset were used to evaluate this approach. This model was able to achieve 97.62 accuracies in 5-class classification and also achieved 94.30 accuracies in 8-class classification on Kvasir. In the BUSI with GT dataset, the proposed approach achieved accuracy=98.88, Precision=95.34, and F1-score=93.74, which are acceptable results compared to other approaches in the literature.


Anytime-Competitive Reinforcement Learning with Policy Prior

arXiv.org Artificial Intelligence

This paper studies the problem of Anytime-Competitive Markov Decision Process (A-CMDP). Existing works on Constrained Markov Decision Processes (CMDPs) aim to optimize the expected reward while constraining the expected cost over random dynamics, but the cost in a specific episode can still be unsatisfactorily high. In contrast, the goal of A-CMDP is to optimize the expected reward while guaranteeing a bounded cost in each round of any episode against a policy prior. We propose a new algorithm, called Anytime-Competitive Reinforcement Learning (ACRL), which provably guarantees the anytime cost constraints. The regret analysis shows the policy asymptotically matches the optimal reward achievable under the anytime competitive constraints. Experiments on the application of carbon-intelligent computing verify the reward performance and cost constraint guarantee of ACRL.