Overview
Ternary Quantization: A Survey
Inference time, model size, and accuracy are critical for deploying deep neural network models. Numerous research efforts have been made to compress neural network models with faster inference and higher accuracy. Pruning and quantization are mainstream methods to this end. During model quantization, converting individual float values of layer weights to low-precision ones can substantially reduce the computational overhead and improve the inference speed. Many quantization methods have been studied, for example, vector quantization, low-bit quantization, and binary/ternary quantization. This survey focuses on ternary quantization. We review the evolution of ternary quantization and investigate the relationships among existing ternary quantization methods from the perspective of projection function and optimization methods.
MP-SeizNet: A Multi-Path CNN Bi-LSTM Network for Seizure-Type Classification Using EEG
Albaqami, Hezam, Hassan, Ghulam Mubashar, Datta, Amitava
Seizure type identification is essential for the treatment and management of epileptic patients. However, it is a difficult process known to be time consuming and labor intensive. Automated diagnosis systems, with the advancement of machine learning algorithms, have the potential to accelerate the classification process, alert patients, and support physicians in making quick and accurate decisions. In this paper, we present a novel multi-path seizure-type classification deep learning network (MP-SeizNet), consisting of a convolutional neural network (CNN) and a bidirectional long short-term memory neural network (Bi-LSTM) with an attention mechanism. The objective of this study was to classify specific types of seizures, including complex partial, simple partial, absence, tonic, and tonic-clonic seizures, using only electroencephalogram (EEG) data. The EEG data is fed to our proposed model in two different representations. The CNN was fed with wavelet-based features extracted from the EEG signals, while the Bi-LSTM was fed with raw EEG signals to let our MP-SeizNet jointly learns from different representations of seizure data for more accurate information learning. The proposed MP-SeizNet was evaluated using the largest available EEG epilepsy database, the Temple University Hospital EEG Seizure Corpus, TUSZ v1.5.2. We evaluated our proposed model across different patient data using three-fold cross-validation and across seizure data using five-fold cross-validation, achieving F1 scores of 87.6% and 98.1%, respectively.
Supporting Future Electrical Utilities: Using Deep Learning Methods in EMS and DMS Algorithms
Kundacina, Ognjen, Gojic, Gorana, Mitrovic, Mile, Miskovic, Dragisa, Vukobratovic, Dejan
Electrical power systems are increasing in size, complexity, as well as dynamics due to the growing integration of renewable energy resources, which have sporadic power generation. This necessitates the development of near real-time power system algorithms, demanding lower computational complexity regarding the power system size. Considering the growing trend in the collection of historical measurement data and recent advances in the rapidly developing deep learning field, the main goal of this paper is to provide a review of recent deep learning-based power system monitoring and optimization algorithms. Electrical utilities can benefit from this review by re-implementing or enhancing the algorithms traditionally used in energy management systems (EMS) and distribution management systems (DMS).
UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question Answering Over Knowledge Graph
Jiang, Jinhao, Zhou, Kun, Zhao, Wayne Xin, Wen, Ji-Rong
Multi-hop Question Answering over Knowledge Graph~(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question on a large-scale Knowledge Graph (KG). To cope with the vast search space, existing work usually adopts a two-stage approach: it first retrieves a relatively small subgraph related to the question and then performs the reasoning on the subgraph to find the answer entities accurately. Although these two stages are highly related, previous work employs very different technical solutions for developing the retrieval and reasoning models, neglecting their relatedness in task essence. In this paper, we propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning. For model architecture, UniKGQA consists of a semantic matching module based on a pre-trained language model~(PLM) for question-relation semantic matching, and a matching information propagation module to propagate the matching information along the directed edges on KGs. For parameter learning, we design a shared pre-training task based on question-relation matching for both retrieval and reasoning models, and then propose retrieval- and reasoning-oriented fine-tuning strategies. Compared with previous studies, our approach is more unified, tightly relating the retrieval and reasoning stages. Extensive experiments on three benchmark datasets have demonstrated the effectiveness of our method on the multi-hop KGQA task. Our codes and data are publicly available at~\url{https://github.com/RUCAIBox/UniKGQA}.
Heuristics for Vehicle Routing Problem: A Survey and Recent Advances
Liu, Fei, Lu, Chengyu, Gui, Lin, Zhang, Qingfu, Tong, Xialiang, Yuan, Mingxuan
Vehicle routing is a well-known optimization research topic with significant practical importance. Among different approaches to solving vehicle routing, heuristics can produce a satisfactory solution at a reasonable computational cost. Consequently, much effort has been made in the past decades to develop vehicle routing heuristics. In this article, we systematically survey the existing vehicle routing heuristics, particularly on works carried out in recent years. A classification of vehicle routing heuristics is presented, followed by a review of their methodologies, recent developments, and applications. Moreover, we present a general framework of state-of-the-art methods and provide insights into their success. Finally, three emerging research topics with notable works and future directions are discussed.
Interpretability and Explainability: A Machine Learning Zoo Mini-tour
Marcinkevičs, Ričards, Vogt, Julia E.
In this literature review, we provided a survey of interpretable and explainable machine learning methods (see Tables 1 and 2 for the summary of the techniques), described commonest goals and desiderata for these techniques, motivated their relevance in several fields of application, and discussed their quantitative evaluation. Interpretability and explainability still remain an active area of research, especially, in the face of recent rapid progress in designing highly performant predictive models and inevitable infusion of machine learning into other domains, where decisions have far-reaching consequences. For years the field has been challenged by a lack of clear definitions for interpretability or explainability, these terms being often wielded "in a quasi-mathematical way"[6,122]. For many techniques, there still exist no satisfactory functionally-grounded evaluation criteria and universally accepted benchmarks, hindering reproducibility and model comparison. Moreover, meaningful adaptations of these methods to'real-world' machine learning systems and data analysis problems largely remain a matter for the future. It has been argued that, for successful and widespread use of interpretable and explainable machine learning models, stakeholders need to be involved in the discussion[4, 122]. A meaningful and equal collaboration between machine learning researchers and stakeholders from various domains, such as medicine, natural sciences, and law, is a logical next step within the evolution of interpretable and explainable ML.
Do Transformers know symbolic rules, and would we know if they did?
Gröndahl, Tommi, Guo, Yujia, Asokan, N.
To improve the explainability of leading Transformer networks used in NLP, it is important to tease apart genuine symbolic rules from merely associative input-output patterns. However, we identify several inconsistencies in how ``symbolicity'' has been construed in recent NLP literature. To mitigate this problem, we propose two criteria to be the most relevant, one pertaining to a system's internal architecture and the other to the dissociation between abstract rules and specific input identities. From this perspective, we critically examine prior work on the symbolic capacities of Transformers, and deem the results to be fundamentally inconclusive for reasons inherent in experiment design. We further maintain that there is no simple fix to this problem, since it arises -- to an extent -- in all end-to-end settings. Nonetheless, we emphasize the need for more robust evaluation of whether non-symbolic explanations exist for success in seemingly symbolic tasks. To facilitate this, we experiment on four sequence modelling tasks on the T5 Transformer in two experiment settings: zero-shot generalization, and generalization across class-specific vocabularies flipped between the training and test set. We observe that T5's generalization is markedly stronger in sequence-to-sequence tasks than in comparable classification tasks. Based on this, we propose a thus far overlooked analysis, where the Transformer itself does not need to be symbolic to be part of a symbolic architecture as the processor, operating on the input and output as external memory components.
Recent Advances in Reinforcement Learning in Finance
Hambly, Ben, Xu, Renyuan, Yang, Huining
The rapid changes in the finance industry due to the increasing amount of data have revolutionized the techniques on data processing and data analysis and brought new theoretical and computational challenges. In contrast to classical stochastic control theory and other analytical approaches for solving financial decision-making problems that heavily reply on model assumptions, new developments from reinforcement learning (RL) are able to make full use of the large amount of financial data with fewer model assumptions and to improve decisions in complex financial environments. This survey paper aims to review the recent developments and use of RL approaches in finance. We give an introduction to Markov decision processes, which is the setting for many of the commonly used RL approaches. Various algorithms are then introduced with a focus on value and policy based methods that do not require any model assumptions. Connections are made with neural networks to extend the framework to encompass deep RL algorithms. Our survey concludes by discussing the application of these RL algorithms in a variety of decision-making problems in finance, including optimal execution, portfolio optimization, option pricing and hedging, market making, smart order routing, and robo-advising.
Physics-Guided Deep Learning for Dynamical Systems: A Survey
Modeling complex physical dynamics is a fundamental task in science and engineering. Traditional physics-based models are sample efficient, and interpretable but often rely on rigid assumptions. Furthermore, direct numerical approximation is usually computationally intensive, requiring significant computational resources and expertise, and many real-world systems do not have fully-known governing laws. While deep learning (DL) provides novel alternatives for efficiently recognizing complex patterns and emulating nonlinear dynamics, its predictions do not necessarily obey the governing laws of physical systems, nor do they generalize well across different systems. Thus, the study of physics-guided DL emerged and has gained great progress. Physics-guided DL aims to take the best from both physics-based modeling and state-of-the-art DL models to better solve scientific problems. In this paper, we provide a structured overview of existing methodologies of integrating prior physical knowledge or physics-based modeling into DL, with a special emphasis on learning dynamical systems. We also discuss the fundamental challenges and emerging opportunities in the area.
Semi-Supervised Constrained Clustering: An In-Depth Overview, Ranked Taxonomy and Future Research Directions
González-Almagro, Germán, Peralta, Daniel, De Poorter, Eli, Cano, José-Ramón, García, Salvador
Clustering is a well-known unsupervised machine learning approach capable of automatically grouping discrete sets of instances with similar characteristics. Constrained clustering is a semi-supervised extension to this process that can be used when expert knowledge is available to indicate constraints that can be exploited. Well-known examples of such constraints are must-link (indicating that two instances belong to the same group) and cannot-link (two instances definitely do not belong together). The research area of constrained clustering has grown significantly over the years with a large variety of new algorithms and more advanced types of constraints being proposed. However, no unifying overview is available to easily understand the wide variety of available methods, constraints and benchmarks. To remedy this, this study presents in-detail the background of constrained clustering and provides a novel ranked taxonomy of the types of constraints that can be used in constrained clustering. In addition, it focuses on the instance-level pairwise constraints, and gives an overview of its applications and its historical context. Finally, it presents a statistical analysis covering 307 constrained clustering methods, categorizes them according to their features, and provides a ranking score indicating which methods have the most potential based on their popularity and validation quality. Finally, based upon this analysis, potential pitfalls and future research directions are provided.