to

### SELM: Software Engineering of Machine Learning Models

One of the pillars of any machine learning model is its concepts. Using software engineering, we can engineer these concepts and then develop and expand them. In this article, we present a SELM framework for Software Engineering of machine Learning Models. We then evaluate this framework through a case study. Using the SELM framework, we can improve a machine learning process efficiency and provide more accuracy in learning with less processing hardware resources and a smaller training dataset. This issue highlights the importance of an interdisciplinary approach to machine learning. Therefore, in this article, we have provided interdisciplinary teams' proposals for machine learning.

### Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm

In this paper, we investigate the sample complexity of policy evaluation in infinite-horizon offline reinforcement learning (also known as the off-policy evaluation problem) with linear function approximation. We identify a hard regime $d\gamma^{2}>1$, where $d$ is the dimension of the feature vector and $\gamma$ is the discount rate. In this regime, for any $q\in[\gamma^{2},1]$, we can construct a hard instance such that the smallest eigenvalue of its feature covariance matrix is $q/d$ and it requires $\Omega\left(\frac{d}{\gamma^{2}\left(q-\gamma^{2}\right)\varepsilon^{2}}\exp\left(\Theta\left(d\gamma^{2}\right)\right)\right)$ samples to approximate the value function up to an additive error $\varepsilon$. Note that the lower bound of the sample complexity is exponential in $d$. If $q=\gamma^{2}$, even infinite data cannot suffice. Under the low distribution shift assumption, we show that there is an algorithm that needs at most $O\left(\max\left\{ \frac{\left\Vert \theta^{\pi}\right\Vert _{2}^{4}}{\varepsilon^{4}}\log\frac{d}{\delta},\frac{1}{\varepsilon^{2}}\left(d+\log\frac{1}{\delta}\right)\right\} \right)$ samples ($\theta^{\pi}$ is the parameter of the policy in linear function approximation) and guarantees approximation to the value function up to an additive error of $\varepsilon$ with probability at least $1-\delta$.

### Function approximation by deep neural networks with parameters $\{0,\pm \frac{1}{2}, \pm 1, 2\}$

In this paper it is shown that $C_\beta$-smooth functions can be approximated by neural networks with parameters $\{0,\pm \frac{1}{2}, \pm 1, 2\}$. The depth, width and the number of active parameters of constructed networks have, up to a logarithimc factor, the same dependence on the approximation error as the networks with parameters in $[-1,1]$. In particular, this means that the nonparametric regression estimation with constructed networks attain the same convergence rate as with the sparse networks with parameters in $[-1,1]$.

### Time series forecasting based on complex network in weighted node similarity

Time series have attracted widespread attention in many fields today. Based on the analysis of complex networks and visibility graph theory, a new time series forecasting method is proposed. In time series analysis, visibility graph theory transforms time series data into a network model. In the network model, the node similarity index is an important factor. On the basis of directly using the node prediction method with the largest similarity, the node similarity index is used as the weight coefficient to optimize the prediction algorithm. Compared with the single-point sampling node prediction algorithm, the multi-point sampling prediction algorithm can provide more accurate prediction values when the data set is sufficient. According to results of experiments on four real-world representative datasets, the method has more accurate forecasting ability and can provide more accurate forecasts in the field of time series and actual scenes.

### Large-scale Recommendation for Portfolio Optimization

Individual investors are now massively using online brokers to trade stocks with convenient interfaces and low fees, albeit losing the advice and personalization traditionally provided by full-service brokers. We frame the problem faced by online brokers of replicating this level of service in a low-cost and automated manner for a very large number of users. Because of the care required in recommending financial products, we focus on a risk-management approach tailored to each user's portfolio and risk profile. We show that our hybrid approach, based on Modern Portfolio Theory and Collaborative Filtering, provides a sound and effective solution. The method is applicable to stocks as well as other financial assets, and can be easily combined with various financial forecasting models. We validate our proposal by comparing it with several baselines in a domain expert-based study.

### A conditional, a fuzzy and a probabilistic interpretation of self-organising maps

In this paper we establish a link between preferential semantics for description logics and self-organising maps, which have been proposed as possible candidates to explain the psychological mechanisms underlying category generalisation. In particular, we show that a concept-wise multipreference semantics, which takes into account preferences with respect to different concepts and has been recently proposed for defeasible description logics, can be used to to provide a logical interpretation of SOMs. We also provide a logical interpretation of SOMs in terms of a fuzzy description logic as well as a probabilistic account.

### Classification and Feature Transformation with Fuzzy Cognitive Maps

Fuzzy Cognitive Maps (FCMs) are considered a soft computing technique combining elements of fuzzy logic and recurrent neural networks. They found multiple application in such domains as modeling of system behavior, prediction of time series, decision making and process control. Less attention, however, has been turned towards using them in pattern classification. In this work we propose an FCM based classifier with a fully connected map structure. In contrast to methods that expect reaching a steady system state during reasoning, we chose to execute a few FCM iterations (steps) before collecting output labels. Weights were learned with a gradient algorithm and logloss or cross-entropy were used as the cost function. Our primary goal was to verify, whether such design would result in a descent general purpose classifier, with performance comparable to off the shelf classical methods. As the preliminary results were promising, we investigated the hypothesis that the performance of $d$-step classifier can be attributed to a fact that in previous $d-1$ steps it transforms the feature space by grouping observations belonging to a given class, so that they became more compact and separable. To verify this hypothesis we calculated three clustering scores for the transformed feature space. We also evaluated performance of pipelines built from FCM-based data transformer followed by a classification algorithm. The standard statistical analyzes confirmed both the performance of FCM based classifier and its capability to improve data. The supporting prototype software was implemented in Python using TensorFlow library.

### Expert System Gradient Descent Style Training: Development of a Defensible Artificial Intelligence Technique

Artificial intelligence systems, which are designed with a capability to learn from the data presented to them, are used throughout society. These systems are used to screen loan applicants, make sentencing recommendations for criminal defendants, scan social media posts for disallowed content and more. Because these systems don't assign meaning to their complex learned correlation network, they can learn associations that don't equate to causality, resulting in non-optimal and indefensible decisions being made. In addition to making decisions that are sub-optimal, these systems may create legal liability for their designers and operators by learning correlations that violate anti-discrimination and other laws regarding what factors can be used in different types of decision making. This paper presents the use of a machine learning expert system, which is developed with meaning-assigned nodes (facts) and correlations (rules). Multiple potential implementations are considered and evaluated under different conditions, including different network error and augmentation levels and different training levels. The performance of these systems is compared to random and fully connected networks.

### Function Approximation via Sparse Random Features

Random feature methods have been successful in various machine learning tasks, are easy to compute, and come with theoretical accuracy bounds. They serve as an alternative approach to standard neural networks since they can represent similar function spaces without a costly training phase. However, for accuracy, random feature methods require more measurements than trainable parameters, limiting their use for data-scarce applications or problems in scientific machine learning. This paper introduces the sparse random feature method that learns parsimonious random feature models utilizing techniques from compressive sensing. We provide uniform bounds on the approximation error for functions in a reproducing kernel Hilbert space depending on the number of samples and the distribution of features. The error bounds improve with additional structural conditions, such as coordinate sparsity, compact clusters of the spectrum, or rapid spectral decay.

### Adaptive Gaussian Fuzzy Classifier for Real-Time Emotion Recognition in Computer Games

Human emotion recognition has become a need for more realistic and interactive machines and computer systems. The greatest challenge is the availability of high-performance algorithms to effectively manage individual differences and nonstationarities in physiological data streams, i.e., algorithms that self-customize to a user with no subject-specific calibration data. We describe an evolving Gaussian Fuzzy Classifier (eGFC), which is supported by an online semi-supervised learning algorithm to recognize emotion patterns from electroencephalogram (EEG) data streams. We extract features from the Fourier spectrum of EEG data. The data are provided by 28 individuals playing the games 'Train Sim World', 'Unravel', 'Slender The Arrival', and 'Goat Simulator' - a public dataset. Different emotions prevail, namely, boredom, calmness, horror and joy. We analyze the effect of individual electrodes, time window lengths, and frequency bands on the accuracy of user-independent eGFCs. We conclude that both brain hemispheres may assist classification, especially electrodes on the frontal (Af3-Af4), occipital (O1-O2), and temporal (T7-T8) areas. We observe that patterns may be eventually found in any frequency band; however, the Alpha (8-13Hz), Delta (1-4Hz), and Theta (4-8Hz) bands, in this order, are the highest correlated with emotion classes. eGFC has shown to be effective for real-time learning of EEG data. It reaches a 72.2% accuracy using a variable rule base, 10-second windows, and 1.8ms/sample processing time in a highly-stochastic time-varying 4-class classification problem.