to

### Do We Need Fuzzy Substrates?

Computers are embedded in almost all of our devices, and most of them are digital. Information at the low levels is stored as binary. Biology, in contrast, often makes use of analog systems. Take fuzzy logic for example. Fuzzy logic techniques typically involve the concept of intermediate values between true and false. But you don't need a special computer for fuzzy logic -- it's just a program running on the digital computer like any other program.

### Comparison of Neural Network based Soft Computing Techniques for Electromagnetic Modeling of a Microstrip Patch Antenna

This paper presents the comparison of various neural networks and algorithms based on accuracy, quickness, and consistency for antenna modelling. Using Nntool by MATLAB, 22 different combinations of networks and training algorithms are used to predict the dimensions of a rectangular microstrip antenna using dielectric constant, height of substrate, and frequency of operation as input. Comparison and characterization of networks is done based on accuracy, mean square error, and training time. Algorithms, on the other hand, are analyzed by their accuracy, speed, reliability, and smoothness in the training process. Finally, these results are analyzed, and recommendations are made for each neural network and algorithm based on uses, advantages, and disadvantages. For example, it is observed that Reduced Radial Bias network is the most accurate network and Scaled Conjugate Gradient is the most reliable algorithm for electromagnetic modelling. This paper will help a researcher find the optimum network and algorithm directly without doing time-taking experimentation.

### Algebraic Semantics of Generalized RIFs

A number of numeric measures like rough inclusion functions (RIFs) are used in general rough sets and soft computing. But these are often intrusive by definition, and amount to making unjustified assumptions about the data. The contamination problem is also about recognizing the domains of discourses involved in this, specifying errors and reducing data intrusion relative to them. In this research, weak quasi rough inclusion functions (wqRIFs) are generalized to general granular operator spaces with scope for limiting contamination. New algebraic operations are defined over collections of such functions, and are studied by the present author. It is shown by her that the algebras formed by the generalized wqRIFs are ordered hemirings with additional operators. By contrast, the generalized rough inclusion functions lack similar structure. This potentially contributes to improving the selection (possibly automatic) of such functions, training methods, and reducing contamination (and data intrusion) in applications. The underlying framework and associated concepts are explained in some detail, as they are relatively new.

### Quantile-based fuzzy clustering of multivariate time series in the frequency domain

A novel procedure to perform fuzzy clustering of multivariate time series generated from different dependence models is proposed. Different amounts of dissimilarity between the generating models or changes on the dynamic behaviours over time are some arguments justifying a fuzzy approach, where each series is associated to all the clusters with specific membership levels. Our procedure considers quantile-based cross-spectral features and consists of three stages: (i) each element is characterized by a vector of proper estimates of the quantile cross-spectral densities, (ii) principal component analysis is carried out to capture the main differences reducing the effects of the noise, and (iii) the squared Euclidean distance between the first retained principal components is used to perform clustering through the standard fuzzy C-means and fuzzy C-medoids algorithms. The performance of the proposed approach is evaluated in a broad simulation study where several types of generating processes are considered, including linear, nonlinear and dynamic conditional correlation models. Assessment is done in two different ways: by directly measuring the quality of the resulting fuzzy partition and by taking into account the ability of the technique to determine the overlapping nature of series located equidistant from well-defined clusters. The procedure is compared with the few alternatives suggested in the literature, substantially outperforming all of them whatever the underlying process and the evaluation scheme. Two specific applications involving air quality and financial databases illustrate the usefulness of our approach.

### Parkinson's Disease Diagnosis based on Gait Cycle Analysis Through an Interpretable Interval Type-2 Neuro-Fuzzy System

In this paper, an interpretable classifier using an interval type-2 fuzzy neural network for detecting patients suffering from Parkinson's Disease (PD) based on analyzing the gait cycle is presented. The proposed method utilizes clinical features extracted from the vertical Ground Reaction Force (vGRF), measured by 16 wearable sensors placed in the soles of subjects' shoes and learns interpretable fuzzy rules. Therefore, experts can verify the decision made by the proposed method based on investigating the firing strength of interpretable fuzzy rules. Moreover, experts can utilize the extracted fuzzy rules for patient diagnosing or adjust them based on their knowledge. To improve the robustness of the proposed method against uncertainty and noisy sensor measurements, Interval Type-2 Fuzzy Logic is applied. To learn fuzzy rules, two paradigms are proposed: 1- A batch learning approach based on clustering available samples is applied to extract initial fuzzy rules, 2- A complementary online learning is proposed to improve the rule base encountering new labeled samples. The performance of the method is evaluated for classifying patients and healthy subjects in different conditions including the presence of noise or observing new instances. Moreover, the performance of the model is compared to some previous supervised and unsupervised machine learning approaches. The final Accuracy, Precision, Recall, and F1 Score of the proposed method are 88.74%, 89.41%, 95.10%, and 92.16%. Finally, the extracted fuzzy sets for each feature are reported.

### Fuzzy Clustering Using HDBSCAN

Like most undergraduates right out of college with little to no first-hand experience working on industry ML projects and loads of ML/python certifications, I joined the Business Intelligence team at Samsung. There were 3 new hires in the team and there was only 1 Data Scientist (DS) position available, the other 2 were Data Engineering. With the 3 of us riding the ML wave, we all sought the Data Scientist position. During the first meeting with our manager, you can imagine the amount of malarkey all the candidates spat out to get the position. We were given a 3-week trial period during which each of us had a Data Engineering pipeline to build and perform an Exploratory Data Analysis on a given dataset.

### Rainfall-runoff prediction using a Gustafson-Kessel clustering based Takagi-Sugeno Fuzzy model

A rainfall-runoff model predicts surface runoff either using a physically-based approach or using a systems-based approach. Takagi-Sugeno (TS) Fuzzy models are systems-based approaches and a popular modeling choice for hydrologists in recent decades due to several advantages and improved accuracy in prediction over other existing models. In this paper, we propose a new rainfall-runoff model developed using Gustafson-Kessel (GK) clustering-based TS Fuzzy model. We present comparative performance measures of GK algorithms with two other clustering algorithms: (i) Fuzzy C-Means (FCM), and (ii)Subtractive Clustering (SC). Our proposed TS Fuzzy model predicts surface runoff using: (i) observed rainfall in a drainage basin and (ii) previously observed precipitation flow in the basin outlet. The proposed model is validated using the rainfall-runoff data collected from the sensors installed on the campus of the Indian Institute of Technology, Kharagpur. The optimal number of rules of the proposed model is obtained by different validation indices. A comparative study of four performance criteria: RootMean Square Error (RMSE), Coefficient of Efficiency (CE), Volumetric Error (VE), and Correlation Coefficient of Determination(R) have been quantitatively demonstrated for each clustering algorithm.

### Towards Personalized and Human-in-the-Loop Document Summarization

The ubiquitous availability of computing devices and the widespread use of the internet have generated a large amount of data continuously. Therefore, the amount of available information on any given topic is far beyond humans' processing capacity to properly process, causing what is known as information overload. To efficiently cope with large amounts of information and generate content with significant value to users, we require identifying, merging and summarising information. Data summaries can help gather related information and collect it into a shorter format that enables answering complicated questions, gaining new insight and discovering conceptual boundaries. This thesis focuses on three main challenges to alleviate information overload using novel summarisation techniques. It further intends to facilitate the analysis of documents to support personalised information extraction. This thesis separates the research issues into four areas, covering (i) feature engineering in document summarisation, (ii) traditional static and inflexible summaries, (iii) traditional generic summarisation approaches, and (iv) the need for reference summaries. We propose novel approaches to tackle these challenges, by: i)enabling automatic intelligent feature engineering, ii) enabling flexible and interactive summarisation, iii) utilising intelligent and personalised summarisation approaches. The experimental results prove the efficiency of the proposed approaches compared to other state-of-the-art models. We further propose solutions to the information overload problem in different domains through summarisation, covering network traffic data, health data and business process data.

### Improvement of a Prediction Model for Heart Failure Survival through Explainable Artificial Intelligence

Cardiovascular diseases and their associated disorder of heart failure are one of the major death causes globally, being a priority for doctors to detect and predict its onset and medical consequences. Artificial Intelligence (AI) allows doctors to discover clinical indicators and enhance their diagnosis and treatments. Specifically, explainable AI offers tools to improve the clinical prediction models that experience poor interpretability of their results. This work presents an explainability analysis and evaluation of a prediction model for heart failure survival by using a dataset that comprises 299 patients who suffered heart failure. The model employs a data workflow pipeline able to select the best ensemble tree algorithm as well as the best feature selection technique. Moreover, different post-hoc techniques have been used for the explainability analysis of the model. The paper's main contribution is an explainability-driven approach to select the best prediction model for HF survival based on an accuracy-explainability balance. Therefore, the most balanced explainable prediction model implements an Extra Trees classifier over 5 selected features (follow-up time, serum creatinine, ejection fraction, age and diabetes) out of 12, achieving a balanced-accuracy of 85.1% and 79.5% with cross-validation and new unseen data respectively. The follow-up time is the most influencing feature followed by serum-creatinine and ejection-fraction. The explainable prediction model for HF survival presented in this paper would improve a further adoption of clinical prediction models by providing doctors with intuitions to better understand the reasoning of, usually, black-box AI clinical solutions, and make more reasonable and data-driven decisions.

### Provably Efficient Generative Adversarial Imitation Learning for Online and Offline Setting with Linear Function Approximation

In generative adversarial imitation learning (GAIL), the agent aims to learn a policy from an expert demonstration so that its performance cannot be discriminated from the expert policy on a certain predefined reward set. In this paper, we study GAIL in both online and offline settings with linear function approximation, where both the transition and reward function are linear in the feature maps. Besides the expert demonstration, in the online setting the agent can interact with the environment, while in the offline setting the agent only accesses an additional dataset collected by a prior. For online GAIL, we propose an optimistic generative adversarial policy optimization algorithm (OGAP) and prove that OGAP achieves $\widetilde{\mathcal{O}}(H^2 d^{3/2}K^{1/2}+KH^{3/2}dN_1^{-1/2})$ regret. Here $N_1$ represents the number of trajectories of the expert demonstration, $d$ is the feature dimension, and $K$ is the number of episodes. For offline GAIL, we propose a pessimistic generative adversarial policy optimization algorithm (PGAP). For an arbitrary additional dataset, we obtain the optimality gap of PGAP, achieving the minimax lower bound in the utilization of the additional dataset. Assuming sufficient coverage on the additional dataset, we show that PGAP achieves $\widetilde{\mathcal{O}}(H^{2}dK^{-1/2} +H^2d^{3/2}N_2^{-1/2}+H^{3/2}dN_1^{-1/2} \ )$ optimality gap. Here $N_2$ represents the number of trajectories of the additional dataset with sufficient coverage.