We propose Preferential MoE, a novel human-ML mixture-of-experts model that augments human expertise in decision making with a data-based classifier only when necessary for predictive performance. Our model exhibits an interpretable gating function that provides information on when human rules should be followed or avoided. The gating function is maximized for using human-based rules, and classification errors are minimized. We propose solving a coupled multi-objective problem with convex subproblems. We develop approximate algorithms and study their performance and convergence. Finally, we demonstrate the utility of Preferential MoE on two clinical applications for the treatment of Human Immunodeficiency Virus (HIV) and management of Major Depressive Disorder (MDD).
Algorithmic risk assessments are increasingly used to make and inform decisions in a wide variety of high-stakes settings. In practice, there is often a multitude of predictive models that deliver similar overall performance, an empirical phenomenon commonly known as the "Rashomon Effect." While many competing models may perform similarly overall, they may have different properties over various subgroups, and therefore have drastically different predictive fairness properties. In this paper, we develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance, or "the set of good models." We provide tractable algorithms to compute the range of attainable group-level predictive disparities and the disparity minimizing model over the set of good models. We extend our framework to address the empirically relevant challenge of selectively labelled data in the setting where the selection decision and outcome are unconfounded given the observed data features. We illustrate our methods in two empirical applications. In a real world credit-scoring task, we build a model with lower predictive disparities than the benchmark model, and demonstrate the benefits of properly accounting for the selective labels problem. In a recidivism risk prediction task, we audit an existing risk score, and find that it generates larger predictive disparities than any model in the set of good models.
We review mathematical foundations of convolutional neural nets (CNNs) with the goals of: i) highlighting connections with techniques from statistics, signal processing, linear algebra, differential equations, and optimization, ii) demystifying underlying computations, and iii) identifying new types of applications. CNNs are powerful machine learning models that highlight features from grid data to make predictions (regression and classification). The grid data object can be represented as vectors (in 1D), matrices (in 2D), or tensors (in 3D or higher dimensions) and can incorporate multiple channels (thus providing high flexibility in the input data representation). For example, an image can be represented as a 2D grid data object that contains red, green, and blue (RBG) channels (each channel is a 2D matrix). Similarly, a video can be represented as a 3D grid data object (two spatial dimensions plus time) with RGB channels (each channel is a 3D tensor). CNNs highlight features from the grid data by performing convolution operations with different types of operators. The operators highlight different types of features (e.g., patterns, gradients, geometrical features) and are learned by using optimization techniques. In other words, CNNs seek to identify optimal operators that best map the input data to the output data. A common misconception is that CNNs are only capable of processing image or video data but their application scope is much wider; specifically, datasets encountered in diverse applications can be expressed as grid data. Here, we show how to apply CNNs to new types of applications such as optimal control, flow cytometry, multivariate process monitoring, and molecular simulations.
Despite years of research into patient falls in hospital rooms, falls and related injuries remain a serious concern to patient safety. In this work, we formulate a gradient-free constrained optimization problem to generate and reconfigure the hospital room interior layout to minimize the risk of falls. We define a cost function built on a hospital room fall model that takes into account the supportive or hazardous effect of the patient's surrounding objects, as well as simulated patient trajectories inside the room. We define a constraint set that ensures the functionality of the generated room layouts in addition to conforming to architectural guidelines. We solve this problem efficiently using a variant of simulated annealing. We present results for two real-world hospital room types and demonstrate a significant improvement of 18% on average in patient fall risk when compared with a traditional hospital room layout and 41% when compared with randomly generated layouts.
Domain knowledge is useful to improve the generalization performance of learning machines. Sign constraints are a handy representation to combine domain knowledge with learning machine. In this paper, we consider constraining the signs of the weight coefficients in learning the linear support vector machine, and develop two optimization algorithms for minimizing the empirical risk under the sign constraints. One of the two algorithms is based on the projected gradient method, in which each iteration of the projected gradient method takes $O(nd)$ computational cost and the sublinear convergence of the objective error is guaranteed. The second algorithm is based on the Frank-Wolfe method that also converges sublinearly and possesses a clear termination criterion. We show that each iteration of the Frank-Wolfe also requires $O(nd)$ cost. Furthermore, we derive the explicit expression for the minimal iteration number to ensure an $\epsilon$-accurate solution by analyzing the curvature of the objective function. Finally, we empirically demonstrate that the sign constraints are a promising technique when similarities to the training examples compose the feature vector.
The study of attentional processing in vision has a long and deep history. Recently, several papers have presented insightful perspectives into how the coordination of multiple attentional functions in the brain might occur. These begin with experimental observations and the authors propose structures, processes, and computations that might explain those observations. Here, we consider a perspective that past works have not, as a complementary approach to the experimentally-grounded ones. We approach the same problem as past authors but from the other end of the computational spectrum, from the problem nature, as Marr's Computational Level would prescribe. What problem must the brain solve when orchestrating attentional processes in order to successfully complete one of the myriad possible visuospatial tasks at which we as humans excel? The hope, of course, is for the approaches to eventually meet and thus form a complete theory, but this is likely not soon. We make the first steps towards this by addressing the necessity of attentional control, examining the breadth and computational difficulty of the visuospatial and attentional tasks seen in human behavior, and suggesting a sketch of how attentional control might arise in the brain. The key conclusions of this paper are that an executive controller is necessary for human attentional function in vision, and that there is a 'first principles' computational approach to its understanding that is complementary to the previous approaches that focus on modelling or learning from experimental observations directly.
Microarray technology is known as one of the most important tools for collecting DNA expression data. This technology allows researchers to investigate and examine types of diseases and their origins. However, microarray data are often associated with challenges such as small sample size, a significant number of genes, imbalanced data, etc. that make classification models inefficient. Thus, a new hybrid solution based on multi-filter and adaptive chaotic multi-objective forest optimization algorithm (AC-MOFOA) is presented to solve the gene selection problem and construct the Ensemble Classifier. In the proposed solution, to reduce the dataset's dimensions, a multi-filter model uses a combination of five filter methods to remove redundant and irrelevant genes. Then, an AC-MOFOA based on the concepts of non-dominated sorting, crowding distance, chaos theory, and adaptive operators is presented. AC-MOFOA as a wrapper method aimed at reducing dataset dimensions, optimizing KELM, and increasing the accuracy of the classification, simultaneously. Next, in this method, an ensemble classifier model is presented using AC-MOFOA results to classify microarray data. The performance of the proposed algorithm was evaluated on nine public microarray datasets, and its results were compared in terms of the number of selected genes, classification efficiency, execution time, time complexity, and hypervolume indicator criterion with five hybrid multi-objective methods. According to the results, the proposed hybrid method could increase the accuracy of the KELM in most datasets by reducing the dataset's dimensions and achieve similar or superior performance compared to other multi-objective methods. Furthermore, the proposed Ensemble Classifier model could provide better classification accuracy and generalizability in microarray data compared to conventional ensemble methods.
Industrial Internet of Things (IIoT) lays a new paradigm for the concept of Industry 4.0 and paves an insight for new industrial era. Nowadays smart machines and smart factories use machine learning/deep learning based models for incurring intelligence. However, storing and communicating the data to the cloud and end device leads to issues in preserving privacy. In order to address this issue, federated learning (FL) technology is implemented in IIoT by the researchers nowadays to provide safe, accurate, robust and unbiased models. Integrating FL in IIoT ensures that no local sensitive data is exchanged, as the distribution of learning models over the edge devices has become more common with FL. Therefore, only the encrypted notifications and parameters are communicated to the central server. In this paper, we provide a thorough overview on integrating FL with IIoT in terms of privacy, resource and data management. The survey starts by articulating IIoT characteristics and fundamentals of distributive and FL. The motivation behind integrating IIoT and FL for achieving data privacy preservation and on-device learning are summarized. Then we discuss the potential of using machine learning, deep learning and blockchain techniques for FL in secure IIoT. Further we analyze and summarize the ways to handle the heterogeneous and huge data. Comprehensive background on data and resource management are then presented, followed by applications of IIoT with FL in healthcare and automobile industry. Finally, we shed light on challenges, some possible solutions and potential directions for future research.
The pandemic caused by the SARS-CoV-2 virus has exposed many flaws in the decision-making strategies used to distribute resources to combat global health crises. In this paper, we leverage reinforcement learning and optimization to improve upon the allocation strategies for various resources. In particular, we consider a problem where a central controller must decide where to send testing kits to learn about the uncertain states of the world (active learning); then, use the new information to construct beliefs about the states and decide where to allocate resources. We propose a general model coupled with a tunable lookahead policy for making vaccine allocation decisions without perfect knowledge about the state of the world. The lookahead policy is compared to a population-based myopic policy which is more likely to be similar to the present strategies in practice. Each vaccine allocation policy works in conjunction with a testing kit allocation policy to perform active learning. Our simulation results demonstrate that an optimization-based lookahead decision making strategy will outperform the presented myopic policy.
We propose a Distributional Approach to address Controlled Text Generation from pre-trained Language Models (LMs). This view permits to define, in a single formal framework, "pointwise" and "distributional" constraints over the target LM -- to our knowledge, this is the first approach with such generality -- while minimizing KL divergence with the initial LM distribution. The optimal target distribution is then uniquely determined as an explicit EBM (Energy-Based Model) representation. From that optimal representation we then train the target controlled autoregressive LM through an adaptive distributional variant of Policy Gradient. We conduct a first set of experiments over pointwise constraints showing the advantages of our approach over a set of baselines, in terms of obtaining a controlled LM balancing constraint satisfaction with divergence from the initial LM (GPT-2). We then perform experiments over distributional constraints, a unique feature of our approach, demonstrating its potential as a remedy to the problem of Bias in Language Models. Through an ablation study we show the effectiveness of our adaptive technique for obtaining faster convergence.