Data anomalies are ubiquitous in real world datasets, and can have an adverse impact on machine learning (ML) systems, such as automated home valuation. Detecting anomalies could make ML applications more responsible and trustworthy. However, the lack of labels for anomalies and the complex nature of real-world datasets make anomaly detection a challenging unsupervised learning problem. In this paper, we propose a novel model-based anomaly detection method, that we call Out-of- Bag anomaly detection, which handles multi-dimensional datasets consisting of numerical and categorical features. The proposed method decomposes the unsupervised problem into the training of a set of ensemble models. Out-of-Bag estimates are leveraged to derive an effective measure for anomaly detection. We not only demonstrate the state-of-the-art performance of our method through comprehensive experiments on benchmark datasets, but also show our model can improve the accuracy and reliability of an ML system as data pre-processing step via a case study on home valuation.
San Diego Supercomputer Center makes high performance computing resources available to researchers via a "condo cluster" model. Many homebuyers have found that the most affordable path to homeownership leads to a condominium, in which the purchaser buys a piece of a much larger building. This same model is in play today in the high performance computing centers at many universities. Under this "condo cluster" model, faculty researchers buy a piece of a much larger HPC system. In a common scenario, researchers use equipment purchase funds from grants or other funding sources to buy compute nodes that are added to the cluster.
Tsetlin Machines (TMs) capture patterns using conjunctive clauses in propositional logic, thus facilitating interpretation. However, recent TM-based approaches mainly rely on inspecting the full range of clauses individually. Such inspection does not necessarily scale to complex prediction problems that require a large number of clauses. In this paper, we propose closed-form expressions for understanding why a TM model makes a specific prediction (local interpretability). Additionally, the expressions capture the most important features of the model overall (global interpretability). We further introduce expressions for measuring the importance of feature value ranges for continuous features. The expressions are formulated directly from the conjunctive clauses of the TM, making it possible to capture the role of features in real-time, also during the learning process as the model evolves. Additionally, from the closed-form expressions, we derive a novel data clustering algorithm for visualizing high-dimensional data in three dimensions. Finally, we compare our proposed approach against SHAP and state-of-the-art interpretable machine learning techniques. For both classification and regression, our evaluation show correspondence with SHAP as well as competitive prediction accuracy in comparison with XGBoost, Explainable Boosting Machines, and Neural Additive Models.
Artificial intelligence (AI) presents an opportunity to transform how we allocate credit and risk, and to create fairer, more inclusive systems. AI's ability to avoid the traditional credit reporting and scoring system that helps perpetuate existing bias makes it a rare, if not unique, opportunity to alter the status quo. However, AI can easily go in the other direction to exacerbate existing bias, creating cycles that reinforce biased credit allocation while making discrimination in lending even harder to find. Will we unlock the positive, worsen the negative, or maintain the status quo by embracing new technology? This paper proposes a framework to evaluate the impact of AI in consumer lending. The goal is to incorporate new data and harness AI to expand credit to consumers who need it on better terms than are currently provided. It builds on our existing system's dual goals of pricing financial services based on the true risk the individual consumer poses while aiming to prevent discrimination (e.g., race, gender, DNA, marital status, etc.).
Algorithmic bias is the systematic preferential or discriminatory treatment of a group of people by an artificial intelligence system. In this work we develop a random-effects based metric for the analysis of social bias in supervised machine learning prediction models where model outputs depend on U.S. locations. We define a methodology for using U.S. Census data to measure social bias on user attributes legally protected against discrimination, such as ethnicity, sex, and religion, also known as protected attributes. We evaluate our method on the Strategic Subject List (SSL) gun-violence prediction dataset, where we have access to both U.S. Census data as well as ground truth protected attributes for 224,235 individuals in Chicago being assessed for participation in future gun-violence incidents. Our results indicate that quantifying social bias using U.S. Census data provides a valid approach to auditing a supervised algorithmic decision-making system. Using our methodology, we then quantify the potential social biases of 100 million ridehailing samples in the city of Chicago. This work is the first large-scale fairness analysis of the dynamic pricing algorithms used by ridehailing applications. An analysis of Chicago ridehailing samples in conjunction with American Community Survey data indicates possible disparate impact due to social bias based on age, house pricing, education, and ethnicity in the dynamic fare pricing models used by ridehailing applications, with effect-sizes of 0.74, 0.70, 0.34, and -0.31 (using Cohen's d) for each demographic respectively. Further, our methodology provides a principled approach to quantifying algorithmic bias on datasets where protected attributes are unavailable, given that U.S. geolocations and algorithmic decisions are provided.
A "knowledge graph" of the COVID-19 disease's many "strains" created by startup Graphen.ai. Each dot is a strain of COVID-19 or a family of COVID-19, the lines show how one strain descends from another. Everyone who has tried to figure out something has experienced the pleasure of seeing how things fit together -- connecting the dots, or following the money, as they say. One of the most fascinating technologies in vogue is a tool that can automate the process of making connections. Called a knowledge graph, it gathers up all the data trapped in various databases and in emails and digital repositories of all sorts, and draws conclusions about how they fit together.
The Dwight and Dian Diercks Computational Science Hall is now open at the Milwaukee School of Engineering campus. A new academic center for next-generation technologies is celebrating its grand opening at Milwaukee School of Engineering. The Dwight and Dian Diercks Computational Science Hall, funded by a donation from Dwight Diercks, a MSOE regent and alumnus, and his wife, Dian, was just completed at 1025 N. Milwaukee St. -- in the center of the downtown campus. It will prepare students for such growing fields as artificial intelligence, deep learning, cybersecurity, robotics and cloud computing. Artificial intelligence involves computers simulating human behavior to perform tasks.
A robotic arm allows amputees to touch and feel objects again by using the power of thought to control it. The high-tech prosthetic, developed by the University of Utah, uses microwires implanted under the skin, which send signals to an external computer that tells the arm to move. The arm even has sensors that transmit signals to the microwires, mimicking the feeling of the hand when it grabs something. This allows users to'feel' objects being held so the brain knows not to cause the prosthetic hand too squeeze too tightly. Fascinating video shows real estate agent Keven Walgamott, who lost his hand and part of his arm in an accident, able to pluck grapes and hold eggs without crushing them - and even put on his wedding ring.
Rolnick, David, Donti, Priya L., Kaack, Lynn H., Kochanski, Kelly, Lacoste, Alexandre, Sankaran, Kris, Ross, Andrew Slavin, Milojevic-Dupont, Nikola, Jaques, Natasha, Waldman-Brown, Anna, Luccioni, Alexandra, Maharaj, Tegan, Sherwin, Evan D., Mukkavilli, S. Karthik, Kording, Konrad P., Gomes, Carla, Ng, Andrew Y., Hassabis, Demis, Platt, John C., Creutzig, Felix, Chayes, Jennifer, Bengio, Yoshua
Climate change is one of the greatest challenges facing humanity, and we, as machine learning experts, may wonder how we can help. Here we describe how machine learning can be a powerful tool in reducing greenhouse gas emissions and helping society adapt to a changing climate. From smart grids to disaster management, we identify high impact problems where existing gaps can be filled by machine learning, in collaboration with other fields. Our recommendations encompass exciting research questions as well as promising business opportunities. We call on the machine learning community to join the global effort against climate change.
Complex black-box predictive models may have high accuracy, but opacity causes problems like lack of trust, lack of stability, sensitivity to concept drift. On the other hand, interpretable models require more work related to feature engineering, which is very time consuming. Can we train interpretable and accurate models, without timeless feature engineering? In this article, we show a method that uses elastic black-boxes as surrogate models to create a simpler, less opaque, yet still accurate and interpretable glass-box models. New models are created on newly engineered features extracted/learned with the help of a surrogate model. We show applications of this method for model level explanations and possible extensions for instance level explanations. We also present an example implementation in Python and benchmark this method on a number of tabular data sets.