Goto

Collaborating Authors

 mdi


Axiomatizing Neural Networks via Pursuit of Subspaces

arXiv.org Machine Learning

While deep neural networks have achieved remarkable success across a wide range of domains, their underlying mechanisms remain poorly understood, and they are often regarded as black boxes. This gap between empirical performance and theoretical understanding poses a challenge analogous to the pre-axiomatic stage of classical geometry. In this work, we introduce the Pursuit of Subspaces (PoS) hypothesis, an axiomatic framework that formulates neural network behavior through a set of geometric postulates. These axioms, together with their derived consequences, provide a unified perspective on representation, computation, and generalization in both shallow and deep architectures. We show that this framework yields geometric explanations for fundamental questions in deep learning, including representation structure, architectural mechanisms, and generalization behavior, offering a principled step toward a coherent theoretical foundation.


Dynamic Features Adaptation in Networking: Toward Flexible training and Explainable inference

arXiv.org Artificial Intelligence

As AI becomes a native component of 6G network control, AI models must adapt to continuously changing conditions, including the introduction of new features and measurements driven by multi-vendor deployments, hardware upgrades, and evolving service requirements. To address this growing need for flexible learning in non-stationary environments, this vision paper highlights Adaptive Random Forests (ARFs) as a reliable solution for dynamic feature adaptation in communication network scenarios. We show that iterative training of ARFs can effectively lead to stable predictions, with accuracy improving over time as more features are added. In addition, we highlight the importance of explainability in AI-driven networks, proposing Drift-Aware Feature Importance (DAFI) as an efficient XAI feature importance (FI) method. DAFI uses a distributional drift detector to signal when to apply computationally intensive FI methods instead of lighter alternatives. Our tests on 3 different datasets indicate that our approach reduces runtime by up to 2 times, while producing more consistent feature importance values. Together, ARFs and DAFI provide a promising framework to build flexible AI methods adapted to 6G network use-cases.



MDI+: A Flexible Random Forest-Based Feature Importance Framework

arXiv.org Artificial Intelligence

Mean decrease in impurity (MDI) is a popular feature importance measure for random forests (RFs). We show that the MDI for a feature $X_k$ in each tree in an RF is equivalent to the unnormalized $R^2$ value in a linear regression of the response on the collection of decision stumps that split on $X_k$. We use this interpretation to propose a flexible feature importance framework called MDI+. Specifically, MDI+ generalizes MDI by allowing the analyst to replace the linear regression model and $R^2$ metric with regularized generalized linear models (GLMs) and metrics better suited for the given data structure. Moreover, MDI+ incorporates additional features to mitigate known biases of decision trees against additive or smooth models. We further provide guidance on how practitioners can choose an appropriate GLM and metric based upon the Predictability, Computability, Stability framework for veridical data science. Extensive data-inspired simulations show that MDI+ significantly outperforms popular feature importance measures in identifying signal features. We also apply MDI+ to two real-world case studies on drug response prediction and breast cancer subtype classification. We show that MDI+ extracts well-established predictive genes with significantly greater stability compared to existing feature importance measures. All code and models are released in a full-fledged python package on Github.


Interpreting Deep Forest through Feature Contribution and MDI Feature Importance

arXiv.org Artificial Intelligence

Deep forest is a non-differentiable deep model which has achieved impressive empirical success across a wide variety of applications, especially on categorical/symbolic or mixed modeling tasks. Many of the application fields prefer explainable models, such as random forests with feature contributions that can provide local explanation for each prediction, and Mean Decrease Impurity (MDI) that can provide global feature importance. However, deep forest, as a cascade of random forests, possesses interpretability only at the first layer. From the second layer on, many of the tree splits occur on the new features generated by the previous layer, which makes existing explanatory tools for random forests inapplicable. To disclose the impact of the original features in the deep layers, we design a calculation method with an estimation step followed by a calibration step for each layer, and propose our feature contribution and MDI feature importance calculation tools for deep forest. Experimental results on both simulated data and real world data verify the effectiveness of our methods.


New York's Landmark AI Bias Law Prompts Uncertainty - Michael Dukakis Institute for Leadership and Innovation (MDI)

#artificialintelligence

Companies that use AI in hiring are trying to determine how to comply with a New York law that mandates they test their systems for potential biases. Businesses and their service providers are grappling with how to comply with New York City's mandate for audits of artificial intelligence systems used in hiring. A New York City law that comes into effect in January will require companies to conduct audits to assess biases, including along race and gender lines, in the AI systems they use in hiring. Under New York's law, the hiring company is ultimately liable--and can face fines--for violations. But the requirement has posed some compliance challenges.


AI Knows if You Are Guilty of Greenwashing - Michael Dukakis Institute for Leadership and Innovation (MDI)

#artificialintelligence

Talk is cheap regarding companies talking up their credentials in Environment Social and Governance (ESG). But artificial intelligence and natural language processing can help identify those who are more serious about it than others. At U.S.-based fund manager Acadian Asset Management, the firm has developed a tool that uses artificial intelligence and machine learning to rank corporates by the seriousness of their intent. In the investment world, many fund managers base their decisions on which companies to invest in on how much they disclose about their activities. But this is only part of the story, says Acadian's director of responsible investing, Andy Moniz.


Six Steps to Responsible AI in the Federal Government - Michael Dukakis Institute for Leadership and Innovation (MDI)

#artificialintelligence

There is widespread agreement that responsible artificial intelligence requires principles such as fairness, transparency, privacy, human safety, and explainability. Nearly all ethicists and tech policy advocates stress these factors and push for algorithms that are fair, transparent, safe, and understandable. But it is not always clear how to operationalize these broad principles or how to handle situations where there are conflicts between competing goals. It is not easy to move from the abstract to the concrete in developing algorithms and sometimes a focus on one goal comes at the detriment of alternative objectives. In this paper, I discuss ways to operationalize responsible AI in the federal government.


Futuristic Technology At The Olympics: AI, IoT, And Robots - Michael Dukakis Institute for Leadership and Innovation (MDI)

#artificialintelligence

The Olympics are all about emotion – the drama of world-class competition, the pageantry of medal ceremonies, and the moment-to-moment celebrations of the human spirit in action. The 2022 Winter Games kicked off on February 4th in Beijing, China. Despite the fact that the Games feel a little different because of COVID restrictions, nearly 3,000 athletes from 91 countries are competing in 109 events across events like alpine skiing, figure skating, ice hockey, luge, bobsled, snowboarding, and speed skating. And behind the scenes, there are huge technological advances helping athletes become better, faster, and stronger. Let's take a look at how artificial intelligence, the IoT, and intelligent devices are being used at the Olympic Games.


A Machine-Learning-Ready Dataset Prepared from the Solar and Heliospheric Observatory Mission

arXiv.org Machine Learning

We present a Python tool to generate a standard dataset from solar images that allows for user-defined selection criteria and a range of pre-processing steps. Our Python tool works with all image products from both the Solar and Heliospheric Observatory (SoHO) and Solar Dynamics Observatory (SDO) missions. We discuss a dataset produced from the SoHO mission's multi-spectral images which is free of missing or corrupt data as well as planetary transits in coronagraph images, and is temporally synced making it ready for input to a machine learning system. Machine-learning-ready images are a valuable resource for the community because they can be used, for example, for forecasting space weather parameters. We illustrate the use of this data with a 3-5 day-ahead forecast of the north-south component of the interplanetary magnetic field (IMF) observed at Lagrange point one (L1). For this use case, we apply a deep convolutional neural network (CNN) to a subset of the full SoHO dataset and compare with baseline results from a Gaussian Naive Bayes classifier.