veeramachaneni
MIT Taxonomy Helps Build Explainability Into the Components of Machine-Learning Models
Researchers develop tools to help data scientists make the features used in machine-learning models more understandable for end users. Explanation methods that help users understand and trust machine-learning models often describe how much certain features used in the model contribute to its prediction. Researchers develop tools to help data scientists make the features used in machine-learning models more understandable for end users. Explanation methods that help users understand and trust machine-learning models often describe how much certain features used in the model contribute to its prediction. For example, if a model predicts a patient's risk of developing cardiac disease, a physician might want to know how strongly the patient's heart rate data influences that prediction.
Building explainability into the components of machine-learning models
Explanation methods that help users understand and trust machine-learning models often describe how much certain features used in the model contribute to its prediction. For example, if a model predicts a patient's risk of developing cardiac disease, a physician might want to know how strongly the patient's heart rate data influences that prediction. But if those features are so complex or convoluted that the user can't understand them, does the explanation method do any good? MIT researchers are striving to improve the interpretability of features so decision makers will be more comfortable using the outputs of machine-learning models. Drawing on years of field work, they developed a taxonomy to help developers craft features that will be easier for their target audience to understand.
Veeramachaneni
We develop multivariate copulas for modeling multiple joint distributions of wind speeds at a wind farm site and neighboring wind source. A ndimensional Gaussian copula and multiple copula graphical models enhance the quality of the prediction site distribution. The models, in comparison to multiple regression, achieve higher accuracy and lower cost because they require less sensing data.
Synthetic Data May Not Be AI's Privacy Silver Bullet - Liwaiwai
Synthetic datasets are increasingly being used to train AI models. These promise greater privacy and less bias, but are not without their drawbacks. Synthetic datasets are becoming increasingly popular for training artificial intelligence models. Proponents of this computer-generated data say it protects personal information and reduces the chances of bias emerging in AI systems. But for many, concerns over privacy and accuracy remain.
- Europe > United Kingdom (0.05)
- Europe > Switzerland > Vaud > Lausanne (0.05)
- Europe > France (0.05)
One-stop machine learning platform turns health care data into insights
Over the past decade, hospitals and other health care providers have put massive amounts of time and energy into adopting electronic health care records, turning hastily scribbled doctors' notes into durable sources of information. But collecting these data is less than half the battle. It can take even more time and effort to turn these records into actual insights -- ones that use the learnings of the past to inform future decisions. Cardea, a software system built by researchers and software engineers at MIT's Data to AI Lab (DAI Lab), is built to help with that. By shepherding hospital data through an ever-increasing set of machine learning models, the system could assist hospitals in planning for events as large as global pandemics and as small as no-show appointments.
A human-machine collaboration to defend against cyberattacks
Being a cybersecurity analyst at a large company today is a bit like looking for a needle in a haystack -- if that haystack were hurtling toward you at fiber optic speed. Every day, employees and customers generate loads of data that establish a normal set of behaviors. An attacker will also generate data while using any number of techniques to infiltrate the system; the goal is to find that "needle" and stop it before it does any damage. The data-heavy nature of that task lends itself well to the number-crunching prowess of machine learning, and an influx of AI-powered systems have indeed flooded the cybersecurity market over the years. But such systems can come with their own problems, namely a never-ending stream of false positives that can make them more of a time suck than a time saver for security analysts.
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (1.00)
Cracking open the black box of automated machine learning
Researchers from MIT and elsewhere have developed an interactive tool that, for the first time, lets users see and control how automated machine-learning systems work. The aim is to build confidence in these systems and find ways to improve them. Designing a machine-learning model for a certain task -- such as image classification, disease diagnoses, and stock market prediction -- is an arduous, time-consuming process. Experts first choose from among many different algorithms to build the model around. Then, they manually tweak "hyperparameters" -- which determine the model's overall structure -- before the model starts training.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.40)
- Asia > China > Hong Kong (0.05)
- Transportation > Air (0.42)
- Health & Medicine > Consumer Health (0.30)
Cracking open the black box of automated machine learning
Researchers from MIT and elsewhere have developed an interactive tool that, for the first time, lets users see and control how automated machine-learning systems work. The aim is to build confidence in these systems and find ways to improve them. Designing a machine-learning model for a certain task -- such as image classification, disease diagnoses, and stock market prediction -- is an arduous, time-consuming process. Experts first choose from among many different algorithms to build the model around. Then, they manually tweak "hyperparameters" -- which determine the model's overall structure -- before the model starts training.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.40)
- Asia > China > Hong Kong (0.05)
- Transportation > Air (0.42)
- Health & Medicine > Consumer Health (0.30)
Artificial data give the same results as real data -- without compromising privacy
Although data scientists can gain great insights from large data sets -- and can ultimately use these insights to tackle major challenges -- accomplishing this is much easier said than done. Many such efforts are stymied from the outset, as privacy concerns make it difficult for scientists to access the data they would like to work with. In a paper presented at the IEEE International Conference on Data Science and Advanced Analytics, members of the Data to AI Lab at the MIT Laboratory for Information and Decision Systems (LIDS) Kalyan Veeramachaneni, principal research scientist in LIDS and the Institute for Data, Systems, and Society (IDSS) and co-authors Neha Patki and Roy Wedge describe a machine learning system that automatically creates synthetic data -- with the goal of enabling data science efforts that, due to a lack of access to real data, may have otherwise not left the ground. While the use of authentic data can cause significant privacy concerns, this synthetic data is completely different from that produced by real users -- but can still be used to develop and test data science algorithms and models. "Once we model an entire database, we can sample and recreate a synthetic version of the data that very much looks like the original database, statistically speaking," says Veeramachaneni.
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.37)
ML 2.0: Machine learning for many
Today, when an enterprise wants to use machine learning to solve a problem, they have to call in the cavalry. Even a simple problem requires multiple data scientists, machine learning experts, and domain experts to come together to agree on priorities and exchange data and information. This process is often inefficient, and it takes months to get results. It also only solves the problem immediate at hand. The next time something comes up, the enterprise has to do the same thing all over again.