olivetti
Equality Graph Assisted Symbolic Regression
de Franca, Fabricio Olivetti, Kronberger, Gabriel
In Symbolic Regression (SR), Genetic Programming (GP) is a popular search algorithm that delivers state-of-the-art results in term of accuracy. Its success relies on the concept of neutrality, which induces large plateaus that the search can safely navigate to more promising regions. Navigating these plateaus, while necessary, requires the computation of redundant expressions, up to 60% of the total number of evaluation, as noted in a recent study. The equality graph (e-graph) structure can compactly store and group equivalent expressions enabling us to verify if a given expression and their variations were already visited by the search, thus enabling us to avoid unnecessary computation. We propose a new search algorithm for symbolic regression called SymRegg that revolves around the e-graph structure following simple steps: perturb solutions sampled from a selection of expressions stored in the e-graph, if it generates an unvisited expression, insert it into the e-graph and generates its equivalent forms. We show that SymRegg is capable of improving the efficiency of the search, maintaining consistently accurate results across different datasets while requiring a choice of a minimalist set of hyperparameters.
- Europe > Austria > Upper Austria (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- South America > Brazil (0.04)
- (2 more...)
Nonnegative matrix factorization and the principle of the common cause
Khalafyan, E., Allahverdyan, A. E., Hovhannisyan, A.
--Nonnegative matrix factorization (NMF) is a known unsupervised data-reduction method. The principle of the common cause (PCC) is a basic methodological approach in probabilistic causality, which seeks an independent mixture model for the joint probability of two dependent random variables. It turns out that these two concepts are closely related. This relationship is explored reciprocally for several datasets of gray-scale images, which are conveniently mapped into probability models. On one hand, PCC provides a predictability tool that leads to a robust estimation of the effective rank of NMF . Unlike other estimates (e.g., those based on the Bayesian Information Criteria), our estimate of the rank is stable against weak noise. We show that NMF implemented around this rank produces features (basis images) that are also stable against noise and against seeds of local optimization, thereby effectively resolving the NMF nonidentifiability problem. On the other hand, NMF provides an interesting possibility of implementing PCC in an approximate way, where larger and positively correlated joint probabilities tend to be explained better via the independent mixture model. We work out a clustering method, where data points with the same common cause are grouped into the same cluster . We also show how NMF can be employed for data denoising. Nonnegative matrix factorization (NMF) was proposed and developed in data science [1]-[3].
- Asia > Armenia (0.04)
- North America > United States > California (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)
Explained: Generative AI's environmental impact
In a two-part series, MIT News explores the environmental implications of generative AI. In this article, we look at why this technology is so resource-intensive. A second piece will investigate what experts are doing to reduce genAI's carbon footprint and other impacts. The excitement surrounding potential benefits of generative AI, from improving worker productivity to advancing scientific research, is hard to ignore. While the explosive growth of this new technology has enabled rapid deployment of powerful models in many industries, the environmental consequences of this generative AI "gold rush" remain difficult to pin down, let alone mitigate.
- North America > United States > Pennsylvania (0.05)
- North America > United States > California (0.05)
- Europe > Russia (0.05)
- (4 more...)
Improving Genetic Programming for Symbolic Regression with Equality Graphs
de Franca, Fabricio Olivetti, Kronberger, Gabriel
The search for symbolic regression models with genetic programming (GP) has a tendency of revisiting expressions in their original or equivalent forms. Repeatedly evaluating equivalent expressions is inefficient, as it does not immediately lead to better solutions. However, evolutionary algorithms require diversity and should allow the accumulation of inactive building blocks that can play an important role at a later point. The equality graph is a data structure capable of compactly storing expressions and their equivalent forms allowing an efficient verification of whether an expression has been visited in any of their stored equivalent forms. We exploit the e-graph to adapt the subtree operators to reduce the chances of revisiting expressions. Our adaptation, called eggp, stores every visited expression in the e-graph, allowing us to filter out from the available selection of subtrees all the combinations that would create already visited expressions. Results show that, for small expressions, this approach improves the performance of a simple GP algorithm to compete with PySR and Operon without increasing computational cost. As a highlight, eggp was capable of reliably delivering short and at the same time accurate models for a selected set of benchmarks from SRBench and a set of real-world datasets.
- North America > United States > District of Columbia > Washington (0.05)
- Europe > Austria > Upper Austria (0.04)
- South America > Brazil > São Paulo (0.04)
- (6 more...)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.66)
rEGGression: an Interactive and Agnostic Tool for the Exploration of Symbolic Regression Models
de Franca, Fabricio Olivetti, Kronberger, Gabriel
Regression analysis is used for prediction and to understand the effect of independent variables on dependent variables. Symbolic regression (SR) automates the search for non-linear regression models, delivering a set of hypotheses that balances accuracy with the possibility to understand the phenomena. Many SR implementations return a Pareto front allowing the choice of the best trade-off. However, this hides alternatives that are close to non-domination, limiting these choices. Equality graphs (e-graphs) allow to represent large sets of expressions compactly by efficiently handling duplicated parts occurring in multiple expressions. E-graphs allow to store and query all SR solution candidates visited in one or multiple GP runs efficiently and open the possibility to analyse much larger sets of SR solution candidates. We introduce rEGGression, a tool using e-graphs to enable the exploration of a large set of symbolic expressions which provides querying, filtering, and pattern matching features creating an interactive experience to gain insights about SR models. The main highlight is its focus in the exploration of the building blocks found during the search that can help the experts to find insights about the studied phenomena.This is possible by exploiting the pattern matching capability of the e-graph data structure.
- North America > United States > District of Columbia > Washington (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Austria > Upper Austria (0.04)
- (7 more...)
Q&A: More-sustainable concrete with machine learning
Its use dates back to early civilizations, and today it is the most popular composite choice in the world. Production of its key ingredient, cement, contributes 8-9 percent of the global anthropogenic CO2 emissions and 2-3 percent of energy consumption, which is only projected to increase in the coming years. With aging United States infrastructure, the federal government recently passed a milestone bill to revitalize and upgrade it, along with a push to reduce greenhouse gas emissions where possible, putting concrete in the crosshairs for modernization, too. Elsa Olivetti, the Esther and Harold E. Edgerton Associate Professor in the MIT Department of Materials Science and Engineering, and Jie Chen, MIT-IBM Watson AI Lab research scientist and manager, think artificial intelligence can help meet this need by designing and formulating new, more sustainable concrete mixtures, with lower costs and carbon dioxide emissions, while improving material performance and reusing manufacturing byproducts in the material itself. Olivetti's research improves environmental and economic sustainability of materials, and Chen develops and optimizes machine learning and computational techniques, which he can apply to materials reformulation.
- Energy (1.00)
- Materials > Construction Materials (0.51)
Reasoning about Typicality and Probabilities in Preferential Description Logics
Giordano, Laura, Gliozzi, Valentina, Lieto, Antonio, Olivetti, Nicola, Pozzato, Gian Luca
In this work we describe preferential Description Logics of typicality, a nonmonotonic extension of standard Description Logics by means of a typicality operator T allowing to extend a knowledge base with inclusions of the form T(C) D, whose intuitive meaning is that "normally/typically Cs are also Ds". This extension is based on a minimal model semantics corresponding to a notion of rational closure, built upon preferential models. We recall the basic concepts underlying preferential Description Logics. We also present two extensions of the preferential semantics: on the one hand, we consider probabilistic extensions, based on a distributed semantics that is suitable for tackling the problem of commonsense concept combination, on the other hand, we consider other strengthening of the rational closure semantics and construction to avoid the so called "blocking of property inheritance problem".
- Europe > Italy > Piedmont > Turin Province > Turin (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > Spain > Andalusia > Granada Province > Granada (0.04)
- (2 more...)
Artificial intelligence aids materials fabrication
In recent years, research efforts such as the Materials Genome Initiative and the Materials Project have produced a wealth of computational tools for designing new materials useful for a range of applications, from energy and electronics to aeronautics and civil engineering. But developing processes for producing those materials has continued to depend on a combination of experience, intuition, and manual literature reviews. A team of researchers at MIT, the University of Massachusetts at Amherst, and the University of California at Berkeley hope to close that materials-science automation gap, with a new artificial-intelligence system that would pore through research papers to deduce "recipes" for producing particular materials. "Computational materials scientists have made a lot of progress in the'what' to make -- what material to design based on desired properties," says Elsa Olivetti, the Atlantic Richfield Assistant Professor of Energy Studies in MIT's Department of Materials Science and Engineering (DMSE). "But because of that success, the bottleneck has shifted to, 'Okay, now how do I make it?'"
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.40)
- North America > United States > California (0.26)
- North America > Canada (0.05)