AITopics | Serra, Thiago

Collaborating Authors

Serra, Thiago

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Computational Tradeoffs of Optimization-Based Bound Tightening in ReLU Networks

Badilla, Fabian, Goycoolea, Marcos, Muñoz, Gonzalo, Serra, Thiago

arXiv.org Artificial IntelligenceJan-30-2024

The use of Mixed-Integer Linear Programming (MILP) models to represent neural networks with Rectified Linear Unit (ReLU) activations has become increasingly widespread in the last decade. This has enabled the use of MILP technology to test--or stress--their behavior, to adversarially improve their training, and to embed them in optimization models leveraging their predictive power. Many of these MILP models rely on activation bounds. That is, bounds on the input values of each neuron. In this work, we explore the tradeoff between the tightness of these bounds and the computational effort of solving the resulting MILP models. We provide guidelines for implementing these models based on the impact of network structure, regularization, and rounding.

artificial intelligence, machine learning, optimization problem, (14 more...)

arXiv.org Artificial Intelligence

2312.16699

Country: South America > Chile (0.29)

Genre: Research Report (0.82)

Industry: Energy > Power Industry (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Optimization Over Trained Neural Networks: Taking a Relaxing Walk

Tong, Jiatai, Cai, Junyang, Serra, Thiago

arXiv.org Artificial IntelligenceJan-28-2024

Besides training, mathematical optimization is also used in deep learning to model and solve formulations over trained neural networks for purposes such as verification, compression, and optimization with learned constraints. However, solving these formulations soon becomes difficult as the network size grows due to the weak linear relaxation and dense constraint matrix. We have seen improvements in recent years with cutting plane algorithms, reformulations, and an heuristic based on Mixed-Integer Linear Programming (MILP). In this work, we propose a more scalable heuristic based on exploring global and local linear relaxations of the neural network model. Our heuristic is competitive with a state-of-the-art MILP solver and the prior heuristic while producing better solutions with increases in input, depth, and number of neurons.

artificial intelligence, machine learning, optimization problem, (13 more...)

arXiv.org Artificial Intelligence

2401.03451

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (0.50)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

When Deep Learning Meets Polyhedral Theory: A Survey

Huchette, Joey, Muñoz, Gonzalo, Serra, Thiago, Tsay, Calvin

arXiv.org Artificial IntelligenceAug-31-2023

In the past decade, deep learning became the prevalent methodology for predictive modeling thanks to the remarkable accuracy of deep neural networks in tasks such as computer vision and natural language processing. Meanwhile, the structure of neural networks converged back to simpler representations based on piecewise constant and piecewise linear functions such as the Rectified Linear Unit (ReLU), which became the most commonly used type of activation function in neural networks. That made certain types of network structure $\unicode{x2014}$such as the typical fully-connected feedforward neural network$\unicode{x2014}$ amenable to analysis through polyhedral theory and to the application of methodologies such as Linear Programming (LP) and Mixed-Integer Linear Programming (MILP) for a variety of purposes. In this paper, we survey the main topics emerging from this fast-paced area of work, which bring a fresh perspective to understanding neural networks in more detail as well as to applying linear optimization techniques to train, verify, and reduce the size of such networks.

artificial intelligence, deep learning meet polyhedral theory, machine learning, (1 more...)

arXiv.org Artificial Intelligence

2305.00241

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Getting Away with More Network Pruning: From Sparsity to Geometry and Linear Regions

Cai, Junyang, Nguyen, Khai-Nguyen, Shrestha, Nishant, Good, Aidan, Tu, Ruisen, Yu, Xin, Zhe, Shandian, Serra, Thiago

arXiv.org Artificial IntelligenceJan-19-2023

One surprising trait of neural networks is the extent to which their connections can be pruned with little to no effect on accuracy. But when we cross a critical level of parameter sparsity, pruning any further leads to a sudden drop in accuracy. This drop plausibly reflects a loss in model complexity, which we aim to avoid. In this work, we explore how sparsity also affects the geometry of the linear regions defined by a neural network, and consequently reduces the expected maximum number of linear regions based on the architecture. We observe that pruning affects accuracy similarly to how sparsity affects the number of linear regions and our proposed bound for the maximum number. Conversely, we find out that selecting the sparsity across layers to maximize our bound very often improves accuracy in comparison to pruning as much with the same sparsity in all layers, thereby providing us guidance on where to prune.

artificial intelligence, linear region, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2301.07966

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Recall Distortion in Neural Network Pruning and the Undecayed Pruning Algorithm

Good, Aidan, Lin, Jiaqi, Sieg, Hannah, Ferguson, Mikey, Yu, Xin, Zhe, Shandian, Wieczorek, Jerzy, Serra, Thiago

arXiv.org Artificial IntelligenceNov-12-2022

Pruning techniques have been successfully used in neural networks to trade accuracy for sparsity. However, the impact of network pruning is not uniform: prior work has shown that the recall for underrepresented classes in a dataset may be more negatively affected. In this work, we study such relative distortions in recall by hypothesizing an intensification effect that is inherent to the model. Namely, that pruning makes recall relatively worse for a class with recall below accuracy and, conversely, that it makes recall relatively better for a class with recall above accuracy. In addition, we propose a new pruning algorithm aimed at attenuating such effect. Through statistical analysis, we have observed that intensification is less severe with our algorithm but nevertheless more pronounced with relatively more difficult tasks, less complex models, and higher pruning ratios. More surprisingly, we conversely observe a de-intensification effect with lower pruning ratios, which indicates that moderate pruning may have a corrective effect to such distortions.

artificial intelligence, machine learning, pruning, (20 more...)

arXiv.org Artificial Intelligence

2206.02976

Country: North America (0.46)

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Equivalent and Approximate Transformations of Deep Neural Networks

Kumar, Abhinav, Serra, Thiago, Ramalingam, Srikumar

arXiv.org Machine LearningMay-27-2019

Two networks are equivalent if they produce the same output for any given input. In this paper, we study the possibility of transforming a deep neural network to another network with a different number of units or layers, which can be either equivalent, a local exact approximation, or a global linear approximation of the original network. On the practical side, we show that certain rectified linear units (ReLUs) can be safely removed from a network if they are always active or inactive for any valid input. If we only need an equivalent network for a smaller domain, then more units can be removed and some layers collapsed. On the theoretical side, we constructively show that for any feed-forward ReLU network, there exists a global linear approximation to a 2-hidden-layer shallow network with a fixed number of units. This result is a balance between the increasing number of units for arbitrary approximation with a single layer and the known upper bound of $\lceil log(n_0+1)\rceil +1$ layers for exact representation, where $n_0$ is the input dimension. While the transformed network may require an exponential number of units to capture the activation patterns of the original network, we show that it can be made substantially smaller by only accounting for the patterns that define linear regions. Based on experiments with ReLU networks on the MNIST dataset, we found that $l_1$-regularization and adversarial training reduces the number of linear regions significantly as the number of stable units increases due to weight sparsity. Therefore, we can also intentionally train ReLU networks to allow for effective loss-less compression and approximation.

activation pattern, deep learning, neural network, (17 more...)

arXiv.org Machine Learning

1905.11428

Country:

Europe (0.28)
North America > United States > Utah (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Empirical Bounds on Linear Regions of Deep Rectifier Networks

Serra, Thiago, Ramalingam, Srikumar

arXiv.org Artificial IntelligenceOct-8-2018

One form of characterizing the expressiveness of a piecewise linear neural network is by the number of linear regions, or pieces, of the function modeled. We have observed substantial progress in this topic through lower and upper bounds on the maximum number of linear regions and a counting procedure. However, these bounds only account for the dimensions of the network and the exact counting may take a prohibitive amount of time, therefore making it infeasible to benchmark the expressiveness of networks. In addition, we present a tighter upper bound that leverages network coefficients. We test both on trained networks. The algorithm for probabilistic lower bounds is several orders of magnitude faster than exact counting and the values reach similar orders of magnitude, hence making our approach a viable method to compare the expressiveness of such networks. The refined upper bound is particularly stronger on networks with narrow layers. Neural networks with piecewise linear activations have become increasingly more common along the past decade, in particular since Nair & Hinton (2010) and Glorot et al. (2011). The simplest and most commonly used among such forms of activation is the Rectifier Linear Unit (ReLU), which outputs the maximum between 0 and its input argument (Hahnloser et al., 2000; LeCun et al., 2015). In the functions modeled by these networks, we can associate each part of the domain in which the network corresponds to an affine function with a particular set of units having positive outputs.

deep learning, linear region, neural network, (18 more...)

arXiv.org Artificial Intelligence

1810.0337

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

The Integrated Last-Mile Transportation Problem (ILMTP)

Raghunathan, Arvind U. (Mitsubishi Electric Research Laboratories) | Bergman, David (University of Connecticut) | Hooker, John (Carnegie Mellon University) | Serra, Thiago (Carnegie Mellon University) | Kobori, Shingo (Mitsubishi Electric Corporation)

AAAI ConferencesJun-20-2018

Last-mile transportation (LMT) refers to any service that moves passengers from a hub of mass transportation (MT), such as air, boat, bus, or train, to destinations, such as a home or an office. In this paper, we introduce the problem of scheduling passengers jointly on MT and LMT services, with passengers sharing a car, van, or autonomous pod of limited capacity for LMT. Passenger itineraries are determined so as to minimize total transit time for all passengers, with each passenger arriving at the destination within a specified time window. The transit time includes the time spent traveling through both services and, possibly, waiting time for transferring between the services. We provide an integer linear programming (ILP) formulation for this problem. Since the ILMTP, is NP-hard and problem instances of practical size are often difficult to solve, we study a restricted version where MT trips are uniform, all passengers have time windows of a common size, and LMT vehicles visit one destination per trip. We prove that there is an optimal solution that sorts and groups passengers by their deadlines and, based on this result, we propose a constructive grouping heuristic and local search operators to generate high-quality solutions. The resulting groups are optimally scheduled in a few seconds using another ILP formulation. Numerical results indicate that the solutions obtained by this heuristic are often close to optimal %, even when multiple destinations are allowed per group, and that warm-starting the ILP solver with such solutions decreases the overall computational times significantly.

ilmtp, integrated last-mile transportation problem

AAAI Conferences

Twenty-Eighth International Conference on Automated Planning and Scheduling

Industry: Transportation > Passenger (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.53)

Add feedback

How Could Polyhedral Theory Harness Deep Learning?

Serra, Thiago, Tjandraatmadja, Christian, Ramalingam, Srikumar

arXiv.org Machine LearningJun-17-2018

The holy grail of deep learning is to come up with an automatic method to design optimal architectures for different applications. In other words, how can we effectively dimension and organize neurons along the network layers based on the computational resources, input size, and amount of training data? We outline promising research directions based on polyhedral theory and mixed-integer representability that may offer an analytical approach to this question, in contrast to the empirical techniques often employed.

deep learning, linear region, neural network, (17 more...)

arXiv.org Machine Learning

1806.06365

Country: North America > United States (0.30)

Genre: Research Report (0.50)

Industry: Government (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bounding and Counting Linear Regions of Deep Neural Networks

Serra, Thiago, Tjandraatmadja, Christian, Ramalingam, Srikumar

arXiv.org Machine LearningJan-6-2018

In this paper, we study the representational power of deep neural networks (DNN) that belong to the family of piecewise-linear (PWL) functions, based on PWL activation units such as rectifier or maxout. We investigate the complexity of such networks by studying the number of linear regions of the PWL function. Typically, a PWL function from a DNN can be seen as a large family of linear functions acting on millions of such regions. We directly build upon the work of Montufar et al. (2014), Montufar (2017) and Raghu et al. (2017) by refining the upper and lower bounds on the number of linear regions for rectified and maxout networks. In addition to achieving tighter bounds, we also develop a novel method to perform exact enumeration or counting of the number of linear regions with a mixed-integer linear formulation that maps the input space to output. We use this new capability to visualize how the number of linear regions change while training DNNs.

deep learning, linear region, neural network, (18 more...)

arXiv.org Machine Learning

1711.02114

Country: North America > United States (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback