bertsima
Reviews: Generalized Linear Model Regression under Distance-to-set Penalties
I found much to like about this paper. It was very clear and well-written, and the proposed method is elegant, intuitive, novel (to the best of my knowledge), seemingly well motivated, and widely applicable. In addition, the main ideas in the paper are well-supported by appropriate numerical and real data examples. I would like to see the authors compare their method to two other methods that I would see as major competitors: first, the relaxed LASSO (Meinshausen 2007) wherein we fit the lasso and then do an unpenalized fit using only the variables that the lasso has selected. As I understand the field, this is the most popular method and the one that Hastie, Tibshirani, and Wainwright (2015) recommend for users who want to avoid the bias of the lasso (but note that the shrinkage "bias" is sometimes desirable to counteract the selection bias or "winner's curse" that the selected coefficients may suffer from). Second, best-subsets itself is now feasible using modern mixed-integer optimization methods (Bertsimas, Kind, and Mazumder, 2016) (if one of these competitors outperforms your method, it would not make me reconsider my recommendation to accept, since the method you propose applies to a much more general class of problems).
Optimal Control of Multiclass Fluid Queueing Networks: A Machine Learning Approach
Bertsimas, Dimitris, Kim, Cheol Woo
We propose a machine learning approach to the optimal control of multiclass fluid queueing networks (MFQNETs) that provides explicit and insightful control policies. We prove that a threshold type optimal policy exists for MFQNET control problems, where the threshold curves are hyperplanes passing through the origin. We use Optimal Classification Trees with hyperplane splits (OCT-H) to learn an optimal control policy for MFQNETs. We use numerical solutions of MFQNET control problems as a training set and apply OCT-H to learn explicit control policies. We report experimental results with up to 33 servers and 99 classes that demonstrate that the learned policies achieve 100\% accuracy on the test set. While the offline training of OCT-H can take days in large networks, the online application takes milliseconds.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Overview (1.00)
- Research Report (0.82)
Distributionally Robust Causal Inference with Observational Data
Bertsimas, Dimitris, Imai, Kosuke, Li, Michael Lingzhi
We consider the estimation of average treatment effects in observational studies and propose a new framework of robust causal inference with unobserved confounders. Our approach is based on distributionally robust optimization and proceeds in two steps. We first specify the maximal degree to which the distribution of unobserved potential outcomes may deviate from that of observed outcomes. We then derive sharp bounds on the average treatment effects under this assumption. Our framework encompasses the popular marginal sensitivity model as a special case, and we demonstrate how the proposed methodology can address a primary challenge of the marginal sensitivity model that it produces uninformative results when unobserved confounders substantially affect treatment and outcome. Specifically, we develop an alternative sensitivity model, called the distributional sensitivity model, under the assumption that heterogeneity of treatment effect due to unobserved variables is relatively small. Unlike the marginal sensitivity model, the distributional sensitivity model allows for potential lack of overlap and often produces informative bounds even when unobserved variables substantially affect both treatment and outcome. Finally, we show how to extend the distributional sensitivity model to difference-in-differences designs and settings with instrumental variables. Through simulation and empirical studies, we demonstrate the applicability of the proposed methodology.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > New York (0.04)
- North America > United States > Tennessee (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
- Education (1.00)
Behind Covid-19 vaccine development
When starting a vaccine program, scientists generally have anecdotal understanding of the disease they're aiming to target. When Covid-19 surfaced over a year ago, there were so many unknowns about the fast-moving virus that scientists had to act quickly and rely on new methods and techniques just to even begin understanding the basics of the disease. Scientists at Janssen Research & Development, developers of the Johnson & Johnson Covid-19 vaccine, leveraged real-world data and, working with MIT researchers, applied artificial intelligence and machine learning to help guide the company's research efforts into a potential vaccine. "Data science and machine learning can be used to augment scientific understanding of a disease," says Najat Khan, chief data science officer and global head of strategy and operations for Janssen Research & Development. "For Covid-19, these tools became even more important because our knowledge was rather limited. There was no hypothesis at the time. We were developing an unbiased understanding of the disease based on real-world data using sophisticated AI/ML algorithms."
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.40)
- South America > Brazil (0.05)
- Africa > South Africa (0.05)
- Research Report > Experimental Study (0.50)
- Research Report > New Finding (0.32)
A machine learning model behind COVID-19 vaccine development
When starting a vaccine program, scientists generally have anecdotal understanding of the disease they're aiming to target. When COVID-19 surfaced over a year ago, there were so many unknowns about the fast-moving virus that scientists had to act quickly and rely on new methods and techniques just to even begin understanding the basics of the disease. Scientists at Janssen Research & Development, developers of the Johnson & Johnson-Janssen COVID-19 vaccine, leveraged real-world data and, working with MIT researchers, applied artificial intelligence and machine learning to help guide the company's research efforts into a potential vaccine. "Data science and machine learning can be used to augment scientific understanding of a disease," says Najat Khan, chief data science officer and global head of strategy and operations for Janssen Research & Development. "For COVID-19, these tools became even more important because our knowledge was rather limited. There was no hypothesis at the time. We were developing an unbiased understanding of the disease based on real-world data using sophisticated AI/ML algorithms."
- Research Report > Experimental Study (0.50)
- Research Report > New Finding (0.32)
A new perspective on low-rank optimization
Bertsimas, Dimitris, Cory-Wright, Ryan, Pauphilet, Jean
A key question in many low-rank problems throughout optimization, machine learning, and statistics is to characterize the convex hulls of simple low-rank sets and judiciously apply these convex hulls to obtain strong yet computationally tractable convex relaxations. We invoke the matrix perspective function - the matrix analog of the perspective function-and characterize explicitly the convex hull of epigraphs of convex quadratic, matrix exponential, and matrix power functions under low-rank constraints. Further, we exploit these characterizations to develop strong relaxations for a variety of low-rank problems including reduced rank regression, non-negative matrix factorization, and factor analysis. We establish that these relaxations can be modeled via semidefinite and matrix power cone constraints, and thus optimized over tractably. The proposed approach parallels and generalizes the perspective reformulation technique in mixed-integer optimization, and leads to new relaxations for a broad class of problems.
- North America > United States > Massachusetts (0.28)
- Europe > United Kingdom > England (0.28)
Prediction with Missing Data
Bertsimas, Dimitris, Delarue, Arthur, Pauphilet, Jean
Missing information is inevitable in real-world data sets. While imputation is well-suited and theoretically sound for statistical inference, its relevance and practical implementation for out-of-sample prediction remains unsettled. We provide a theoretical analysis of widely used data imputation methods and highlight their key deficiencies in making accurate predictions. Alternatively, we propose adaptive linear regression, a new class of models that can be directly trained and evaluated on partially observed data, adapting to the set of available features. In particular, we show that certain adaptive regression models are equivalent to impute-then-regress methods where the imputation and the regression models are learned simultaneously instead of sequentially. We validate our theoretical findings and adaptive regression approach with numerical results with real-world data sets.
- Research Report > Experimental Study (0.94)
- Research Report > New Finding (0.93)
Mixed-Projection Conic Optimization: A New Paradigm for Modeling Rank Constraints
Bertsimas, Dimitris, Cory-Wright, Ryan, Pauphilet, Jean
We propose a framework for modeling and solving low-rank optimization problems to certifiable optimality. We introduce symmetric projection matrices that satisfy $Y^2=Y$, the matrix analog of binary variables that satisfy $z^2=z$, to model rank constraints. By leveraging regularization and strong duality, we prove that this modeling paradigm yields tractable convex optimization problems over the non-convex set of orthogonal projection matrices. Furthermore, we design outer-approximation algorithms to solve low-rank problems to certifiable optimality, compute lower bounds via their semidefinite relaxations, and provide near optimal solutions through rounding and local search techniques. We implement these numerical ingredients and, for the first time, solve low-rank optimization problems to certifiable optimality. Our algorithms also supply certifiably near-optimal solutions for larger problem sizes and outperform existing heuristics, by deriving an alternative to the popular nuclear norm relaxation which generalizes the perspective relaxation from vectors to matrices. All in all, our framework, which we name Mixed-Projection Conic Optimization, solves low-rank problems to certifiable optimality in a tractable and unified fashion.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > New York > New York County > New York City (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- (2 more...)
Dynamic optimization with side information
Bertsimas, Dimitris, McCord, Christopher, Sturt, Bradley
We present a data-driven framework for incorporating side information in dynamic optimization under uncertainty. Specifically, our approach uses predictive machine learning methods (such as k-nearest neighbors, kernel regression, and random forests) to weight the relative importance of various data-driven uncertainty sets in a robust optimization formulation. Through a novel measure concentration result for local machine learning methods, we prove that the proposed framework is asymptotically optimal for stochastic dynamic optimization with covariates. We also describe a general-purpose approximation for the proposed framework, based on overlapping linear decision rules, which is computationally tractable and produces high-quality solutions for dynamic problems with many stages. Across a variety of examples in shipment planning, inventory management, and finance, our method achieves improvements of up to 15% over alternatives and requires less than one minute of computation time on problems with twelve stages.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
A unified approach to mixed-integer optimization: Nonlinear formulations and scalable algorithms
Bertsimas, Dimitris, Cory-Wright, Ryan, Pauphilet, Jean
We propose a unified framework to address a family of classical mixed-integer optimization problems, including network design, facility location, unit commitment, sparse portfolio selection, binary quadratic optimization and sparse learning problems. These problems exhibit logical relationships between continuous and discrete variables, which are usually reformulated linearly using a big-M formulation. In this work, we challenge this longstanding modeling practice and express the logical constraints in a non-linear way. By imposing a regularization condition, we reformulate these problems as convex binary optimization problems, which are solvable using an outer-approximation procedure. In numerical experiments, we establish that a general-purpose numerical strategy, which combines cutting-plane, first-order and local search methods, solves these problems faster and at a larger scale than state-of-the-art mixed-integer linear or second-order cone methods. Our approach successfully solves network design problems with 100s of nodes and provides solutions up to 40\% better than the state-of-the-art; sparse portfolio selection problems with up to 3,200 securities compared with 400 securities for previous attempts; and sparse regression problems with up to 100,000 covariates.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Austria (0.04)