magdon-ismail
Column Selection via Adaptive Sampling
Saurabh Paul, Malik Magdon-Ismail, Petros Drineas
Selecting a good column (or row) subset of massive data matrices has found many applications in data analysis and machine learning. We propose a new adaptive sampling algorithm that can be used to improve any relative-error column selection algorithm. Our algorithm delivers a tighter theoretical bound on the approximation error which we also demonstrate empirically using two well known relative-error column subset selection algorithms. Our experimental results on synthetic and real-world data show that our algorithm outperforms non-adaptive sampling as well as prior adaptive sampling approaches.
Column Selection via Adaptive Sampling
Selecting a good column (or row) subset of massive data matrices has found many applications in data analysis and machine learning. We propose a new adaptive sampling algorithm that can be used to improve any relative-error column selection algorithm. Our algorithm delivers a tighter theoretical bound on the approximation error which we also demonstrate empirically using two well known relative-error column subset selection algorithms. Our experimental results on synthetic and real-world data show that our algorithm outperforms non-adaptive sampling as well as prior adaptive sampling approaches.
Some Inapproximability Results of MAP Inference and Exponentiated Determinantal Point Processes
We study the computational complexity of two hard problems on determinantal point processes (DPPs). One is maximum a posteriori (MAP) inference, i.e., to find a principal submatrix having the maximum determinant. The other is probabilistic inference on exponentiated DPPs (E-DPPs), which can sharpen or weaken the diversity preference of DPPs with an exponent parameter p. We present several complexity-theoretic hardness results that explain the difficulty in approximating MAP inference and the normalizing constant for E-DPPs. We first prove that unconstrained MAP inference for an n × n matrix is NP-hard to approximate within a factor of 2βn, where β = 10−1013 . This result improves upon the best-known inapproximability factor of (9/8 − ϵ), and rules out the existence of any polynomial-factor approximation algorithm assuming P ≠ NP. We then show that log-determinant maximization is NP-hard to approximate within a factor of 5/4 for the unconstrained case and within a factor of 1 + 10−1013 for the size-constrained monotone case. In particular, log-determinant maximization does not admit a polynomial-time approximation scheme unless P = NP. As a corollary of the first result, we demonstrate that the normalizing constant for E-DPPs of any (fixed) constant exponent p ≥ β-1 = 101013 is NP-hard to approximate within a factor of 2βpn, which is in contrast to the case of p ≤ 1 admitting a fully polynomial-time randomized approximation scheme.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States > Pennsylvania (0.04)
Machine Learning Models Predict COVID-19 Impact in Smaller Cities
According to a robust machine learning model that can predict pandemic impact even in smaller cities, with 75% of the population in the Capital Region in New York remaining at home, the COVID-19 pandemic will peak locally in the second half of May. If the rate of people staying home drops to 50%, it will peak in early June. Rensselaer Polytechnic Institute researcher Malik Magdon-Ismail tailored the models he is developing to work with sparse data points, like those available during the early phase in a pandemic or in smaller cities, which ordinarily make trend-spotting difficult. "There are no simple, robust, general tools that, for example, officials in Albany could use to make projections," said Magdon-Ismail, a professor of computer science, and expert in machine learning, data mining, and pattern recognition. "These models show that the projections vary enormously from one city to another. This knowledge could relieve some of the uncertainty that is around in developing policy."
Machine Learning the Phenomenology of COVID-19 From Early Infection Dynamics
We present a robust data-driven machine learning analysis of the COVID-19 pandemic from its early infection dynamics, specifically infection counts over time. The goal is to extract actionable public health insights. These insights include the infectious force, the rate of a mild infection becoming serious, estimates for asymtomatic infections and predictions of new infections over time. We focus on USA data starting from the first confirmed infection on January 20 2020. Our methods reveal significant asymptomatic (hidden) infection, a lag of about 10 days, and we quantitatively confirm that the infectious force is strong with about a 0.14% transition from mild to serious infection. Our methods are efficient, robust and general, being agnostic to the specific virus and applicable to different populations or cohorts.
- Research Report > Experimental Study (0.46)
- Research Report > New Finding (0.46)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Epidemiology (1.00)
Column Selection via Adaptive Sampling
Paul, Saurabh, Magdon-Ismail, Malik, Drineas, Petros
Selecting a good column (or row) subset of massive data matrices has found many applications in data analysis and machine learning. We propose a new adaptive sampling algorithm that can be used to improve any relative-error column selection algorithm. Our algorithm delivers a tighter theoretical bound on the approximation error which we also demonstrate empirically using two well known relative-error column subset selection algorithms. Our experimental results on synthetic and real-world data show that our algorithm outperforms non-adaptive sampling as well as prior adaptive sampling approaches.