AITopics

2404.02866

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

arXiv.org Artificial IntelligenceApr-8-2023

Calibration of P-values for calibration and for deviation of a subpopulation from the full population

Tygert, Mark

The author's recent research papers, "Cumulative deviation of a subpopulation from the full population" and "A graphical method of cumulative differences between two subpopulations" (both published in volume 8 of Springer's open-access "Journal of Big Data" during 2021), propose graphical methods and summary statistics, without extensively calibrating formal significance tests. The summary metrics and methods can measure the calibration of probabilistic predictions and can assess differences in responses between a subpopulation and the full population while controlling for a covariate or score via conditioning on it. These recently published papers construct significance tests based on the scalar summary statistics, but only sketch how to calibrate the attained significance levels (also known as "P-values") for the tests. The present article reviews and synthesizes work spanning many decades in order to detail how to calibrate the P-values. The present paper presents computationally efficient, easily implemented numerical methods for evaluating properly calibrated P-values, together with rigorous mathematical proofs guaranteeing their accuracy, and illustrates and validates the methods with open-source software and numerical examples.

artificial intelligence, data mining, machine learning, (18 more...)

2202.001

Country: North America > United States > California (0.68)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Data Science > Data Mining (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)

arXiv.org Machine LearningJul-16-2020

Plots of the cumulative differences between observed and expected values of ordered Bernoulli variates

Tygert, Mark

Many predictions are probabilistic in nature; for example, a prediction could be for precipitation tomorrow, but with only a 30 percent chance. Given both the predictions and the actual outcomes, "reliability diagrams" (also known as "calibration plots") help detect and diagnose statistically significant discrepancies between the predictions and the outcomes. The canonical reliability diagrams are based on histogramming the observed and expected values of the predictions; several variants of the standard reliability diagrams propose to replace the hard histogram binning with soft kernel density estimation using smooth convolutional kernels of widths similar to the widths of the bins. In all cases, an important question naturally arises: which widths are best (or are multiple plots with different widths better)? Rather than answering this question, plots of the cumulative differences between the observed and expected values largely avoid the question, by displaying miscalibration directly as the slopes of secant lines for the graphs. Slope is easy to perceive with quantitative precision even when the constant offsets of the secant lines are irrelevant. There is no need to bin or perform kernel density estimation with a somewhat arbitrary kernel.

artificial intelligence, reliability diagram, reliability diagram reliability diagram, (13 more...)

2006.02504

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence (0.87)

arXiv.org Artificial IntelligenceJun-3-2020

An optimizable scalar objective value cannot be objective and should not be the sole objective

Kloumann, Isabel, Tygert, Mark

The morality of algorithms and their potential for bias and discrimination are important concerns. A popular approach to machine learning and artificial intelligence is via the numerical optimization of objective functions, and adapting such an approach to handle ethics could seem natural: with a hammer in hand, everything looks like a nail. The hammer of much artificial intelligence is the optimization of objective values, so some might like to treat morality solely through such objective functions. However, relying solely on the optimization of scalar objective values is fraught with unavoidable flaws when dealing with real people.

artificial intelligence, health & medicine, optimization problem, (17 more...)

2006.02577

Country:

North America > United States (1.00)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Government (1.00)
Law (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

arXiv.org Artificial IntelligenceJan-9-2020

Secure multiparty computations in floating-point arithmetic

Guo, Chuan, Hannun, Awni, Knott, Brian, van der Maaten, Laurens, Tygert, Mark, Zhu, Ruiyu

Secure multiparty computations enable the distribution of so-called shares of sensitive data to multiple parties such that the multiple parties can effectively process the data while being unable to glean much information about the data (at least not without collusion among all parties to put back together all the shares). Thus, the parties may conspire to send all their processed results to a trusted third party (perhaps the data provider) at the conclusion of the computations, with only the trusted third party being able to view the final results. Secure multiparty computations for privacy-preserving machine-learning turn out to be possible using solely standard floating-point arithmetic, at least with a carefully controlled leakage of information less than the loss of accuracy due to roundoff, all backed by rigorous mathematical proofs of worst-case bounds on information loss and numerical stability in finite-precision arithmetic. Numerical examples illustrate the high performance attained on commodity off-the-shelf hardware for generalized linear models, including ordinary linear least-squares regression, binary and multinomial logistic regression, probit regression, and Poisson regression.

artificial intelligence, machine learning, regression, (17 more...)

2001.03192

Country: North America > United States (1.00)

Genre:

Research Report > Experimental Study (0.50)
Research Report > New Finding (0.36)

Industry: Information Technology > Security & Privacy (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceFeb-12-2018

Regression-aware decompositions

Tygert, Mark

Linear least-squares regression with a "design" matrix A approximates a given matrix B via minimization of the spectral- or Frobenius-norm discrepancy ||AX-B|| over every conformingly sized matrix X. Another popular approximation is low-rank approximation via principal component analysis (PCA) -- which is essentially singular value decomposition (SVD) -- or interpolative decomposition (ID). Classically, PCA/SVD and ID operate solely with the matrix B being approximated, not supervised by any auxiliary matrix A. However, linear least-squares regression models can inform the ID, yielding regression-aware ID. As a bonus, this provides an interpretation as regression-aware PCA for a kind of canonical correlation analysis between A and B. The regression-aware decompositions effectively enable supervision to inform classical dimensionality reduction, which classically has been totally unsupervised. The regression-aware decompositions reveal the structure inherent in B that is relevant to regression against A.

artificial intelligence, decomposition, machine learning, (17 more...)

1710.04238

Genre: Research Report (0.40)

Industry: Energy > Power Industry (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Machine LearningSep-1-2017

Hierarchical loss for classification

Wu, Cinna, Tygert, Mark, LeCun, Yann

Failing to distinguish between a sheepdog and a skyscraper should be worse and penalized more than failing to distinguish between a sheepdog and a poodle; after all, sheepdogs and poodles are both breeds of dogs. However, existing metrics of failure (so-called "loss" or "win") used in textual or visual classification/recognition via neural networks seldom view a sheepdog as more similar to a poodle than to a skyscraper. We define a metric that, inter alia, can penalize failure to distinguish between a sheepdog and a skyscraper more than failure to distinguish between a sheepdog and a poodle. Unlike previously employed possibilities, this metric is based on an ultrametric tree associated with any given tree organization into a semantically meaningful hierarchy of a classifier's classes.

artificial intelligence, hierarchy, machine learning, (16 more...)

1709.01062

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.31)

arXiv.org Machine LearningFeb-8-2016

Poor starting points in machine learning

Tygert, Mark

Poor (even random) starting points for learning/training/optimization are common in machine learning. In many settings, the method of Robbins and Monro (online stochastic gradient descent) is known to be optimal for good starting points, but may not be optimal for poor starting points -- indeed, for poor starting points Nesterov acceleration can help during the initial iterations, even though Nesterov methods not designed for stochastic approximation could hurt during later iterations. The common practice of training with nontrivial minibatches enhances the advantage of Nesterov acceleration.

artificial intelligence, iteration, machine learning, (17 more...)

1602.02823

Country: Europe > France (0.14)

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.59)

arXiv.org Machine LearningDec-12-2015

A mathematical motivation for complex-valued convolutional networks

Bruna, Joan, Chintala, Soumith, LeCun, Yann, Piantino, Serkan, Szlam, Arthur, Tygert, Mark

A complex-valued convolutional network (convnet) implements the repeated application of the following composition of three operations, recursively applying the composition to an input vector of nonnegative real numbers: (1) convolution with complex-valued vectors followed by (2) taking the absolute value of every entry of the resulting vectors followed by (3) local averaging. For processing real-valued random vectors, complex-valued convnets can be viewed as "data-driven multiscale windowed power spectra," "data-driven multiscale windowed absolute spectra," "data-driven multiwavelet absolute values," or (in their most general configuration) "data-driven nonlinear multiwavelet packets." Indeed, complex-valued convnets can calculate multiscale windowed spectra when the convnet filters are windowed complex-valued exponentials. Standard real-valued convnets, using rectified linear units (ReLUs), sigmoidal (for example, logistic or tanh) nonlinearities, max. pooling, etc., do not obviously exhibit the same exact correspondence with data-driven wavelets (whereas for complex-valued convnets, the correspondence is much more than just a vague analogy). Courtesy of the exact correspondence, the remarkably rich and rigorous body of mathematical analysis for wavelets applies directly to (complex-valued) convnets.

artificial intelligence, convnet, neural network, (18 more...)

1503.03438

Country:

North America > United States > California (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)