AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

Machine learning in front of statistical methods for prediction spread SARS-CoV-2 in Colombia

Estupiñán, A., Acuña, J., Rodriguez, A., Ayala, A., Estupiñán, C., Gonzalez, Ramon E. R., Triana-Camacho, D. A., Cristiano-Rodríguez, K. L., Morales, Carlos Andrés Collazos

arXiv.org Artificial IntelligenceSep-27-2022

Previous analysis has been performed on the daily number of cases, deaths, infected people, and people who were exposed to the virus, all of them in a timeline of 550 days. Moreover, it has made the fitting of infection spread detailing the most efficient and optimal methods with lower propagation error and the presence of statistical biases. Finally, four different prevention scenarios were proposed to evaluate the ratio of each one of the parameters related to the disease.

artificial intelligence, infection, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2208.0591

Country:

South America > Colombia > Santander Department > Bucaramanga (0.05)
South America > Brazil > Pernambuco > Recife (0.04)
South America > Colombia > Bogotá D.C. > Bogotá (0.04)
(5 more...)

Genre:

Research Report (0.64)
Workflow (0.47)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

On the inability of Gaussian process regression to optimally learn compositional functions

Giordano, Matteo, Ray, Kolyan, Schmidt-Hieber, Johannes

arXiv.org Artificial IntelligenceSep-27-2022

We rigorously prove that deep Gaussian process priors can outperform Gaussian process priors if the target function has a compositional structure. To this end, we study information-theoretic lower bounds for posterior contraction rates for Gaussian process regression in a continuous regression model. We show that if the true function is a generalized additive function, then the posterior based on any mean-zero Gaussian process can only recover the truth at a rate that is strictly slower than the minimax rate by a factor that is polynomially suboptimal in the sample size $n$.

artificial intelligence, machine learning, posterior mean, (17 more...)

arXiv.org Artificial Intelligence

2205.07764

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Towards Human-Compatible XAI: Explaining Data Differentials with Concept Induction over Background Knowledge

Widmer, Cara, Sarker, Md Kamruzzaman, Nadella, Srikanth, Fiechter, Joshua, Juvina, Ion, Minnery, Brandon, Hitzler, Pascal, Schwartz, Joshua, Raymer, Michael

arXiv.org Artificial IntelligenceSep-27-2022

Concept induction, which is based on formal logical reasoning over description logics, has been used in ontology engineering in order to create ontology (TBox) axioms from the base data (ABox) graph. In this paper, we show that it can also be used to explain data differentials, for example in the context of Explainable AI (XAI), and we show that it can in fact be done in a way that is meaningful to a human observer.

explanation, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2209.1371

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Oceania > Australia (0.04)
(14 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.94)

Industry:

Transportation (0.46)
Leisure & Entertainment (0.46)
Retail (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Latent Variable Method Demonstrator -- Software for Understanding Multivariate Data Analytics Algorithms

Schaeffer, Joachim, Braatz, Richard

arXiv.org Artificial IntelligenceSep-26-2022

The ever-increasing quantity of multivariate process data is driving a need for skilled engineers to analyze, interpret, and build models from such data. Multivariate data analytics relies heavily on linear algebra, optimization, and statistics and can be challenging for students to understand given that most curricula do not have strong coverage in the latter three topics. This article describes interactive software - the Latent Variable Demonstrator (LAVADE) - for teaching, learning, and understanding latent variable methods. In this software, users can interactively compare latent variable methods such as Partial Least Squares (PLS), and Principal Component Regression (PCR) with other regression methods such as Least Absolute Shrinkage and Selection Operator (lasso), Ridge Regression (RR), and Elastic Net (EN). LAVADE helps to build intuition on choosing appropriate methods, hyperparameter tuning, and model coefficient interpretation, fostering a conceptual understanding of the algorithms' differences. The software contains a data generation method and three chemical process datasets, allowing for comparing results of datasets with different levels of complexity. LAVADE is released as open-source software so that others can apply and advance the tool for use in teaching or research.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.compchemeng.2022.108014

2205.08132

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.72)

Add feedback

On Extending Amdahl's law to Learn Computer Performance

Poolla, Chaitanya, Saxena, Rahul

arXiv.org Artificial IntelligenceSep-26-2022

The problem of learning parallel computer performance is investigated in the context of multicore processors. Given a fixed workload, the effect of varying system configuration on performance is sought. Conventionally, the performance speedup due to a single resource enhancement is formulated using Amdahl's law. However, in case of multiple configurable resources the conventional formulation results in several disconnected speedup equations that cannot be combined together to determine the overall speedup. To solve this problem, we propose to (1) extend Amdahl's law to accommodate multiple configurable resources into the overall speedup equation, and (2) transform the speedup equation into a multivariable regression problem suitable for machine learning. Using experimental data from fifty-eight tests spanning two benchmarks (SPECCPU 2017 and PCMark 10) and four hardware platforms (Intel Xeon 8180M, AMD EPYC 7702P, Intel CoffeeLake 8700K, and AMD Ryzen 3900X), analytical models are developed and cross-validated. Findings indicate that in most cases, the models result in an average cross-validated accuracy higher than 95%, thereby validating the proposed extension of Amdahl's law. The proposed methodology enables rapid generation of multivariable analytical models to support future industrial development, optimization, and simulation needs.

amdahl, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2110.07822

Country:

North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Santa Clara (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

Sampling Constrained Continuous Probability Distributions: A Review

Lan, Shiwei, Kang, Lulu

arXiv.org Machine LearningSep-25-2022

The problem of sampling constrained continuous distributions has frequently appeared in many machine/statistical learning models. Many Monte Carlo Markov Chain (MCMC) sampling methods have been adapted to handle different types of constraints on the random variables. Among these methods, Hamilton Monte Carlo (HMC) and the related approaches have shown significant advantages in terms of computational efficiency compared to other counterparts. In this article, we first review HMC and some extended sampling methods, and then we concretely explain three constrained HMC-based sampling methods, reflection, reformulation, and spherical HMC. For illustration, we apply these methods to solve three well-known constrained sampling problems, truncated multivariate normal distributions, Bayesian regularized regression, and nonparametric density estimation. In this review, we also connect constrained sampling with another similar problem in the statistical design of experiments of constrained design space. Keywords: constrained sampling; Hamilton Monte Carlo; Riemannian Monte Carlo; regularized regression; truncated multivariate Gaussian.

artificial intelligence, constraint, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1002/wics.1608

2209.12403

Country:

North America > United States > New York (0.04)
North America > United States > Illinois (0.04)
North America > United States > Arizona (0.04)
(3 more...)

Genre: Overview (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future

Klie, Jan-Christoph, Webber, Bonnie, Gurevych, Iryna

arXiv.org Artificial IntelligenceSep-25-2022

Annotated data is an essential ingredient in natural language processing for training and evaluating machine learning models. It is therefore very desirable for the annotations to be of high quality. Recent work, however, has shown that several popular datasets contain a surprising amount of annotation errors or inconsistencies. To alleviate this issue, many methods for annotation error detection have been devised over the years. While researchers show that their approaches work well on their newly introduced datasets, they rarely compare their methods to previous work or on the same datasets. This raises strong concerns on methods' general performance and makes it difficult to asses their strengths and weaknesses. We therefore reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets for text classification as well as token and span labeling. In addition, we define a uniform evaluation setup including a new formalization of the annotation error detection task, evaluation protocol and general best practices. To facilitate future research and reproducibility, we release our datasets and implementations in an easy-to-use and open source software package.

machine learning, natural language, text classification, (18 more...)

arXiv.org Artificial Intelligence

2206.0228

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Greater London > London > Wimbledon (0.04)
(48 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education (1.00)
Leisure & Entertainment > Sports (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.67)
(2 more...)

Add feedback

Interpretable Machine Learning Models for Modal Split Prediction in Transportation Systems

Brenner, Aron, Wu, Manxi, Amin, Saurabh

arXiv.org Artificial IntelligenceSep-24-2022

Modal split prediction in transportation networks has the potential to support network operators in managing traffic congestion and improving transit service reliability. We focus on the problem of hourly prediction of the fraction of travelers choosing one mode of transportation over another using high-dimensional travel time data. We use logistic regression as base model and employ various regularization techniques for variable selection to prevent overfitting and resolve multicollinearity issues. Importantly, we interpret the prediction accuracy results with respect to the inherent variability of modal splits and travelers' aggregate responsiveness to changes in travel time. By visualizing model parameters, we conclude that the subset of segments found important for predictive accuracy changes from hour-to-hour and include segments that are topologically central and/or highly congested. We apply our approach to the San Francisco Bay Area freeway and rapid transit network and demonstrate superior prediction accuracy and interpretability of our method compared to pre-specified variable selection methods.

artificial intelligence, machine learning, travel time, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ITSC55140.2022.9921938

2203.14191

Country:

North America > United States > California > San Francisco County > San Francisco (0.25)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.24)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.49)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

Communication-Efficient {Federated} Learning Using Censored Heavy Ball Descent

Chen, Yicheng, Blum, Rick S., Sadler, Brian M.

arXiv.org Artificial IntelligenceSep-24-2022

Distributed machine learning enables scalability and computational offloading, but requires significant levels of communication. Consequently, communication efficiency in distributed learning settings is an important consideration, especially when the communications are wireless and battery-driven devices are employed. In this paper we develop a censoring-based heavy ball (CHB) method for distributed learning in a server-worker architecture. Each worker self-censors unless its local gradient is sufficiently different from the previously transmitted one. The significant practical advantages of the HB method for learning problems are well known, but the question of reducing communications has not been addressed. CHB takes advantage of the HB smoothing to eliminate reporting small changes, and provably achieves a linear convergence rate equivalent to that of the classical HB method for smooth and strongly convex objective functions. The convergence guarantee of CHB is theoretically justified for both convex and nonconvex cases. In addition we prove that, under some conditions, at least half of all communications can be eliminated without any impact on convergence rate. Extensive numerical results validate the communication efficiency of CHB on both synthetic and real datasets, for convex, nonconvex, and nondifferentiable cases. Given a target accuracy, CHB can significantly reduce the number of communications compared to existing algorithms, achieving the same accuracy without slowing down the optimization process.

artificial intelligence, communication, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2209.11944

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Washington > King County > Redmond (0.04)
North America > United States > Pennsylvania > Northampton County > Bethlehem (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Law > Civil Rights & Constitutional Law (0.75)
Leisure & Entertainment > Sports > Tennis (0.61)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.33)

Add feedback

Dead or Murdered? Predicting Responsibility Perception in Femicide News Reports

Minnema, Gosse, Gemelli, Sara, Zanchi, Chiara, Caselli, Tommaso, Nissim, Malvina

arXiv.org Artificial IntelligenceSep-24-2022

Different linguistic expressions can conceptualize the same event from different viewpoints by emphasizing certain participants over others. Here, we investigate a case where this has social consequences: how do linguistic expressions of gender-based violence (GBV) influence who we perceive as responsible? We build on previous psycholinguistic research in this area and conduct a large-scale perception survey of GBV descriptions automatically extracted from a corpus of Italian newspapers. We then train regression models that predict the salience of GBV participants with respect to different dimensions of perceived responsibility. Our best model (fine-tuned BERT) shows solid overall performance, with large differences between dimensions and participants: salient _focus_ is more predictable than salient _blame_, and perpetrators' salience is more predictable than victims' salience. Experiments with ridge regression models using different representations show that features based on linguistic theory similarly to word-based features. Overall, we show that different linguistic choices do trigger different perceptions of responsibility, and that such perceptions can be modelled automatically. This work can be a core instrument to raise awareness of the consequences of different perspectivizations in the general public and in news producers alike.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2209.1203

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy (0.05)
Africa > Togo > Maritime Region > Lome (0.04)
(5 more...)

Genre: Questionnaire & Opinion Survey (1.00)

Industry:

Law > Criminal Law (0.88)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.88)
Education > Educational Setting > Higher Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback