AITopics | least-square loss

Collaborating Authors

least-square loss

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Residual-as-Teacher: Mitigating Bias Propagation in Student--Teacher Estimation

Yamamoto, Kakei, Wainwright, Martin J.

arXiv.org Machine LearningMar-27-2026

We study statistical estimation in a student--teacher setting, where predictions from a pre-trained teacher are used to guide a student model. A standard approach is to train the student to directly match the teacher's outputs, which we refer to as student soft matching (SM). This approach directly propagates any systematic bias or mis-specification present in the teacher, thereby degrading the student's predictions. We propose and analyze an alternative scheme, known as residual-as-teacher (RaT), in which the teacher is used to estimate residuals in the student's predictions. Our analysis shows how the student can thereby emulate a proximal gradient scheme for solving an oracle optimization problem, and this provably reduces the effect of teacher bias. For general student--teacher pairs, we establish non-asymptotic excess risk bounds for any RaT fixed point, along with convergence guarantees for the student-teacher iterative scheme. For kernel-based student--teacher pairs, we prove a sharp separation: the RaT method achieves the minimax-optimal rate, while the SM method incurs constant prediction error for any sample size. Experiments on both synthetic data and ImageNette classification under covariate shift corroborate our theoretical findings.

artificial intelligence, data mining, machine learning, (21 more...)

arXiv.org Machine Learning

2603.25466

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Education (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Reviews: DAGs with NO TEARS: Continuous Optimization for Structure Learning

Neural Information Processing SystemsOct-8-2024, 07:47:47 GMT

The authors study the problem of structure learning for Bayesian networks. The conventional methods for this task include the constraint-based methods or the score-based methods which involve optimizing a discrete score function over the set of DAGs with a combinatorial constraint. Unlike the existing approaches, the authors propose formulating the problem as a continuous optimization problem over real matrices, which performs a global search, and can be solved using standard numerical algorithms. The main idea in this work is using a smooth function for expressing an equality constraint to force acyclicity on the estimated structure. The paper is very well written and enjoyable to read.

continuous optimization, graph, score function, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.92)

Add feedback

On the Role of Entropy-based Loss for Learning Causal Structures with Continuous Optimization

Cai, Ruichu, Chen, Weilin, Qiao, Jie, Hao, Zhifeng

arXiv.org Artificial IntelligenceJun-17-2021

Causal discovery from observational data is an important but challenging task in many scientific fields. Recently, NOTEARS [Zheng et al., 2018] formulates the causal structure learning problem as a continuous optimization problem using least-square loss with an acyclicity constraint. Though the least-square loss function is well justified under the standard Gaussian noise assumption, it is limited if the assumption does not hold. In this work, we theoretically show that the violation of the Gaussian noise assumption will hinder the causal direction identification, making the causal orientation fully determined by the causal strength as well as the variances of noises in the linear case and the noises of strong non-Gaussianity in the nonlinear case. Consequently, we propose a more general entropy-based loss that is theoretically consistent with the likelihood score under any noise distribution. We run extensive empirical evaluations on both synthetic data and real-world data to validate the effectiveness of the proposed method and show that our method achieves the best in Structure Hamming Distance, False Discovery Rate, and True Positive Rate matrices.

additive noise model, causal direction, least-square loss, (10 more...)

arXiv.org Artificial Intelligence

2106.02835

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback