AITopics | promise and pitfall

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsNov-20-2025, 22:01:44 GMT

The promises and pitfalls of Stochastic Gradient Langevin Dynamics

Stochastic Gradient Langevin Dynamics (SGLD) has emerged as a key MCMC algorithm for Bayesian learning from large scale datasets. While SGLD with decreasing step sizes converges weakly to the posterior distribution, the algorithm is often used with a constant step size in practice and has demonstrated spectacular successes in machine learning tasks. The current practice is to set the step size inversely proportional to N where N is the number of training samples. As N becomes large, we show that the SGLD algorithm has an invariant probability measure which significantly departs from the target posterior and behaves like as Stochastic Gradient Descent (SGD). This difference is inherently due to the high variance of the stochastic gradients. Several strategies have been suggested to reduce this effect; among them, SGLD Fixed Point (SGLDFP) uses carefully designed control variates to reduce the variance of the stochastic gradients. We show that SGLDFP gives approximate samples from the posterior distribution, with an accuracy comparable to the Langevin Monte Carlo (LMC) algorithm for a computational cost sublinear in the number of data points. We provide a detailed analysis of the Wasserstein distances between LMC, SGLD, SGLDFP and SGD and explicit expressions of the means and covariance matrices of their invariant distributions. Our findings are supported by limited numerical experiments.

name change, promise and pitfall, stochastic gradient langevin dynamic, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Wilson, Andrew Gordon, Hu, Zhiting, Salakhutdinov, Ruslan, Xing, Eric P.

Response to Promises and Pitfalls of Deep Kernel Learning

arXiv.org Machine LearningSep-26-2025

This note responds to "Promises and Pitfalls of Deep Kernel Learning" (Ober et al., 2021). The marginal likelihood of a Gaussian process can be compartmentalized into a data fit term and a complexity penalty. Ober et al. (2021) shows that if a kernel can be multiplied by a signal variance coefficient, then reparametrizing and substituting in the maximized value of this parameter sets a reparametrized data fit term to a fixed value. They use this finding to argue that the complexity penalty, a log determinant of the kernel matrix, then dominates in determining the other values of kernel hyperparameters, which can lead to data overcorrelation. By contrast, we show that the reparametrization in fact introduces another data-fit term which influences all other kernel hyperparameters. Thus, a balance between data fit and complexity still plays a significant role in determining kernel hyperparameters.

kernel, learning, marginal likelihood, (10 more...)

arXiv.org Machine Learning

2509.21228

Country:

North America > United States > New York (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Anzenberg, Eitan, Samajpati, Arunava, Chandrasekar, Sivasankaran, Kacholia, Varun

Evaluating the Promise and Pitfalls of LLMs in Hiring Decisions

arXiv.org Artificial IntelligenceJul-29-2025

The use of large language models (LLMs) in hiring promises to streamline candidate screening, but it also raises serious concerns regarding accuracy and algorithmic bias where sufficient safeguards are not in place. In this work, we benchmark several state-of-the-art foundational LLMs - including models from OpenAI, Anthropic, Google, Meta, and Deepseek, and compare them with our proprietary domain-specific hiring model (Match Score) for job candidate matching. We evaluate each model's predictive accuracy (ROC AUC, Precision-Recall AUC, F1-score) and fairness (impact ratio of cut-off analysis across declared gender, race, and intersectional subgroups). Our experiments on a dataset of roughly 10,000 real-world recent candidate-job pairs show that Match Score outperforms the general-purpose LLMs on accuracy (ROC AUC 0.85 vs 0.77) and achieves significantly more equitable outcomes across demographic groups. Notably, Match Score attains a minimum race-wise impact ratio of 0.957 (near-parity), versus 0.809 or lower for the best LLMs, (0.906 vs 0.773 for the intersectionals, respectively). We discuss why pretraining biases may cause LLMs with insufficient safeguards to propagate societal biases in hiring scenarios, whereas a bespoke supervised model can more effectively mitigate these biases. Our findings highlight the importance of domain-specific modeling and bias auditing when deploying AI in high-stakes domains such as hiring, and caution against relying on off-the-shelf LLMs for such tasks without extensive fairness safeguards. Furthermore, we show with empirical evidence that there shouldn't be a dichotomy between choosing accuracy and fairness in hiring: a well-designed algorithm can achieve both accuracy in hiring and fairness in outcomes.

large language model, machine learning, natural language, (20 more...)

2507.02087

Genre: Research Report > New Finding (1.00)

Industry:

Law (0.94)
Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsJan-19-2025, 17:43:45 GMT

Promises and Pitfalls of Threshold-based Auto-labeling

promise and pitfall, threshold-based auto-labeling, validation data, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsOct-7-2024, 08:01:03 GMT

Reviews: The promises and pitfalls of Stochastic Gradient Langevin Dynamics

Review after rebuttal: I thank the author(s) for their response. While I still believe that this paper is a minor increment beyond what has already been done on SGLD, I agree that the message might be useful for some. I also appreciate the effort the authors have made in improving the manuscript based on reviews' suggestions, particularly their efforts to include relevant numerical experiments to ML scenarios, and recommendations beyond the CV approach which has been studied to exhaustion and rarely applicable in practice. Based on this, I've adjusted my decision to marginally above threshold. Original review: In the paper "The promises and pitfalls of Stochastic Gradient Langevin Dynamics" the authors revisit the Stochastic Langevin Gradient Dynamics (SGLD) approach to approximately sampling from a probability distribution using stochastic gradients (specifically subsampling). The authors compare a number of different classes of approximate inference method, including SGLD, LMC (known by some as Unadjusted Langevin Algorithm or ULA) and Stochastic Gradient Langevin Dynamics Fixed Point (SGLDFP) -- the latter being a variant of SGLD with a control variate exploiting the unimodality of the distribution, similar to what has been presented in [3, 25 and others].

promise and pitfall, sgld, stochastic gradient langevin dynamic, (10 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Kapoor, Sayash, Henderson, Peter, Narayanan, Arvind

Promises and pitfalls of artificial intelligence for legal applications

arXiv.org Artificial IntelligenceJan-10-2024

Is AI set to redefine the legal profession? We argue that this claim is not supported by the current evidence. We dive into AI's increasingly prevalent roles in three types of legal tasks: information processing; tasks involving creativity, reasoning, or judgment; and predictions about the future. We find that the ease of evaluating legal applications varies greatly across legal tasks, based on the ease of identifying correct answers and the observability of information relevant to the task at hand. Tasks that would lead to the most significant changes to the legal profession are also the ones most prone to overoptimism about AI capabilities, as they are harder to evaluate. We make recommendations for better evaluation and deployment of AI in legal contexts.

application, evaluation, language model, (16 more...)

2402.01656

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(7 more...)

Genre: Research Report (0.41)

Industry:

Law > Litigation (1.00)
Law > Intellectual Property & Technology Law (0.93)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Applied AI (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.49)

Khan, Muhammad Fawad Akbar, Ramsdell, Max, Falor, Erik, Karimi, Hamid

Assessing the Promise and Pitfalls of ChatGPT for Automated Code Generation

arXiv.org Artificial IntelligenceNov-5-2023

This paper presents a comprehensive evaluation of the code generation capabilities of ChatGPT, a prominent large language model, compared to human programmers. A novel dataset of 131 code-generation prompts across 5 categories was curated to enable robust analysis. Code solutions were generated by both ChatGPT and humans for all prompts, resulting in 262 code samples. A meticulous manual assessment methodology prioritized evaluating correctness, comprehensibility, and security using 14 established code quality metrics. The key findings reveal ChatGPT's strengths in crafting concise, efficient code with advanced constructs, showcasing strengths in data analysis tasks (93.1% accuracy) but limitations in visual-graphical challenges. Comparative analysis with human code highlights ChatGPT's inclination towards modular design and superior error handling. Additionally, machine learning models effectively distinguished ChatGPT from human code with up to 88% accuracy, suggesting detectable coding style disparities. By providing profound insights into ChatGPT's code generation capabilities and limitations through quantitative metrics and qualitative analysis, this study makes valuable contributions toward advancing AI-based programming assistants. The curated dataset and methodology offer a robust foundation for future research in this nascent domain. All data and codes are available on https://github.com/DSAatUSU/ChatGPT-promises-and-pitfalls.

category, chatgpt, dataset, (13 more...)

2311.0264

Country:

North America > United States > Utah > Cache County > Logan (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > United States > California (0.04)

Genre: Research Report (1.00)

Industry: Education > Educational Setting (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Kristiadi, Agustinus, Immer, Alexander, Eschenhagen, Runa, Fortuin, Vincent

Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization

arXiv.org Artificial IntelligenceJul-10-2023

The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks. It is theoretically compelling since it can be seen as a Gaussian process posterior with the mean function given by the neural network's maximum-a-posteriori predictive function and the covariance function induced by the empirical neural tangent kernel. However, while its efficacy has been studied in large-scale tasks like image classification, it has not been studied in sequential decision-making problems like Bayesian optimization where Gaussian processes -- with simple mean functions and kernels such as the radial basis function -- are the de-facto surrogate models. In this work, we study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility. However, we also present some pitfalls that might arise and a potential problem with the LLA when the search space is unbounded.

artificial intelligence, bayesian inference, machine learning, (13 more...)

2304.08309

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(2 more...)

Genre: Research Report (1.00)

#artificialintelligenceSep-28-2022, 05:25:18 GMT

Three experts discuss Midjourney's promise and pitfalls

This summer, text-to-image AIs have captured the imagination of architects. The software is a powerful tool, but one that should be integrated into ongoing discussions of architectural image making, technology, representation, bias, education, and labor. AN gathered Kory Bieg, Shelby Doyle, and Andrew Kudless to discuss these issues. The Architect's Newspaper: To start, could you share how you've been using Midjourney and related AI platforms so far? What kinds of explorations have you done? What types of images have you been making? So far, it's been for open exploration. I'm trying to understand how to communicate with AI. On one hand, you can write a text and hope to get something that's related to the text.

architecture, knowledge, midjourney, (15 more...)

#artificialintelligence

Country:

North America > United States > Iowa (0.05)
North America > United States > Texas > Travis County > Austin (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Communications > Social Media (0.69)