AITopics

2501.1599

Country:

Europe > Spain > Galicia > Madrid (0.07)
North America > United States (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Greece (0.04)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.86)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)

arXiv.org Artificial IntelligenceJan-27-2025

Governing AI Beyond the Pretraining Frontier

Caputo, Nicholas A.

This year, jurisdictions worldwide, including the United States, the European Union, the United Kingdom, and China, are set to enact or revise laws governing frontier AI. Their efforts largely rely on the assumption that increasing model scale through pretraining is the path to more advanced AI capabilities. Yet growing evidence suggests that this "pretraining paradigm" may be hitting a wall and major AI companies are turning to alternative approaches, like inference-time "reasoning," to boost capabilities instead. This paradigm shift presents fundamental challenges for the frontier AI governance frameworks that target pretraining scale as a key bottleneck useful for monitoring, control, and exclusion, threatening to undermine this new legal order as it emerges. This essay seeks to identify these challenges and point to new paths forward for regulation. First, we examine the existing frontier AI regulatory regime and analyze some key traits and vulnerabilities. Second, we introduce the concept of the "pretraining frontier," the capabilities threshold made possible by scaling up pretraining alone, and demonstrate how it could make the regulatory field more diffuse and complex and lead to new forms of competition. Third, we lay out a regulatory approach that focuses on increasing transparency and leveraging new natural technical bottlenecks to effectively oversee changing frontier AI development while minimizing regulatory burdens and protecting fundamental rights. Our analysis provides concrete mechanisms for governing frontier AI systems across diverse technical paradigms, offering policymakers tools for addressing both current and future regulatory challenges in frontier AI.

frontier ai, paradigm, regulation, (15 more...)

2502.15719

Country:

Asia > China (0.89)
Europe > United Kingdom (0.66)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
(8 more...)

Genre: Research Report (0.64)

Industry:

Law > Statutes (1.00)
Information Technology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Regional Government > Europe Government (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)
(2 more...)

Neural Information Processing SystemsJan-26-2025, 14:16:53 GMT

Review for NeurIPS paper: Investigating Gender Bias in Language Models Using Causal Mediation Analysis

Only the reporting clause is examined while the that clause that contains the statement is ignored: In previous bias probing studies, the input content is the entire sentence with the complete context. However, in this paper, only the prompt part (reporting clause) is fed to the language model for analysis. Therefore, the proposed intervention setup effectively only focuses on word level bias probing. In the templates shown in Figure 8 in the Appendix, the verb "cry" or "drive" could embody implicit bias. However, under the current framework, such potential biases are not investigated. Therefore, the conclusions drawn in this study that gender bias effects are concentrated in specific components of the model may not generalize well when more complex syntactic and semantic structures and interactions are considered.

causal mediation analysis, language model, neurips paper, (3 more...)

Neural Information Processing Systems

Industry: Law > Alternative Dispute Resolution (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Neural Information Processing SystemsJan-26-2025, 14:16:46 GMT

Review for NeurIPS paper: Investigating Gender Bias in Language Models Using Causal Mediation Analysis

The paper studies the problem of bias in neural models where the proposed solution is based on causal mediation analysis. The focus of the paper is on pre-trained transformer language models, GPT-2. The proposed method of using mediation analysis for analyzing attention heads and neurons through interventions is novel and interesting, and can be generalized to other types of biases. The paper is well-written, and experiments are thorough.

causal mediation analysis, language model, neurips paper, (1 more...)

Neural Information Processing Systems

Industry: Law > Alternative Dispute Resolution (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.38)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.38)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.38)

ESGSenticNet: A Neurosymbolic Knowledge Base for Corporate Sustainability Analysis

Ong, Keane, Mao, Rui, Xing, Frank, Satapathy, Ranjan, Sulaeman, Johan, Cambria, Erik, Mengaldo, Gianmarco

Evaluating corporate sustainability performance is essential to drive sustainable business practices, amid the need for a more sustainable economy. However, this is hindered by the complexity and volume of corporate sustainability data (i.e. sustainability disclosures), not least by the effectiveness of the NLP tools used to analyse them. To this end, we identify three primary challenges - immateriality, complexity, and subjectivity, that exacerbate the difficulty of extracting insights from sustainability disclosures. To address these issues, we introduce ESGSenticNet, a publicly available knowledge base for sustainability analysis. ESGSenticNet is constructed from a neurosymbolic framework that integrates specialised concept parsing, GPT-4o inference, and semi-supervised label propagation, together with a hierarchical taxonomy. This approach culminates in a structured knowledge base of 44k knowledge triplets - ('halve carbon emission', supports, 'emissions control'), for effective sustainability analysis. Experiments indicate that ESGSenticNet, when deployed as a lexical method, more effectively captures relevant and actionable sustainability information from sustainability disclosures compared to state of the art baselines. Besides capturing a high number of unique ESG topic terms, ESGSenticNet outperforms baselines on the ESG relatedness and ESG action orientation of these terms by 26% and 31% respectively. These metrics describe the extent to which topic terms are related to ESG, and depict an action toward ESG. Moreover, when deployed as a lexical method, ESGSenticNet does not require any training, possessing a key advantage in its simplicity for non-technical stakeholders.

large language model, machine learning, natural language, (22 more...)

2501.1572

Country: Asia > India (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Water & Waste Management > Solid Waste Management (1.00)
Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Stanovsky, Gabriel, Keydar, Renana, Perl, Gadi, Habba, Eliya

Beyond Benchmarks: On The False Promise of AI Regulation

The rapid advancement of artificial intelligence (AI) systems in critical domains like healthcare, justice, and social services has sparked numerous regulatory initiatives aimed at ensuring their safe deployment. Current regulatory frameworks, exemplified by recent US and EU efforts, primarily focus on procedural guidelines while presuming that scientific benchmarking can effectively validate AI safety, similar to how crash tests verify vehicle safety or clinical trials validate drug efficacy. However, this approach fundamentally misunderstands the unique technical challenges posed by modern AI systems. Through systematic analysis of successful technology regulation case studies, we demonstrate that effective scientific regulation requires a causal theory linking observable test outcomes to future performance - for instance, how a vehicle's crash resistance at one speed predicts its safety at lower speeds. We show that deep learning models, which learn complex statistical patterns from training data without explicit causal mechanisms, preclude such guarantees. This limitation renders traditional regulatory approaches inadequate for ensuring AI safety. Moving forward, we call for regulators to reckon with this limitation, and propose a preliminary two-tiered regulatory framework that acknowledges these constraints: mandating human oversight for high-risk applications while developing appropriate risk communication strategies for lower-risk uses. Our findings highlight the urgent need to reconsider fundamental assumptions in AI regulation and suggest a concrete path forward for policymakers and researchers.

artificial intelligence, machine learning, natural language, (15 more...)

2501.15693

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Washington > King County > Seattle (0.04)
(9 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.66)

Industry:

Law > Statutes (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > Europe Government (0.68)
Government > Regional Government > North America Government > United States Government > FDA (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Be Intentional About Fairness!: Fairness, Size, and Multiplicity in the Rashomon Set

Dai, Gordon, Ravishankar, Pavan, Yuan, Rachel, Neill, Daniel B., Black, Emily

This phenomenon--often called the Rashomon effect [7], predictive multiplicity [22], or model multiplicity [5]--has wide-ranging implications for both understanding and improving fairness, as these equally accurate models often differ substantially in other properties such as fairness [21, 28] or model simplicity [29-31]. As prior work has pointed out, this multiplicity of models can be viewed as both a fairness opportunity and a concern [5, 10]. On the positive side, legal scholarship has pointed to the fact that model multiplicity is relevant to how to interpret and enforce U.S. anti-discrimination law, and specifically, can strengthen the disparate impact doctrine to more effectively combat algorithmic discrimination [3]. In a recent paper, Black et al. [3] suggest that the phenomenon of model multiplicity could support a reading of the disparate impact doctrine that requires companies to proactively search the set of equally accurate models for less discriminatory alternatives that have equivalent accuracy to a base model deemed acceptable for deployment from a model performance perspective. On the negative side, several scholars have pointed out that facially similar models, with equivalent accuracy but differences in their individual predictions, can suggest that some model decisions are arbitrary since they seem to be made on the basis of model choice that does not impact performance (e.g., a <1% change in a model's training set accuracy) [2, 17, 22]. This arbitrariness can impact model explanations and recourse as well: individuals with decisions that are unstable across small model changes may not receive reliable explanations for their model outcome, or ways to change it [4, 6, 25]. Further, if there is a group-based asymmetry of arbitrariness-e.g., if female loan applicants have more arbitrariness in their decisions than male loan applicants-- this could lead to a group-based equity concern in and of itself. Understanding the extent of the benefits and risks of model multiplicity relies upon an understanding of the properties of the Rashomon set, or the set of approximately equally accurate models for a given prediction task, i.e., equally accurate up to

artificial intelligence, machine learning, rashomon, (16 more...)

2501.15634

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Law > Civil Rights & Constitutional Law (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Rahman, Anika, Khatun, Mst. Taskia

Assessing and Predicting Air Pollution in Asia: A Regional and Temporal Study (2018-2023)

This study analyzes and predicts air pollution in Asia, focusing on PM 2.5 levels from 2018 to 2023 across five regions: Central, East, South, Southeast, and West Asia. South Asia emerged as the most polluted region, with Bangladesh, India, and Pakistan consistently having the highest PM 2.5 levels and death rates, especially in Nepal, Pakistan, and India. East Asia showed the lowest pollution levels. K-means clustering categorized countries into high, moderate, and low pollution groups. The ARIMA model effectively predicted 2023 PM 2.5 levels (MAE: 3.99, MSE: 33.80, RMSE: 5.81, R: 0.86). The findings emphasize the need for targeted interventions to address severe pollution and health risks in South Asia.

artificial intelligence, deep learning, machine learning, (18 more...)

2501.1559

Country:

Asia > India (0.46)
Asia > Pakistan (0.45)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.27)
(38 more...)

Genre: Research Report (1.00)

Industry:

Energy (0.93)
Health & Medicine > Public Health (0.73)
Law > Environmental Law (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

The Potential of Large Language Models in Supply Chain Management: Advancing Decision-Making, Efficiency, and Innovation

Aghaei, Raha, Kiaei, Ali A., Boush, Mahnaz, Vahidi, Javad, Barzegar, Zeynab, Rofoosheh, Mahan

The integration of large language models (LLMs) into supply chain management (SCM) is revolutionizing the industry by improving decision-making, predictive analytics, and operational efficiency. This white paper explores the transformative impact of LLMs on various SCM functions, including demand forecasting, inventory management, supplier relationship management, and logistics optimization. By leveraging advanced data analytics and real-time insights, LLMs enable organizations to optimize resources, reduce costs, and improve responsiveness to market changes. Key findings highlight the benefits of integrating LLMs with emerging technologies such as IoT, blockchain, and robotics, which together create smarter and more autonomous supply chains. Ethical considerations, including bias mitigation and data protection, are taken into account to ensure fair and transparent AI practices. In addition, the paper discusses the need to educate the workforce on how to manage new AI-driven processes and the long-term strategic benefits of adopting LLMs. Strategic recommendations for SCM professionals include investing in high-quality data management, promoting cross-functional collaboration, and aligning LLM initiatives with overall business goals. The findings highlight the potential of LLMs to drive innovation, sustainability, and competitive advantage in the ever-changing supply chain management landscape.

large language model, machine learning, natural language, (20 more...)

2501.15411

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
North America > United States > California (0.04)

Genre:

Research Report > Promising Solution (0.46)
Overview > Innovation (0.46)

Industry:

Transportation > Freight & Logistics Services (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Vashistha, Ritwik, Farahi, Arya

I-trustworthy Models. A framework for trustworthiness evaluation of probabilistic classifiers

arXiv.org Machine LearningJan-26-2025

As probabilistic models continue to permeate various facets of our society and contribute to scientific advancements, it becomes a necessity to go beyond traditional metrics such as predictive accuracy and error rates and assess their trustworthiness. Grounded in the competence-based theory of trust, this work formalizes I-trustworthy framework -- a novel framework for assessing the trustworthiness of probabilistic classifiers for inference tasks by linking local calibration to trustworthiness. To assess I-trustworthiness, we use the local calibration error (LCE) and develop a method of hypothesis-testing. This method utilizes a kernel-based test statistic, Kernel Local Calibration Error (KLCE), to test local calibration of a probabilistic classifier. This study provides theoretical guarantees by offering convergence bounds for an unbiased estimator of KLCE. Additionally, we present a diagnostic tool designed to identify and measure biases in cases of miscalibration. The effectiveness of the proposed test statistic is demonstrated through its application to both simulated and real-world datasets. Finally, LCE of related recalibration methods is studied, and we provide evidence of insufficiency of existing methods to achieve I-trustworthiness.

calibration, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2501.15617

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Michigan > Genesee County > Flint (0.04)
North America > United States > Florida > Broward County (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.46)

Industry:

Law (0.94)
Health & Medicine > Therapeutic Area (0.46)
Government > Regional Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)