AITopics | Taufiq, Muhammad Faaiz

Collaborating Authors

Taufiq, Muhammad Faaiz

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Uncertainty Quantification and Causal Considerations for Off-Policy Decision Making

Taufiq, Muhammad Faaiz

arXiv.org Machine LearningFeb-9-2025

Off-policy evaluation (OPE) is a critical challenge in robust decision-making that seeks to assess the performance of a new policy using data collected under a different policy. However, the existing OPE methodologies suffer from several limitations arising from statistical uncertainty as well as causal considerations. In this thesis, we address these limitations by presenting three different works. Firstly, we consider the problem of high variance in the importance-sampling-based OPE estimators. We introduce the Marginal Ratio (MR) estimator, a novel OPE method that reduces variance by focusing on the marginal distribution of outcomes rather than direct policy shifts, improving robustness in contextual bandits. Next, we propose Conformal Off-Policy Prediction (COPP), a principled approach for uncertainty quantification in OPE that provides finite-sample predictive intervals, ensuring robust decision-making in risk-sensitive applications. Finally, we address causal unidentifiability in off-policy decision-making by developing novel bounds for sequential decision settings, which remain valid under arbitrary unmeasured confounding. We apply these bounds to assess the reliability of digital twin models, introducing a falsification framework to identify scenarios where model predictions diverge from real-world behaviour. Our contributions provide new insights into robust decision-making under uncertainty and establish principled methods for evaluating policies in both static and dynamic settings.

data mining, machine learning, survey article, (20 more...)

arXiv.org Machine Learning

2502.06011

Country: North America > United States > New York > New York County > New York City (0.14)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
(3 more...)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.92)
(3 more...)

Add feedback

Understanding Chain-of-Thought in LLMs through Information Theory

Ton, Jean-Francois, Taufiq, Muhammad Faaiz, Liu, Yang

arXiv.org Artificial IntelligenceNov-18-2024

Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks, from complex reasoning to code generation [Chowdhery et al., 2024, OpenAI et al., 2024, Bubeck et al., 2023, Anil et al., 2023]. Many of these advances can be attributed to Chain-of-Thought (CoT) reasoning [Wei et al., 2024, Nye et al., 2021, Li et al., 2024], which involves breaking down complex problems into a series of intermediate steps, mirroring human-like reasoning processes. The success of CoT reasoning, particularly in domains such as mathematics, logic, and multi-step decision-making, has led researchers and developers to incorporate CoT-like features directly into model training, i.e. the FLAN family of models [Chung et al., 2022, Wei et al., 2022]. This paper introduces a new formal framework for analyzing CoT in LLMs. We provide a rigorous method grounded in information theory, to evaluate the quality of each step in a model's reasoning process, thus offering insights beyond simple accuracy metrics to identify areas for improvement.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2411.11984

Country: North America > United States (0.28)

Genre:

Workflow (1.00)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Dataset Fairness: Achievable Fairness on Your Data With Utility Guarantees

Taufiq, Muhammad Faaiz, Ton, Jean-Francois, Liu, Yang

arXiv.org Machine LearningFeb-26-2024

One of the key challenges in fairness for machine learning is to train models that minimize the disparity across various sensitive groups such as race or gender [Caton and Haas, 2020, Ustun et al., 2019, Celis et al., 2019]. This often comes at the cost of reduced model accuracy, a phenomenon termed accuracyfairness trade-off in literature [Valdivia et al., 2021, Martinez et al., 2020]. This trade-off can differ significantly across datasets in practice, depending on factors such as dataset biases, imbalances etc. [Agarwal et al., 2018, Bendekgey and Sudderth, 2021, Celis et al., 2021]. To demonstrate how these trade-offs are inherently dataset-dependent, let's consider a simple example involving two distinct crime datasets. Dataset A has records from a community where crime rates are uniformly distributed across all racial groups, whereas Dataset B comes from a community where historical factors have resulted in a disproportionate crime rate among a specific racial group. Intuitively, training models which are racially agnostic is more challenging for Dataset B, due to the unequal distribution of crime rates across racial groups, and will result in a greater loss in model accuracy as compared to Dataset A. This example underscores that setting a uniform fairness requirement across diverse datasets (such as requiring the fairness violation metric to be below 10% for both datasets), while also adhering to essential accuracy benchmarks is impractical.

artificial intelligence, machine learning, trade-off, (16 more...)

arXiv.org Machine Learning

2402.17106

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.93)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.88)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Marginal Density Ratio for Off-Policy Evaluation in Contextual Bandits

Taufiq, Muhammad Faaiz, Doucet, Arnaud, Cornish, Rob, Ton, Jean-Francois

arXiv.org Machine LearningDec-3-2023

Off-Policy Evaluation (OPE) in contextual bandits is crucial for assessing new policies using existing data without costly experimentation. However, current OPE methods, such as Inverse Probability Weighting (IPW) and Doubly Robust (DR) estimators, suffer from high variance, particularly in cases of low overlap between target and behavior policies or large action and context spaces. In this paper, we introduce a new OPE estimator for contextual bandits, the Marginal Ratio (MR) estimator, which focuses on the shift in the marginal distribution of outcomes $Y$ instead of the policies themselves. Through rigorous theoretical analysis, we demonstrate the benefits of the MR estimator compared to conventional methods like IPW and DR in terms of variance reduction. Additionally, we establish a connection between the MR estimator and the state-of-the-art Marginalized Inverse Propensity Score (MIPS) estimator, proving that MR achieves lower variance among a generalized family of MIPS estimators. We further illustrate the utility of the MR estimator in causal inference settings, where it exhibits enhanced performance in estimating Average Treatment Effects (ATE). Our experiments on synthetic and real-world datasets corroborate our theoretical findings and highlight the practical advantages of the MR estimator in OPE for contextual bandits.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2312.01457

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Causal Falsification of Digital Twins

Cornish, Rob, Taufiq, Muhammad Faaiz, Doucet, Arnaud, Holmes, Chris

arXiv.org Artificial IntelligenceNov-2-2023

Digital twins are virtual systems designed to predict how a real-world process will evolve in response to interventions. This modelling paradigm holds substantial promise in many applications, but rigorous procedures for assessing their accuracy are essential for safety-critical settings. We consider how to assess the accuracy of a digital twin using real-world data. We formulate this as causal inference problem, which leads to a precise definition of what it means for a twin to be "correct" appropriate for many applications. Unfortunately, fundamental results from causal inference mean observational data cannot be used to certify that a twin is correct in this sense unless potentially tenuous assumptions are made, such as that the data are unconfounded. To avoid these assumptions, we propose instead to find situations in which the twin is not correct, and present a general-purpose statistical procedure for doing so. Our approach yields reliable and actionable information about the twin under only the assumption of an i.i.d. dataset of observational trajectories, and remains sound even if the data are confounded. We apply our methodology to a large-scale, real-world case study involving sepsis modelling within the Pulse Physiology Engine, which we assess using the MIMIC-III dataset of ICU patients.

artificial intelligence, confidence interval, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2301.0721

Country:

North America > United States (0.27)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

Liu, Yang, Yao, Yuanshun, Ton, Jean-Francois, Zhang, Xiaoying, Guo, Ruocheng, Cheng, Hao, Klochkov, Yegor, Taufiq, Muhammad Faaiz, Li, Hang

arXiv.org Artificial IntelligenceAug-10-2023

Ensuring alignment, which refers to making models behave in accordance with human intentions [1,2], has become a critical task before deploying large language models (LLMs) in real-world applications. For instance, OpenAI devoted six months to iteratively aligning GPT-4 before its release [3]. However, a major challenge faced by practitioners is the lack of clear guidance on evaluating whether LLM outputs align with social norms, values, and regulations. This obstacle hinders systematic iteration and deployment of LLMs. To address this issue, this paper presents a comprehensive survey of key dimensions that are crucial to consider when assessing LLM trustworthiness. The survey covers seven major categories of LLM trustworthiness: reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness. Each major category is further divided into several sub-categories, resulting in a total of 29 sub-categories. Additionally, a subset of 8 sub-categories is selected for further investigation, where corresponding measurement studies are designed and conducted on several widely-used LLMs. The measurement results indicate that, in general, more aligned models tend to perform better in terms of overall trustworthiness. However, the effectiveness of alignment varies across the different trustworthiness categories considered. This highlights the importance of conducting more fine-grained analyses, testing, and making continuous improvements on LLM alignment. By shedding light on these key dimensions of LLM trustworthiness, this paper aims to provide valuable insights and guidance to practitioners in the field. Understanding and addressing these concerns will be crucial in achieving reliable and ethically sound deployment of LLMs in various applications.

artificial intelligence, dutch-belgian television sery, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2308.05374

Country:

Europe (1.00)
Asia (0.67)
North America > United States > California (0.13)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Leisure & Entertainment (1.00)
Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
(10 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Manifold Restricted Interventional Shapley Values

Taufiq, Muhammad Faaiz, Blöbaum, Patrick, Minorics, Lenon

arXiv.org Artificial IntelligenceFeb-25-2023

Shapley values are model-agnostic methods for explaining model predictions. Many commonly used methods of computing Shapley values, known as off-manifold methods, rely on model evaluations on out-of-distribution input samples. Consequently, explanations obtained are sensitive to model behaviour outside the data distribution, which may be irrelevant for all practical purposes. While on-manifold methods have been proposed which do not suffer from this problem, we show that such methods are overly dependent on the input data distribution, and therefore result in unintuitive and misleading explanations. To circumvent these problems, we propose ManifoldShap, which respects the model's domain of validity by restricting model evaluations to the data manifold. We show, theoretically and empirically, that ManifoldShap is robust to off-manifold perturbations of the model and leads to more accurate and intuitive explanations than existing state-of-the-art Shapley methods.

artificial intelligence, machine learning, shapley value, (17 more...)

arXiv.org Artificial Intelligence

2301.04041

Country:

North America > United States (0.68)
Europe (0.67)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback