Goto

Collaborating Authors

 confirmation measure


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

Heavy review Summary: The paper concerns multi-class problems with a large number of classes. It introduces a novel label tree classifier that learns and predicts in logarithmic time in the number of classes. Theoretical guarantees in terms of a boosting-like theorem have been proven. Moreover, not only node classifiers, but also the structure of the tree is trained online. Additionally, the authors show a simple subtree swapping procedure that ensures proper balancing of the tree.


Defeaters and Eliminative Argumentation in Assurance 2.0

Bloomfield, Robin, Netkachova, Kate, Rushby, John

arXiv.org Artificial Intelligence

A traditional assurance case employs a positive argument in which reasoning steps, grounded on evidence and assumptions, sustain a top claim that has external significance. Human judgement is required to check the evidence, the assumptions, and the narrative justifications for the reasoning steps; if all are assessed good, then the top claim can be accepted. A valid concern about this process is that human judgement is fallible and prone to confirmation bias. The best defense against this concern is vigorous and skeptical debate and discussion in the manner of a dialectic or Socratic dialog. There is merit in recording aspects of this discussion for the benefit of subsequent developers and assessors. Defeaters are a means doing this: they express doubts about aspects of the argument and can be developed into subcases that confirm or refute the doubts, and can record them as documentation to assist future consideration. This report describes how defeaters, and multiple levels of defeaters, should be represented and assessed in Assurance 2.0 and its Clarissa/ASCE tool support. These mechanisms also support eliminative argumentation, which is a contrary approach to assurance, favored by some, that uses a negative argument to refute all reasons why the top claim could be false.


Assessing Confidence with Assurance 2.0

Bloomfield, Robin, Rushby, John

arXiv.org Artificial Intelligence

An assurance case is intended to provide justifiable confidence in the truth of its top claim, which typically concerns safety or security. A natural question is then "how much" confidence does the case provide? We argue that confidence cannot be reduced to a single attribute or measurement. Instead, we suggest it should be based on attributes that draw on three different perspectives: positive, negative, and residual doubts. Positive Perspectives consider the extent to which the evidence and overall argument of the case combine to make a positive statement justifying belief in its claims. We set a high bar for justification, requiring it to be indefeasible. The primary positive measure for this is soundness, which interprets the argument as a logical proof. Confidence in evidence can be expressed probabilistically and we use confirmation measures to ensure that the "weight" of evidence crosses some threshold. In addition, probabilities can be aggregated from evidence through the steps of the argument using probability logics to yield what we call probabilistic valuations for the claims. Negative Perspectives record doubts and challenges to the case, typically expressed as defeaters, and their exploration and resolution. Assurance developers must guard against confirmation bias and should vigorously explore potential defeaters as they develop the case, and should record them and their resolution to avoid rework and to aid reviewers. Residual Doubts: the world is uncertain so not all potential defeaters can be resolved. We explore risks and may deem them acceptable or unavoidable. It is crucial however that these judgments are conscious ones and that they are recorded in the assurance case. This report examines the perspectives in detail and indicates how Clarissa, our prototype toolset for Assurance 2.0, assists in their evaluation.


Causal Confirmation Measures: From Simpson's Paradox to COVID-19

Lu, Chenguang

arXiv.org Artificial Intelligence

When we compare the influences of two causes on an outcome, if the conclusion from every group is against that from the conflation, we think there is Simpson's Paradox. The Existing Causal Inference Theory (ECIT) can make the overall conclusion consistent with the grouping conclusion by removing the confounder's influence to eliminate the paradox. The ECIT uses relative risk difference Pd = max(0, (R - 1)/R) (R denotes the risk ratio) as the probability of causation. In contrast, Philosopher Fitelson uses confirmation measure D (posterior probability minus prior probability) to measure the strength of causation. Fitelson concludes that from the perspective of Bayesian confirmation, we should directly accept the overall conclusion without considering the paradox. The author proposed a Bayesian confirmation measure b* similar to Pd before. To overcome the contradiction between the ECIT and Bayesian confirmation, the author uses the semantic information method with the minimum cross-entropy criterion to deduce causal confirmation measure Cc = (R -1)/max(R, 1). Cc is like Pd but has normalizing property (between -1 and 1) and cause symmetry. It especially fits cases where a cause restrains an outcome, such as the COVID-19 vaccine controlling the infection. Some examples (about kidney stone treatments and COVID-19) reveal that Pd and Cc are more reasonable than D; Cc is more useful than Pd.


Understanding Topic Coherence Measures

#artificialintelligence

Topic Modeling is one of the most important NLP fields. It aims to explain a textual dataset by decomposing it into two distributions: topics and words. So, a Topic Modeling Algorithm is a mathematical/statistical model used to infer what are the topics that better represent the data. For simplicity, a topic can be described as a collection of words, like ['ball', 'cat', 'house'] and ['airplane', 'clouds'], but in practice, what an algorithm does is assign each word in our vocabulary a'participation' value in a given topic. The words with the highest values can be considered as the true participants of a topic.


Channels' Confirmation and Predictions' Confirmation: from the Medical Test to the Raven Paradox

Lu, Chenguang

arXiv.org Artificial Intelligence

After long arguments between positivism and falsificationism, the verification of universal hypotheses was replaced with the confirmation of uncertain major premises. Unfortunately, Hemple discovered the Raven Paradox (RP). Then, Carnap used the logical probability increment as the confirmation measure. So far, many confirmation measures have been proposed. Measure F among them proposed by Kemeny and Oppenheim possesses symmetries and asymmetries proposed by Elles and Fitelson, monotonicity proposed by Greco et al., and normalizing property suggested by many researchers. Based on the semantic information theory, a measure b* similar to F is derived from the medical test. Like the likelihood ratio, b* and F can only indicate the quality of channels or the testing means instead of the quality of probability predictions. And, it is still not easy to use b*, F, or another measure to clarify the RP. For this reason, measure c* similar to the correct rate is derived. The c* has the simple form: (a-c)/max(a, c); it supports the Nicod Criterion and undermines the Equivalence Condition, and hence, can be used to eliminate the RP. Some examples are provided to show why it is difficult to use one of popular confirmation measures to eliminate the RP. Measure F, b*, and c* indicate that fewer counterexamples' existence is more essential than more positive examples' existence, and hence, are compatible with Popper's falsification thought.