AITopics | trustscore

Collaborating Authors

trustscore

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness

Zheng, Danna, Liu, Danyang, Lapata, Mirella, Pan, Jeff Z.

arXiv.org Artificial IntelligenceMay-6-2024

Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, prompting a surge in their practical applications. However, concerns have arisen regarding the trustworthiness of LLMs' outputs, particularly in closed-book question-answering tasks, where non-experts may struggle to identify inaccuracies due to the absence of contextual or ground truth information. This paper introduces TrustScore, a framework based on the concept of Behavioral Consistency, which evaluates whether an LLM's response aligns with its intrinsic knowledge. Additionally, TrustScore can seamlessly integrate with factchecking methods, which assesses alignment with external knowledge sources. The experimental results show that TrustScore achieves strong correlations with human judgments, surpassing existing reference-free metrics, and achieving results on par with reference-based metrics. Large-scale language models (LLMs) have recently been in the spotlight due to their impressive performance in various NLP tasks, sparking enthusiasm for potential applications (Kaddour et al., 2023; Bubeck et al., 2023). However, a notable concern has emerged regarding the ability of LLMs to generate plausible yet incorrect responses (Tam et al., 2022; Liu et al., 2023; Devaraj et al., 2022), particularly challenging for users without specialized expertise. Consequently, users are often advised to employ LLMs in scenarios where they can confidently assess the information provided.

arxiv preprint arxiv, secure and trustworthy, trustscore, (15 more...)

arXiv.org Artificial Intelligence

2402.12545

Country:

Europe > Spain (0.05)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.05)
North America > United States > West Virginia (0.04)
(10 more...)

Genre:

Research Report > New Finding (0.48)
Personal > Honors (0.30)

Industry: Government > Regional Government > North America Government > United States Government (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

To Softmax, or not to Softmax: that is the question when applying Active Learning for Transformer Models

Gonsior, Julius, Falkenberg, Christian, Magino, Silvio, Reusch, Anja, Thiele, Maik, Lehner, Wolfgang

arXiv.org Artificial IntelligenceOct-6-2022

Despite achieving state-of-the-art results in nearly all Natural Language Processing applications, fine-tuning Transformer-based language models still requires a significant amount of labeled data to work. A well known technique to reduce the amount of human effort in acquiring a labeled dataset is \textit{Active Learning} (AL): an iterative process in which only the minimal amount of samples is labeled. AL strategies require access to a quantified confidence measure of the model predictions. A common choice is the softmax activation function for the final layer. As the softmax function provides misleading probabilities, this paper compares eight alternatives on seven datasets. Our almost paradoxical finding is that most of the methods are too good at identifying the true most uncertain samples (outliers), and that labeling therefore exclusively outliers results in worse performance. As a heuristic we propose to systematically ignore samples, which results in improvements of various methods compared to the softmax function.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2210.03005

Country:

Europe > Germany > Saxony > Dresden (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

CreditVidya

#artificialintelligenceJan-3-2018, 06:56:36 GMT

Credit Access is one of the fundamental enablers for faster economic growth and reduced income inequality. In India, the credit to GDP ratio stands around 50% which is significantly lower as compared to the developed economies and other BRICS nations. Today, approximately 525 million customers in India have no access to formal credit from regulated financial institutions. These individuals depend on informal mechanisms for saving and protecting themselves against risk. Insufficient credit history – Most of the unbanked consumers have either never taken a loan or have a thin file history.

artificial intelligence, creditvidya, financial institution, (11 more...)

#artificialintelligence

Country: Asia > India (0.91)

Industry: Banking & Finance > Credit (0.91)

Technology:

Information Technology > Communications (0.33)
Information Technology > Artificial Intelligence (0.33)

Add feedback