AITopics | factual

Collaborating Authors

factual

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ACORN: Aspect-wise Commonsense Reasoning Explanation Evaluation

Brassard, Ana, Heinzerling, Benjamin, Kudo, Keito, Sakaguchi, Keisuke, Inui, Kentaro

arXiv.org Artificial IntelligenceMay-8-2024

Evaluating free-text explanations is a multifaceted, subjective, and labor-intensive task. Large language models (LLMs) present an appealing alternative due to their potential for consistency, scalability, and cost-efficiency. In this work, we present ACORN, a new dataset of 3,500 free-text explanations and aspect-wise quality ratings, and use it to gain insights into how LLMs evaluate explanations. We observed that replacing one of the human ratings sometimes maintained, but more often lowered the inter-annotator agreement across different settings and quality aspects, suggesting that their judgments are not always consistent with human raters. We further quantified this difference by comparing the correlation between LLM-generated ratings with majority-voted human ratings across different quality aspects. With the best system, Spearman's rank correlation ranged between 0.53 to 0.95, averaging 0.72 across aspects, indicating moderately high but imperfect alignment. Finally, we considered the alternative of using an LLM as an additional rater when human raters are scarce, and measured the correlation between majority-voted labels with a limited human pool and LLMs as an additional rater, compared to the original gold labels. While GPT-4 improved the outcome when there were only two human raters, in all other observed cases, LLMs were neutral to detrimental when there were three or more human raters. We publicly release the dataset to support future improvements in LLM-in-the-loop evaluation here: https://github.com/a-brassard/ACORN.

explanation, human rater, rater, (15 more...)

arXiv.org Artificial Intelligence

2405.04818

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Washington > King County > Seattle (0.04)
Asia > Japan > Honshū > Tōhoku (0.04)
(10 more...)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

FACTUAL: A Novel Framework for Contrastive Learning Based Robust SAR Image Classification

Wang, Xu, Ye, Tian, Kannan, Rajgopal, Prasanna, Viktor

arXiv.org Artificial IntelligenceApr-4-2024

Deep Learning (DL) Models for Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR), while delivering improved performance, have been shown to be quite vulnerable to adversarial attacks. Existing works improve robustness by training models on adversarial samples. However, by focusing mostly on attacks that manipulate images randomly, they neglect the real-world feasibility of such attacks. In this paper, we propose FACTUAL, a novel Contrastive Learning framework for Adversarial Training and robust SAR classification. FACTUAL consists of two components: (1) Differing from existing works, a novel perturbation scheme that incorporates realistic physical adversarial attacks (such as OTSA) to build a supervised adversarial pre-training network. This network utilizes class labels for clustering clean and perturbed images together into a more informative feature space. (2) A linear classifier cascaded after the encoder to use the computed representations to predict the target labels. By pre-training and fine-tuning our model on both clean and adversarial samples, we show that our model achieves high prediction accuracy on both cases. Our model achieves 99.7% accuracy on clean samples, and 89.6% on perturbed samples, both outperforming previous state-of-the-art methods.

contrastive learning, learning, perturbation, (15 more...)

arXiv.org Artificial Intelligence

2404.03225

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Government > Military (0.87)
Information Technology > Security & Privacy (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.51)

Add feedback

Test-time Augmentation for Factual Probing

Kamoda, Go, Heinzerling, Benjamin, Sakaguchi, Keisuke, Inui, Kentaro

arXiv.org Artificial IntelligenceOct-25-2023

Factual probing is a method that uses prompts to test if a language model "knows" certain world knowledge facts. A problem in factual probing is that small changes to the prompt can lead to large changes in model output. Previous work aimed to alleviate this problem by optimizing prompts via text mining or fine-tuning. However, such approaches are relation-specific and do not generalize to unseen relation types. Here, we propose to use test-time augmentation (TTA) as a relation-agnostic method for reducing sensitivity to prompt variations by automatically augmenting and ensembling prompts at test time. Experiments show improved model calibration, i.e., with TTA, model confidence better reflects prediction accuracy. Improvements in prediction accuracy are observed for some models, but for other models, TTA leads to degradation. Error analysis identifies the difficulty of producing high-quality prompt variations as the main challenge for TTA.

factual, test-time augmentation

arXiv.org Artificial Intelligence

2310.17121

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.53)

Add feedback

Forgetful Large Language Models: Lessons Learned from Using LLMs in Robot Programming

Chen, Juo-Tung, Huang, Chien-Ming

arXiv.org Artificial IntelligenceOct-10-2023

Large language models offer new ways of empowering people to program robot applications-namely, code generation via prompting. However, the code generated by LLMs is susceptible to errors. This work reports a preliminary exploration that empirically characterizes common errors produced by LLMs in robot programming. We categorize these errors into two phases: interpretation and execution. In this work, we focus on errors in execution and observe that they are caused by LLMs being "forgetful" of key information provided in user prompts. Based on this observation, we propose prompt engineering tactics designed to reduce errors in execution. We then demonstrate the effectiveness of these tactics with three language models: ChatGPT, Bard, and LLaMA-2. Finally, we discuss lessons learned from using LLMs in robot programming and call for the benchmarking of LLM-powered end-user development of robot applications.

factual, gripper, lib, (16 more...)

arXiv.org Artificial Intelligence

2310.06646

Country: North America > United States > Maryland > Baltimore (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Factual, a Location Data Company Leverages Machine Learning to Update Its Data Insights Solution

#artificialintelligenceDec-1-2019, 10:46:46 GMT

Factual, the location data company, today announced a significant update to its Audience product, adding Predictive and Loyalty audiences built using machine-learned predictive insights to its roster of targeting solutions for marketers. Beginning today, marketers will have access to new Predictive Audiences and Loyalty Audiences, both built on sophisticated visitation pattern analysis, which will further enable marketers to construct highly scalable and accurate audience segments based on real-world consumer behavior and designed for ROI. The company has also added more than 100 ready-to-use audience segments in every vertical, including auto, retail and QSR. Factual builds its Predictive Audiences by developing an understanding of visitors to a place category and mapping their visitation patterns beforehand. Using Factual's Observation Graph, consumers most likely to visit a category based on these patterns can be segmented into audiences, giving marketers the ability to connect with consumers before they set foot in a brand's retail location.

audience, factual, predictive audience, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.71)

Add feedback

SemEval-2019 Task 8: Fact Checking in Community Question Answering Forums

Mihaylova, Tsvetomila, Karadjov, Georgi, Atanasova, Pepa, Baly, Ramy, Mohtarami, Mitra, Nakov, Preslav

arXiv.org Machine LearningMay-25-2019

We present SemEval-2019 Task 8 on Fact Checking in Community Question Answering Forums, which features two subtasks. Subtask A is about deciding whether a question asks for factual information vs. an opinion/advice vs. just socializing. Subtask B asks to predict whether an answer to a factual question is true, false or not a proper answer. We received 17 official submissions for subtask A and 11 official submissions for Subtask B. For subtask A, all systems improved over the majority class baseline. For Subtask B, all systems were below a majority class baseline, but several systems were very close to it. The leaderboard and the data from the competition can be found at http://competitions.codalab.org/competitions/20022

machine learning, natural language, question answering, (18 more...)

arXiv.org Machine Learning

1906.01727

Country:

North America > United States > California > San Francisco County > San Francisco (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.17)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
(31 more...)

Genre: Research Report (0.40)

Industry: Media > News (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.62)

Add feedback