AITopics | Vlachos, Andreas

Collaborating Authors

Vlachos, Andreas

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Collaborative Evaluation of Deepfake Text with Deliberation-Enhancing Dialogue Systems

Lee, Jooyoung, Zhu, Xiaochen, Karadzhov, Georgi, Stafford, Tom, Vlachos, Andreas, Lee, Dongwon

arXiv.org Artificial IntelligenceMar-6-2025

The proliferation of generative models has presented significant challenges in distinguishing authentic human-authored content from deepfake content. Collaborative human efforts, augmented by AI tools, present a promising solution. In this study, we explore the potential of DeepFakeDeLiBot, a deliberation-enhancing chatbot, to support groups in detecting deepfake text. Our findings reveal that group-based problem-solving significantly improves the accuracy of identifying machine-generated paragraphs compared to individual efforts. While engagement with DeepFakeDeLiBot does not yield substantial performance gains overall, it enhances group dynamics by fostering greater participant engagement, consensus building, and the frequency and diversity of reasoning-based utterances. Additionally, participants with higher perceived effectiveness of group collaboration exhibited performance benefits from DeepFakeDeLiBot. These findings underscore the potential of deliberative chatbots in fostering interactive and productive group dynamics while ensuring accuracy in collaborative deepfake text detection. \textit{Dataset and source code used in this study will be made publicly available upon acceptance of the manuscript.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.04945

Country:

North America > United States (1.00)
Asia (1.00)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.77)
Health & Medicine (0.68)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Future Outcome Reasoning and Confidence Assessment Benchmark

Yuan, Zhangdie, Ding, Zifeng, Vlachos, Andreas

arXiv.org Artificial IntelligenceMar-3-2025

Forecasting is an important task in many domains, such as technology and economics. However existing forecasting benchmarks largely lack comprehensive confidence assessment, focus on limited question types, and often consist of artificial questions that do not align with real-world human forecasting needs. To address these gaps, we introduce FOReCAst (Future Outcome Reasoning and Confidence Assessment), a benchmark that evaluates models' ability to make predictions and their confidence in them. FOReCAst spans diverse forecasting scenarios involving Boolean questions, timeframe prediction, and quantity estimation, enabling a comprehensive evaluation of both prediction accuracy and confidence calibration for real-world applications.

large language model, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2502.19676

Country:

North America > United States (0.14)
Europe > Italy (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Energy (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)

Add feedback

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Chen, Liang, Wang, Zekun, Ren, Shuhuai, Li, Lei, Zhao, Haozhe, Li, Yunshui, Cai, Zefan, Guo, Hongcheng, Zhang, Lei, Xiong, Yizhe, Zhang, Yichi, Wu, Ruoyu, Dong, Qingxiu, Zhang, Ge, Yang, Jian, Meng, Lingwei, Hu, Shujie, Chen, Yulong, Lin, Junyang, Bai, Shuai, Vlachos, Andreas, Tan, Xu, Zhang, Minjia, Xiao, Wen, Yee, Aaron, Liu, Tianyu, Chang, Baobao

arXiv.org Artificial IntelligenceDec-29-2024

Building on the foundations of language modeling in natural language processing, Next Token Prediction (NTP) has evolved into a versatile training objective for machine learning tasks across various modalities, achieving considerable success. As Large Language Models (LLMs) have advanced to unify understanding and generation tasks within the textual modality, recent research has shown that tasks from different modalities can also be effectively encapsulated within the NTP framework, transforming the multimodal information into tokens and predict the next one given the context. This survey introduces a comprehensive taxonomy that unifies both understanding and generation within multimodal learning through the lens of NTP. The proposed taxonomy covers five key aspects: Multimodal tokenization, MMNTP model architectures, unified task representation, datasets \& evaluation, and open challenges. This new taxonomy aims to aid researchers in their exploration of multimodal intelligence. An associated GitHub repository collecting the latest papers and repos is available at https://github.com/LMM101/Awesome-Multimodal-Next-Token-Prediction

large language model, machine learning, natural language, (25 more...)

arXiv.org Artificial Intelligence

2412.18619

Country:

Europe (1.00)
North America > United States (0.67)

Genre:

Research Report (1.00)
Overview (1.00)
Instructional Material (1.00)

Industry:

Leisure & Entertainment (0.92)
Information Technology (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.45)
Media > Music (0.45)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

Segment-Level Diffusion: A Framework for Controllable Long-Form Generation with Diffusion Language Models

Zhu, Xiaochen, Karadzhov, Georgi, Whitehouse, Chenxi, Vlachos, Andreas

arXiv.org Artificial IntelligenceDec-15-2024

Diffusion models have shown promise in text generation but often struggle with generating long, coherent, and contextually accurate text. Token-level diffusion overlooks word-order dependencies and enforces short output windows, while passage-level diffusion struggles with learning robust representation for long-form text. To address these challenges, we propose Segment-Level Diffusion (SLD), a framework that enhances diffusion-based text generation through text segmentation, robust representation training with adversarial and contrastive learning, and improved latent-space guidance. By segmenting long-form outputs into separate latent representations and decoding them with an autoregressive decoder, SLD simplifies diffusion predictions and improves scalability. Experiments on XSum, ROCStories, DialogSum, and DeliData demonstrate that SLD achieves competitive or superior performance in fluency, coherence, and contextual compatibility across automatic and human evaluation metrics comparing with other diffusion and autoregressive baselines. Ablation studies further validate the effectiveness of our segmentation and representation learning strategies.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2412.11333

Country:

Europe (1.00)
Asia (0.93)
North America > United States > California (0.28)

Genre:

Research Report (0.82)
Overview (0.68)
Personal > Interview (0.46)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.67)

Add feedback

A Bayesian Optimization Approach to Machine Translation Reranking

Cheng, Julius, Züfle, Maike, Zouhar, Vilém, Vlachos, Andreas

arXiv.org Artificial IntelligenceNov-14-2024

Reranking a list of candidates from a machine translation system with an external scoring model and returning the highest-scoring candidate remains a simple and effective method for improving the overall output quality. Translation scoring models continue to grow in size, with the best models being comparable to generation models. Thus, reranking can add substantial computational cost to the translation pipeline. In this work, we pose reranking as a Bayesian optimization (BayesOpt) problem. By strategically selecting candidates to score based on a balance of exploration and exploitation, we show that it is possible to find top-scoring candidates when scoring only a fraction of the candidate list. For instance, our method achieves the same CometKiwi score using only 70 scoring evaluations compared a baseline system using 180. We present a multi-fidelity setting for BayesOpt, where the candidates are first scored with a cheaper but noisier proxy scoring model, which further improves the cost-performance tradeoff when using smaller but well-trained distilled proxy scorers.

artificial intelligence, computational linguistic, natural language, (16 more...)

arXiv.org Artificial Intelligence

2411.09694

Country: Europe (0.28)

Genre: Research Report > Experimental Study (0.46)

Industry: Energy > Oil & Gas > Upstream (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking

Akhtar, Mubashara, Schlichtkrull, Michael, Vlachos, Andreas

arXiv.org Artificial IntelligenceNov-8-2024

Current automated fact-checking (AFC) approaches commonly evaluate evidence either implicitly via the predicted verdicts or by comparing retrieved evidence with a predefined closed knowledge source, such as Wikipedia. However, these methods suffer from limitations, resulting from their reliance on evaluation metrics developed for different purposes and constraints imposed by closed knowledge sources. Recent advances in natural language generation (NLG) evaluation offer new possibilities for evidence assessment. In this work, we introduce Ev2R, an evaluation framework for AFC that comprises three types of approaches for evidence evaluation: reference-based, proxy-reference, and reference-less. We evaluate their effectiveness through agreement with human ratings and adversarial tests, and demonstrate that prompt-based scorers, particularly those leveraging LLMs and reference evidence, outperform traditional evaluation approaches.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2411.05375

Country:

Europe > United Kingdom (1.00)
Asia (1.00)
Africa (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Government (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

TabVer: Tabular Fact Verification with Natural Logic

Aly, Rami, Vlachos, Andreas

arXiv.org Artificial IntelligenceNov-1-2024

Fact verification on tabular evidence incentivises the use of symbolic reasoning models where a logical form is constructed (e.g. a LISP-style program), providing greater verifiability than fully neural approaches. However, these systems typically rely on well-formed tables, restricting their use in many scenarios. An emerging symbolic reasoning paradigm for textual evidence focuses on natural logic inference, which constructs proofs by modelling set-theoretic relations between a claim and its evidence in natural language. This approach provides flexibility and transparency but is less compatible with tabular evidence since the relations do not extend to arithmetic functions. We propose a set-theoretic interpretation of numerals and arithmetic functions in the context of natural logic, enabling the integration of arithmetic expressions in deterministic proofs. We leverage large language models to generate arithmetic expressions by generating questions about salient parts of a claim which are answered by executing appropriate functions on tables. In a few-shot setting on FEVEROUS, we achieve an accuracy of 71.4, outperforming both fully neural and symbolic reasoning models by 3.4 points. When evaluated on TabFact without any further training, our method remains competitive with an accuracy lead of 0.5 points.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.01093

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East (0.67)

Genre: Research Report (0.50)

Industry:

Government > Regional Government (0.47)
Government > Voting & Elections (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

The Automated Verification of Textual Claims (AVeriTeC) Shared Task

Schlichtkrull, Michael, Chen, Yulong, Whitehouse, Chenxi, Deng, Zhenyun, Akhtar, Mubashara, Aly, Rami, Guo, Zhijiang, Christodoulopoulos, Christos, Cocarascu, Oana, Mittal, Arpit, Thorne, James, Vlachos, Andreas

arXiv.org Artificial IntelligenceOct-31-2024

The Automated Verification of Textual Claims (AVeriTeC) shared task asks participants to retrieve evidence and predict veracity for real-world claims checked by fact-checkers. Evidence can be found either via a search engine, or via a knowledge store provided by the organisers. Submissions are evaluated using AVeriTeC score, which considers a claim to be accurately verified if and only if both the verdict is correct and retrieved evidence is considered to meet a certain quality threshold. The shared task received 21 submissions, 18 of which surpassed our baseline. The winning team was TUDA_MAI with an AVeriTeC score of 63%. In this paper we describe the shared task, present the full results, and highlight key takeaways from the shared task.

computational linguistic, large language model, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2410.2385

Country:

Asia > China (0.93)
Asia > Middle East > UAE (0.14)
North America > United States > Louisiana (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.64)

Industry:

Media (0.68)
Government > Regional Government > Asia Government > China Government (0.46)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

Conformity in Large Language Models

Zhu, Xiaochen, Zhang, Caiqi, Stafford, Tom, Collier, Nigel, Vlachos, Andreas

arXiv.org Artificial IntelligenceOct-16-2024

The conformity effect describes the tendency of individuals to align their responses with the majority. Studying this bias in large language models (LLMs) is crucial, as LLMs are increasingly used in various information-seeking and decision-making tasks as conversation partners to improve productivity. Thus, conformity to incorrect responses can compromise their effectiveness. In this paper, we adapt psychological experiments to examine the extent of conformity in state-of-the-art LLMs. Our findings reveal that all models tested exhibit varying levels of conformity toward the majority, regardless of their initial choice or correctness, across different knowledge domains. Notably, we are the first to show that LLMs are more likely to conform when they are more uncertain in their own prediction. We further explore factors that influence conformity, such as training paradigms and input characteristics, finding that instruction-tuned models are less susceptible to conformity, while increasing the naturalness of majority tones amplifies conformity. Finally, we propose two interventions--Devil's Advocate and Question Distillation--to mitigate conformity, providing insights into building more robust language models.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.12428

Country:

Asia (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

ALVIN: Active Learning Via INterpolation

Korakakis, Michalis, Vlachos, Andreas, Weller, Adrian

arXiv.org Artificial IntelligenceOct-11-2024

Active Learning aims to minimize annotation effort by selecting the most useful instances from a pool of unlabeled data. However, typical active learning methods overlook the presence of distinct example groups within a class, whose prevalence may vary, e.g., in occupation classification datasets certain demographics are disproportionately represented in specific classes. This oversight causes models to rely on shortcuts for predictions, i.e., spurious correlations between input attributes and labels occurring in well-represented groups. To address this issue, we propose Active Learning Via INterpolation (ALVIN), which conducts intra-class interpolations between examples from under-represented and well-represented groups to create anchors, i.e., artificial points situated between the example groups in the representation space. By selecting instances close to the anchors for annotation, ALVIN identifies informative examples exposing the model to regions of the representation space that counteract the influence of shortcuts. Crucially, since the model considers these examples to be of high certainty, they are likely to be ignored by typical active learning methods. Experimental results on six datasets encompassing sentiment analysis, natural language inference, and paraphrase detection demonstrate that ALVIN outperforms state-of-the-art active learning methods in both in-distribution and out-of-distribution generalization.

artificial intelligence, computational linguistic, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2410.08972

Country:

Europe (1.00)
North America > United States > Washington > King County > Seattle (0.28)
North America > United States > Minnesota (0.28)
North America > Canada > British Columbia (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback