AITopics | mistral

Collaborating Authors

mistral

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Leave big tech behind! How to replace Amazon, Google, X, Meta, Apple – and more

The GuardianFeb-26-2026, 10:00:25 GMT

Switching to big tech alternatives is easier than you might imagine. Switching to big tech alternatives is easier than you might imagine. T here's not much to love about big tech these days. So many ills can be laid at its door: social media harms, misinformation, polarisation, mining and misuse of personal data, environmental negligence, tax avoidance, the list goes on. Added to which, Silicon Valley's leaders seem all too keen to cosy up to the Trump administration, to shower the president with bribes - sorry, gifts - and remain silent about his worsening political overreach. And that's before we get to the rampant " enshittification ", as the tech writer Cory Doctorow describes it, which means that by design many big tech products have become less useful and more extractive than they were when we originally signed up to them.

artificial intelligence, machine learning, social media, (15 more...)

The Guardian

Country:

Europe > United Kingdom (0.48)
North America > United States > California (0.24)
Europe > France (0.06)
(10 more...)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Services (0.94)
(2 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Multi-LLM Debate: Framework, Principals, and Interventions

Neural Information Processing SystemsFeb-10-2026, 21:55:29 GMT

We first take a theoretical approach to analyzing debate and provide a framework through which debate can be mathematically examined. Building on this framework, we provide several theoretical results for multi-agent debate.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > West Virginia (0.04)
North America > United States > Virginia (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Media (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Mistral's New Ultra-Fast Translation Model Gives Big AI Labs a Run for Their Money

WIREDFeb-4-2026, 15:32:45 GMT

Mistral's New Ultra-Fast Translation Model Gives Big AI Labs a Run for Their Money "Too many GPUs makes you lazy," says the French startup's vice president of science operations, as the company carves out a different path than the major US AI companies. Mistral AI has released a new family of AI models that it claims will clear the path to seamless conversation between people speaking different languages . On Wednesday, the Paris-based AI lab released two new speech-to-text models: Voxtral Mini Transcribe V2 and Voxtral Realtime. The former is built to transcribe audio files in large batches and the latter for nearly real-time transcription, within 200 milliseconds; both can translate between 13 languages. Voxtral Realtime is freely available under an open source license.

large language model, machine learning, real time system, (19 more...)

WIRED

Country:

North America > United States > California (0.16)
North America > United States > Minnesota (0.15)

Industry: Leisure & Entertainment > Sports (0.98)

Technology:

Information Technology > Architecture > Real Time Systems (0.61)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback

Large Language Model-Based Generation of Discharge Summaries

Rodrigues, Tiago, Lopes, Carla Teixeira

arXiv.org Artificial IntelligenceDec-9-2025

Discharge Summaries are documents written by medical professionals that detail a patient's visit to a care facility. They contain a wealth of information crucial for patient care, and automating their generation could significantly reduce the effort required from healthcare professionals, minimize errors, and ensure that critical patient information is easily accessible and actionable. In this work, we explore the use of five Large Language Models on this task, from open-source models (Mistral, Llama 2) to proprietary systems (GPT-3, GPT-4, Gemini 1.5 Pro), leveraging MIMIC-III summaries and notes. We evaluate them using exact-match, soft-overlap, and reference-free metrics. Our results show that proprietary models, particularly Gemini with one-shot prompting, outperformed others, producing summaries with the highest similarity to the gold-standard ones. Open-source models, while promising, especially Mistral after fine-tuning, lagged in performance, often struggling with hallucinations and repeated information. Human evaluation by a clinical expert confirmed the practical utility of the summaries generated by proprietary models. Despite the challenges, such as hallucinations and missing information, the findings suggest that LLMs, especially proprietary models, are promising candidates for automatic discharge summary generation as long as data privacy is ensured.

information, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2512.06812

Country:

Europe (1.00)
North America > United States (0.28)
North America > Canada (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Going All-In on LLM Accuracy: Fake Prediction Markets, Real Confidence Signals

Todasco, Michael

arXiv.org Artificial IntelligenceDec-9-2025

Large language models are increasingly used to evaluate other models, yet these judgments typically lack any representation of confidence. This pilot study tests whether framing an evaluation task as a betting game (a fictional prediction market with its own LLM currency) improves forecasting accuracy and surfaces calibrated confidence signals. We generated 100 math and logic questions with verifiable answers. Six Baseline models (three current-generation, three prior-generation) answered all items. Three Predictor models then forecasted, for each question-baseline pair, if the baseline would answer correctly. Each predictor completed matched runs in two conditions: Control (simple correct/incorrect predictions) and Incentive (predictions plus wagers of 1-100,000 LLMCoin under even odds, starting from a 1,000,000 LLMCoin bankroll). Across 5,400 predictions per condition, Incentive runs showed modestly higher accuracy (81.5% vs. 79.1%, p = .089, d = 0.86) and significantly faster learning across rounds (12.0 vs. 2.9 percentage-point improvement from Round 1 to Round 4, p = .011). Most notably, stake size tracked confidence. "Whale" bets of 40,000+ coins were correct ~99% of the time, while small bets (<1,000 coins) showed only ~74% accuracy. The key finding is not that fictional money makes models smarter; accuracy gains were modest and did not reach statistical significance (p = .089) in this pilot. Rather, the betting mechanic created a legible confidence signal absent from binary yes/no outputs. This suggests that simple financial framing may help transform LLMs into risk-aware forecasters, making their internal beliefs visible and usable. The protocol offers a foundation for future work for meta-evaluation systems and what may become LLM-to-LLM prediction markets.

accuracy, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.17605/OSF.IO/DC24T

2512.05998

Genre: Research Report > Experimental Study (1.00)

Industry: Banking & Finance > Trading > Prediction Market (0.81)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Mind Reading or Misreading? LLMs on the Big Five Personality Test

Di Cursi, Francesco, Boldrini, Chiara, Conti, Marco, Passarella, Andrea

arXiv.org Artificial IntelligenceDec-1-2025

We evaluate large language models (LLMs) for automatic personality prediction from text under the binary Five Factor Model (BIG5). Five models -- including GPT-4 and lightweight open-source alternatives -- are tested across three heterogeneous datasets (Essays, MyPersonality, Pandora) and two prompting strategies (minimal vs. enriched with linguistic and psychological cues). Enriched prompts reduce invalid outputs and improve class balance, but also introduce a systematic bias toward predicting trait presence. Performance varies substantially: Openness and Agreeableness are relatively easier to detect, while Extraversion and Neuroticism remain challenging. Although open-source models sometimes approach GPT-4 and prior benchmarks, no configuration yields consistently reliable predictions in zero-shot binary settings. Moreover, aggregate metrics such as accuracy and macro-F1 mask significant asymmetries, with per-class recall offering clearer diagnostic value. These findings show that current out-of-the-box LLMs are not yet suitable for APPT, and that careful coordination of prompt design, trait framing, and evaluation metrics is essential for interpretable results.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.23101

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Assessing the Capability of LLMs in Solving POSCOMP Questions

Viegas, Cayo, Gheyi, Rohit, Ribeiro, Márcio

arXiv.org Artificial IntelligenceNov-19-2025

--Recent advancements in Large Language Models (LLMs) have significantly expanded the capabilities of artificial intelligence in natural language processing tasks. Despite this progress, their performance in specialized domains such as computer science remains relatively unexplored. Understanding the proficiency of LLMs in these domains is critical for evaluating their practical utility and guiding future developments. The POSCOMP, a prestigious Brazilian examination used for graduate admissions in computer science promoted by the Brazlian Computer Society (SBC), provides a challenging benchmark. This study investigates whether LLMs can match or surpass human performance on the POSCOMP exam. Four LLMs - ChatGPT -4, Gemini 1.0 Advanced, Claude 3 Sonnet, and Le Chat Mistral Large - were initially evaluated on the 2022 and 2023 POSCOMP exams. The assessments measured the models' proficiency in handling complex questions typical of the exam. LLM performance was notably better on text-based questions than on image interpretation tasks. In the 2022 exam, ChatGPT - 4 led with 57 correct answers out of 69 questions, followed by Gemini 1.0 Advanced (49), Le Chat Mistral (48), and Claude 3 Sonnet (44). Similar trends were observed in the 2023 exam. ChatGPT -4 achieved the highest performance, surpassing all students who took the POSCOMP 2023 exam. LLMs, particularly ChatGPT -4, show promise in text-based tasks on the POSCOMP exam, although image interpretation remains a challenge. Given the rapid evolution of LLMs, we expanded our analysis to include more recent models - o1, Gemini 2.5 Pro, Claude 3.7 Sonnet, and o3-mini-high - evaluated on the 2022-2024 POSCOMP exams. These newer models demonstrate further improvements and consistently surpass both the average and top-performing human participants across all three years. The POSCOMP [1] is a prestigious assessment designed to test the knowledge of prospective computer science graduate students, promoted by the Brazilian Computer Society (SBC). It serves as an entry criterion for many graduate programs across Brazil. Using this exam as a benchmark for evaluating Large Language Models (LLMs) allows for a direct comparison between AI capabilities and human standards, offering valuable insights into the strengths and limitations of current AI models. Recent advancements in LLMs [2], [3] have significantly expanded the capabilities of Artificial Intelligence (AI), particularly in natural language processing tasks.

exam, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.5753/jbcs.2025.4493

2505.20338

Country:

Asia > Japan (0.28)
South America > Brazil (0.25)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Assessment & Standards (0.46)
Education > Educational Setting > Higher Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

From Passive to Persuasive: Steering Emotional Nuance in Human-AI Negotiation

Chebrolu, Niranjan, Yeo, Gerard Christopher, Jaidka, Kokil

arXiv.org Artificial IntelligenceNov-18-2025

Large Language Models (LLMs) demonstrate increasing conversational fluency, yet instilling them with nuanced, human-like emotional expression remains a significant challenge. Current alignment techniques often address surface-level output or require extensive fine-tuning. This paper demonstrates that targeted activation engineering can steer LLaMA 3.1-8B to exhibit more human-like emotional nuances. We first employ attribution patching to identify causally influential components, to find a key intervention locus by observing activation patterns during diagnostic conversational tasks. We then derive emotional expression vectors from the difference in the activations generated by contrastive text pairs (positive vs. negative examples of target emotions). Applying these vectors to new conversational prompts significantly enhances emotional characteristics: steered responses show increased positive sentiment (e.g., joy, trust) and more frequent first-person pronoun usage, indicative of greater personal engagement. Our findings offer a precise and interpretable framework and new directions for the study of conversational AI.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.12832

Country: North America > United States (0.46)

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Where Should I Study? Biased Language Models Decide! Evaluating Fairness in LMs for Academic Recommendations

Shailya, Krithi, Mishra, Akhilesh Kumar, Krishnan, Gokul S, Ravindran, Balaraman

arXiv.org Artificial IntelligenceNov-13-2025

Large Language Models (LLMs) are increasingly used as daily recommendation systems for tasks like education planning, yet their recommendations risk perpetuating societal biases. This paper empirically examines geographic, demographic, and economic biases in university and program suggestions from three open-source LLMs: LLaMA-3.1-8B, Gemma-7B, and Mistral-7B. Using 360 simulated user profiles varying by gender, nationality, and economic status, we analyze over 25,000 recommendations. Results show strong biases: institutions in the Global North are disproportionately favored, recommendations often reinforce gender stereotypes, and institutional repetition is prevalent. While LLaMA-3.1 achieves the highest diversity, recommending 481 unique universities across 58 countries, systemic disparities persist. To quantify these issues, we propose a novel, multi-dimensional evaluation framework that goes beyond accuracy by measuring demographic and geographic representation. Our findings highlight the urgent need for bias consideration in educational LMs to ensure equitable global access to higher education.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.04498

Country:

South America (1.00)
Oceania (1.00)
North America > United States (1.00)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)
Education > Educational Setting > Higher Education (0.89)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning

Casademunt, Helena, Juang, Caden, Karvonen, Adam, Marks, Samuel, Rajamanoharan, Senthooran, Nanda, Neel

arXiv.org Artificial IntelligenceNov-11-2025

Fine-tuning large language models (LLMs) can lead to unintended out-of-distribution generalization. Standard approaches to this problem rely on modifying training data, for example by adding data that better specify the intended generalization. However, this is not always practical. We introduce Concept Ablation Fine-Tuning (CAFT), a technique that leverages interpretability tools to control how LLMs generalize from fine-tuning, without needing to modify the training data or otherwise use data from the target distribution. Given a set of directions in an LLM's latent space corresponding to undesired concepts, CAFT works by ablating these concepts with linear projections during fine-tuning, steering the model away from unintended generalizations. We successfully apply CAFT to three fine-tuning tasks, including emergent misalignment, a phenomenon where LLMs fine-tuned on a narrow task generalize to give egregiously misaligned responses to general questions. Without any changes to the fine-tuning data, CAFT reduces misaligned responses by 10x without degrading performance on the training distribution. Overall, CAFT represents a novel approach for steering LLM generalization without modifying training data.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.16795

Country:

Asia > Middle East (0.46)
North America > United States (0.28)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.65)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Law (0.92)
Banking & Finance (0.67)
Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback