AITopics | minerva

Collaborating Authors

minerva

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

"Don't Teach Minerva": Guiding LLMs Through Complex Syntax for Faithful Latin Translation with RAG

Aguilar, Sergio Torres

arXiv.org Artificial IntelligenceNov-4-2025

Translating a morphology-rich, low-resource language like Latin poses significant challenges. This paper introduces a reproducible draft-based refinement pipeline that elevates open-source Large Language Models (LLMs) to a performance level statistically comparable to top-tier proprietary systems. Our method first uses a fine-tuned NLLB-1.3B model to generate a high-quality, structurally faithful draft. A zero-shot LLM (Llama-3.3 or Qwen3) then polishes this draft, a process that can be further enhanced by augmenting the context with retrieved out-context examples (RAG). We demonstrate the robustness of this approach on two distinct benchmarks: a standard in-domain test set (Rosenthal, 2023) and a new, challenging out-of-domain (OOD) set of 12th-century Latin letters (2025). Our central finding is that this open-source RAG system achieves performance statistically comparable to the GPT-5 baseline, without any task-specific LLM fine-tuning. We release the pipeline, the Chartres OOD set, and evaluation scripts and models to facilitate replicability and further research.

large language model, machine learning, translation, (21 more...)

arXiv.org Artificial Intelligence

2511.01454

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)

Add feedback

MINERVA: Evaluating Complex Video Reasoning

Nagrani, Arsha, Menon, Sachit, Iscen, Ahmet, Buch, Shyamal, Mehran, Ramin, Jha, Nilpa, Hauth, Anja, Zhu, Yukun, Vondrick, Carl, Sirotenko, Mikhail, Schmid, Cordelia, Weyand, Tobias

arXiv.org Artificial IntelligenceMay-2-2025

Multimodal LLMs are turning their focus to video benchmarks, however most video benchmarks only provide outcome supervision, with no intermediate or interpretable reasoning steps. This makes it challenging to assess if models are truly able to combine perceptual and temporal information to reason about videos, or simply get the correct answer by chance or by exploiting linguistic biases. To remedy this, we provide a new video reasoning dataset called MINERVA for modern multimodal models. Each question in the dataset comes with 5 answer choices, as well as detailed, hand-crafted reasoning traces. Our dataset is multimodal, diverse in terms of video domain and length, and consists of complex multi-step questions. Extensive benchmarking shows that our dataset provides a challenge for frontier open-source and proprietary models. We perform fine-grained error analysis to identify common failure modes across various models, and create a taxonomy of reasoning errors. We use this to explore both human and LLM-as-a-judge methods for scoring video reasoning traces, and find that failure modes are primarily related to temporal localization, followed by visual perception errors, as opposed to logical or completeness errors. The dataset, along with questions, answer candidates and reasoning traces will be publicly available under https://github.com/google-deepmind/neptune?tab=readme-ov-file\#minerva.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.00681

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Sports (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Minerva: A Programmable Memory Test Benchmark for Language Models

Xia, Menglin, Ruehle, Victor, Rajmohan, Saravan, Shokri, Reza

arXiv.org Artificial IntelligenceFeb-5-2025

How effectively can LLM-based AI assistants utilize their memory (context) to perform various tasks? Traditional data benchmarks, which are often manually crafted, suffer from several limitations: they are static, susceptible to overfitting, difficult to interpret, and lack actionable insights--failing to pinpoint the specific capabilities a model lacks when it does not pass a test. In this paper, we present a framework for automatically generating a comprehensive set of tests to evaluate models' abilities to use their memory effectively. Our framework extends the range of capability tests beyond the commonly explored (passkey, key-value, needle in the haystack) search, a dominant focus in the literature. Specifically, we evaluate models on atomic tasks such as searching, recalling, editing, matching, comparing information in context memory, and performing basic operations when inputs are structured into distinct blocks, simulating real-world data. Additionally, we design composite tests to investigate the models' ability to maintain state while operating on memory. Our benchmark enables an interpretable, detailed assessment of memory capabilities of LLMs.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.03358

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.33)

Add feedback

Leveraging Large Language Models to Geolocate Linguistic Variations in Social Media Posts

Savarro, Davide, Zago, Davide, Zoia, Stefano

arXiv.org Artificial IntelligenceJul-22-2024

Geolocalization of social media content is the task of determining the geographical location of a user based on textual data, that may show linguistic variations and informal language. In this project, we address the GeoLingIt challenge of geolocalizing tweets written in Italian by leveraging large language models (LLMs). GeoLingIt requires the prediction of both the region and the precise coordinates of the tweet. Our approach involves fine-tuning pre-trained LLMs to simultaneously predict these geolocalization aspects. By integrating innovative methodologies, we enhance the models' ability to understand the nuances of Italian social media text to improve the state-of-the-art in this domain. This work is conducted as part of the Large Language Models course at the Bertinoro International Spring School 2024. We make our code publicly available on GitHub https://github.com/dawoz/geolingit-biss2024.

dataset, italian province, minerva, (13 more...)

arXiv.org Artificial Intelligence

2407.16047

Country:

Europe > Italy > Piedmont > Turin Province > Turin (0.14)
Europe > Italy > Lazio (0.05)
Europe > Italy > Campania (0.05)
(4 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Igea: a Decoder-Only Language Model for Biomedical Text Generation in Italian

Buonocore, Tommaso Mario, Rancati, Simone, Parimbelli, Enea

arXiv.org Artificial IntelligenceJul-8-2024

The advent of probabilistic language models has revolutionized various domains, with biomedical natural language processing (NLP) standing out due to its significant impact on healthcare provision and medical research. The ability of these models to understand, process, and generate text from vast biomedical corpora has led to improvements in tasks such as entity recognition, relation extraction, and question answering. However, the majority of this progress has been focused on English-language texts, creating a notable disparity for other languages with fewer resources, such as Italian. In the Italian context, the scarcity of large and diverse training datasets presents a substantial challenge. General language models like Minerva and Maestrale have made strides in Italian NLP, but they lack the specialization required to handle the nuances of biomedical terminology effectively. Addressing this gap is crucial, as the precision and clarity needed in medical communications are paramount for clinical and research applications in such a high-stakes domain. In this paper we introduce Igea, a biomedical language model (BLM) built from the ground-up on the Italian language, and that is effective in handling Italian native biomedical text while maintaining its efficiency in terms of computational resources. We built upon the foundation model Minerva, which we then continually trained on Italian native biomedical text, while employing proper provisions to avoid disruption of what was learned during pre-training.

arxiv, igea, language model, (12 more...)

arXiv.org Artificial Intelligence

2407.06011

Country:

Europe > Italy (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Knowledge Graph Reasoning with Self-supervised Reinforcement Learning

Ma, Ying, Burns, Owen, Wang, Mingqiu, Li, Gang, Du, Nan, Shafey, Laurent El, Wang, Liqiang, Shafran, Izhak, Soltau, Hagen

arXiv.org Artificial IntelligenceMay-22-2024

Reinforcement learning (RL) is an effective method of finding reasoning pathways in incomplete knowledge graphs (KGs). To overcome the challenges of a large action space, a self-supervised pre-training method is proposed to warm up the policy network before the RL training stage. To alleviate the distributional mismatch issue in general self-supervised RL (SSRL), in our supervised learning (SL) stage, the agent selects actions based on the policy network and learns from generated labels; this self-generation of labels is the intuition behind the name self-supervised. With this training framework, the information density of our SL objective is increased and the agent is prevented from getting stuck with the early rewarded paths. Our self-supervised RL (SSRL) method improves the performance of RL by pairing it with the wide coverage achieved by SL during pretraining, since the breadth of the SL objective makes it infeasible to train an agent with that alone. We show that our SSRL model meets or exceeds current state-of-the-art results on all Hits@k and mean reciprocal rank (MRR) metrics on four large benchmark KG datasets. This SSRL method can be used as a plug-in for any RL architecture for a KGR task. We adopt two RL architectures, i.e., MINERVA and MultiHopKG as our baseline RL models and experimentally show that our SSRL model consistently outperforms both baselines on all of these four KG reasoning tasks. Full code for the paper available at https://github.com/owenonline/Knowledge-Graph-Reasoning-with-Self-supervised-Reinforcement-Learning.

agent, dataset, relation, (16 more...)

arXiv.org Artificial Intelligence

2405.1364

Country:

Asia > Middle East > Palestine (0.28)
Asia > China > Beijing > Beijing (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(11 more...)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

In AI, is bigger always better?

#artificialintelligenceMar-9-2023, 04:55:15 GMT

Artificial-intelligence systems that can churn out fluent text, such as OpenAI's ChatGPT, are the newest darlings of the technology industry. But when faced with mathematical queries that require reasoning to answer, these large language models (LLMs) often stumble. A line parallel to y 4x 6 passes through (5, 10). What is the y-coordinate of the point where this line crosses the y-axis? Although LLMs can sometimes answer these types of question correctly, they more often get them wrong. In one early test of its reasoning abilities, ChatGPT scored just 26% when faced with a sample of questions from the'MATH' data set of secondary-school-level mathematical problems1. This is to be expected: given input text, an LLM simply generates new text in accordance with statistical regularities in the words, symbols and sentences that make up the model's training data.

llm, minerva, neural network, (17 more...)

#artificialintelligence

Country:

North America > Canada > Quebec > Montreal (0.15)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Washington > King County > Redmond (0.04)
(6 more...)

Genre: Research Report > New Finding (0.47)

Industry:

Energy (1.00)
Education > Educational Setting (0.54)
Information Technology > Services (0.47)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Add feedback

Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs

Jiang, Albert Q., Welleck, Sean, Zhou, Jin Peng, Li, Wenda, Liu, Jiacheng, Jamnik, Mateja, Lacroix, Timothée, Wu, Yuhuai, Lample, Guillaume

arXiv.org Artificial IntelligenceFeb-20-2023

The formalization of existing mathematical proofs is a notoriously difficult process. Despite decades of research on automation and proof assistants, writing formal proofs remains arduous and only accessible to a few experts. While previous studies to automate formalization focused on powerful search algorithms, no attempts were made to take advantage of available informal proofs. In this work, we introduce Draft, Sketch, and Prove (DSP), a method that maps informal proofs to formal proof sketches, and uses the sketches to guide an automated prover by directing its search to easier sub-problems. We investigate two relevant setups where informal proofs are either written by humans or generated by a language model. Our experiments and ablation studies show that large language models are able to produce well-structured formal sketches that follow the same reasoning steps as the informal proofs. Guiding an automated prover with these sketches enhances its performance from 20.9% to 39.3% on a collection of mathematical competition problems.

informal proof, logic & formal reasoning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2210.12283

Country:

North America > United States > North Carolina > Wake County > Morrisville (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(4 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

How will AI change mathematics? Rise of chatbots highlights discussion

#artificialintelligenceFeb-18-2023, 01:55:09 GMT

AI tools have allowed researchers to solve complex mathematical problems.Credit: Fadel Senna/AFP/Getty As interest in chatbots spreads like wildfire, mathematicians are beginning to explore how artificial intelligence (AI) could help them to do their work. Whether it's assisting with verifying human-written work or suggesting new ways to solve difficult problems, automation is beginning to change the field in ways that go beyond mere calculation, researchers say. "We're looking at a very specific question: will machines change math?" says Andrew Granville, a number theorist at the University of Montreal in Canada. A workshop at the University of California, Los Angeles (UCLA), this week explored this question, aiming to build bridges between mathematicians and computer scientists. "Most mathematicians are completely unaware of these opportunities," says one of the event's organizers, Marijn Heule, a computer scientist at Carnegie Mellon University in Pittsburgh, Pennsylvania.

change mathematics, computer scientist, mathematician, (12 more...)

#artificialintelligence

Country: