AITopics | trove

Collaborating Authors

trove

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Trove: A Flexible Toolkit for Dense Retrieval

Esfandiarpoor, Reza, Zuo, Max, Bach, Stephen H.

arXiv.org Artificial IntelligenceNov-4-2025

We introduce Trove, an easy-to-use open-source retrieval toolkit that simplifies research experiments without sacrificing flexibility or speed. For the first time, we introduce efficient data management features that load and process (filter, select, transform, and combine) retrieval datasets on the fly, with just a few lines of code. This gives users the flexibility to easily experiment with different dataset configurations without the need to compute and store multiple copies of large datasets. Trove is highly customizable: in addition to many built-in options, it allows users to freely modify existing components or replace them entirely with user-defined objects. It also provides a low-code and unified pipeline for evaluation and hard negative mining, which supports multi-node execution without any code changes. Trove's data management features reduce memory consumption by a factor of 2.6. Moreover, Trove's easy-to-use inference pipeline incurs no overhead, and inference times decrease linearly with the number of available nodes. Most importantly, we demonstrate how Trove simplifies retrieval experiments and allows for arbitrary customizations, thus facilitating exploratory research.

machine learning, natural language, trove, (17 more...)

arXiv.org Artificial Intelligence

2511.01857

Country:

North America > United States (0.14)
North America > Dominican Republic (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Information Management > Search (0.93)

Add feedback

Biotech firm aims to create 'ChatGPT of biology' – will it work?

New ScientistJun-17-2025, 20:13:20 GMT

A British biotech firm called Basecamp Research has spent the past few years collecting troves of genetic data from microbes living in extreme environments around the world, identifying more than a million species and nearly 10 billion genes new to science. It claims that this massive database of the planet's biodiversity will help train a "ChatGPT of biology" that will answer questions about life on Earth – but there's no guarantee this will work. A hydrogen fuel revolution is coming – here's why we might not want it Jörg Overmann at the Leibniz Institute DSMZ in Germany, which houses one of the world's most diverse collections of microbial cultures, says increasing known genetic sequences is valuable, but may not result in useful findings for things like drug discovery or chemistry without more information about the organisms from which they were collected. "I'm not convinced that in the end the understanding of really novel functions will be accelerated by this brute-force increase in the sequence space," he says. Recent years have seen researchers develop a number of machine learning models trained to identify patterns and predict relationships amid vast amounts of biological data.

large language model, machine learning, natural language, (18 more...)

New Scientist

Country:

Europe > Germany (0.25)
North America > United States > California > Alameda County > Berkeley (0.05)
Europe > Middle East > Malta (0.05)
Europe > France (0.05)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.62)

Add feedback

TROVE: A Challenge for Fine-Grained Text Provenance via Source Sentence Tracing and Relationship Classification

Zhu, Junnan, Xiao, Min, Wang, Yining, Zhai, Feifei, Zhou, Yu, Zong, Chengqing

arXiv.org Artificial IntelligenceMar-19-2025

LLMs have achieved remarkable fluency and coherence in text generation, yet their widespread adoption has raised concerns about content reliability and accountability. In high-stakes domains such as healthcare, law, and news, it is crucial to understand where and how the content is created. To address this, we introduce the Text pROVEnance (TROVE) challenge, designed to trace each sentence of a target text back to specific source sentences within potentially lengthy or multi-document inputs. Beyond identifying sources, TROVE annotates the fine-grained relationships (quotation, compression, inference, and others), providing a deep understanding of how each target sentence is formed. To benchmark TROVE, we construct our dataset by leveraging three public datasets covering 11 diverse scenarios (e.g., QA and summarization) in English and Chinese, spanning source texts of varying lengths (0-5k, 5-10k, 10k+), emphasizing the multi-document and long-document settings essential for provenance. To ensure high-quality data, we employ a three-stage annotation process: sentence retrieval, GPT provenance, and human provenance. We evaluate 11 LLMs under direct prompting and retrieval-augmented paradigms, revealing that retrieval is essential for robust performance, larger models perform better in complex relationship classification, and closed-source models often lead, yet open-source models show significant promise, particularly with retrieval augmentation.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.15289

Country:

Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
Asia > China > Beijing > Beijing (0.04)
Europe > United Kingdom (0.04)
Europe > Hungary (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Library Learning Doesn't: The Curious Case of the Single-Use "Library"

Berlot-Attwell, Ian, Rudzicz, Frank, Si, Xujie

arXiv.org Artificial IntelligenceOct-26-2024

Advances in Large Language Models (LLMs) have spurred a wave of LLM library learning systems for mathematical reasoning. These systems aim to learn a reusable library of tools, such as formal Isabelle lemmas or Python programs that are tailored to a family of tasks. Many of these systems are inspired by the human structuring of knowledge into reusable and extendable concepts, but do current methods actually learn reusable libraries of tools? We study two library learning systems for mathematics which both reported increased accuracy: LEGO-Prover and TroVE. We find that function reuse is extremely infrequent on miniF2F and MATH. Our followup ablation experiments suggest that, rather than reuse, self-correction and self-consistency are the primary drivers of the observed performance gains. Our code and data are available at https://github.com/ikb-a/curious-case

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.20274

Country:

Europe > Austria > Vienna (0.15)
North America > Canada > Ontario > Toronto (0.14)
Asia > Singapore (0.04)
(8 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Education > Educational Setting > Continuing Education (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Faux ScarJo and the Descent of the A.I. Vultures

The New YorkerMay-22-2024, 18:28:43 GMT

On May 13th, during a live event, the artificial-intelligence company OpenAI unveiled the next generation of its technology, GPT-4o, the successor to GPT-3. When OpenAI first released its product to the public in late 2022, as the text-based tool ChatGPT, it nearly single-handedly ushered in the A.I. era. The latest version is far more powerful still. The "o" in the name stands for "omni"; the model can communicate seamlessly across various forms of media at once, including text, audio, and video, receiving prompts in one medium and responding in another. It can maintain a memory of everything you tell it.

large language model, machine learning, natural language, (22 more...)

The New Yorker

Industry: Media (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.52)

Add feedback

A Vast New Data Set Could Supercharge the AI Hunt for Crypto Money Laundering

WIREDMay-1-2024, 13:00:00 GMT

One task where AI tools have proven to be particularly superhuman is analyzing vast troves of data to find patterns that humans can't see, or automating and accelerating the discovery of those we can. That makes Bitcoin's blockchain, a public record of nearly a billion transactions between pseudonymous addresses, the perfect sort of puzzle for AI to solve. Now, a new study--along with a vast, newly released trove of crypto crime training data--may be about to trigger a leap forward in automated tools' ability to suss out illicit money flows across the Bitcoin economy. On Wednesday, researchers from cryptocurrency tracing firm Elliptic, MIT, and IBM published a paper that lays out a new approach to finding money laundering on Bitcoin's blockchain. Rather than try to identify cryptocurrency wallets or clusters of addresses associated with criminal entities such as dark-web black markets, thieves, or scammers, the researchers collected patterns of bitcoin transactions that led from one of those known bad actors to a cryptocurrency exchange where dirty crypto might be cashed out.

artificial intelligence, money laundering, transaction, (12 more...)

WIRED

Industry:

Law Enforcement & Public Safety (1.00)
Banking & Finance > Trading (1.00)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Even the CIA is developing an AI chatbot

EngadgetSep-26-2023, 19:23:58 GMT

The CIA and other US intelligence agencies will soon have an AI chatbot similar to ChatGPT. The program, revealed on Tuesday by Bloomberg, will train on publicly available data and provide sources alongside its answers so agents can confirm their validity. The aim is for US spies to more easily sift through ever-growing troves of information, although the exact nature of what constitutes "public data" could spark some thorny privacy issues. "We've gone from newspapers and radio, to newspapers and television, to newspapers and cable television, to basic internet, to big data, and it just keeps going," Randy Nixon, the CIA's director of Open Source Enterprise, said in an interview with Bloomberg. "We have to find the needles in the needle field."

ai chatbot, information, nixon, (8 more...)

Engadget

Country:

North America > United States (1.00)
Asia > China (0.08)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)

Add feedback

Artificial Intelligence and Extended Reality May Pose Security Risks, Expert Warns

#artificialintelligenceAug-15-2022, 11:48:40 GMT

Payton predicted that "AI poisoning" will be something to be concerned about in 2021. As Towards Data Science notes, a "poisoning attack happens when the adversary is able to inject bad data into your model's training pool, and hence get it to learn something it shouldn't." In solidly built AI models, Payton noted, "your [AI] coach should be self-learning and contextually aware and almost become a black box to the engineer" once it gets up and running. "My prediction is that, as we're implementing more AI, hackers will hack in and change that algorithm undetected, so that the AI will do things not initially in the design," she said. "AI is going to be cybercriminals' weapon of choice, to help them crack into more accounts, networks and data stores."

artificial intelligence, payton, pose security risk, (5 more...)

#artificialintelligence

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.72)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Italy slaps facial recognition firm Clearview AI with €20 million fine

EngadgetMar-9-2022, 21:25:00 GMT

Italy's data privacy watchdog said it will fine the controversial facial recognition firm Clearview AI for breaching EU law. An investigation by Garante, Italy's data protection authority, found that the company's database of 10 billion images of faces includes those of Italians and residents in Italy. The New York City-based firm is being fined €20 million, and will also have to delete any facial biometrics it holds of Italian nationals. This isn't the first time that the beleaguered facial recognition tech company is facing legal consequences. The UK data protection authority last November fined the company £17 million after finding its practices--which include collecting selfies of people without their consent from security camera footage or mugshots--violate the nation's data protection laws.

clearview ai, facial recognition firm clearview ai, italy slap, (5 more...)

Engadget

Country:

Europe > Italy (1.00)
North America > United States > New York (0.26)
Oceania > Australia (0.06)
(2 more...)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.92)

Add feedback

Meta Unveils New AI Supercomputer

WSJ.com: WSJD - TechnologyJan-24-2022, 17:00:00 GMT

The Morning Download delivers daily insights and news on business technology from the CIO Journal team. Meta, which announced the news in a blog post Monday, said its research team currently is using the supercomputer to train AI models in natural-language processing and computer vision for research. The aim is to boost capabilities to one day train models with more than a trillion parameters on data sets as large as an exabyte, which is roughly equivalent to 36,000 years of high-quality video. "The experiences we're building for the metaverse require enormous compute power…and RSC will enable new AI models that can learn from trillions of examples, understand hundreds of languages, and more," Meta CEO Mark Zuckerberg said in a statement provided to The Wall Street Journal. By mid-summer, when the AI Research SuperCluster is fully built, it will house some 16,000 GPUs, becoming the fastest AI supercomputer in the world, Meta said.

meta, meta unveil new ai supercomputer, supercomputer, (10 more...)

WSJ.com: WSJD - Technology

Country: North America > United States > Illinois (0.06)

Genre: Press Release (0.37)

Industry: Information Technology > Services (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.92)
Information Technology > Communications > Social Media (0.57)

Add feedback