AITopics

2403.10237

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Asia > Taiwan (0.04)
Asia > Middle East > Iran > East Azerbaijan Province > Tabriz (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Services (0.87)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

Neural Information Processing SystemsMar-14-2024, 19:36:55 GMT

Query Complexity of Derivative-Free Optimization

This paper provides lower bounds on the convergence rate of Derivative Free Optimization (DFO) with noisy function evaluations, exposing a fundamental and unavoidable gap between the performance of algorithms with access to gradients and those with access to only function evaluations. However, there are situations in which DFO is unavoidable, and for such situations we propose a new DFO algorithm that is proved to be near optimal for the class of strongly convex objective functions. A distinctive feature of the algorithm is that it uses only Boolean-valued function comparisons, rather than function evaluations. This makes the algorithm useful in an even wider range of applications, such as optimization based on paired comparisons from human subjects, for example. We also show that regardless of whether DFO is based on noisy function evaluations or Boolean-valued function comparisons, the convergence rate is the same.

apple, comparison oracle, oracle, (13 more...)

Country: North America > United States > Wisconsin > Dane County > Madison (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.40)

Neural Information Processing SystemsMar-14-2024, 11:01:56 GMT

Accuracy at the Top

We introduce a new notion of classification accuracy based on the top -quantile values of a scoring function, a relevant criterion in a number of problems arising for search engines. We define an algorithm optimizing a convex surrogate of the corresponding loss, and discuss its solution in terms of a set of convex optimization problems. We also present margin-based guarantees for this algorithm based on the top -quantile value of the scores of the functions in the hypothesis set. Finally, we report the results of several experiments in the bipartite setting evaluating the performance of our solution and comparing the results to several other algorithms seeking high precision at the top. In most examples, our solution achieves a better performance in precision at the top.

accuracy, algorithm, quantile, (14 more...)

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.49)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.48)

Neural Information Processing SystemsMar-14-2024, 03:11:09 GMT

Human memory search as a random walk in a semantic network

The human mind has a remarkable ability to store a vast amount of information in memory, and an even more remarkable ability to retrieve these experiences when needed. Understanding the representations and algorithms that underlie human memory search could potentially be useful in other information retrieval settings, including internet search. Psychological studies have revealed clear regularities in how people search their memory, with clusters of semantically related items tending to be retrieved together. These findings have recently been taken as evidence that human memory search is similar to animals foraging for food in patchy environments, with people making a rational decision to switch away from a cluster of related information as it becomes depleted. We demonstrate that the results that were taken as evidence for this account also emerge from a random walk on a semantic network, much like the random web surfer model used in internet search engines. This offers a simpler and more unified account of how people search their memory, postulating a single process rather than one process for exploring a cluster and one process for switching between clusters.

irt, random walk, semantic network, (14 more...)

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > Florida (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.47)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

arXiv.org Artificial IntelligenceMar-14-2024

Contextual Clarity: Generating Sentences with Transformer Models using Context-Reverso Data

Musaev, Ruslan

To create a dataset for training the T5 model, we harness the power of that provides usage examples for words. We prepared a dataset in the form of (query word, context or example usage) by parsing Context-Reverso webpages based on a query word. Additionally, we trained t5-small, and t5-base models for generating context-sentences based on input words. This resource enables us to obtain diverse and contextually rich sentences that incorporate the target keywords. We have also developed an application for learning new English words with a generated context [Telegram bot]. Our method aims to address the challenges of generating extremely short contexts and mitigating ambiguity in sentence construction. Objective: To develop a model that can generate informative and contextually relevant sentence-contexts for a given set of keywords, benefiting natural language understanding and generation applications such as search engines, personal assistants, and content summarization.

context sentence, dataset, keyword, (14 more...)

2403.08103

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.50)

arXiv.org Artificial IntelligenceMar-14-2024

MCFEND: A Multi-source Benchmark Dataset for Chinese Fake News Detection

Li, Yupeng, He, Haorui, Bai, Jin, Wen, Dacheng

The prevalence of fake news across various online sources has had a significant influence on the public. Existing Chinese fake news detection datasets are limited to news sourced solely from Weibo. However, fake news originating from multiple sources exhibits diversity in various aspects, including its content and social context. Methods trained on purely one single news source can hardly be applicable to real-world scenarios. Our pilot experiment demonstrates that the F1 score of the state-of-the-art method that learns from a large Chinese fake news detection dataset, Weibo-21, drops significantly from 0.943 to 0.470 when the test data is changed to multi-source news data, failing to identify more than one-third of the multi-source fake news. To address this limitation, we constructed the first multi-source benchmark dataset for Chinese fake news detection, termed MCFEND, which is composed of news we collected from diverse sources such as social platforms, messaging apps, and traditional online news outlets. Notably, such news has been fact-checked by 14 authoritative fact-checking agencies worldwide. In addition, various existing Chinese fake news detection methods are thoroughly evaluated on our proposed dataset in cross-source, multi-source, and unseen source ways. MCFEND, as a benchmark dataset, aims to advance Chinese fake news detection approaches in real-world scenarios.

dataset, detection, fact-checking agency, (10 more...)

doi: 10.1145/3589334.3645385

2403.09092

Country:

Asia > China > Hong Kong (0.05)
Asia > Singapore > Central Region > Singapore (0.05)
Asia > Taiwan (0.04)
(5 more...)

Genre: Research Report > New Finding (0.67)

Industry: Media > News (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Neural Information Processing SystemsMar-13-2024, 23:18:39 GMT

Beyond Pairwise: Provably Fast Algorithms for Approximate k-Way Similarity Search

We go beyond the notion of pairwise similarity and look into search problems with k-way similarity functions.

log 1, resemblance, similarity, (16 more...)

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)
North America > United States > Texas > Dallas County > Dallas (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
(9 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.88)

Neural Information Processing SystemsMar-13-2024, 01:45:39 GMT

Copeland Dueling Bandits Zohar Karnin Informatics Institute

A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist. Two algorithms are proposed that instead seek to minimize regret with respect to the Copeland winner, which, unlike the Condorcet winner, is guaranteed to exist. The first, Copeland Confidence Bound (CCB), is designed for small numbers of arms, while the second, Scalable Copeland Bandits (SCB), works better for large-scale problems. We provide theoretical results bounding the regret accumulated by CCB and SCB, both substantially improving existing results.

algorithm, bandit problem, dueling bandit problem, (15 more...)

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.70)
Information Technology > Data Science > Data Mining > Big Data (0.54)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Rakotoson, Loïc, Massip, Sylvain, Laleye, Fréjus A. A.

Science Checker Reloaded: A Bidirectional Paradigm for Transparency and Logical Reasoning

arXiv.org Artificial IntelligenceMar-13-2024

Information retrieval is a rapidly evolving field. However it still faces significant limitations in the scientific and industrial vast amounts of information, such as semantic divergence and vocabulary gaps in sparse retrieval, low precision and lack of interpretability in semantic search, or hallucination and outdated information in generative models. In this paper, we introduce a two-block approach to tackle these hurdles for long documents. The first block enhances language understanding in sparse retrieval by query expansion to retrieve relevant documents. The second block deepens the result by providing comprehensive and informative answers to the complex question using only the information spread in the long document, enabling bidirectional engagement. At various stages of the pipeline, intermediate results are presented to users to facilitate understanding of the system's reasoning. We believe this bidirectional approach brings significant advancements in terms of transparency, logical thinking, and comprehensive understanding in the field of scientific information retrieval.

arxiv preprint arxiv, information retrieval, retrieval, (13 more...)

2402.13897

Country:

Europe > France > Île-de-France > Paris > Paris (0.05)
North America > Dominican Republic (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre:

Research Report (0.51)
Overview (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.91)

arXiv.org Artificial IntelligenceMar-13-2024

Foundation Models and Information Retrieval in Digital Pathology

Tizhoosh, H. R.

The surge in adoption of digital pathology has the potential to revolutionize medical diagnosis by allowing computerized analysis of tissue images (Pantanowitz 2010; Aljanabi 2012; Hanna2020). Central to this technology is the digitization of formalin-fixed, paraffin-embedded (FFPE) tissue sections mounted on glass slides. This process converts physical tissue samples into high-resolution, gigapixel digital images called whole slide images (WSIs) (Kumar2020; Evans2022). These WSI files contain detailed patterns of tissue morphology, enabling the application of computer-vision algorithms in diagnostic pathology. Pathologists can now analyze tissue images seamlessly on computer screens at various magnifications (Griffin2017). This shift from light microscopes to digital displays allows for easier visual inspection of anatomic clues that may indicate specific diseases.

arxiv preprint arxiv, information retrieval, pathology, (11 more...)