AITopics

Country: Europe > United Kingdom (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Neural Information Processing SystemsFeb-9-2026, 11:03:29 GMT

972cda1e62b72640cb7ac702714a115f-Paper.pdf

oracle, search model, training data, (16 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Genre: Research Report (0.93)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.68)
(2 more...)

Wei, Yanhao 'Max', Jiang, Zhenling

Pre-Training Estimators for Structural Models: Application to Consumer Search

arXiv.org Artificial IntelligenceDec-1-2025

We develop pre-trained estimators for structural econometric models. The estimator uses a neural net to recognize the structural model's parameter from data patterns. Once trained, the estimator can be shared and applied to different datasets at negligible cost and effort. Under sufficient training, the estimator converges to the Bayesian posterior given the data patterns. As an illustration, we construct a pretrained estimator for a sequential search model (available at pnnehome.github.io). Estimation takes only seconds and achieves high accuracy on 12 real datasets. More broadly, pretrained estimators can make structural models much easier to use and more accessible.

artificial intelligence, machine learning, pretrained nne, (18 more...)

2505.00526

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Industry:

Retail > Online (0.46)
Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Nguyen, An Thi, Stoykova, Radina, Arazo, Eric

Emergent AI Surveillance: Overlearned Person Re-Identification and Its Mitigation in Law Enforcement Context

arXiv.org Artificial IntelligenceOct-8-2025

Generic instance search models can dramatically reduce the manual effort required to analyze vast surveillance footage during criminal investigations by retrieving specific objects of interest to law enforcement. However, our research reveals an unintended emergent capability: through overlearning, these models can single out specific individuals even when trained on datasets without human subjects. This capability raises concerns regarding identification and profiling of individuals based on their personal data, while there is currently no clear standard on how de-identification can be achieved. We evaluate two technical safeguards to curtail a model's person re-identification capacity: index exclusion and confusion loss. Our experiments demonstrate that combining these approaches can reduce person re-identification accuracy to below 2% while maintaining 82% of retrieval performance for non-person objects. However, we identify critical vulnerabilities in these mitigations, including potential circumvention using partial person images. These findings highlight urgent regulatory questions at the intersection of AI governance and data protection: How should we classify and regulate systems with emergent identification capabilities? And what technical standards should be required to prevent identification capabilities from developing in seemingly benign applications?

artificial intelligence, machine learning, object-oriented architecture, (18 more...)

2510.06026

Country: Europe (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.46)

Neural Information Processing SystemsAug-15-2025, 06:28:48 GMT

Supplementary Material S1 Pseudocode Algorithm 1 gives pseudocode for autofocusing a broad class of model-based optimization (MBO)

"E-step" (Steps 1 and 2 in Algorithm 1) and a weighted maximum likelihood estimation (MLE) "M-step" (Step 3; see [ ( t 1) (t 1) One may use these in a number of different ways. The following observation is due to Chebyshev's inequality. One can use Proposition S2.1 to construct a confidence interval on, for example, the expected squared Note that 1) the bound in Proposition S2.1 is CbAS naturally controls the importance weight variance. Design procedures that leverage a trust region can naturally bound the variance of the importance weights. We used CbAS as follows.

artificial intelligence, machine learning, training distribution, (18 more...)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

Neural Information Processing SystemsAug-15-2025, 06:28:41 GMT

972cda1e62b72640cb7ac702714a115f-Paper.pdf

artificial intelligence, evolutionary algorithm, machine learning, (17 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Genre: Research Report (0.93)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.68)
(2 more...)

Ahluwalia, Aman, Sutradhar, Bishwajit, Ghosh, Karishma, Yadav, Indrapal, Sheetal, Arpan, Patil, Prashant

Hybrid Semantic Search: Unveiling User Intent Beyond Keywords

arXiv.org Artificial IntelligenceSep-6-2024

At its core, semantic search hinges This paper addresses the limitations of on two crucial components. The first, the search traditional keyword-based search in function, acts similarly to traditional search understanding user intent and introduces a engines [1] by identifying and ranking novel hybrid search approach that leverages documents relevant to a user's query within a the strengths of non-semantic search engines, vast collection of information (corpus). However, Large Language Models (LLMs), and semantic search goes beyond this basic embedding models. The proposed system functionality with its second component: integrates keyword matching, semantic vector semantic understanding. This is where embeddings, and LLM-generated structured Transformers come into play, allowing the queries to deliver highly relevant and system to delve deeper than keyword matching.

query, retrieval, semantic search, (12 more...)

2408.09236

Country: Asia > India > Maharashtra > Pune (0.04)

Genre: Research Report (0.43)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.49)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)

arXiv.org Artificial IntelligenceJan-2-2024

Large Search Model: Redefining Search Stack in the Era of LLMs

Wang, Liang, Yang, Nan, Huang, Xiaolong, Yang, Linjun, Majumder, Rangan, Wei, Furu

Modern search engines are built on a stack of different components, including query understanding, retrieval, multi-stage ranking, and question answering, among others. These components are often optimized and deployed independently. In this paper, we introduce a novel conceptual framework called large search model, which redefines the conventional search stack by unifying search tasks with one large language model (LLM). All tasks are formulated as autoregressive text generation problems, allowing for the customization of tasks through the use of natural language prompts. This proposed framework capitalizes on the strong language understanding and reasoning capabilities of LLMs, offering the potential to enhance search result quality while simultaneously simplifying the existing cumbersome search stack. To substantiate the feasibility of this framework, we present a series of proof-of-concept experiments and discuss the potential challenges associated with implementing this approach within real-world search systems.

arxiv preprint, information, language model, (12 more...)

2310.14587

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
North America > United States > New York > New York County > New York City (0.04)
(8 more...)

Genre: Research Report (0.87)

Industry:

Information Technology (0.68)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceNov-24-2023

Code Search Debiasing:Improve Search Results beyond Overall Ranking Performance

Zhang, Sheng, Li, Hui, Wang, Yanlin, Wei, Zhao, Xiu, Yong, Wang, Juhong, Ji, Rongong

Code search engine is an essential tool in software development. Many code search methods have sprung up, focusing on the overall ranking performance of code search. In this paper, we study code search from another perspective by analyzing the bias of code search models. Biased code search engines provide poor user experience, even though they show promising overall performance. Due to different development conventions (e.g., prefer long queries or abbreviations), some programmers will find the engine useful, while others may find it hard to get desirable search results. To mitigate biases, we develop a general debiasing framework that employs reranking to calibrate search results. It can be easily plugged into existing engines and handle new code search biases discovered in the future. Experiments show that our framework can effectively reduce biases. Meanwhile, the overall ranking performance of code search gets improved after debiasing.

code snippet, query, snippet, (15 more...)

2311.14901

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Fujian Province > Xiamen (0.04)

Genre:

Research Report (0.64)
Overview (0.46)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceMay-23-2023

WebCPM: Interactive Web Search for Chinese Long-form Question Answering

Qin, Yujia, Cai, Zihan, Jin, Dian, Yan, Lan, Liang, Shihao, Zhu, Kunlun, Lin, Yankai, Han, Xu, Ding, Ning, Wang, Huadong, Xie, Ruobing, Qi, Fanchao, Liu, Zhiyuan, Sun, Maosong, Zhou, Jie

Long-form question answering (LFQA) aims at answering complex, open-ended questions with detailed, paragraph-length responses. The de facto paradigm of LFQA necessitates two procedures: information retrieval, which searches for relevant supporting facts, and information synthesis, which integrates these facts into a coherent answer. In this paper, we introduce WebCPM, the first Chinese LFQA dataset. One unique feature of WebCPM is that its information retrieval is based on interactive web search, which engages with a search engine in real time. Following WebGPT, we develop a web search interface. We recruit annotators to search for relevant information using our interface and then answer questions. Meanwhile, the web search behaviors of our annotators would be recorded. In total, we collect 5,500 high-quality question-answer pairs, together with 14,315 supporting facts and 121,330 web search actions. We fine-tune pre-trained language models to imitate human behaviors for web search and to generate answers based on the collected facts. Our LFQA pipeline, built on these fine-tuned models, generates answers that are no worse than human-written ones in 32.5% and 47.5% of the cases on our dataset and DuReader, respectively.

information, information retrieval, question answering, (22 more...)

2305.06849

Country:

Asia > China > Beijing > Beijing (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.70)