AITopics | Feldman, Sergey

Collaborating Authors

Feldman, Sergey

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Asai, Akari, He, Jacqueline, Shao, Rulin, Shi, Weijia, Singh, Amanpreet, Chang, Joseph Chee, Lo, Kyle, Soldaini, Luca, Feldman, Sergey, D'arcy, Mike, Wadden, David, Latzke, Matt, Tian, Minyang, Ji, Pan, Liu, Shengyan, Tong, Hao, Wu, Bohao, Xiong, Yanyu, Zettlemoyer, Luke, Neubig, Graham, Weld, Dan, Downey, Doug, Yih, Wen-tau, Koh, Pang Wei, Hajishirzi, Hannaneh

arXiv.org Artificial IntelligenceNov-21-2024

Scientific progress depends on researchers' ability to synthesize the growing body of literature. Can large language models (LMs) assist scientists in this task? We introduce OpenScholar, a specialized retrieval-augmented LM that answers scientific queries by identifying relevant passages from 45 million open-access papers and synthesizing citation-backed responses. To evaluate OpenScholar, we develop ScholarQABench, the first large-scale multi-domain benchmark for literature search, comprising 2,967 expert-written queries and 208 long-form answers across computer science, physics, neuroscience, and biomedicine. On ScholarQABench, OpenScholar-8B outperforms GPT-4o by 5% and PaperQA2 by 7% in correctness, despite being a smaller, open model. While GPT4o hallucinates citations 78 to 90% of the time, OpenScholar achieves citation accuracy on par with human experts. OpenScholar's datastore, retriever, and self-feedback inference loop also improves off-the-shelf LMs: for instance, OpenScholar-GPT4o improves GPT-4o's correctness by 12%. In human evaluations, experts preferred OpenScholar-8B and OpenScholar-GPT4o responses over expert-written ones 51% and 70% of the time, respectively, compared to GPT4o's 32%. We open-source all of our code, models, datastore, data and a public demo.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.14199

Country:

Asia (0.67)
North America > United States > North Carolina (0.14)
North America > United States > Illinois (0.14)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TOPICAL: TOPIC Pages AutomagicaLly

Giorgi, John, Singh, Amanpreet, Downey, Doug, Feldman, Sergey, Wang, Lucy Lu

arXiv.org Artificial IntelligenceMay-2-2024

Topic pages aggregate useful information about an entity or concept into a single succinct and accessible article. Automated creation of topic pages would enable their rapid curation as information resources, providing an alternative to traditional web search. While most prior work has focused on generating topic pages about biographical entities, in this work, we develop a completely automated process to generate high-quality topic pages for scientific entities, with a focus on biomedical concepts. We release TOPICAL, a web app and associated open-source code, comprising a model pipeline combining retrieval, clustering, and prompting, that makes it easy for anyone to generate topic pages for a wide variety of biomedical entities on demand. In a human evaluation of 150 diverse topic pages generated using TOPICAL, we find that the vast majority were considered relevant, accurate, and coherent, with correct supporting citations. We make all code publicly available and host a free-to-use web app at: https://s2-topical.apps.allenai.org

information retrieval, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2405.01796

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.94)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
(2 more...)

Add feedback

On-the-fly Definition Augmentation of LLMs for Biomedical NER

Munnangi, Monica, Feldman, Sergey, Wallace, Byron C, Amir, Silvio, Hope, Tom, Naik, Aakanksha

arXiv.org Artificial IntelligenceApr-23-2024

Despite their general capabilities, LLMs still struggle on biomedical NER tasks, which are difficult due to the presence of specialized terminology and lack of training data. In this work we set out to improve LLM performance on biomedical NER in limited data settings via a new knowledge augmentation approach which incorporates definitions of relevant concepts on-the-fly. During this process, to provide a test bed for knowledge augmentation, we perform a comprehensive exploration of prompting strategies. Our experiments show that definition augmentation is useful for both open source and closed LLMs. For example, it leads to a relative improvement of 15\% (on average) in GPT-4 performance (F1) across all (six) of our test datasets. We conduct extensive ablations and analyses to demonstrate that our performance improvements stem from adding relevant definitional knowledge. We find that careful prompting strategies also improve LLM performance, allowing them to outperform fine-tuned language models in few-shot settings. To facilitate future research in this direction, we release our code at https://github.com/allenai/beacon.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2404.00152

Country:

North America > United States (0.28)
Europe (0.28)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.47)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RCT Rejection Sampling for Causal Estimation Evaluation

Keith, Katherine A., Feldman, Sergey, Jurgens, David, Bragg, Jonathan, Bhattacharya, Rohit

arXiv.org Artificial IntelligenceJan-31-2024

Confounding is a significant obstacle to unbiased estimation of causal effects from observational data. For settings with high-dimensional covariates -- such as text data, genomics, or the behavioral social sciences -- researchers have proposed methods to adjust for confounding by adapting machine learning methods to the goal of causal estimation. However, empirical evaluation of these adjustment methods has been challenging and limited. In this work, we build on a promising empirical evaluation strategy that simplifies evaluation design and uses real data: subsampling randomized controlled trials (RCTs) to create confounded observational datasets while using the average causal effects from the RCTs as ground-truth. We contribute a new sampling algorithm, which we call RCT rejection sampling, and provide theoretical guarantees that causal identification holds in the observational data to allow for valid comparisons to the ground-truth RCT. Using synthetic data, we show our algorithm indeed results in low bias when oracle estimators are evaluated on the confounded samples, which is not always the case for a previously proposed algorithm. In addition to this identification result, we highlight several finite data considerations for evaluation designers who plan to use RCT rejection sampling on their own datasets. As a proof of concept, we implement an example evaluation pipeline and walk through these finite data considerations with a novel, real-world RCT -- which we release publicly -- consisting of approximately 70k observations and text data as high-dimensional covariates. Together, these contributions build towards a broader agenda of improved empirical evaluation for causal estimation.

artificial intelligence, machine learning research, natural language, (17 more...)

arXiv.org Artificial Intelligence

2307.15176

Country: North America > United States (0.14)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.66)
Health & Medicine > Health Care Technology > Medical Record (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

SciRepEval: A Multi-Format Benchmark for Scientific Document Representations

Singh, Amanpreet, D'Arcy, Mike, Cohan, Arman, Downey, Doug, Feldman, Sergey

arXiv.org Artificial IntelligenceNov-13-2023

Learned representations of scientific documents can serve as valuable input features for downstream tasks without further fine-tuning. However, existing benchmarks for evaluating these representations fail to capture the diversity of relevant tasks. In response, we introduce SciRepEval, the first comprehensive benchmark for training and evaluating scientific document representations. It includes 24 challenging and realistic tasks, 8 of which are new, across four formats: classification, regression, ranking and search. We then use this benchmark to study and improve the generalization ability of scientific document representation models. We show how state-of-the-art models like SPECTER and SciNCL struggle to generalize across the task formats, and that simple multi-task training fails to improve them. However, a new approach that learns multiple embeddings per document, each tailored to a different format, can improve performance. We experiment with task-format-specific control codes and adapters and find they outperform the existing single-embedding state-of-the-art by over 2 points absolute. We release the resulting family of multi-format models, called SPECTER2, for the community to use and build on.

artificial intelligence, information retrieval, natural language, (19 more...)

arXiv.org Artificial Intelligence

2211.13308

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

ABNIRML: Analyzing the Behavior of Neural IR Models

MacAvaney, Sean, Feldman, Sergey, Goharian, Nazli, Downey, Doug, Cohan, Arman

arXiv.org Artificial IntelligenceJul-20-2023

Pretrained contextualized language models such as BERT and T5 have established a new state-of-the-art for ad-hoc search. However, it is not yet well-understood why these methods are so effective, what makes some variants more effective than others, and what pitfalls they may have. We present a new comprehensive framework for Analyzing the Behavior of Neural IR ModeLs (ABNIRML), which includes new types of diagnostic probes that allow us to test several characteristics -- such as writing styles, factuality, sensitivity to paraphrasing and word order -- that are not addressed by previous techniques. To demonstrate the value of the framework, we conduct an extensive empirical study that yields insights into the factors that contribute to the neural model's gains, and identify potential unintended biases the models exhibit. Some of our results confirm conventional wisdom, like that recent neural ranking models rely less on exact term overlap with the query, and instead leverage richer linguistic information, evidenced by their higher sensitivity to word and sentence order. Other results are more surprising, such as that some models (e.g., T5 and ColBERT) are biased towards factually correct (rather than simply relevant) texts. Further, some characteristics vary even for the same base language model, and other characteristics can appear due to random variations during model training.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1162/tacl_a_00457

2011.00696

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Add feedback

S2abEL: A Dataset for Entity Linking from Scientific Tables

Lou, Yuze, Kuehl, Bailey, Bransom, Erin, Feldman, Sergey, Naik, Aakanksha, Downey, Doug

arXiv.org Artificial IntelligenceApr-29-2023

Entity linking (EL) is the task of linking a textual mention to its corresponding entry in a knowledge base, and is critical for many knowledge-intensive NLP applications. When applied to tables in scientific papers, EL is a step toward large-scale scientific knowledge bases that could enable advanced scientific question answering and analytics. We present the first dataset for EL in scientific tables. EL for scientific tables is especially challenging because scientific knowledge bases can be very incomplete, and disambiguating table mentions typically requires understanding the papers's tet in addition to the table. Our dataset, S2abEL, focuses on EL in machine learning results tables and includes hand-labeled cell types, attributed sources, and entity links from the PaperswithCode taxonomy for 8,429 cells from 732 tables. We introduce a neural baseline method designed for EL on scientific tables containing many out-of-knowledge-base mentions, and show that it significantly outperforms a state-of-the-art generic table EL method. The best baselines fall below human performance, and our analysis highlights avenues for improvement.

data mining, knowledge management, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2305.00366

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Knowledge Management > Knowledge Engineering (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.94)
(2 more...)

Add feedback

The Semantic Scholar Open Data Platform

Kinney, Rodney, Anastasiades, Chloe, Authur, Russell, Beltagy, Iz, Bragg, Jonathan, Buraczynski, Alexandra, Cachola, Isabel, Candra, Stefan, Chandrasekhar, Yoganand, Cohan, Arman, Crawford, Miles, Downey, Doug, Dunkelberger, Jason, Etzioni, Oren, Evans, Rob, Feldman, Sergey, Gorney, Joseph, Graham, David, Hu, Fangzhou, Huff, Regan, King, Daniel, Kohlmeier, Sebastian, Kuehl, Bailey, Langan, Michael, Lin, Daniel, Liu, Haokun, Lo, Kyle, Lochner, Jaron, MacMillan, Kelsey, Murray, Tyler, Newell, Chris, Rao, Smita, Rohatgi, Shaurya, Sayre, Paul, Shen, Zejiang, Singh, Amanpreet, Soldaini, Luca, Subramanian, Shivashankar, Tanaka, Amber, Wade, Alex D., Wagner, Linda, Wang, Lucy Lu, Wilhelm, Chris, Wu, Caroline, Yang, Jiangjiang, Zamarron, Angele, Van Zuylen, Madeleine, Weld, Daniel S.

arXiv.org Artificial IntelligenceJan-24-2023

The volume of scientific output is creating an urgent need for automated tools to help scientists keep up with developments in their field. Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature. We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction to build the Semantic Scholar Academic Graph, the largest open scientific literature graph to-date, with 200M+ papers, 80M+ authors, 550M+ paper-authorship edges, and 2.4B+ citation edges. The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings. In this paper, we describe the components of the S2 data processing pipeline and the associated APIs offered by the platform. We will update this living document to reflect changes as we add new data offerings and improve existing services.

data mining, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2301.1014

Genre: Research Report (1.00)

Industry: Information Technology (0.35)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Literature-Augmented Clinical Outcome Prediction

Naik, Aakanksha, Parasa, Sravanthi, Feldman, Sergey, Wang, Lucy Lu, Hope, Tom

arXiv.org Artificial IntelligenceNov-16-2021

Predictive models for medical outcomes hold great promise for enhancing clinical decision-making. These models are trained on rich patient data such as clinical notes, aggregating many patient signals into an outcome prediction. However, AI-based clinical models have typically been developed in isolation from the prominent paradigm of Evidence Based Medicine (EBM), in which medical decisions are based on explicit evidence from existing literature. In this work, we introduce techniques to help bridge this gap between EBM and AI-based clinical models, and show that these methods can improve predictive accuracy. We propose a novel system that automatically retrieves patient-specific literature based on intensive care (ICU) patient information, aggregates relevant papers and fuses them with internal admission notes to form outcome predictions. Our model is able to substantially boost predictive accuracy on three challenging tasks in comparison to strong recent baselines; for in-hospital mortality, we are able to boost top-10% precision by a large margin of over 25%.

endocrinology, machine learning, natural language, (26 more...)

arXiv.org Artificial Intelligence

2111.08374

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Maryland > Montgomery County (0.14)

Genre: Research Report > Experimental Study (0.85)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Diagnostic Medicine (0.88)
Health & Medicine > Health Care Technology > Medical Record (0.68)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.54)

Add feedback

Multi-Task Averaging

Feldman, Sergey, Gupta, Maya, Frigyik, Bela

Neural Information Processing SystemsDec-31-2012

We present a multi-task learning approach to jointly estimate the means of multiple independent data sets. The proposed multi-task averaging (MTA) algorithm results in a convex combination of the single-task averages. We derive the optimal amount of regularization, and show that it can be effectively estimated. Simulations and real data experiments demonstrate that MTA both maximum likelihood and James-Stein estimators, and that our approach to estimating the amount of regularization rivals cross-validation in performance but is more computationally efficient.

law enforcement, mta, public safety, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel (0.29)
North America > United States > Washington > King County > Seattle (0.14)

Industry:

Law Enforcement & Public Safety > Terrorism (0.95)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.49)

Add feedback