AITopics | paragraph

Collaborating Authors

paragraph

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DSAS: AUniversal Plug-and-Play Framework for Attention Optimization in Multi-Document Question Answering

Neural Information Processing SystemsJun-23-2026, 04:35:46 GMT

While large language models (LLMs) show considerable promise across various fields, they have notable limitations in handling multi-document question answering (Multi-doc QA) tasks. The first challenge is long-range dependency modeling, where LLMs struggle to focus on key information in long texts, which weakens important semantic connections. Second, most LLMs suffer from the "lost-in-themiddle" issue, where they have difficulty processing information in the middle of long inputs. Current solutions either truncate global dependencies or demand costly finetuning, ultimately lacking a universal and simple solution for these challenges. To resolve these limitations, we propose Dual-Stage Adaptive Sharpening (DSAS) containing two modules.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States (0.67)
North America > Canada (0.46)
Asia > China (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Media > Film (0.67)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ADataset for Distilling Knowledge Priors from Literature for Therapeutic Design

Neural Information Processing SystemsJun-17-2026, 03:48:01 GMT

AI-driven discovery can greatly reduce design time and enhance new therapeutics' effectiveness. Models using simulators explore broad design spaces but risk violating implicit constraints due to a lack of experimental priors. For example, in a new analysis across diverse models on the GuacaMol benchmark using supervised classifiers, over 60% of molecules proposed had a high probability of being mutagenic. In this work, we introduce Medex, a dataset of priors for design problems extracted from literature describing compounds used in lab settings. It is constructed with LLM pipelines for discovering therapeutic entities in relevant paragraphs and summarizing information in concise fair-use facts. Medex consists of 32.3 million pairs of natural language facts, and appropriate entity representations (i.e.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre:

Research Report > Experimental Study (1.00)
Overview (0.92)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
Government > Regional Government > North America Government > United States Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Add feedback

PANORAMA: ADataset and Benchmarks Capturing Decision Trails and Rationales in Patent Examination

Neural Information Processing SystemsJun-17-2026, 02:03:30 GMT

Patent examination remains an ongoing challenge in the NLP literature even after the advent of large language models (LLMs), as it requires an extensive yet nuanced human judgment on whether a submitted claim meets the statutory standards of novelty and non-obviousness against previously granted claims--prior art--in expert domains. Previous NLP studies have approached this challenge as a prediction task (e.g., forecasting grant outcomes) with high-level proxies such as similarity metrics or classifiers trained on historical labels. However, this approach often overlooks the step-by-step evaluations that examiners must make with profound information, including rationales for the decisions provided in office actions documents, which also makes it harder to measure the current state of techniques in patent review processes. To fill this gap, we construct PANORAMA, a dataset of 8,143 U.S. patent examination records that preserves the full decision trails, including original applications, all cited references, Non-Final Rejections, and Notices of Allowance. Also, PANORAMA decomposes the trails into sequential benchmarks that emulate patent professionals' patent review processes and allow researchers to examine large language models' capabilities at each step of them. Our findings indicate that, although LLMs are relatively effective at retrieving relevant prior art and pinpointing the pertinent paragraphs, they struggle to assess the novelty and non-obviousness of patent claims. We discuss these results and argue that advancing NLP, including LLMs, in the patent domain requires a deeper understanding of real-world patent examination.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
Asia (0.92)
North America > United States (0.68)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Law > Intellectual Property & Technology Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

e2cfb719f58585f779d0a4f9f07bd618-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-30-2026, 02:17:07 GMT

A.1 Creation of the Multimodal Web Document Dataset A.1.1 Collecting of a Large Number of HTMLFiles Our data collection process begins by considering the 25 most recent Common Crawl6 dumps available at the time of dataset creation. It contains webpages spanning from February 2020 to January/February 2023. We use a modified version of readability-lxml7 to extract the main text from the pages, discarding any pages that contain text of excessively high perplexity. This process yields a total of 41.2 billion documents. Selection of English content To identify non-English content, we apply the FastText classifier (Joulin et al., 2017) to the extracted text, e ectively filtering out 63.6% of the documents. Early text deduplication Often, a set of URLs is crawled repeatedly across di erent Common Crawl snapshots. However, the content of these websites may vary as web administrators make changes over time. Hence, at this stage, we refrain from deduplicating documents based on their URLs. Instead, we perform MinHash (Broder, 1997) deduplication with 16 hashes calculated over 5-grams. To further refine the data, we eliminate documents containing substantial proportions of repeated paragraphs and n-grams, employing the methodology described in MassiveText (Rae et al., 2022).

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Africa (1.00)
North America > Canada (0.93)
(2 more...)

Genre: Research Report > Experimental Study (0.46)

Industry:

Leisure & Entertainment > Sports > Martial Arts (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
(14 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Mobile (0.68)
(2 more...)

Add feedback

Appendix of Modeling

Neural Information Processing SystemsApr-25-2026, 13:47:11 GMT

To create a passage representation, the passage title and text are concatenated ([CLS]title [SEP]passage [SEP]), following common practice (Karpukhin et al., 2020). We retrieve top 10 passages and use them as input to mGEN. We differentiate those paragraphs from the question using special tokens (

vs. He graduated with a B.S. degree in Biology in 1957. As in the case of machine translation, we found that the language code does not need to be specified during inference as our model learns the question language automatically. Yet, we found that training with language codes is particularly useful to augment training data for Ltarget without any question data in Ltarget.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: North America > United States > New York (0.14)

Industry:

Leisure & Entertainment (0.93)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.89)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.34)

Add feedback

040ca38cefb1d9226d79c05dd25469cb-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 11:08:19 GMT

If there is a bingo on mode-k, the m-th row of the mode-k expansion of P is a constant multiple of the (m 1)-th row, where mis a number determined by the bingo position. When a row is a constant multiple of another row, the rank of the matrix is reduced by a maximum of one, which means Rank(P(k)) Ik 1. In the same way, if there are bk bingos, then bk rows are constant multiple of the other rows, which means Rank(P(k)) Ik bk. For any positive tensor P, rank(P) = 1 if and only if its all many-body θparameters are 0. Proof. First, we show that rank(P) = 1 implies all many-body θ-parameters are 0. From the assumption of rank(P) = 1, the m-th row of the mode-k expansion of P have to be a constant multiple of the (m 1)-th row for all m= {2,...,Ik}and k [d].

artificial intelligence, machine learning, tensor, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition

Theodore Bluche

Neural Information Processing SystemsMar-23-2026, 05:09:41 GMT

Neural Information Processing Systems http://nips.cc/

machine learning, natural language, segmentation, (17 more...)

Neural Information Processing Systems

Genre: Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition

Neural Information Processing SystemsMar-17-2026, 07:28:02 GMT

Offline handwriting recognition systems require cropped text line images for both training and recognition. On the one hand, the annotation of position and transcript at line level is costly to obtain. On the other hand, automatic line segmentation algorithms are prone to errors, compromising the subsequent recognition. In this paper, we propose a modification of the popular and efficient Multi-Dimensional Long Short-Term Memory Recurrent Neural Networks (MDLSTM-RNNs) to enable end-to-end processing of handwritten paragraphs. More particularly, we replace the collapse layer transforming the two-dimensional representation into a sequence of predictions by a recurrent version which can select one line at a time. In the proposed model, a neural network performs a kind of implicit line segmentation by computing attention weights on the image representation. The experiments on paragraphs of Rimes and IAM databases yield results that are competitive with those of networks trained at line level, and constitute a significant step towards end-to-end transcription of full documents.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

Filters

Collaborating Authors

paragraph

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

DSAS: AUniversal Plug-and-Play Framework for Attention Optimization in Multi-Document Question Answering

ADataset for Distilling Knowledge Priors from Literature for Therapeutic Design

PANORAMA: ADataset and Benchmarks Capturing Decision Trails and Rationales in Patent Examination

fdba5e0a9b57fce03e89cc0cad0a24e9-Paper-Conference.pdf

e2cfb719f58585f779d0a4f9f07bd618-Supplemental-Datasets_and_Benchmarks.pdf

Appendix of Modeling

040ca38cefb1d9226d79c05dd25469cb-Supplemental.pdf

Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition

Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition

859b00aec8885efc83d1541b52a1220d-AuthorFeedback.pdf