Goto

Collaborating Authors

 patent


Wegovy maker sues rival over 'knock-off' weight-loss drugs

BBC News

The maker of Ozempic and Wegovy is suing a rival firm for selling what it says are unsafe, knock-off versions of its weight-loss drugs in the US. Danish company Novo Nordisk asked US courts on Monday to ban Hims & Hers' range of weight-loss pills and injections, which it says are not approved by US authorities and infringe on its patent. The legal drama began on Friday after Hims & Hers launched a new weight-loss pill, leading to an initial threat from Novo Nordisk. Over the weekend, Hims & Hers said it would stop selling the pill. On Monday, its share price slumped as it called Novo Nordisk's decision to press ahead with the lawsuit a blatant attack.


PINE: Pipeline for Important Node Exploration in Attributed Networks

Kovtun, Elizaveta, Makarenko, Maksim, Semenova, Natalia, Zaytsev, Alexey, Budennyy, Semen

arXiv.org Artificial Intelligence

A graph with semantically attributed nodes are a common data structure in a wide range of domains. It could be interlinked web data or citation networks of scientific publications. The essential problem for such a data type is to determine nodes that carry greater importance than all the others, a task that markedly enhances system monitoring and management. Traditional methods to identify important nodes in networks introduce centrality measures, such as node degree or more complex PageRank. However, they consider only the network structure, neglecting the rich node attributes. Recent methods adopt neural networks capable of handling node features, but they require supervision. This work addresses the identified gap--the absence of approaches that are both unsupervised and attribute-aware--by introducing a Pipeline for Important Node Exploration (PINE). At the core of the proposed framework is an attention-based graph model that incorporates node semantic features in the learning process of identifying the structural graph properties. The PINE's node importance scores leverage the obtained attention distribution. We demonstrate the superior performance of the proposed PINE method on various homogeneous and heterogeneous attributed networks. As an industry-implemented system, PINE tackles the real-world challenge of unsupervised identification of key entities within large-scale enterprise graphs.


A Viral Chinese Wristband Claims to Zap You Awake. The Public Says 'No Thanks'

WIRED

The Public Says'No Thanks' The maker of the eCoffee Energyband says it electrically stimulates your nerves to keep you alert. Researchers are skeptical, and critics see it as a way for China's bosses to keep workers productive. Forget coffee, you can now stay alert by strapping on a wristband that lightly zaps you awake. That's what eCoffee Energyband, a Chinese gadget that sells for just over $100, is claiming to do. First released in late 2023, the product is a lightweight wearable with two electrode pads that sit against the inner wrist.


IMPACT: A Large-scale Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents

Neural Information Processing Systems

Our dataset includes half a million design patents comprising 3.61 million figures along with captions from patents granted by the United States Patent and Trademark Office (USPTO) over a 16-year period from 2007 to 2022. We incorporate the metadata of each patent application with elaborate captions that are coherent with multiple viewpoints of designs.


Patent Language Model Pretraining with ModernBERT

Yousefiramandi, Amirhossein, Cooney, Ciaran

arXiv.org Artificial Intelligence

Transformer-based language models such as BERT have become foundational in NLP, yet their performance degrades in specialized domains like patents, which contain long, technical, and legally structured text. Prior approaches to patent NLP have primarily relied on fine-tuning general-purpose models or domain-adapted variants pretrained with limited data. In this work, we pretrain 3 domain-specific masked language models for patents, using the ModernBERT architecture and a curated corpus of over 60 million patent records. Our approach incorporates architectural optimizations, including FlashAttention, rotary embeddings, and GLU feed-forward layers. We evaluate our models on four downstream patent classification tasks. Our model, ModernBERT-base-PT, consistently outperforms the general-purpose ModernBERT baseline on three out of four datasets and achieves competitive performance with a baseline PatentBERT. Additional experiments with ModernBERT-base-VX and Mosaic-BERT-large demonstrate that scaling the model size and customizing the tokenizer further enhance performance on selected tasks. Notably, all ModernBERT variants retain substantially faster inference over - 3x that of PatentBERT - underscoring their suitability for time-sensitive applications. These results underscore the benefits of domain-specific pretraining and architectural improvements for patent-focused NLP tasks.


NoLBERT: A No Lookahead(back) Foundational Language Model

Kakhbod, Ali, Li, Peiyao

arXiv.org Artificial Intelligence

We present NoLBERT, a lightweight, timestamped foundational language model for empirical research -- particularly for forecasting in economics, finance, and the social sciences. By pretraining exclusively on text from 1976 to 1995, NoLBERT avoids both lookback and lookahead biases (information leakage) that can undermine econometric inference. It exceeds domain-specific baselines on NLP benchmarks while maintaining temporal consistency. Applied to patent texts, NoLBERT enables the construction of firm-level innovation networks and shows that gains in innovation centrality predict higher long-run profit growth.


Patent Representation Learning via Self-supervision

Zuo, You, Gerdes, Kim, de La Clergerie, Eric Villemonte, Sagot, Benoît

arXiv.org Artificial Intelligence

This paper presents a simple yet effective contrastive learning framework for learning patent embeddings by leveraging multiple views from within the same document. We first identify a patent-specific failure mode of SimCSE style dropout augmentation: it produces overly uniform embeddings that lose semantic cohesion. To remedy this, we propose section-based augmentation, where different sections of a patent (e.g., abstract, claims, background) serve as complementary views. This design introduces natural semantic and structural diversity, mitigating over-dispersion and yielding embeddings that better preserve both global structure and local continuity. On large-scale benchmarks, our fully self-supervised method matches or surpasses citation-and IPC-supervised baselines in prior-art retrieval and classification, while avoiding reliance on brittle or incomplete annotations. Our analysis further shows that different sections specialize for different tasks-claims and summaries benefit retrieval, while background sections aid classification-highlighting the value of patents' inherent discourse structure for representation learning. These results highlight the value of exploiting intra-document views for scalable and generalizable patent understanding.


Chain of Retrieval: Multi-Aspect Iterative Search Expansion and Post-Order Search Aggregation for Full Paper Retrieval

Park, Sangwoo, Baek, Jinheon, Jeong, Soyeong, Hwang, Sung Ju

arXiv.org Artificial Intelligence

Scientific paper retrieval, particularly framed as document-to-document retrieval, aims to identify relevant papers in response to a long-form query paper, rather than a short query string. Previous approaches to this task have focused exclusively on abstracts, embedding them into dense vectors as surrogates for full documents and calculating similarity between them. Yet, abstracts offer only sparse and high-level summaries, and such methods primarily optimize one-to-one similarity, overlooking the dynamic relations that emerge among relevant papers during the retrieval process. To address this, we propose Chain of Retrieval(COR), a novel iterative framework for full-paper retrieval. Specifically, CoR decomposes each query paper into multiple aspect-specific views, matches them against segmented candidate papers, and iteratively expands the search by promoting top-ranked results as new queries, thereby forming a tree-structured retrieval process. The resulting retrieval tree is then aggregated in a post-order manner: descendants are first combined at the query level, then recursively merged with their parent nodes, to capture hierarchical relations across iterations. To validate this, we present SCIFULLBENCH, a large-scale benchmark providing both complete and segmented contexts of full papers for queries and candidates, and results show that CoR significantly outperforms existing retrieval baselines. Our code and dataset is available at https://github.com/psw0021/Chain-of-Retrieval.git.


Doc2SAR: A Synergistic Framework for High-Fidelity Extraction of Structure-Activity Relationships from Scientific Documents

Zhuang, Jiaxi, Li, Kangning, Hou, Jue, Xu, Mingjun, Gao, Zhifeng, Cai, Hengxing

arXiv.org Artificial Intelligence

Extracting molecular structure-activity relationships (SARs) from scientific literature and patents is essential for drug discovery and materials research. However, this task remains challenging due to heterogeneous document formats and limitations of existing methods. Specifically, rule-based approaches relying on rigid templates fail to generalize across diverse document layouts, while general-purpose multimodal large language models (MLLMs) lack sufficient accuracy and reliability for specialized tasks, such as layout detection and optical chemical structure recognition (OCSR). To address these challenges, we introduce DocSAR-200, a rigorously annotated benchmark of 200 scientific documents designed specifically for evaluating SAR extraction methods. Additionally, we propose Doc2SAR, a novel synergistic framework that integrates domain-specific tools with MLLMs enhanced via supervised fine-tuning (SFT). Extensive experiments demonstrate that Doc2SAR achieves state-of-the-art performance across various document types, significantly outperforming leading end-to-end baselines. Specifically, Doc2SAR attains an overall Table Recall of 80.78% on DocSAR-200, exceeding end2end GPT -4o by 51.48%. Furthermore, Doc2SAR demonstrates practical usability through efficient inference and is accompanied by a web app. The code and data are provided in the supplementary materials.


Patentformer: A demonstration of AI-assisted automated patent drafting

Mudhiganti, Sai Krishna Reddy, Wang, Juanyan, Yang, Ruo, Sharma, Manali

arXiv.org Artificial Intelligence

Patent drafting presents significant challenges due to its reliance on the extensive experience and specialized expertise of patent attorneys, who must possess both legal acumen and technical understanding of an invention to craft patent applications in a formal legal writing style. This paper presents a demonstration of Patentformer, an AI-powered automated patent drafting platform designed to support patent attorneys by rapidly producing high-quality patent applications adhering to legal writing standards.