Google asserts that the patent industry stands to benefit from AI and machine learning models like BERT, a natural language processing algorithm that attained state-of-the-art results when it was released in 2018. In a whitepaper published today, the tech giant outlines a methodology to train a BERT model on over 100 million patent publications from the U.S. and other countries using open-source tooling, which can then be used to determine the novelty of patents and generate classifications to assist with categorization. The global patent corpus is large, with millions of new patents issued every year. Patent applications average around 10,000 words and are meticulously wordsmithed by inventors, lawyers, and patent examiners. Patent filings are also written with language that can be unintelligible to lay readers and highly context-dependent; many terms are used to mean completely different things in different patents.
Nov-21-2020, 19:40:36 GMT