Goto

Collaborating Authors

Results


Text Mining with R

#artificialintelligence

This is the website for Text Mining with R! Visit the GitHub repository for this site, find the book at O'Reilly, or buy it on Amazon.


Unexpected Scientific Insights into COVID-19 From AI Machine Learning Tool

#artificialintelligence

A team of materials scientists at Lawrence Berkeley National Laboratory (Berkeley Lab) – scientists who normally spend their time researching things like high-performance materials for thermoelectrics or battery cathodes – have built a text-mining tool in record time to help the global scientific community synthesize the mountain of scientific literature on COVID-19 being generated every day. The tool, live at covidscholar.org, The hope is that the tool could eventually enable "automated science." "On Google and other search engines people search for what they think is relevant," said Berkeley Lab scientist Gerbrand Ceder, one of the project leads. "Our objective is to do information extraction so that people can find nonobvious information and relationships. That's the whole idea of machine learning and natural language processing that will be applied on these datasets."


Machine Learning Tool Could Provide Unexpected Scientific Insights into COVID-19

#artificialintelligence

Berkeley Lab researchers (clockwise from top left) Kristin Persson, John Dagdelen, Gerbrand Ceder, and Amalie Trewartha led development of COVIDScholar, a text-mining tool for COVID-19-related scientific literature. A team of materials scientists at Lawrence Berkeley National Laboratory (Berkeley Lab) – scientists who normally spend their time researching things like high-performance materials for thermoelectrics or battery cathodes – have built a text-mining tool in record time to help the global scientific community synthesize the mountain of scientific literature on COVID-19 being generated every day. The tool, live at covidscholar.org, The hope is that the tool could eventually enable "automated science." "On Google and other search engines people search for what they think is relevant," said Berkeley Lab scientist Gerbrand Ceder, one of the project leads.


Natural language processing for word sense disambiguation and information extraction

arXiv.org Artificial Intelligence

This research work deals with Natural Language Processing (NLP) and extraction of essential information in an explicit form. The most common among the information management strategies is Document Retrieval (DR) and Information Filtering. DR systems may work as combine harvesters, which bring back useful material from the vast fields of raw material. With large amount of potentially useful information in hand, an Information Extraction (IE) system can then transform the raw material by refining and reducing it to a germ of original text. A Document Retrieval system collects the relevant documents carrying the required information, from the repository of texts. An IE system then transforms them into information that is more readily digested and analyzed. It isolates relevant text fragments, extracts relevant information from the fragments, and then arranges together the targeted information in a coherent framework. The thesis presents a new approach for Word Sense Disambiguation using thesaurus. The illustrative examples supports the effectiveness of this approach for speedy and effective disambiguation. A Document Retrieval method, based on Fuzzy Logic has been described and its application is illustrated. A question-answering system describes the operation of information extraction from the retrieved text documents. The process of information extraction for answering a query is considerably simplified by using a Structured Description Language (SDL) which is based on cardinals of queries in the form of who, what, when, where and why. The thesis concludes with the presentation of a novel strategy based on Dempster-Shafer theory of evidential reasoning, for document retrieval and information extraction. This strategy permits relaxation of many limitations, which are inherent in Bayesian probabilistic approach.


LexNLP: Natural language processing and information extraction for legal and regulatory texts

#artificialintelligence

By accepting the Deed and closing the Transaction, Buyer, on behalf of itself and its successors and assigns, shall thereby release each of the Seller Parties from, and waive any and all Liabilities against each of the Seller Parties for, attributable to, or in connection with the Property, whether arising or accruing before, on or after the Closing and whether attributable to events or circumstances which arise or occur before, on or after the Closing, including, without limitation, the following: (a) any and all statements or opinions heretofore or hereafter made, or information furnished, by any Seller Parties to any Buyerâ s Representatives; and (b) any and all Liabilities with respect to the structural, physical, or environmental condition of the Property, including, without limitation, all Liabilities relating to the release, presence, discovery or removal of any hazardous or regulated substance, chemical, waste or material that may be located in, at, about or under the Property, or connected with or arising out of any and all claims or causes of action based upon CERCLA (Comprehensive Environmental Response, Compensation, and Liability Act of 1980, 42 U.S.C. Notwithstanding the foregoing, the foregoing release and waiver is not intended and shall not be construed as affecting or impairing any rights or remedies that Buyer may have against Seller with respect to (i) a breach of any of Sellerâ s Warranties, (ii) a breach of any Surviving Covenants, or (iii) any acts constituting fraud by Seller.


Unwanted Advances in Higher Education: Uncovering Sexual Harassment Experiences in Academia with Text Mining

arXiv.org Machine Learning

Sexual harassment in academia is often a hidden problem because victims are usually reluctant to report their experiences. Recently, a web survey was developed to provide an opportunity to share thousands of sexual harassment experiences in academia. Using an efficient approach, this study collected and investigated more than 2,000 sexual harassment experiences to better understand these unwanted advances in higher education. This paper utilized text mining to disclose hidden topics and explore their weight across three variables: harasser gender, institution type, and victim's field of study. We mapped the topics on five themes drawn from the sexual harassment literature and found that more than 50% of the topics were assigned to the unwanted sexual attention theme. Fourteen percent of the topics were in the gender harassment theme, in which insulting, sexist, or degrading comments or behavior was directed towards women. Five percent of the topics involved sexual coercion (a benefit is offered in exchange for sexual favors), 5% involved sex discrimination, and 7% of the topics discussed retaliation against the victim for reporting the harassment, or for simply not complying with the harasser. Findings highlight the power differential between faculty and students, and the toll on students when professors abuse their power. While some topics did differ based on type of institution, there were no differences between the topics based on gender of harasser or field of study. This research can be beneficial to researchers in further investigation of this paper's dataset, and to policymakers in improving existing policies to create a safe and supportive environment in academia.


A Survey on Temporal Reasoning for Temporal Information Extraction from Text

Journal of Artificial Intelligence Research

Time is deeply woven into how people perceive, and communicate about the world. Almost unconsciously, we provide our language utterances with temporal cues, like verb tenses, and we can hardly produce sentences without such cues. Extracting temporal cues from text, and constructing a global temporal view about the order of described events is a major challenge of automatic natural language understanding. Temporal reasoning, the process of combining different temporal cues into a coherent temporal view, plays a central role in temporal information extraction. This article presents a comprehensive survey of the research from the past decades on temporal reasoning for automatic temporal information extraction from text, providing a case study on how combining symbolic reasoning with machine learning-based information extraction systems can improve performance. It gives a clear overview of the used methodologies for temporal reasoning, and explains how temporal reasoning can be, and has been successfully integrated into temporal information extraction systems. Based on the distillation of existing work, this survey also suggests currently unexplored research areas. We argue that the level of temporal reasoning that current systems use is still incomplete for the full task of temporal information extraction, and that a deeper understanding of how the various types of temporal information can be integrated into temporal reasoning is required to drive future research in this area.


Event extraction based on open information extraction and ontology

arXiv.org Artificial Intelligence

The work presented in this master thesis consists of extracting a set of events from texts written in natural language. For this purpose, we have based ourselves on the basic notions of the information extraction as well as the open information extraction. First, we applied an open information extraction(OIE) system for the relationship extraction, to highlight the importance of OIEs in event extraction, and we used the ontology to the event modeling. We tested the results of our approach with test metrics. As a result, the two-level event extraction approach has shown good performance results but requires a lot of expert intervention in the construction of classifiers and this will take time. In this context we have proposed an approach that reduces the expert intervention in the relation extraction, the recognition of entities and the reasoning which are automatic and based on techniques of adaptation and correspondence. Finally, to prove the relevance of the extracted results, we conducted a set of experiments using different test metrics as well as a comparative study.


The New Legal Landscape for Text Mining and Machine Learning by Matthew Sag :: SSRN

#artificialintelligence

Individually and collectively, copyrighted works have the potential to generate information that goes far beyond what their individual authors expressed or intended. Various methods of computational and statistical analysis of text--usually referred to as text data mining ("TDM") or just text mining--can unlock that information. However, because almost every use of TDM involves making copies of the text to be mined, the legality of that copying has become a fraught issue in copyright law in United States and around the world. One of the most fundamental questions for copyright law in the Internet age is whether the protection of the author's original expression should stand as an obstacle to the generation of insights about that expression. How this question is answered will have a profound influence on the future of research across the sciences and the humanities, and for the development of the next generation of information technology: machine learning and artificial intelligence.


Text Mining Fedspeak · Len Kiefer

#artificialintelligence

Textmining is an exciting topic. There is tremendous potential to gain insights from textual analysis. See for example Gentzko, Kelly and Taddy's Text as Data. While text mining may be quite advanced in other fields, in finance and economics the application of these techniques is still in its infancy. In order to take advantage of text as data, economists and financial analysts need tools to help them.