Goto

Collaborating Authors

 academic literature


bnRep: A repository of Bayesian networks from the academic literature

arXiv.org Artificial Intelligence

Bayesian networks (BNs) are widely used for modeling complex systems with uncertainty, yet repositories of pre-built BNs remain limited. This paper introduces bnRep, an open-source R package offering a comprehensive collection of documented BNs, facilitating benchmarking, replicability, and education. With over 200 networks from academic publications, bnRep integrates seamlessly with bnlearn and other R packages, providing users with interactive tools for network exploration.


AceParse: A Comprehensive Dataset with Diverse Structured Texts for Academic Literature Parsing

arXiv.org Artificial Intelligence

With the development of data-centric AI, the focus has shifted from model-driven approaches to improving data quality. Academic literature, as one of the crucial types, is predominantly stored in PDF formats and needs to be parsed into texts before further processing. However, parsing diverse structured texts in academic literature remains challenging due to the lack of datasets that cover various text structures. In this paper, we introduce AceParse, the first comprehensive dataset designed to support the parsing of a wide range of structured texts, including formulas, tables, lists, algorithms, and sentences with embedded mathematical expressions. Based on AceParse, we fine-tuned a multimodal model, named AceParser, which accurately parses various structured texts within academic literature. This model outperforms the previous state-of-the-art by 4.1% in terms of F1 score and by 5% in Jaccard Similarity, demonstrating the potential of multimodal models in academic literature parsing. Our dataset is available at https://github.com/JHW5981/AceParse.


Large Language Models and Thematic Analysis: Human-AI Synergy in Researching Hate Speech on Social Media

arXiv.org Artificial Intelligence

In the dynamic field of artificial intelligence (AI), the development and application of Large Language Models (LLMs) for text analysis are of significant academic interest. Despite the promising capabilities of various LLMs in conducting qualitative analysis, their use in the humanities and social sciences has not been thoroughly examined. This article contributes to the emerging literature on LLMs in qualitative analysis by documenting an experimental study involving GPT-4. The study focuses on performing thematic analysis (TA) using a YouTube dataset derived from an EU-funded project, which was previously analyzed by other researchers. This dataset is about the representation of Roma migrants in Sweden during 2016, a period marked by the aftermath of the 2015 refugee crisis and preceding the Swedish national elections in 2017. Our study seeks to understand the potential of combining human intelligence with AI's scalability and efficiency, examining the advantages and limitations of employing LLMs in qualitative research within the humanities and social sciences. Additionally, we discuss future directions for applying LLMs in these fields.


SsciBERT: A Pre-trained Language Model for Social Science Texts

arXiv.org Artificial Intelligence

With its large-scale growth, the ways to quickly find existing research on relevant issues have become an urgent demand for researchers. Previous studies, such as SciBERT, have shown that pre-training using domain-specific texts can improve the performance of natural language processing tasks. However, the pre-trained language model for social sciences is not available so far. In light of this, the present research proposes a pre-trained model based on the abstracts published in the Social Science Citation Index (SSCI) journals.


A Review on Method Entities in the Academic Literature: Extraction, Evaluation, and Application

arXiv.org Artificial Intelligence

In scientific research, the method is an indispensable means to solve scientific problems and a critical research object. With the advancement of sciences, many scientific methods are being proposed, modified, and used in academic literature. The authors describe details of the method in the abstract and body text, and key entities in academic literature reflecting names of the method are called method entities. Exploring diverse method entities in a tremendous amount of academic literature helps scholars understand existing methods, select the appropriate method for research tasks, and propose new methods. Furthermore, the evolution of method entities can reveal the development of a discipline and facilitate knowledge discovery. Therefore, this article offers a systematic review of methodological and empirical works focusing on extracting method entities from full-text academic literature and efforts to build knowledge services using these extracted method entities. Definitions of key concepts involved in this review were first proposed. Based on these definitions, we systematically reviewed the approaches and indicators to extract and evaluate method entities, with a strong focus on the pros and cons of each approach. We also surveyed how extracted method entities are used to build new applications. Finally, limitations in existing works as well as potential next steps were discussed.


AI and IP: Building a Research Agenda โ€“ City Law Forum

#artificialintelligence

Artificial intelligence poses new questions for intellectual property (IP) law. Can machines be inventors for purposes of patent law? Is new legislation required to govern AI creativity? Courts, IP offices, and legislators in multiple jurisdictions are considering these questions. By now, there is a well-developed and comprehensive academic literature which analyses the interface between IP and AI. And while there will always be room for further analysis of such questions as technology progresses, there is diminishing marginal returns to such inquiries at this point in time.


A Natural Language Processing Pipeline for Detecting Informal Data References in Academic Literature

arXiv.org Artificial Intelligence

Discovering authoritative links between publications and the datasets that they use can be a labor-intensive process. We introduce a natural language processing pipeline that retrieves and reviews publications for informal references to research datasets, which complements the work of data librarians. We first describe the components of the pipeline and then apply it to expand an authoritative bibliography linking thousands of social science studies to the data-related publications in which they are used. The pipeline increases recall for literature to review for inclusion in data-related collections of publications and makes it possible to detect informal data references at scale. We contribute (1) a novel Named Entity Recognition (NER) model that reliably detects informal data references and (2) a dataset connecting items from social science literature with datasets they reference. Together, these contributions enable future work on data reference, data citation networks, and data reuse.


Exploring the future of AI in the education sector EdExec

#artificialintelligence

Despite detailed analysis being conducted around the benefits of artificial intelligence (AI) in various industries, its effect on education has been relatively unexplored. Global innovation foundation Nesta has begun a research project to explore the future of AI in education and found a relatively modest โ€“ but fast-growing โ€“ bank of academic literature focusing on the topic. As the literature on AI in education grows, however, they also expect to see its scope widening. Early academic literature was, typically, focused on how AI could be used to solve'Bloom's 2-Sigma Problem' and replicate the'gold standard' of education: one-to-one tutoring. However, academics, researchers and technologists are now describing experiments where AI is focused on whole range of different elements โ€“ from enabling collaboration between peers to assessing complicated skills, like creativity.


Deep Learning Application in Security and Privacy -- Theory and Practice: A Position Paper

arXiv.org Artificial Intelligence

Technology is shaping our lives in a multitude of ways. This is fuelled by a technology infrastructure, both legacy and state of the art, composed of a heterogeneous group of hardware, software, services and organisations. Such infrastructure faces a diverse range of challenges to its operations that include security, privacy, resilience, and quality of services. Among these, cybersecurity and privacy are taking the centre-stage, especially since the General Data Protection Regulation (GDPR) came into effect. Traditional security and privacy techniques are overstretched and adversarial actors have evolved to design exploitation techniques that circumvent protection. With the ever-increasing complexity of technology infrastructure, security and privacy-preservation specialists have started to look for adaptable and flexible protection methods that can evolve (potentially autonomously) as the adversarial actor changes its techniques. For this, Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) were put forward as saviours. In this paper, we look at the promises of AI, ML, and DL stated in academic and industrial literature and evaluate how realistic they are. We also put forward potential challenges a DL based security and privacy protection technique has to overcome. Finally, we conclude the paper with a discussion on what steps the DL and the security and privacy-preservation community have to take to ensure that DL is not just going to be hype, but an opportunity to build a secure, reliable, and trusted technology infrastructure on which we can rely on for so much in our lives.


Machine learning is way easier than it looks Inside Intercom

#artificialintelligence

It's easy to believe that machine learning is hard. An arcane craft known only to a select few academics. After all, you're teaching machines that work in ones and zeros to reach their own conclusions about the world. You're teaching them how to think! However, it's not nearly as hard as the complex and formula-laden literature would have you believe.