AITopics | Färber, Michael

Collaborating Authors

Färber, Michael

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SemOpenAlex: The Scientific Landscape in 26 Billion RDF Triples

Färber, Michael, Lamprecht, David, Krause, Johan, Aung, Linn, Haase, Peter

arXiv.org Artificial IntelligenceAug-7-2023

We present SemOpenAlex, an extensive RDF knowledge graph that contains over 26 billion triples about scientific publications and their associated entities, such as authors, institutions, journals, and concepts. SemOpenAlex is licensed under CC0, providing free and open access to the data. We offer the data through multiple channels, including RDF dump files, a SPARQL endpoint, and as a data source in the Linked Open Data cloud, complete with resolvable URIs and links to other data sources. Moreover, we provide embeddings for knowledge graph entities using high-performance computing. SemOpenAlex enables a broad range of use-case scenarios, such as exploratory semantic search via our website, large-scale scientific impact quantification, and other forms of scholarly big data analytics within and across scientific disciplines. Additionally, it enables academic recommender systems, such as recommending collaborators, publications, and venues, including explainability capabilities. Finally, SemOpenAlex can serve for RDF query optimization benchmarks, creating scholarly knowledge-guided language models, and as a hub for semantic scientific publishing.

artificial intelligence, semopenalex, xsd, (14 more...)

arXiv.org Artificial Intelligence

2308.03671

Country:

Europe > Germany > Baden-Württemberg (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Add feedback

Evaluating Generative Models for Graph-to-Text Generation

Yuan, Shuzhou, Färber, Michael

arXiv.org Artificial IntelligenceJul-27-2023

Large language models (LLMs) have been widely employed for graph-to-text generation tasks. However, the process of finetuning LLMs requires significant training resources and annotation work. In this paper, we explore the capability of generative models to generate descriptive text from graph data in a zero-shot setting. Specifically, we evaluate GPT-3 and ChatGPT on two graph-to-text datasets and compare their performance with that of finetuned LLM models such as T5 and BART. Our results demonstrate that generative models are capable of generating fluent and coherent text, achieving BLEU scores of 10.57 and 11.08 for the AGENDA and WebNLG datasets, respectively. However, our error analysis reveals that generative models still struggle with understanding the semantic relations between entities, and they also tend to generate text with hallucinations or irrelevant information. As a part of error analysis, we utilize BERT to detect machine-generated text and achieve high macro-F1 scores. We have made the text generated by generative models publicly available.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2307.14712

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.15)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)

Add feedback

Impact, Attention, Influence: Early Assessment of Autonomous Driving Datasets

Bogdoll, Daniel, Hendl, Jonas, Schreyer, Felix, Gowda, Nishanth, Färber, Michael, Zöllner, J. Marius

arXiv.org Artificial IntelligenceMar-31-2023

Autonomous Driving (AD), the area of robotics with the greatest potential impact on society, has gained a lot of momentum in the last decade. As a result of this, the number of datasets in AD has increased rapidly. Creators and users of datasets can benefit from a better understanding of developments in the field. While scientometric analysis has been conducted in other fields, it rarely revolves around datasets. Thus, the impact, attention, and influence of datasets on autonomous driving remains a rarely investigated field. In this work, we provide a scientometric analysis for over 200 datasets in AD. We perform a rigorous evaluation of relations between available metadata and citation counts based on linear regression. Subsequently, we propose an Influence Score to assess a dataset already early on without the need for a track-record of citations, which is only available with a certain delay.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICCRE57112.2023.10155607

2301.022

Country: Europe > Germany (0.46)

Genre: Research Report (1.00)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.92)
Information Technology > Robotics & Automation (0.82)

Technology:

Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

unarXive 2022: All arXiv Publications Pre-Processed for NLP, Including Structured Full-Text and Citation Network

Saier, Tarek, Krause, Johan, Färber, Michael

arXiv.org Artificial IntelligenceMar-27-2023

Large-scale data sets on scholarly publications are the basis for a variety of bibliometric analyses and natural language processing (NLP) applications. Especially data sets derived from publication's full-text have recently gained attention. While several such data sets already exist, we see key shortcomings in terms of their domain and time coverage, citation network completeness, and representation of full-text content. To address these points, we propose a new version of the data set unarXive. We base our data processing pipeline and output format on two existing data sets, and improve on each of them. Our resulting data set comprises 1.9 M publications spanning multiple disciplines and 32 years. It furthermore has a more complete citation network than its predecessors and retains a richer representation of document structure as well as non-textual publication content such as mathematical notation. In addition to the data set, we provide ready-to-use training/test data for citation recommendation and IMRaD classification. All data and source code is publicly available at https://github.com/IllDepence/unarXive.

artificial intelligence, natural language, structured full-text and citation network, (2 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/JCDL57899.2023.00020

2303.14957

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Biases in Scholarly Recommender Systems: Impact, Prevalence, and Mitigation

Färber, Michael, Coutinho, Melissa, Yuan, Shuzhou

arXiv.org Artificial IntelligenceFeb-13-2023

With the remarkable increase in the number of scientific entities such as publications, researchers, and scientific topics, and the associated information overload in science, academic recommender systems have become increasingly important for millions of researchers and science enthusiasts. However, it is often overlooked that these systems are subject to various biases. In this article, we first break down the biases of academic recommender systems and characterize them according to their impact and prevalence. In doing so, we distinguish between biases originally caused by humans and biases induced by the recommender system. Second, we provide an overview of methods that have been used to mitigate these biases in the scholarly domain. Based on this, third, we present a framework that can be used by researchers and developers to mitigate biases in scholarly recommender systems and to evaluate recommender systems fairly. Finally, we discuss open challenges and possible research directions related to scholarly biases.

artificial intelligence, machine learning, recommender system, (14 more...)

arXiv.org Artificial Intelligence

2301.07483

Country:

Europe (1.00)
North America > United States > New York > New York County > New York City (0.16)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine (1.00)
Education (1.00)
Information Technology > Services (0.46)
Media > News (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Add feedback

Predicting Companies' ESG Ratings from News Articles Using Multivariate Timeseries Analysis

Aue, Tanja, Jatowt, Adam, Färber, Michael

arXiv.org Artificial IntelligenceNov-13-2022

Environmental, social and governance (ESG) engagement of companies moved into the focus of public attention over recent years. With the requirements of compulsory reporting being implemented and investors incorporating sustainability in their investment decisions, the demand for transparent and reliable ESG ratings is increasing. However, automatic approaches for forecasting ESG ratings have been quite scarce despite the increasing importance of the topic. In this paper, we build a model to predict ESG ratings from news articles using the combination of multivariate timeseries construction and deep learning techniques. A news dataset for about 3,000 US companies together with their ratings is also created and released for training. Through the experimental evaluation we find out that our approach provides accurate results outperforming the state-of-the-art, and can be used in practice to support a manual determination or analysis of ESG ratings.

artificial intelligence, community relations, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2212.11765

Country:

Europe (0.93)
North America > United States (0.48)

Genre:

Research Report (0.82)
Workflow (0.68)
Overview (0.66)
Public Relations > Community Relations (0.46)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Linked Crunchbase: A Linked Data API and RDF Data Set About Innovative Companies

Färber, Michael

arXiv.org Artificial IntelligenceJul-19-2019

Crunchbase is an online platform collecting information about startups and technology companies, including attributes and relations of companies, people, and investments. Data contained in Crunchbase is, to a large extent, not available elsewhere, making Crunchbase to a unique data source. In this paper, we present how to bring Crunchbase to the Web of Data so that its data can be used in the machine-readable RDF format by anyone on the Web. First, we give insights into how we developed and hosted a Linked Data API for Crunchbase and how sameAs links to other data sources are integrated. Then, we present our method for crawling RDF data based on this API to build a custom Crunchbase RDF knowledge graph. We created an RDF data set with over 347 million triples, including 781k people, 659k organizations, and 343k investments. Our Crunchbase Linked Data API is available online at http://linked-crunchbase.org.

crunchbase, data api and rdf data, innovative company

arXiv.org Artificial Intelligence

1907.08671

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Add feedback

Which Knowledge Graph Is Best for Me?

Färber, Michael, Rettinger, Achim

arXiv.org Artificial IntelligenceSep-28-2018

In recent years, DBpedia, Freebase, OpenCyc, Wikidata, and YAGO have been published as noteworthy large, cross-domain, and freely available knowledge graphs. Although extensively in use, these knowledge graphs are hard to compare against each other in a given setting. Thus, it is a challenge for researchers and developers to pick the best knowledge graph for their individual needs. In our recent survey, we devised and applied data quality criteria to the above-mentioned knowledge graphs. Furthermore, we proposed a framework for finding the most suitable knowledge graph for a given setting. With this paper we intend to ease the access to our in-depth survey by presenting simplified rules that map individual data quality requirements to specific knowledge graphs. However, this paper does not intend to replace our previously introduced decision-support framework. For an informed decision on which KG is best for you we still refer to our in-depth survey.

artificial intelligence, knowledge graph, survey article, (19 more...)

arXiv.org Artificial Intelligence

1809.11099

Country: Europe > Germany > Baden-Württemberg (0.15)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)

Add feedback

Monte Carlo Connection Prover

Färber, Michael, Kaliszyk, Cezary, Urban, Josef

arXiv.org Artificial IntelligenceNov-18-2016

Monte Carlo Tree Search (MCTS) is a technique to guide search in a large decision space by taking random samples and evaluating their outcome. In this work, we study MCTS methods in the context of the connection calculus and implement them on top of the leanCoP prover. This includes proposing useful proof-state evaluation heuristics that are learned from previous proofs, and proposing and automatically improving suitable MCTS strategies in this context. The system is trained and evaluated on a large suite of related problems coming from the Mizar proof assistant, showing that it is capable to find new and different proofs. To our knowledge, this is the first time MCTS has been applied to theorem proving.

artificial intelligence, planning & scheduling, proof search, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-319-63046-5_34

1611.0599

Country:

Oceania > Australia (0.14)
Oceania > Fiji (0.14)
Europe > Switzerland (0.14)
(2 more...)

Genre:

Research Report (0.64)
Instructional Material > Course Syllabus & Notes (0.47)

Industry: Leisure & Entertainment > Games > Go (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback