Collaborating Authors

Large expert-curated database for benchmarking document similarity detection in biomedical literature search


Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations.

Automatic Language Identification in Texts: A Survey

Journal of Artificial Intelligence Research

Language identification (“LI”) is the problem of determining the natural language that a document or part thereof is written in. Automatic LI has been extensively researched for over fifty years. Today, LI is a key part of many text processing pipelines, as text processing techniques generally assume that the language of the input text is known. Research in this area has recently been especially active. This article provides a brief history of LI research, and an extensive survey of the features and methods used in the LI literature. We describe the features and methods using a unified notation, to make the relationships between methods clearer. We discuss evaluation methods, applications of LI, as well as off-the-shelfLI systems that do not require training by the end user. Finally, we identify open issues, survey the work to date on each issue, and propose future directions for research in LI.

An Evolutionary Hierarchical Interval Type-2 Fuzzy Knowledge Representation System (EHIT2FKRS) for Travel Route Assignment Artificial Intelligence

Urban Traffic Networks are characterized by high dynamics of traffic flow and increased travel time, including waiting times. This leads to more complex road traffic management. The present research paper suggests an innovative advanced traffic management system based on Hierarchical Interval Type-2 Fuzzy Logic model optimized by the Particle Swarm Optimization (PSO) method. The aim of designing this system is to perform dynamic route assignment to relieve traffic congestion and limit the unexpected fluctuation effects on traffic flow. The suggested system is executed and simulated using SUMO, a well-known microscopic traffic simulator. For the present study, we have tested four large and heterogeneous metropolitan areas located in the cities of Sfax, Luxembourg, Bologna and Cologne. The experimental results proved the effectiveness of learning the Hierarchical Interval type-2 Fuzzy logic using real time particle swarm optimization technique PSO to accomplish multiobjective optimality regarding two criteria: number of vehicles that reach their destination and average travel time. The obtained results are encouraging, confirming the efficiency of the proposed system.

Knowledge Graphs Artificial Intelligence

In this paper we provide a comprehensive introduction to knowledge graphs, which have recently garnered significant attention from both industry and academia in scenarios that require exploiting diverse, dynamic, large-scale collections of data. After a general introduction, we motivate and contrast various graph-based data models and query languages that are used for knowledge graphs. We discuss the roles of schema, identity, and context in knowledge graphs. We explain how knowledge can be represented and extracted using a combination of deductive and inductive techniques. We summarise methods for the creation, enrichment, quality assessment, refinement, and publication of knowledge graphs. We provide an overview of prominent open knowledge graphs and enterprise knowledge graphs, their applications, and how they use the aforementioned techniques. We conclude with high-level future research directions for knowledge graphs.

A Modern Retrospective on Probabilistic Numerics Machine Learning

The field of probabilistic numerics (PN), loosely speaking, attempts to provide a statistical treatment of the errors and/or approximations that are made en route to the output of a deterministic numerical method, e.g. the approximation of an integral by quadrature, or the discretised solution of an ordinary or partial differential equation. This decade has seen a surge of activity in this field. In comparison with historical developments that can be traced back over more than a hundred years, the most recent developments are particularly interesting because they have been characterised by simultaneous input from multiple scientific disciplines: mathematics, statistics, machine learning, and computer science. The field has, therefore, advanced on a broad front, with contributions ranging from the building of overarching generaltheory to practical implementations in specific problems of interest. Over the same period of time, and because of increased interaction among researchers coming from different communities, the extent to which these developments were -- or were not -- presaged by twentieth-century researchers has also come to be better appreciated. Thus, the time appears to be ripe for an update of the 2014 Tübingen Manifesto on probabilistic numerics[Hennig, 2014, Osborne, 2014d,c,b,a] and the position paper[Hennig et al., 2015] to take account of the developments between 2014 and 2019, an improved awareness of the history of this field, and a clearer sense of its future directions. In this article, we aim to summarise some of the history of probabilistic perspectives on numerics (Section 2), to place more recent developments into context (Section 3), and to articulate a vision for future research in, and use of, probabilistic numerics (Section 4).