Goto

Collaborating Authors

 knowledge area


Dialogue Systems Engineering: A Survey and Future Directions

arXiv.org Artificial Intelligence

This paper proposes to refer to the field of software engineering related to the life cycle of dialogue systems as Dialogue Systems Engineering, and surveys this field while also discussing its future directions. With the advancement of large language models, the core technologies underlying dialogue systems have significantly progressed. As a result, dialogue system technology is now expected to be applied to solving various societal issues and in business contexts. To achieve this, it is important to build, operate, and continuously improve dialogue systems correctly and efficiently. Accordingly, in addition to applying existing software engineering knowledge, it is becoming increasingly important to evolve software engineering tailored specifically to dialogue systems. In this paper, we enumerate the knowledge areas of dialogue systems engineering based on those of software engineering, as defined in the Software Engineering Body of Knowledge (SWEBOK) Version 4.0, and survey each area. Based on this survey, we identify unexplored topics in each area and discuss the future direction of dialogue systems engineering.


On the Biased Assessment of Expert Finding Systems

arXiv.org Artificial Intelligence

In large organisations, identifying experts on a given topic is crucial in leveraging the internal knowledge spread across teams and departments. So-called enterprise expert retrieval systems automatically discover and structure employees' expertise based on the vast amount of heterogeneous data available about them and the work they perform. Evaluating these systems requires comprehensive ground truth expert annotations, which are hard to obtain. Therefore, the annotation process typically relies on automated recommendations of knowledge areas to validate. This case study provides an analysis of how these recommendations can impact the evaluation of expert finding systems. We demonstrate on a popular benchmark that system-validated annotations lead to overestimated performance of traditional term-based retrieval models and even invalidate comparisons with more recent neural methods. We also augment knowledge areas with synonyms to uncover a strong bias towards literal mentions of their constituent words. Finally, we propose constraints to the annotation process to prevent these biased evaluations, and show that this still allows annotation suggestions of high utility. These findings should inform benchmark creation or selection for expert finding, to guarantee meaningful comparison of methods.


Undergraduate Computer Science Curricula

Communications of the ACM

There can be many conflicting goals for the design of a computer science curriculum, including: immediate employability in industry, preparation for long-term success in an ever-changing discipline, and preparation for graduate (that is, post-graduate) study. Emphasis on immediate employability may lead to prioritizing current tools and techniques at the expense of foundational and theoretical skills as well as broader liberal-arts studies that are crucial for long-term career success and graduate work. The implications of these conflicting goals include allocation of finite resources (time, courses in the curriculum), unwillingness of students to invest in the mathematics that they see as irrelevant to their immediate career goals, and reluctance of faculty to have their courses be driven by a continually evolving marketplace of tools and APIs. For example, if we ask graduates of computer science programs to reflect on the impact of their undergraduate education, explicitly focusing on short- and long-term impact, will there be enough meaningful data to significantly inform curricular design? A recent survey of industry professionals undertaken by the ACM/IEEE-CS/AAAI 2023 Computer Science Curricular Task Force (CS2023)a points the way. This column presents one aspect of that survey--a focus on comparing short-term and long-term views--and calls for similar surveys of industry professionals to be conducted on an ongoing basis to refine our understanding of the role played by various elements of undergraduate computer science curricula in the success of graduates.


Semantic Similarity Measure of Natural Language Text through Machine Learning and a Keyword-Aware Cross-Encoder-Ranking Summarizer -- A Case Study Using UCGIS GIS&T Body of Knowledge

arXiv.org Artificial Intelligence

Initiated by the University Consortium of Geographic Information Science (UCGIS), GIS&T Body of Knowledge (BoK) is a community-driven endeavor to define, develop, and document geospatial topics related to geographic information science and technologies (GIS&T). In recent years, GIS&T BoK has undergone rigorous development in terms of its topic re-organization and content updating, resulting in a new digital version of the project. While the BoK topics provide useful materials for researchers and students to learn about GIS, the semantic relationships among the topics, such as semantic similarity, should also be identified so that a better and automated topic navigation can be achieved. Currently, the related topics are either defined manually by editors or authors, which may result in an incomplete assessment of topic relationship. To address this challenge, our research evaluates the effectiveness of multiple natural language processing (NLP) techniques in extracting semantics from text, including both deep neural networks and traditional machine learning approaches. Besides, a novel text summarization - KACERS (Keyword-Aware Cross-Encoder-Ranking Summarizer) - is proposed to generate a semantic summary of scientific publications. By identifying the semantic linkages among key topics, this work provides guidance for future development and content organization of the GIS&T BoK project. It also offers a new perspective on the use of machine learning techniques for analyzing scientific publications, and demonstrate the potential of KACERS summarizer in semantic understanding of long text documents.


DINGO: an ontology for projects and grants linked data

arXiv.org Artificial Intelligence

Services and resources built around Semantic Web, semantically-enabled applications and linked (open) data technologies have been increasingly impacting research and research-related activities in the last years. Development has been intense along several directions, for instance in "semantic publishing" [36], but also in the aspects directed toward the reproducibility and attribution of research and scholarly outputs, leading also to the interest in having Open Science Graphs interconnected at the global level [21]. All this has become more and more essential to research practices, also in light of the so-called reproducibility crisis affecting a number of research fields (see, for instance, the huge list of latest studies at https://reproduciblescience.org/2019). In fact, the demand of easily and automatically parsable, interoperable and processable data goes beyond the purely academic sphere. The research landscape comprises a vast number and type of activities, with multiple and diverse stakeholders, actors and with impact on several aspects and sectors of society.


Memozing - E-learning Network

#artificialintelligence

Learn faster, learn better, learn easier, and learn with more fun. Want to learn a new language, must learn something for school or university, or want to train yourself in a profession? Then you are in the right place. Our patented learning project helps you structure your knowledge area and promotes every knowledge area to be guaranteed in the long-term memory. Learn exactly what you want.