Eisenstadt
Policies and Evaluation for Online Meeting Summarization
Schneider, Felix, Turchi, Marco, Waibel, Alex
With more and more meetings moving to a digital domain, meeting summarization has recently gained interest in both academic and commercial research. However, prior academic research focuses on meeting summarization as an offline task, performed after the meeting concludes. In this paper, we perform the first systematic study of online meeting summarization. For this purpose, we propose several policies for conducting online summarization. We discuss the unique challenges of this task compared to the offline setting and define novel metrics to evaluate latency and partial summary quality. The experiments on the AutoMin dataset show that 1) online models can produce strong summaries, 2) our metrics allow a detailed analysis of different systems' quality-latency trade-off, also taking into account intermediate outputs and 3) adaptive policies perform better than fixed scheduled ones. These findings provide a starting point for the wider research community to explore this important task.
Digital Phenotyping for Adolescent Mental Health: A Feasibility Study Employing Machine Learning to Predict Mental Health Risk From Active and Passive Smartphone Data
Kadirvelu, Balasundaram, Bel, Teresa Bellido, Freccero, Aglaia, Di Simplicio, Martina, Nicholls, Dasha, Faisal, A Aldo
Background: Adolescents are particularly vulnerable to mental disorders, with over 75% of cases manifesting before the age of 25. Research indicates that only 18 to 34% of young people experiencing high levels of depression or anxiety symptoms seek support. Digital tools leveraging smartphones offer scalable and early intervention opportunities. Objective: Using a novel machine learning framework, this study evaluated the feasibility of integrating active and passive smartphone data to predict mental disorders in non-clinical adolescents. Specifically, we investigated the utility of the Mindcraft app in predicting risks for internalising and externalising disorders, eating disorders, insomnia and suicidal ideation. Methods: Participants (N=103; mean age 16.1 years) were recruited from three London schools. Participants completed the Strengths and Difficulties Questionnaire, the Eating Disorders-15 Questionnaire, Sleep Condition Indicator Questionnaire and indicated the presence/absence of suicidal ideation. They used the Mindcraft app for 14 days, contributing active data via self-reports and passive data from smartphone sensors. A contrastive pretraining phase was applied to enhance user-specific feature stability, followed by supervised fine-tuning. The model evaluation employed leave-one-subject-out cross-validation using balanced accuracy as the primary metric. Results: The integration of active and passive data achieved superior performance compared to individual data sources, with mean balanced accuracies of 0.71 for SDQ-High risk, 0.67 for insomnia, 0.77 for suicidal ideation and 0.70 for eating disorders. The contrastive learning framework stabilised daily behavioural representations, enhancing predictive robustness. This study demonstrates the potential of integrating active and passive smartphone data with advanced machine-learning techniques for predicting mental health risks.
MCP-Solver: Integrating Language Models with Constraint Programming Systems
While Large Language Models (LLMs) perform exceptionally well at natural language tasks, they often struggle with precise formal reasoning and the rigorous specification of problems. We present MCP-Solver, a prototype implementation of the Model Context Protocol that demonstrates the potential for systematic integration between LLMs and constraint programming systems. Our implementation provides interfaces for the creation, editing, and validation of a constraint model. Through an item-based editing approach with integrated validation, the system ensures model consistency at every modification step and enables structured iterative refinement. The system handles concurrent solving sessions and maintains a persistent knowledge base of modeling insights. Initial experiments suggest that this integration can effectively combine LLMs' natural language understanding with constraint-solving capabilities. Our open-source implementation is proof of concept for integrating formal reasoning systems with LLMs through standardized protocols. While further research is needed to establish comprehensive formal guarantees, this work takes a first step toward principled integration of natural language processing with constraint-based reasoning.
Shaping AI's Impact on Billions of Lives
Cuéllar, Mariano-Florentino, Dean, Jeff, Doshi-Velez, Finale, Hennessy, John, Konwinski, Andy, Koyejo, Sanmi, Moiloa, Pelonomi, Pierson, Emma, Patterson, David
Artificial Intelligence (AI), like any transformative technology, has the potential to be a double-edged sword, leading either toward significant advancements or detrimental outcomes for society as a whole. As is often the case when it comes to widely-used technologies in market economies (e.g., cars and semiconductor chips), commercial interest tends to be the predominant guiding factor. The AI community is at risk of becoming polarized to either take a laissez-faire attitude toward AI development, or to call for government overregulation. Between these two poles we argue for the community of AI practitioners to consciously and proactively work for the common good. This paper offers a blueprint for a new type of innovation infrastructure including 18 concrete milestones to guide AI research in that direction. Our view is that we are still in the early days of practical AI, and focused efforts by practitioners, policymakers, and other stakeholders can still maximize the upsides of AI and minimize its downsides. We talked to luminaries such as recent Nobelist John Jumper on science, President Barack Obama on governance, former UN Ambassador and former National Security Advisor Susan Rice on security, philanthropist Eric Schmidt on several topics, and science fiction novelist Neal Stephenson on entertainment. This ongoing dialogue and collaborative effort has produced a comprehensive, realistic view of what the actual impact of AI could be, from a diverse assembly of thinkers with deep understanding of this technology and these domains. From these exchanges, five recurring guidelines emerged, which form the cornerstone of a framework for beginning to harness AI in service of the public good. They not only guide our efforts in discovery but also shape our approach to deploying this transformative technology responsibly and ethically.
Machines of Meaning
One goal of Artificial Intelligence is to learn meaningful representations for natural language expressions, but what this entails is not always clear. A variety of new linguistic behaviours present themselves embodied as computers, enhanced humans, and collectives with various kinds of integration and communication. But to measure and understand the behaviours generated by such systems, we must clarify the language we use to talk about them. Computational models are often confused with the phenomena they try to model and shallow metaphors are used as justifications for (or to hype) the success of computational techniques on many tasks related to natural language; thus implying their progress toward human-level machine intelligence without ever clarifying what that means. This paper discusses the challenges in the specification of "machines of meaning", machines capable of acquiring meaningful semantics from natural language in order to achieve their goals. We characterize "meaning" in a computational setting, while highlighting the need for detachment from anthropocentrism in the study of the behaviour of machines of meaning. The pressing need to analyse AI risks and ethics requires a proper measurement of its capabilities which cannot be productively studied and explained while using ambiguous language. We propose a view of "meaning" to facilitate the discourse around approaches such as neural language models and help broaden the research perspectives for technology that facilitates dialogues between humans and machines.
Is it me, or is A larger than B: Uncovering the determinants of relational cognitive dissonance resolution
Barak, Tomer, Loewenstein, Yonatan
This study explores the computational mechanisms underlying the resolution of cognitive dissonances. We focus on scenarios in which an observation violates the expected relationship between objects. For instance, an agent expects object A to be smaller than B in some feature space but observes the opposite. One solution is to adjust the expected relationship according to the new observation and change the expectation to A being larger than B. An alternative solution would be to adapt the representation of A and B in the feature space such that in the new representation, the relationship that A is smaller than B is maintained. While both pathways resolve the dissonance, they generalize differently to different tasks. Using Artificial Neural Networks (ANNs) capable of relational learning, we demonstrate the existence of these two pathways and show that the chosen pathway depends on the dissonance magnitude. Large dissonances alter the representation of the objects, while small dissonances lead to adjustments in the expected relationships. We show that this effect arises from the inherently different learning dynamics of relationships and representations and study the implications.
Prompting and Fine-Tuning of Small LLMs for Length-Controllable Telephone Call Summarization
Thulke, David, Gao, Yingbo, Jalota, Rricha, Dugast, Christian, Ney, Hermann
This paper explores the rapid development of a telephone call summarization system utilizing large language models (LLMs). Our approach involves initial experiments with prompting existing LLMs to generate summaries of telephone conversations, followed by the creation of a tailored synthetic training dataset utilizing stronger frontier models. We place special focus on the diversity of the generated data and on the ability to control the length of the generated summaries to meet various use-case specific requirements. The effectiveness of our method is evaluated using two state-of-the-art LLM-as-a-judge-based evaluation techniques to ensure the quality and relevance of the summaries. Our results show that fine-tuned Llama-2-7B-based summarization model performs on-par with GPT-4 in terms of factual accuracy, completeness and conciseness. Our findings demonstrate the potential for quickly bootstrapping a practical and efficient call summarization system.
On the Biased Assessment of Expert Finding Systems
Decorte, Jens-Joris, Van Hautte, Jeroen, Develder, Chris, Demeester, Thomas
In large organisations, identifying experts on a given topic is crucial in leveraging the internal knowledge spread across teams and departments. So-called enterprise expert retrieval systems automatically discover and structure employees' expertise based on the vast amount of heterogeneous data available about them and the work they perform. Evaluating these systems requires comprehensive ground truth expert annotations, which are hard to obtain. Therefore, the annotation process typically relies on automated recommendations of knowledge areas to validate. This case study provides an analysis of how these recommendations can impact the evaluation of expert finding systems. We demonstrate on a popular benchmark that system-validated annotations lead to overestimated performance of traditional term-based retrieval models and even invalidate comparisons with more recent neural methods. We also augment knowledge areas with synonyms to uncover a strong bias towards literal mentions of their constituent words. Finally, we propose constraints to the annotation process to prevent these biased evaluations, and show that this still allows annotation suggestions of high utility. These findings should inform benchmark creation or selection for expert finding, to guarantee meaningful comparison of methods.
What's Wrong? Refining Meeting Summaries with LLM Feedback
Kirstein, Frederic, Ruas, Terry, Gipp, Bela
Meeting summarization has become a critical task since digital encounters have become a common practice. Large language models (LLMs) show great potential in summarization, offering enhanced coherence and context understanding compared to traditional methods. However, they still struggle to maintain relevance and avoid hallucination. We introduce a multi-LLM correction approach for meeting summarization using a two-phase process that mimics the human review process: mistake identification and summary refinement. We release QMSum Mistake, a dataset of 200 automatically generated meeting summaries annotated by humans on nine error types, including structural, omission, and irrelevance errors. Our experiments show that these errors can be identified with high accuracy by an LLM. We transform identified mistakes into actionable feedback to improve the quality of a given summary measured by relevance, informativeness, conciseness, and coherence. This post-hoc refinement effectively improves summary quality by leveraging multiple LLMs to validate output quality. Our multi-LLM approach for meeting summarization shows potential for similar complex text generation tasks requiring robustness, action planning, and discussion towards a goal.
Revolutionising Role-Playing Games with ChatGPT
Stampfl, Rita, Geyer, Barbara, Deissl-O'Meara, Marie, Ivkić, Igor
Digitalisation in education and its influence on teaching methods is the focus of this study, which examines the use of ChatGPT in a role-playing game used in the Cloud Computing Engineering Master's programme at the University of Applied Sciences Burgenland. The aim of the study was to analyse the impact of AI-based simulations on students' learning experience. Based on Vygotsky's sociocultural theory, ChatGPT was used to give students a deeper understanding of strategic decision-making processes in simulated business scenarios. The methodological approach included role-playing and qualitative content analysis of 20 student reflections. The findings suggest that ChatGPT enhances students' engagement, critical thinking, and communication skills, in addition to contributing to the effective application of theoretical knowledge. Furthermore, simulations can contribute to the effective application of theoretical knowledge. The results underscore the significance of adaptive teaching approaches in promoting digital literacy and equipping learners for the digital workplace. The integration of AI into curricula and the need for ongoing innovation in higher education are also emphasised as a means of guaranteeing excellent, future-focused instruction. The findings highlight the potential of AI and ChatGPT in particular, as an innovative cutting-edge educational tool that can both enhance the learning experience and help achieve the Sustainable Development Goals (SDGs) through education.