AITopics | Paraskevopoulos, Georgios

Collaborating Authors

Paraskevopoulos, Georgios

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BloomWise: Enhancing Problem-Solving capabilities of Large Language Models using Bloom's-Taxonomy-Inspired Prompts

Zoumpoulidi, Maria-Eleni, Paraskevopoulos, Georgios, Potamianos, Alexandros

arXiv.org Artificial IntelligenceOct-5-2024

Despite the continuous progress of Large Language Models (LLMs) across various tasks, their performance on mathematical problems and reasoning tasks remains limited. This limitation can be attributed, among other factors, to the inherent difficulty of these problems and the fact that solutions often consist of multiple steps, potentially of varying nature, making it challenging for a single prompting technique to execute all required steps. To address this, we introduce BloomWise, a new prompting technique, inspired by Bloom's Taxonomy, aiming to improve LLMs' performance in solving such problems by encouraging them to approach the problem starting from simple, i.e., remembering, and progressing to higher cognitive skills, i.e., analyzing, until the correct solution is reached. The decision regarding the need to employ more sophisticated cognitive skills is based on self-evaluation performed by the LLM. Thus, we encourage the LLM to deploy the appropriate cognitive processes. In extensive experiments across 4 popular math reasoning datasets, we have demonstrated the effectiveness of our proposed approach. We also present extensive ablations, analyzing the strengths of each module within our system.

bloom, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.04094

Country: Europe > Greece (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

The Greek podcast corpus: Competitive speech models for low-resourced languages with weakly supervised data

Paraskevopoulos, Georgios, Tsoukala, Chara, Katsamanis, Athanasios, Katsouros, Vassilis

arXiv.org Artificial IntelligenceJun-21-2024

The development of speech technologies for languages with limited digital representation poses significant challenges, primarily due to the scarcity of available data. This issue is exacerbated in the era of large, data-intensive models. Recent research has underscored the potential of leveraging weak supervision to augment the pool of available data. In this study, we compile an 800-hour corpus of Modern Greek from podcasts and employ Whisper large-v3 to generate silver transcriptions. This corpus is utilized to fine-tune our models, aiming to assess the efficacy of this approach in enhancing ASR performance. Our analysis spans 16 distinct podcast domains, alongside evaluations on established datasets for Modern Greek. The findings indicate consistent WER improvements, correlating with increases in both data volume and model size. Our study confirms that assembling large, weakly supervised corpora serves as a cost-effective strategy for advancing speech technologies in under-resourced languages.

corpus, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.15284

Country: Europe > Greece (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Mobile (0.96)
Information Technology > Artificial Intelligence > Natural Language (0.94)

Add feedback

Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens

San, Nay, Paraskevopoulos, Georgios, Arora, Aryaman, He, Xiluo, Kaur, Prabhjot, Adams, Oliver, Jurafsky, Dan

arXiv.org Artificial IntelligenceFeb-3-2024

While massively multilingual speech models like wav2vec 2.0 XLSR-128 can be directly fine-tuned for automatic speech recognition (ASR), downstream performance can still be relatively poor on languages that are under-represented in the pre-training data. Continued pre-training on 70-200 hours of untranscribed speech in these languages can help -- but what about languages without that much recorded data? For such cases, we show that supplementing the target language with data from a similar, higher-resource 'donor' language can help. For example, continued pre-training on only 10 hours of low-resource Punjabi supplemented with 60 hours of donor Hindi is almost as good as continued pretraining on 70 hours of Punjabi. By contrast, sourcing data from less similar donors like Bengali does not improve ASR performance. To inform donor language selection, we propose a novel similarity metric based on the sequence distribution of induced acoustic units: the Acoustic Token Distribution Similarity (ATDS). Across a set of typologically different target languages (Punjabi, Galician, Iban, Setswana), we show that the ATDS between the target language and its candidate donors precisely predicts target language ASR performance.

artificial intelligence, machine learning, speech recognition, (16 more...)

arXiv.org Artificial Intelligence

2402.02302

Country:

Europe (1.00)
Asia > India (1.00)
Africa (0.93)
North America (0.93)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Investigating Personalization Methods in Text to Music Generation

Plitsis, Manos, Kouzelis, Theodoros, Paraskevopoulos, Georgios, Katsouros, Vassilis, Panagakis, Yannis

arXiv.org Artificial IntelligenceSep-20-2023

In this work, we investigate the personalization of text-to-music diffusion models in a few-shot setting. Motivated by recent advances in the computer vision domain, we are the first to explore the combination of pre-trained text-to-audio diffusers with two established personalization methods. We experiment with the effect of audio-specific data augmentation on the overall system performance and assess different training strategies. For evaluation, we construct a novel dataset with prompts and music clips. We consider both embedding-based and music-specific metrics for quantitative evaluation, as well as a user study for qualitative evaluation. Our analysis shows that similarity metrics are in accordance with user preferences and that current personalization approaches tend to learn rhythmic music constructs more easily than melody. The code, dataset, and example material of this study are open to the research community.

artificial intelligence, machine learning, similarity, (15 more...)

arXiv.org Artificial Intelligence

2309.1114

Country:

Europe > Greece (0.15)
North America > United States (0.14)
Europe > Spain (0.14)

Genre:

Questionnaire & Opinion Survey (0.55)
Research Report (0.51)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Weakly-supervised forced alignment of disfluent speech using phoneme-level modeling

Kouzelis, Theodoros, Paraskevopoulos, Georgios, Katsamanis, Athanasios, Katsouros, Vassilis

arXiv.org Artificial IntelligenceMay-30-2023

The study of speech disorders can benefit greatly from time-aligned data. However, audio-text mismatches in disfluent speech cause rapid performance degradation for modern speech aligners, hindering the use of automatic approaches. In this work, we propose a simple and effective modification of alignment graph construction of CTC-based models using Weighted Finite State Transducers. The proposed weakly-supervised approach alleviates the need for verbatim transcription of speech disfluencies for forced alignment. During the graph construction, we allow the modeling of common speech disfluencies, i.e. repetitions and omissions. Further, we show that by assessing the degree of audio-text mismatch through the use of Oracle Error Rate, our method can be effectively used in the wild. Our evaluation on a corrupted version of the TIMIT test set and the UCLASS dataset shows significant improvements, particularly for recall, achieving a 23-25% relative improvement over our baselines.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2306.00996

Country:

Europe > Greece (0.14)
Europe > Czechia (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)

Add feedback

Depression detection in social media posts using affective and social norm features

Triantafyllopoulos, Ilias, Paraskevopoulos, Georgios, Potamianos, Alexandros

arXiv.org Artificial IntelligenceMar-24-2023

Emotive language is also correlated with depression, as mental health issues affect the emotional state of people. It is empirically We propose a deep architecture for depression detection from established that depressed individuals express more negative social media posts. The proposed architecture builds upon thoughts, emotions and perspectives [9, 10, 11]. BERT to extract language representations from social media posts and combines these representations using an attentive Depression detection from social media can be performed bidirectional GRU network. We incorporate affective information, either at the individual post level or at the user level, given by augmenting the text representations with features extracted a collection of posts by said user. In [12], authors classify from a pretrained emotion classifier. Motivated by psychological depression-related LiveJournal posts, while in [5] authors focus literature we propose to incorporate profanity and on Twitter post classification. In [13], a shared task for CLPsych morality features of posts and words in our architecture using a 2015 is proposed for clinical diagnoses from Twitter posts.

artificial intelligence, machine learning, social media, (14 more...)

arXiv.org Artificial Intelligence

2303.14279

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition Systems A case study for Modern Greek

Paraskevopoulos, Georgios, Kouzelis, Theodoros, Rouvalis, Georgios, Katsamanis, Athanasios, Katsouros, Vassilis, Potamianos, Alexandros

arXiv.org Artificial IntelligenceDec-31-2022

Modern speech recognition systems exhibits rapid performance degradation under domain shift. This issue is especially prevalent in data-scarce settings, such as low-resource languages, where diversity of training data is limited. In this work we propose M2DS2, a simple and sample-efficient finetuning strategy for large pretrained speech models, based on mixed source and target domain self-supervision. We find that including source domain self-supervision stabilizes training and avoids mode collapse of the latent representations. For evaluation, we collect HParl, a $120$ hour speech corpus for Greek, consisting of plenary sessions in the Greek Parliament. We merge HParl with two popular Greek corpora to create GREC-MD, a test-bed for multi-domain evaluation of Greek ASR systems. In our experiments we find that, while other Unsupervised Domain Adaptation baselines fail in this resource-constrained environment, M2DS2 yields significant improvements for cross-domain adaptation, even when a only a few hours of in-domain audio are available. When we relax the problem in a weakly supervised setting, we find that independent adaptation for audio using M2DS2 and language using simple LM augmentation techniques is particularly effective, yielding word error rates comparable to the fully supervised baselines.

artificial intelligence, machine learning, sample-efficient unsupervised domain adaptation, (4 more...)

arXiv.org Artificial Intelligence

2301.00304

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis

Chlapanis, Odysseas S., Paraskevopoulos, Georgios, Potamianos, Alexandros

arXiv.org Artificial IntelligenceDec-1-2022

Multimodal learning pipelines have benefited from the success of pretrained language models. However, this comes at the cost of increased model parameters. In this work, we propose Adapted Multimodal BERT (AMB), a BERT-based architecture for multimodal tasks that uses a combination of adapter modules and intermediate fusion layers. The adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations. During the adaptation process the pre-trained language model parameters remain frozen, allowing for fast, parameter-efficient training. In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise. Our experiments on sentiment analysis with CMU-MOSEI show that AMB outperforms the current state-of-the-art across metrics, with 3.4% relative reduction in the resulting error and 2.1% relative improvement in 7-class classification accuracy.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2212.00678

Country: Europe (0.29)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.63)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.63)

Add feedback

EmpBot: A T5-based Empathetic Chatbot focusing on Sentiments

Zaranis, Emmanouil, Paraskevopoulos, Georgios, Katsamanis, Athanasios, Potamianos, Alexandros

arXiv.org Artificial IntelligenceOct-30-2021

In this paper, we introduce EmpBot: an end-to-end empathetic chatbot. Empathetic conversational agents should not only understand what is being discussed, but also acknowledge the implied feelings of the conversation partner and respond appropriately. To this end, we propose a method based on a transformer pretrained language model (T5). Specifically, during finetuning we propose to use three objectives: response language modeling, sentiment understanding, and empathy forcing. The first objective is crucial for generating relevant and coherent responses, while the next ones are significant for acknowledging the sentimental state of the conversational partner and for favoring empathetic responses. We evaluate our model on the EmpatheticDialogues dataset using both automated metrics and human evaluation. The inclusion of the sentiment understanding and empathy forcing auxiliary losses favor empathetic responses, as human evaluation results indicate, comparing with the current state-of-the-art.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2111.0031

Country:

Europe > Greece (0.16)
Oceania > Australia (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Integrating Recurrence Dynamics for Speech Emotion Recognition

Tzinis, Efthymios, Paraskevopoulos, Georgios, Baziotis, Christos, Potamianos, Alexandros

arXiv.org Machine LearningNov-9-2018

We investigate the performance of features that can capture nonlinear recurrence dynamics embedded in the speech signal for the task of Speech Emotion Recognition (SER). Reconstruction of the phase space of each speech frame and the computation of its respective Recurrence Plot (RP) reveals complex structures which can be measured by performing Recurrence Quantification Analysis (RQA). These measures are aggregated by using statistical functionals over segment and utterance periods. We report SER results for the proposed feature set on three databases using different classification methods. When fusing the proposed features with traditional feature sets, e.g., [1], we show an improvement in unweighted accuracy of up to 5.7% and 10.7% on Speaker-Dependent (SD) and Speaker-Independent (SI) SER tasks, respectively, over the baseline [1]. Following a segment-based approach we demonstrate state-of- the-art performance on IEMOCAP using a Bidirectional Recurrent Neural Network.

deep learning, emotion recognition, neural network, (21 more...)

arXiv.org Machine Learning

doi: 10.21437/Interspeech.2018-1377

1811.04133

Country:

North America > United States (0.14)
Europe > Greece (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback