AITopics | italic

Collaborating Authors

italic

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A First Look at Information Highlighting in Stack Overflow Answers

Ahmed, Shahla Shaan, Wang, Shaowei, Tian, Yuan, Tse-Hsun, null, Chen, null, Zhang, Haoxiang

arXiv.org Artificial IntelligenceJan-31-2024

Context: Navigating the knowledge of Stack Overflow (SO) remains challenging. To make the posts vivid to users, SO allows users to write and edit posts with Markdown or HTML so that users can leverage various formatting styles (e.g., bold, italic, and code) to highlight the important information. Nonetheless, there have been limited studies on the highlighted information. Objective: We carried out the first large-scale exploratory study on the information highlighted in SO answers in our recent study. To extend our previous study, we develop approaches to automatically recommend highlighted content with formatting styles using neural network architectures initially designed for the Named Entity Recognition task. Method: In this paper, we studied 31,169,429 answers of Stack Overflow. For training recommendation models, we choose CNN and BERT models for each type of formatting (i.e., Bold, Italic, Code, and Heading) using the information highlighting dataset we collected from SO answers. Results: Our models based on CNN architecture achieve precision ranging from 0.71 to 0.82. The trained model for automatic code content highlighting achieves a recall of 0.73 and an F1 score of 0.71, outperforming the trained models for other formatting styles. The BERT models have even lower recalls and F1 scores than the CNN models. Our analysis of failure cases indicates that the majority of the failure cases are missing identification (i.e., the model misses the content that is supposed to be highlighted) due to the models tend to learn the frequently highlighted words while struggling to learn less frequent words. Conclusion: Our findings suggest that it is possible to develop recommendation models for highlighting information for answers with different formatting styles on Stack Overflow.

information, italic, stack overflow, (13 more...)

arXiv.org Artificial Intelligence

2401.01472

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Manitoba (0.04)
Europe > Italy (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

ITALIC: An Italian Intent Classification Dataset

Koudounas, Alkis, La Quatra, Moreno, Vaiani, Lorenzo, Colomba, Luca, Attanasio, Giuseppe, Pastor, Eliana, Cagliero, Luca, Baralis, Elena

arXiv.org Artificial IntelligenceJun-14-2023

Recent large-scale Spoken Language Understanding datasets focus predominantly on English and do not account for language-specific phenomena such as particular phonemes or words in different lects. We introduce ITALIC, the first large-scale speech dataset designed for intent classification in Italian. The dataset comprises 16,521 crowdsourced audio samples recorded by 70 speakers from various Italian regions and annotated with intent labels and additional metadata. We explore the versatility of ITALIC by evaluating current state-of-the-art speech and text models. Results on intent classification suggest that increasing scale and running language adaptation yield better speech models, monolingual text models outscore multilingual ones, and that speech recognition on ITALIC is more challenging than on existing Italian benchmarks. We release both the dataset and the annotation scheme to streamline the development of new Italian SLU models and language-specific datasets.

artificial intelligence, natural language, speech recognition, (16 more...)

arXiv.org Artificial Intelligence

2306.08502

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)
Europe > Italy > Lombardy > Milan (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback