AITopics | schubert

Collaborating Authors

schubert

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CiMaTe: Citation Count Prediction Effectively Leveraging the Main Text

Hirako, Jun, Sasano, Ryohei, Takeda, Koichi

arXiv.org Artificial IntelligenceOct-6-2024

Prediction of the future citation counts of papers is increasingly important to find interesting papers among an ever-growing number of papers. Although a paper's main text is an important factor for citation count prediction, it is difficult to handle in machine learning models because the main text is typically very long; thus previous studies have not fully explored how to leverage it. In this paper, we propose a BERT-based citation count prediction model, called CiMaTe, that leverages the main text by explicitly capturing a paper's sectional structure. Through experiments with papers from computational linguistics and biology domains, we demonstrate the CiMaTe's effectiveness, outperforming the previous methods in Spearman's rank correlation coefficient; 5.1 points in the computational linguistics domain and 1.8 points in the biology domain.

citation count, main text, representation, (16 more...)

arXiv.org Artificial Intelligence

2410.04404

Country: South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

The Missing Curve Detectors of InceptionV1: Applying Sparse Autoencoders to InceptionV1 Early Vision

Gorton, Liv

arXiv.org Artificial IntelligenceJun-5-2024

Recent work on sparse autoencoders (SAEs) has shown promise in extracting interpretable features from neural networks and addressing challenges with polysemantic neurons caused by superposition. In this paper, we apply SAEs to the early vision layers of InceptionV1, a well-studied convolutional neural network, with a focus on curve detectors. Our results demonstrate that SAEs can uncover new interpretable features not apparent from examining individual neurons, including additional curve detectors that fill in previous gaps. We also find that SAEs can decompose some polysemantic neurons into more monosemantic constituent features. These findings suggest SAEs are a valuable tool for understanding InceptionV1, and convolutional neural networks more generally.

activation, inceptionv1, neuron, (14 more...)

arXiv.org Artificial Intelligence

2406.03662

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

Medoid Silhouette clustering with automatic cluster number selection

Lenssen, Lars, Schubert, Erich

arXiv.org Machine LearningSep-7-2023

The evaluation of clustering results is difficult, highly dependent on the evaluated data set and the perspective of the beholder. There are many different clustering quality measures, which try to provide a general measure to validate clustering results. A very popular measure is the Silhouette. We discuss the efficient medoid-based variant of the Silhouette, perform a theoretical analysis of its properties, provide two fast versions for the direct optimization, and discuss the use to choose the optimal number of clusters. We combine ideas from the original Silhouette with the well-known PAM algorithm and its latest improvements FasterPAM. One of the versions guarantees equal results to the original variant and provides a run speedup of $O(k^2)$. In experiments on real data with 30000 samples and $k$=100, we observed a 10464$\times$ speedup compared to the original PAMMEDSIL algorithm. Additionally, we provide a variant to choose the optimal number of clusters directly.

artificial intelligence, machine learning, silhouette, (18 more...)

arXiv.org Machine Learning

doi: 10.1016/j.is.2023.102290

2309.03751

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

SAPIEN: Affective Virtual Agents Powered by Large Language Models

Hasan, Masum, Ozel, Cengiz, Potter, Sammy, Hoque, Ehsan

arXiv.org Artificial IntelligenceAug-6-2023

Abstract--In this demo paper, we introduce SAPIEN, a platform for high-fidelity virtual agents driven by large language models that can hold open domain conversations with users in 13 different languages, and display emotions through facial expressions and voice. The platform allows users to customize their virtual agent's personality, background, and conversation premise, thus providing a rich, immersive interaction experience. Furthermore, after the virtual meeting, the user can choose to get the conversation analyzed and receive actionable feedback on their communication skills. This paper illustrates an overview of the platform and discusses the various application domains of this technology, ranging from entertainment to mental health, communication training, language learning, education, healthcare, and beyond. Additionally, we consider the ethical implications of such realistic virtual agent representations and the potential challenges in ensuring responsible use.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2308.03022

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > New York > Monroe County > Rochester (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Towards Resolving Word Ambiguity with Word Embeddings

Thurnbauer, Matthias, Reisinger, Johannes, Goller, Christoph, Fischer, Andreas

arXiv.org Artificial IntelligenceJul-25-2023

Ambiguity is ubiquitous in natural language. Resolving ambiguous meanings is especially important in information retrieval tasks. While word embeddings carry semantic information, they fail to handle ambiguity well. Transformer models have been shown to handle word ambiguity for complex queries, but they cannot be used to identify ambiguous words, e.g. for a 1-word query. Furthermore, training these models is costly in terms of time, hardware resources, and training data, prohibiting their use in specialized environments with sensitive data. Word embeddings can be trained using moderate hardware resources. This paper shows that applying DBSCAN clustering to the latent space can identify ambiguous words and evaluate their level of ambiguity. An automatic DBSCAN parameter selection leads to high-quality clusters, which are semantically coherent and correspond well to the perceived meanings of a given word.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2307.13417

Country:

North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Add feedback

FastDiagP: An Algorithm for Parallelized Direct Diagnosis

Le, Viet-Man, Silva, Cristian Vidal, Felfernig, Alexander, Benavides, David, Galindo, José, Tran, Thi Ngoc Trang

arXiv.org Artificial IntelligenceMay-11-2023

Constraint-based applications attempt to identify a solution that meets all defined user requirements. If the requirements are inconsistent with the underlying constraint set, algorithms that compute diagnoses for inconsistent constraints should be implemented to help users resolve the "no solution could be found" dilemma. FastDiag is a typical direct diagnosis algorithm that supports diagnosis calculation without predetermining conflicts. However, this approach faces runtime performance issues, especially when analyzing complex and large-scale knowledge bases. In this paper, we propose a novel algorithm, so-called FastDiagP, which is based on the idea of speculative programming. This algorithm extends FastDiag by integrating a parallelization mechanism that anticipates and pre-calculates consistency checks requested by FastDiag. This mechanism helps to provide consistency checks with fast answers and boosts the algorithm's runtime performance. The performance improvements of our proposed algorithm have been shown through empirical results using the Linux-2.6.3.33 configuration knowledge base.

artificial intelligence, constraint-based reasoning, diagnosis, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1609/aaai.v37i5.25792

2305.06951

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
South America > Chile > Maule Region > Talca Province > Talca (0.04)
Europe > Austria > Styria > Graz (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)

Add feedback

Stop using the elbow criterion for k-means and how to choose the number of clusters instead

Schubert, Erich

arXiv.org Artificial IntelligenceDec-23-2022

A major challenge when using k-means clustering often is how to choose the parameter k, the number of clusters. In this letter, we want to point out that it is very easy to draw poor conclusions from a common heuristic, the "elbow method". Better alternatives have been known in literature for a long time, and we want to draw attention to some of these easy to use options, that often perform better. This letter is a call to stop using the elbow method altogether, because it severely lacks theoretic support, and we want to encourage educators to discuss the problems of the method -- if introducing it in class at all -- and teach alternatives instead, while researchers and reviewers should reject conclusions drawn from the elbow method.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3606274.3606278

2212.12189

Country: Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Dortmund (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Improved proteasomal cleavage prediction with positive-unlabeled learning

Dorigatti, Emilio, Bischl, Bernd, Schubert, Benjamin

arXiv.org Artificial IntelligenceOct-28-2022

Accurate in silico modeling of the antigen processing pathway is crucial to enable personalized epitope vaccine design for cancer. An important step of such pathway is the degradation of the vaccine into smaller peptides by the proteasome, some of which are going to be presented to T cells by the MHC complex. While predicting MHC-peptide presentation has received a lot of attention recently, proteasomal cleavage prediction remains a relatively unexplored area in light of recent advancesin high-throughput mass spectrometry-based MHC ligandomics. Moreover, as such experimental techniques do not allow to identify regions that cannot be cleaved, the latest predictors generate decoy negative samples and treat them as true negatives when training, even though some of them could actually be positives. In this work, we thus present a new predictor trained with an expanded dataset and the solid theoretical underpinning of positive-unlabeled learning, achieving a new state-of-the-art in proteasomal cleavage prediction. The improved predictive capabilities will in turn enable more precise vaccine development improving the efficacy of epitope-based vaccines. Pretrained models are available on GitHub

artificial intelligence, machine learning, prediction, (15 more...)

arXiv.org Artificial Intelligence

2209.07527

Country:

Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.05)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
North America > United States > New York (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Vaccines (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Add feedback

Clustering by Direct Optimization of the Medoid Silhouette

Lenssen, Lars, Schubert, Erich

arXiv.org Artificial IntelligenceSep-26-2022

The evaluation of clustering results is difficult, highly dependent on the evaluated data set and the perspective of the beholder. There are many different clustering quality measures, which try to provide a general measure to validate clustering results. A very popular measure is the Silhouette. We discuss the efficient medoid-based variant of the Silhouette, perform a theoretical analysis of its properties, and provide two fast versions for the direct optimization. We combine ideas from the original Silhouette with the well-known PAM algorithm and its latest improvements FasterPAM. One of the versions guarantees equal results to the original variant and provides a run speedup of $O(k^2)$. In experiments on real data with 30000 samples and $k$=100, we observed a 10464$\times$ speedup compared to the original PAMMEDSIL algorithm.

artificial intelligence, machine learning, silhouette, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-17849-8_15

2209.12553

Country: Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Dortmund (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

AI and the Future of Music Creation

#artificialintelligenceSep-21-2022, 06:25:15 GMT

My father and my brothers are amateur musicians who play various instruments; thus, music runs in my family. I was raised in a household where music listening is a tradition and where kids are encouraged to sing, play, and listen to music early in our days in Brazil. My earliest memories are of music; they are the things that moved me the most and that life taught me to value the most. I put together some rock bands when I was a teenager, and even now, there is never a shortage of musical instruments at my house, both analog and now digital. In addition to my musical education, adult life led me along other routes, including those in computing and artificial intelligence: a really rich experience that allowed me to learn new languages and develop technical skills.

artificial intelligence, artist, music, (15 more...)

#artificialintelligence

Country: South America > Brazil (0.25)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.48)

Add feedback