AITopics | Sahoo, Satyajeet

Plotting

Sahoo, Satyajeet

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Novel Cholesky Kernel based Support Vector Classifier

Sahoo, Satyajeet, Maiti, Jhareswar

arXiv.org Machine LearningApr-6-2025

Support Vector Machine (SVM) is a popular supervised classification model that works by first finding the margin boundaries for the training data classes and then calculating the decision boundary, which is then used to classify the test data. This study demonstrates limitations of traditional support vector classification which uses cartesian coordinate geometry to find the margin and decision boundaries in an input space using only a few support vectors, without considering data variance and correlation. Subsequently, the study proposes a new Cholesky Kernel that adjusts for the effects of variance-covariance structure of the data in the decision boundary equation and margin calculations. The study demonstrates that SVM model is valid only in the Euclidean space, and the Cholesky kernel obtained by decomposing covariance matrix acts as a transformation matrix, which when applied on the original data transforms the data from the input space to the Euclidean space. The effectiveness of the Cholesky kernel based SVM classifier is demonstrated by classifying the Wisconsin Breast Cancer (Diagnostic) Dataset and comparing with traditional SVM approaches. The Cholesky kernel based SVM model shows marked improvement in the precision, recall and F1 scores compared to linear and other kernel SVMs.

artificial intelligence, kernel, machine learning, (16 more...)

arXiv.org Machine Learning

2504.04371

Country: North America > United States > Wisconsin (0.25)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

Multivariate Gaussian Topic Modelling: A novel approach to discover topics with greater semantic coherence

Sahoo, Satyajeet, Maiti, Jhareswar, Tewari, Virendra Kumar

arXiv.org Artificial IntelligenceMar-19-2025

An important aspect of text mining involves information retrieval in form of discovery of semantic themes (topics) from documents using topic modelling. While generative topic models like Latent Dirichlet Allocation (LDA) elegantly model topics as probability distributions and are useful in identifying latent topics from large document corpora with minimal supervision, they suffer from difficulty in topic interpretability and reduced performance in shorter texts. Here we propose a novel Multivariate Gaussian Topic modelling (MGD) approach. In this approach topics are presented as Multivariate Gaussian Distributions and documents as Gaussian Mixture Models. Using EM algorithm, the various constituent Multivariate Gaussian Distributions and their corresponding parameters are identified. Analysis of the parameters helps identify the keywords having the highest variance and mean contributions to the topic, and from these key-words topic annotations are carried out. This approach is first applied on a synthetic dataset to demonstrate the interpretability benefits vis-\`a-vis LDA. A real-world application of this topic model is demonstrated in analysis of risks and hazards at a petrochemical plant by applying the model on safety incident reports to identify the major latent hazards plaguing the plant. This model achieves a higher mean topic coherence of 0.436 vis-\`a-vis 0.294 for LDA.

artificial intelligence, natural language, topic model, (17 more...)

arXiv.org Artificial Intelligence

2503.15036

Country:

Asia > India (0.14)
Asia > China (0.14)

Genre:

Research Report > Promising Solution (0.40)
Overview > Innovation (0.40)

Industry:

Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.69)
Government (0.67)
Leisure & Entertainment > Sports > Cricket (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.92)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)

Add feedback

Variance-Adjusted Cosine Distance as Similarity Metric

Sahoo, Satyajeet, Maiti, Jhareswar

arXiv.org Machine LearningFeb-4-2025

Cosine similarity is a popular distance measure that measures the similarity between two vectors in the inner product space. It is widely used in many data classification algorithms like K-Nearest Neighbors, Clustering etc. This study demonstrates limitations of application of cosine similarity. Particularly, this study demonstrates that traditional cosine similarity metric is valid only in the Euclidean space, whereas the original data resides in a random variable space. When there is variance and correlation in the data, then cosine distance is not a completely accurate measure of similarity. While new similarity and distance metrics have been developed to make up for the limitations of cosine similarity, these metrics are used as substitutes to cosine distance, and do not make modifications to cosine distance to overcome its limitations. Subsequently, we propose a modified cosine similarity metric, where cosine distance is adjusted by variance-covariance of the data. Application of variance-adjusted cosine distance gives better similarity performance compared to traditional cosine distance. KNN modelling on the Wisconsin Breast Cancer Dataset is performed using both traditional and modified cosine similarity measures and compared. The modified formula shows 100% test accuracy on the data.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2502.02233

Country:

North America > United States > Wisconsin (0.26)
Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback