AITopics | correct number

Collaborating Authors

correct number

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Number of Clusters in a Dataset: A Regularized K-means Approach

Kamgar-Parsi, Behzad, Kamgar-Parsi, Behrooz

arXiv.org Artificial IntelligenceMay-30-2025

Finding the number of meaningful clusters in an unlabeled dataset is important in many applications. Regularized k-means algorithm is a possible approach frequently used to find the correct number of distinct clusters in datasets. The most common formulation of the regularization function is the additive linear term $λk$, where $k$ is the number of clusters and $λ$ a positive coefficient. Currently, there are no principled guidelines for setting a value for the critical hyperparameter $λ$. In this paper, we derive rigorous bounds for $λ$ assuming clusters are {\em ideal}. Ideal clusters (defined as $d$-dimensional spheres with identical radii) are close proxies for k-means clusters ($d$-dimensional spherically symmetric distributions with identical standard deviations). Experiments show that the k-means algorithm with additive regularizer often yields multiple solutions. Thus, we also analyze k-means algorithm with multiplicative regularizer. The consensus among k-means solutions with additive and multiplicative regularizations reduces the ambiguity of multiple solutions in certain cases. We also present selected experiments that demonstrate performance of the regularized k-means algorithms as clusters deviate from the ideal assumption.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.22991

Country: North America > United States (0.67)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Ranked differences Pearson correlation dissimilarity with an application to electricity users time series clustering

Charoensuk, Chutiphan, Wiroonsri, Nathakhun

arXiv.org Machine LearningMay-8-2025

Time series clustering is an unsupervised learning method for classifying time series data into groups with similar behavior. It is used in applications such as healthcare, finance, economics, energy, and climate science. Several time series clustering methods have been introduced and used for over four decades. Most of them focus on measuring either Euclidean distances or association dissimilarities between time series. In this work, we propose a new dissimilarity measure called ranked Pearson correlation dissimilarity (RDPC), which combines a weighted average of a specified fraction of the largest element-wise differences with the well-known Pearson correlation dissimilarity. It is incorporated into hierarchical clustering. The performance is evaluated and compared with existing clustering algorithms. The results show that the RDPC algorithm outperforms others in complicated cases involving different seasonal patterns, trends, and peaks. Finally, we demonstrate our method by clustering a random sample of customers from a Thai electricity consumption time series dataset into seven groups with unique characteristics.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

2505.02173

Country:

Asia > Thailand (0.05)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Industry: Energy > Power Industry (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Teaching Large Language Models Number-Focused Headline Generation With Key Element Rationales

Qian, Zhen, Zhang, Xiuzhen, Xu, Xiaofei, Xia, Feng

arXiv.org Artificial IntelligenceFeb-5-2025

Number-focused headline generation is a summarization task requiring both high textual quality and precise numerical accuracy, which poses a unique challenge for Large Language Models (LLMs). Existing studies in the literature focus only on either textual quality or numerical reasoning and thus are inadequate to address this challenge. In this paper, we propose a novel chain-of-thought framework for using rationales comprising key elements of the Topic, Entities, and Numerical reasoning (TEN) in news articles to enhance the capability for LLMs to generate topic-aligned high-quality texts with precise numerical accuracy. Specifically, a teacher LLM is employed to generate TEN rationales as supervision data, which are then used to teach and fine-tune a student LLM. Our approach teaches the student LLM automatic generation of rationales with enhanced capability for numerical reasoning and topic-aligned numerical headline generation. Experiments show that our approach achieves superior performance in both textual quality and numerical accuracy.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.03129

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Mexico > Mexico City > Mexico City (0.04)
Oceania > Australia (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government (0.68)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Binyamin, Lital, Tewel, Yoad, Segev, Hilit, Hirsch, Eran, Rassin, Royi, Chechik, Gal

arXiv.org Artificial IntelligenceJun-14-2024

Despite the unprecedented success of text-to-image diffusion models, controlling the number of depicted objects using text is surprisingly hard. This is important for various applications from technical documents, to children's books to illustrating cooking recipes. Generating object-correct counts is fundamentally challenging because the generative model needs to keep a sense of separate identity for every instance of the object, even if several objects look identical or overlap, and then carry out a global computation implicitly during generation. It is still unknown if such representations exist. To address count-correct generation, we first identify features within the diffusion model that can carry the object identity information. We then use them to separate and count instances of objects during the denoising process and detect over-generation and under-generation. We fix the latter by training a model that predicts both the shape and location of a missing object, based on the layout of existing ones, and show how it can be used to guide denoising with correct object count. Our approach, CountGen, does not depend on external source to determine object layout, but rather uses the prior from the diffusion model itself, creating prompt-dependent and seed-dependent layouts. Evaluated on two benchmark datasets, we find that CountGen strongly outperforms the count-accuracy of existing baselines.

countgen, diffusion model, layout, (16 more...)

arXiv.org Artificial Intelligence

2406.1021

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > Canada > Quebec > Capitale-Nationale Region > Québec (0.04)
North America > Canada > Quebec > Capitale-Nationale Region > Quebec City (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Media (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.87)

Add feedback

Unsupervised Machine Learning to Classify the Confinement of Waves in Periodic Superstructures

Kozoň, Marek, Schrijver, Rutger, Schlottbom, Matthias, van der Vegt, Jaap J. W., Vos, Willem L.

arXiv.org Artificial IntelligenceApr-26-2023

Abstract: We employ unsupervised machine learning to enhance the accuracy of our recently presented scaling method for wave confinement analysis [1]. We employ the standard k-means++ algorithm as well as our own model-based algorithm. We investigate cluster validity indices as a means to find the correct number of confinement dimensionalities to be used as an input to the clustering algorithms. Subsequently, we analyze the performance of the two clustering algorithms when compared to the direct application of the scaling method without clustering. We find that the clustering approach provides more physically meaningful results, but may struggle with identifying the correct set of confinement dimensionalities. We conclude that the most accurate outcome is obtained by first applying the direct scaling to find the correct set of confinement dimensionalities and subsequently employing clustering to refine the results. Moreover, our model-based algorithm outperforms the standard k-means++ clustering. 1. Introduction Completely controlling wave propagation in periodic media is a key challenge that is essential for a large variety of applications [2-16]. An especially interesting type of control is wave confinement achieved by introducing disorder and functional defects into an otherwise periodic medium [17-20]. The interference of waves in such an altered structure may result in a strong concentration of the energy density inside a small sub-volume of the medium. Wave confinement has been investigated for different types of waves and in various settings, e.g., classical mechanics [21], photonics [10, 11, 22-24], solid state physics [25-29], or magnonics [30, 31]. Its applications include sensors, controlled spontaneous emission, and enhanced interactions between hybrid wave-types such as sound and light [32-40].

algorithm, confinement dimensionality, supercell, (11 more...)

arXiv.org Artificial Intelligence

doi: 10.1364/OE.492014

2304.11901

Country:

Europe > Netherlands (0.04)
North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)

Genre: Research Report (0.40)

Industry: Energy (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect the Number of Clusters

Mohammadi, Seyed Omid, Kalhor, Ahmad, Bodaghi, Hossein

arXiv.org Artificial IntelligenceOct-9-2021

This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters. K-splits starts from a small number of clusters and uses the most significant data distribution axis to split these clusters incrementally into better fits if needed. Accuracy and speed are two main advantages of the proposed method. We experiment on six synthetic benchmark datasets plus two real-world datasets MNIST and Fashion-MNIST, to prove that our algorithm has excellent accuracy in finding the correct number of clusters under different conditions. We also show that k-splits is faster than similar methods and can even be faster than the standard k-means in lower dimensions. Finally, we suggest using k-splits to uncover the exact position of centroids and then input them as initial points to the k-means algorithm to fine-tune the results.

algorithm, centroid, dataset, (15 more...)

arXiv.org Artificial Intelligence

2110.0466

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Iran > Tehran Province > Tehran (0.05)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > United States > California > Alameda County > Oakland (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

ThetA -- fast and robust clustering via a distance parameter

Garyfallidis, Eleftherios, Fadnavis, Shreyas, Park, Jong Sung, Chandio, Bramsh Qamar, Guaje, Javier, Koudoro, Serge, Anousheh, Nasim

arXiv.org Artificial IntelligenceFeb-13-2021

Based on this, one can further divide distance-based methods into three categories: 1) assuming number of clusters as Clustering is a fundamental problem in machine known in advance, 2) a distance threshold as known or 3) learning where distance-based approaches have by assuming a limiting number of data points belonging to dominated the field for many decades. This set each particular cluster. of problems is often tackled by partitioning the data into K clusters where the number of clusters While clustering algorithms primarily focus on accurately is chosen apriori. While significant progress has partitioning the data, they also aimed at inferring information been made on these lines over the years, it is well from a data exploration standpoint. In this work, we established that as the number of clusters or dimensions primarily focus on distance-based clustering given its broad increase, current approaches dwell in adoption and propose a new framework, ThetA, which uses local minima resulting in suboptimal solutions.

algorithm, centroid, k-means, (17 more...)

arXiv.org Artificial Intelligence

2102.07028

Country:

North America > United States > Indiana (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Machine Learning Case Study

#artificialintelligenceJul-7-2020, 04:45:24 GMT

We are living in the era of digital technologies. When was the last time you walked into a shop that didn't have a PayTM or BHIM UPI? These digital transaction technologies have quickly become a key part of our daily lives. And not just at an individual level, these digital technologies are at the core of every financial institution. Executing a payment transaction or fund transfer has become very smooth with multiple possible options (like internet banking, ATM, credit or debit cards, UPI, POS Machines, etc.) having reliable systems running at the backend.

artificial intelligence, customer, machine learning, (17 more...)

#artificialintelligence

Country: Asia > India (0.05)

Industry:

Information Technology > Services (0.35)
Banking & Finance > Credit (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

A Neural Network for Determination of Latent Dimensionality in Nonnegative Matrix Factorization

Nebgen, Benjamin T., Vangara, Raviteja, Hombrados-Herrera, Miguel A., Kuksova, Svetlana, Alexandrov, Boian S.

arXiv.org Machine LearningJun-22-2020

Non-negative Matrix Factorization (NMF) has proven to be a powerful unsupervised learning method for uncovering hidden features in complex and noisy data sets with applications in data mining, text recognition, dimension reduction, face recognition, anomaly detection, blind source separation, and many other fields. An important input for NMF is the latent dimensionality of the data, that is, the number of hidden features, K, present in the explored data set. Unfortunately, this quantity is rarely known a priori. We utilize a supervised machine learning approach in combination with a recent method for model determination, called NMFk, to determine the number of hidden features automatically. NMFk performs a set of NMF simulations on an ensemble of matrices, obtained by bootstrapping the initial data set, and determines which K produces stable groups of latent features that reconstruct the initial data set well. We then train a Multi-Layer Perceptron (MLP) classifier network to determine the correct number of latent features utilizing the statistics and characteristics of the NMF solutions, obtained from NMFk. In order to train the MLP classifier, a training set of 58,660 matrices with predetermined latent features were factorized with NMFk. The MLP classifier in conjunction with NMFk maintains a greater than 95% success rate when applied to a held out test set. Additionally, when applied to two well-known benchmark data sets, the swimmer and MIT face data, NMFk/MLP correctly recovered the established number of hidden features. Finally, we compared the accuracy of our method to the ARD, AIC and Stability-based methods.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Machine Learning

2006.12402

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > South Holland > Dordrecht (0.04)
Asia (0.04)

Genre: Research Report (1.00)

Industry:

Energy (0.93)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Penalized k-means algorithms for finding the correct number of clusters in a dataset

Kamgar-Parsi, Behzad, Kamgar-Parsi, Behrooz

arXiv.org Machine LearningNov-15-2019

In many applications we want to find the number of clusters in a dataset. A common approach is to use the penalized k-means algorithm with an additive penalty term linear in the number of clusters. An open problem is estimating the value of the coefficient of the penalty term. Since estimating the value of the coefficient in a principled manner appears to be intractable for general clusters, we investigate "ideal clusters", i.e. identical spherical clusters with no overlaps and no outlier background noise. In this paper: (a) We derive, for the case of ideal clusters, rigorous bounds for the coefficient of the additive penalty. Unsurprisingly, the bounds depend on the correct number of clusters, which we want to find in the first place. We further show that additive penalty, even for this simplest case of ideal clusters, typically produces a weak and often ambiguous signature for the correct number of clusters. (b) As an alternative, we examine the k-means with multiplicative penalty, and show that this parameter-free formulation has a stronger, and less often ambiguous, signature for the correct number of clusters. We also empirically investigate certain types of deviations from ideal cluster assumption and show that combination of k-means with additive and multiplicative penalties can resolve ambiguous solutions.

algorithm, correct number, penalty, (15 more...)

arXiv.org Machine Learning

1911.06741

Country:

North America > United States > District of Columbia > Washington (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback