AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.44)

Neural Information Processing SystemsJan-27-2025, 13:08:30 GMT

Review for NeurIPS paper: From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering

Throughout discussion among reviewers with the author response, all reviewers agree with the novelty and the significance of the theoretical contribution of this paper, which provides approximation guarantees of the proposed embedding. While reviewers raised a concern about empirical performance regarding with computational cost and parameter tuning, they are common problems for other clustering approaches and are not crucial problems of the proposal. Hence I recommend acceptance of this paper.

continuous embedding and back, hyperbolic hierarchical clustering, neurips paper, (1 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.40)

Neural Information Processing SystemsJan-27-2025, 11:27:33 GMT

Reviews: Subquadratic High-Dimensional Hierarchical Clustering

This paper proposes a new approach to approximating hierarchical agglomerative clustering (HAC) by requiring that at each round, only a gamma-best merge be performed (gamma being the multiplicative approximation factor to the closest pair). Two algorithms are introduced to approximate HAC - one for Ward and one for Average linkage. In both cases, the algorithms rely on using approximate nearest neighbor ANN as a black box. In addition, a bucketing datastructure is used in Wards algorithm and a subsampling procedure in used for Average linkage to guarantee the subquadratic runtime. This is a new contribution to the theoretical literature on HAC, a provable subquadratic algorithm for (an approximation to) HAC cases other than single linkage.

algorithm, average linkage, subquadratic high-dimensional hierarchical clustering, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.73)

Neural Information Processing SystemsJan-27-2025, 11:11:56 GMT

Reviews: Subquadratic High-Dimensional Hierarchical Clustering

This paper was very much a borderline paper, with two accept scores and one reject score. One of the concerns raised by the negative reviewer was that, while the algorithm can achieve an approximation to the best merge at each step, it is unclear how the final clustering results would compare to the standard algorithm. The authors addressed this in their rebuttal, which helped. Also, there were some issues raised about experiments as well as various minor suggestions (typos etc.). In general it seems that the concerns are mostly minor, and on the whole this paper seems to make an interesting and worthwhile contribution, so I am recommending that the paper is accepted.

algorithm, subquadratic high-dimensional hierarchical clustering

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.40)

Malinen, Mikko I., Fränti, Pasi

Fixed-sized clusters $k$-Means

arXiv.org Artificial IntelligenceJan-27-2025

We present a $k$-means-based clustering algorithm, which optimizes the mean square error, for given cluster sizes. A straightforward application is balanced clustering, where the sizes of each cluster are equal. In the $k$-means assignment phase, the algorithm solves an assignment problem using the Hungarian algorithm. This makes the assignment phase time complexity $O(n^3)$. This enables clustering of datasets of size more than 5000 points.

algorithm, artificial intelligence, machine learning, (17 more...)

2501.16113

Country:

Europe > Finland > North Karelia > Joensuu (0.05)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.05)
North America > United States > California > San Francisco County > San Francisco (0.05)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Mata-Carballeira, Óscar, Gutiérrez-Zaballa, Jon, del Campo, Inés, Martínez, Victoria

An FPGA-Based Neuro-Fuzzy Sensor for Personalized Driving Assistance

arXiv.org Artificial IntelligenceJan-27-2025

Depending on their sophistication level, sensors can be classified ranging from simple sensors that directly measure single physical parameters (e.g., ambient light sensors and temperature sensors) to complex intelligent sensors, which determine parameters of the surrounding environment through wide spectrum signals (e.g., radio frequency/radar and light/video); besides measuring, they perform data processing and are enabled to carry out actuations. Whereas intelligent sensors make use of data of a different nature underneath, in which complex and nonlinear behaviors are codified; data-mining techniques used jointly with machine learning (ML) algorithms have shown adequate performance for modeling this hidden information. As intelligent sensors often rely on complex sensors and sensor fusion techniques, the data processing power they need can only be provided by high-performance computational platforms such as microprocessors, graphics-processing units (GPUs), or field-programmable gate arrays (FPGAs). In particular, FPGA-based implementations stand out due to the extremely high operational frequencies and low power consumption they can achieve, even for complex, multilayered algorithms [1]. In the context of the automotive field, intelligent sensors are key components of current assistance systems.

artificial intelligence, machine learning, vehicle, (20 more...)

doi: 10.3390/s19184011

2501.16212

Country:

North America > United States > California > Santa Clara County > San Jose (0.04)
North America > United States > District of Columbia > Washington (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(23 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Information Technology (0.86)
(2 more...)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Data Science (1.00)
Information Technology > Communications (1.00)
(3 more...)

Neural Information Processing SystemsJan-26-2025, 07:35:56 GMT

Reviews: Foundations of Comparison-Based Hierarchical Clustering

In this work the authors study hierarchical clustering under quadruplet comparison framework. The authors show that single and complete linkages are inherently comparison based and propose two variants of average linkage clustering exploiting quadruplet comparison. Exact hierarchy recovery guarantee is provided under planted hierarchical partition model and empirical evaluation is provided. The meaning of the variables \mu, \delta etc are hard to interpret from the description. They have been nicely summarized (and explained) in the appendix A.1.

comparison-based hierarchical clustering, pure cluster, quadruplet comparison, (11 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.73)

Neural Information Processing SystemsJan-26-2025, 07:35:46 GMT

Reviews: Foundations of Comparison-Based Hierarchical Clustering

The authors have proposed two variants of average linkage hierarchical clustering using quadruplet comparison framework. Theoretical results of hierarchy recovery is established under a suitable model. The reviewers are in agreement that the results are new and important. The authors should incorporate the suggestions made by the reviewers to further strengthen the paper.

artificial intelligence, comparison-based hierarchical clustering, machine learning, (2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.79)

arXiv.org Artificial IntelligenceJan-26-2025

ESGSenticNet: A Neurosymbolic Knowledge Base for Corporate Sustainability Analysis

Ong, Keane, Mao, Rui, Xing, Frank, Satapathy, Ranjan, Sulaeman, Johan, Cambria, Erik, Mengaldo, Gianmarco

Evaluating corporate sustainability performance is essential to drive sustainable business practices, amid the need for a more sustainable economy. However, this is hindered by the complexity and volume of corporate sustainability data (i.e. sustainability disclosures), not least by the effectiveness of the NLP tools used to analyse them. To this end, we identify three primary challenges - immateriality, complexity, and subjectivity, that exacerbate the difficulty of extracting insights from sustainability disclosures. To address these issues, we introduce ESGSenticNet, a publicly available knowledge base for sustainability analysis. ESGSenticNet is constructed from a neurosymbolic framework that integrates specialised concept parsing, GPT-4o inference, and semi-supervised label propagation, together with a hierarchical taxonomy. This approach culminates in a structured knowledge base of 44k knowledge triplets - ('halve carbon emission', supports, 'emissions control'), for effective sustainability analysis. Experiments indicate that ESGSenticNet, when deployed as a lexical method, more effectively captures relevant and actionable sustainability information from sustainability disclosures compared to state of the art baselines. Besides capturing a high number of unique ESG topic terms, ESGSenticNet outperforms baselines on the ESG relatedness and ESG action orientation of these terms by 26% and 31% respectively. These metrics describe the extent to which topic terms are related to ESG, and depict an action toward ESG. Moreover, when deployed as a lexical method, ESGSenticNet does not require any training, possessing a key advantage in its simplicity for non-technical stakeholders.

large language model, machine learning, natural language, (22 more...)

2501.1572

Country: Asia > India (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Water & Waste Management > Solid Waste Management (1.00)
Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Dinh, Duy-Tai, Fujinami, Tsutomu, Huynh, Van-Nam

Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient

arXiv.org Artificial IntelligenceJan-26-2025

The problem of estimating the number of clusters (say k) is one of the major challenges for the partitional clustering. This paper proposes an algorithm named k-SCC to estimate the optimal k in categorical data clustering. For the clustering step, the algorithm uses the kernel density estimation approach to define cluster centers. In addition, it uses an information-theoretic based dissimilarity to measure the distance between centers and objects in each cluster. The silhouette analysis based approach is then used to evaluate the quality of different clusterings obtained in the former step to choose the best k. Comparative experiments were conducted on both synthetic and real datasets to compare the performance of k-SCC with three other algorithms. Experimental results show that k-SCC outperforms the compared algorithms in determining the number of clusters for each dataset.

algorithm, artificial intelligence, machine learning, (13 more...)

doi: 10.1007/978-981-15-1209-4_1

2501.15542

Country:

Asia > Singapore (0.04)
Asia > Japan (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
North America > United States > California > Alameda County > Oakland (0.04)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)