AITopics | similarity calculation

Collaborating Authors

similarity calculation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Accelerating spherical K-means clustering for large-scale sparse document data

Aoyama, Kazuo, Saito, Kazumi

arXiv.org Machine LearningNov-18-2024

This paper presents an accelerated spherical K-means clustering algorithm for large-scale and high-dimensional sparse document data sets. We design an algorithm working in an architecture-friendly manner (AFM), which is a procedure of suppressing performance-degradation factors such as the numbers of instructions, branch mispredictions, and cache misses in CPUs of a modern computer system. For the AFM operation, we leverage unique universal characteristics (UCs) of a data-object and a cluster's mean set, which are skewed distributions on data relationships such as Zipf's law and a feature-value concentration phenomenon. The UCs indicate that the most part of the number of multiplications for similarity calculations is executed regarding terms with high document frequencies (df) and the most part of a similarity between an object- and a mean-feature vector is obtained by the multiplications regarding a few high mean-feature values. Our proposed algorithm applies an inverted-index data structure to a mean set, extracts the specific region with high-df terms and high mean-feature values in the mean-inverted index by newly introduced two structural parameters, and exploits the index divided into three parts for efficient pruning. The algorithm determines the two structural parameters by minimizing the approximate number of multiplications related to that of instructions, reduces the branch mispredictions by sharing the index structure including the two parameters with all the objects, and suppressing the cache misses by keeping in the caches the frequently used data in the foregoing specific region, resulting in working in the AFM. We experimentally demonstrate that our algorithm efficiently achieves superior speed performance in large-scale documents compared with algorithms using the state-of-the-art techniques.

algorithm, centroid, similarity, (16 more...)

arXiv.org Machine Learning

2411.113

Country:

Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
North America > United States > New Jersey (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Mind Scramble: Unveiling Large Language Model Psychology Via Typoglycemia

Yu, Miao, Mao, Junyuan, Zhang, Guibin, Ye, Jingheng, Fang, Junfeng, Zhong, Aoxiao, Liu, Yang, Liang, Yuxuan, Wang, Kun, Wen, Qingsong

arXiv.org Artificial IntelligenceOct-23-2024

Research into the external behaviors and internal mechanisms of large language models (LLMs) has shown promise in addressing complex tasks in the physical world. Studies suggest that powerful LLMs, like GPT-4, are beginning to exhibit human-like cognitive abilities, including planning, reasoning, and reflection. In this paper, we introduce a research line and methodology called LLM Psychology, leveraging human psychology experiments to investigate the cognitive behaviors and mechanisms of LLMs. We migrate the Typoglycemia phenomenon from psychology to explore the "mind" of LLMs. Unlike human brains, which rely on context and word patterns to comprehend scrambled text, LLMs use distinct encoding and decoding processes. Through Typoglycemia experiments at the character, word, and sentence levels, we observe: (I) LLMs demonstrate human-like behaviors on a macro scale, such as lower task accuracy and higher token/time consumption; (II) LLMs exhibit varying robustness to scrambled input, making Typoglycemia a benchmark for model evaluation without new datasets; (III) Different task types have varying impacts, with complex logical tasks (e.g., math) being more challenging in scrambled form; (IV) Each LLM has a unique and consistent "cognitive pattern" across tasks, revealing general mechanisms in its psychology process. We provide an in-depth analysis of hidden layers to explain these phenomena, paving the way for future research in LLM Psychology and deeper interpretability.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.01677

Country:

North America > United States (0.46)
Asia (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Energy > Oil & Gas (1.00)
Leisure & Entertainment > Sports > Football (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Linear Attention Based Deep Nonlocal Means Filtering for Multiplicative Noise Removal

Siyao, Xiao, Libing, Huang, Shunsheng, Zhang

arXiv.org Artificial IntelligenceJul-6-2024

Multiplicative noise widely exists in radar images, medical images and other important fields' images. Compared to normal noises, multiplicative noise has a generally stronger effect on the visual expression of images. Aiming at the denoising problem of multiplicative noise, we linearize the nonlocal means algorithm with deep learning and propose a linear attention mechanism based deep nonlocal means filtering (LDNLM). Starting from the traditional nonlocal means filtering, we employ deep channel convolution neural networks to extract the information of the neighborhood matrix and obtain representation vectors of every pixel. Then we replace the similarity calculation and weighted averaging processes with the inner operations of the attention mechanism. To reduce the computational overhead, through the formula of similarity calculation and weighted averaging, we derive a nonlocal filter with linear complexity. Experiments on both simulated and real multiplicative noise demonstrate that the LDNLM is more competitive compared with the state-of-the-art methods. Additionally, we prove that the LDNLM possesses interpretability close to traditional NLM.

ldnlm, multiplicative noise, noise, (14 more...)

arXiv.org Artificial Intelligence

2407.05087

Country: Asia > China > Sichuan Province > Chengdu (0.04)

Genre: Research Report (0.70)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

H3DFact: Heterogeneous 3D Integrated CIM for Factorization with Holographic Perceptual Representations

Wan, Zishen, Liu, Che-Kai, Ibrahim, Mohamed, Yang, Hanchen, Spetalnick, Samuel, Krishna, Tushar, Raychowdhury, Arijit

arXiv.org Artificial IntelligenceApr-5-2024

Disentangling attributes of various sensory signals is central to human-like perception and reasoning and a critical task for higher-order cognitive and neuro-symbolic AI systems. An elegant approach to represent this intricate factorization is via high-dimensional holographic vectors drawing on brain-inspired vector symbolic architectures. However, holographic factorization involves iterative computation with high-dimensional matrix-vector multiplications and suffers from non-convergence problems. In this paper, we present H3DFact, a heterogeneous 3D integrated in-memory compute engine capable of efficiently factorizing high-dimensional holographic representations. H3DFact exploits the computation-in-superposition capability of holographic vectors and the intrinsic stochasticity associated with memristive-based 3D compute-in-memory. Evaluated on large-scale factorization and perceptual problems, H3DFact demonstrates superior capability in factorization accuracy and operational capacity by up to five orders of magnitude, with 5.5x compute density, 1.2x energy efficiency improvements, and 5.9x less silicon footprint compared to iso-capacity 2D designs.

factorization, h3df act, vector, (15 more...)

arXiv.org Artificial Intelligence

2404.04173

Country: North America > United States > Virginia (0.04)

Genre: Research Report (0.50)

Industry: Semiconductors & Electronics (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

Improving ICD-based semantic similarity by accounting for varying degrees of comorbidity

Schneider, Jan Janosch, Adler, Marius, Ammer-Herrmenau, Christoph, König, Alexander Otto, Sax, Ulrich, Hügel, Jonas

arXiv.org Artificial IntelligenceAug-14-2023

Finding similar patients is a common objective in precision medicine, facilitating treatment outcome assessment and clinical decision support. Choosing widely-available patient features and appropriate mathematical methods for similarity calculations is crucial. International Statistical Classification of Diseases and Related Health Problems (ICD) codes are used worldwide to encode diseases and are available for nearly all patients. Aggregated as sets consisting of primary and secondary diagnoses they can display a degree of comorbidity and reveal comorbidity patterns. It is possible to compute the similarity of patients based on their ICD codes by using semantic similarity algorithms. These algorithms have been traditionally evaluated using a single-term expert rated data set. However, real-word patient data often display varying degrees of documented comorbidities that might impair algorithm performance. To account for this, we present a scale term that considers documented comorbidity-variance. In this work, we compared the performance of 80 combinations of established algorithms in terms of semantic similarity based on ICD-code sets. The sets have been extracted from patients with a C25.X (pancreatic cancer) primary diagnosis and provide a variety of different combinations of ICD-codes. Using our scale term we yielded the best results with a combination of level-based information content, Leacock & Chodorow concept similarity and bipartite graph matching for the set similarities reaching a correlation of 0.75 with our expert's ground truth. Our results highlight the importance of accounting for comorbidity variance while demonstrating how well current semantic similarity algorithms perform.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2308.07359

Country:

Europe > Germany > Lower Saxony > Gottingen (0.16)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Ohio > Franklin County > Columbus (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A HRNet-based Rehabilitation Monitoring System

Hung, Yi-Ching, Jiang, Yu-Qing, Liou, Fong-Syuan, Tsao, Yu-Hsuan, Chiang, Zi-Cing, Sun, MIn-Te

arXiv.org Artificial IntelligenceJul-14-2023

The rehabilitation treatment helps to heal minor sports and occupational injuries. In a traditional rehabilitation process, a therapist will assign certain actions to a patient to perform in between hospital visits, and it will rely on the patient to remember actions correctly and the schedule to perform them. Unfortunately, many patients forget to perform actions or fail to recall actions in detail. As a consequence, the rehabilitation treatment is hampered or, in the worst case, the patient may suffer from additional injury caused by performing incorrect actions. To resolve these issues, we propose a HRNet-based rehabilitation monitoring system, which can remind a patient when to perform the actions and display the actions for the patient to follow via the patient's smartphone. In addition, it helps the therapist to monitor the progress of the rehabilitation for the patient. Our system consists of an iOS app and several components at the server side. The app is in charge of displaying and collecting action videos. The server computes the similarity score between the therapist's actions and the patient's in the videos to keep track of the number of repetitions of each action. Theses stats will be shown to both of the patient and therapist. The extensive experiments show that the F1-Score of the similarity calculation is as high as 0.9 and the soft accuracy of the number of repetitions is higher than 90%.

artificial intelligence, keypoint, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2306.10756

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Taiwan (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Mobile (0.66)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.31)

Add feedback

In-memory factorization of holographic perceptual representations

Langenegger, Jovin, Karunaratne, Geethan, Hersche, Michael, Benini, Luca, Sebastian, Abu, Rahimi, Abbas

arXiv.org Artificial IntelligenceFeb-16-2023

Disentanglement of constituent factors of a sensory signal is central to perception and cognition and hence is a critical task for future artificial intelligence systems. In this paper, we present a compute engine capable of efficiently factorizing holographic perceptual representations by exploiting the computation-in-superposition capability of brain-inspired hyperdimensional computing and the intrinsic stochasticity associated with analog in-memory computing based on nanoscale memristive devices. Such an iterative in-memory factorizer is shown to solve at least five orders of magnitude larger problems that cannot be solved otherwise, while also significantly lowering the computational time and space complexity. We present a large-scale experimental demonstration of the factorizer by employing two in-memory compute chips based on phase-change memristive devices. The dominant matrix-vector multiply operations are executed at O(1) thus reducing the computational time complexity to merely the number of iterations. Moreover, we experimentally demonstrate the ability to factorize visual perceptual representations reliably and efficiently.

artificial intelligence, in-memory factorizer, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1038/s41565-023-01357-8

2211.05052

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)

Add feedback

Understanding Graph Embeddings

#artificialintelligenceJul-18-2022, 21:55:21 GMT

In the last year, graph embeddings have become increasingly important in Enterprise Knowledge Graph (EKG) strategy. Graph embeddings will soon become the de facto way to quickly find similar items in large billion-vertex EKGs. And as we have discussed in our prior articles, real-time similarity calculations are critical to many areas such as recommendation, next best action, and cohort building. The goal of this article is to give you an intuitive feeling for what graph embeddings are and how they are used so you can decide if these are right for your EKG project. For those of you with a bit of data science background, we will also touch a bit on how they are calculated. For the most part, we will be using storytelling and metaphors to explain these concepts.

calculation, graph, mowgli, (16 more...)

#artificialintelligence

Industry:

Information Technology (0.69)
Health & Medicine > Therapeutic Area (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.71)

Add feedback

Structured Inverted-File k-Means Clustering for High-Dimensional Sparse Data

Aoyama, Kazuo, Saito, Kazumi

arXiv.org Machine LearningMar-30-2021

This paper presents an architecture-friendly k-means clustering algorithm called SIVF for a large-scale and high-dimensional sparse data set. Algorithm efficiency on time is often measured by the number of costly operations such as similarity calculations. In practice, however, it depends greatly on how the algorithm adapts to an architecture of the computer system which it is executed on. Our proposed SIVF employs invariant centroid-pair based filter (ICP) to decrease the number of similarity calculations between a data object and centroids of all the clusters. To maximize the ICP performance, SIVF exploits for a centroid set an inverted-file that is structured so as to reduce pipeline hazards. We demonstrate in our experiments on real large-scale document data sets that SIVF operates at higher speed and with lower memory consumption than existing algorithms. Our performance analysis reveals that SIVF achieves the higher speed by suppressing performance degradation factors of the number of cache misses and branch mispredictions rather than less similarity calculations.

algorithm, feature vector, similarity calculation, (14 more...)

arXiv.org Machine Learning

2103.16141

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
North America > United States > New Jersey (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Comparative Visual Analytics for Assessing Medical Records with Sequence Embedding

Guo, Rongchen, Fujiwara, Takanori, Li, Yiran, Lima, Kelly M., Sen, Soman, Tran, Nam K., Ma, Kwan-Liu

arXiv.org Machine LearningFeb-18-2020

Machine learning for data-driven diagnosis has been actively studied in medicine to provide better healthcare. Supporting analysis of a patient cohort similar to a patient under treatment is a key task for clinicians to make decisions with high confidence. However, such analysis is not straightforward due to the characteristics of medical records: high dimensionality, irregularity in time, and sparsity. To address this challenge, we introduce a method for similarity calculation of medical records. Our method employs event and sequence embeddings. While we use an autoencoder for the event embedding, we apply its variant with the self-attention mechanism for the sequence embedding. Moreover, in order to better handle the irregularity of data, we enhance the self-attention mechanism with consideration of different time intervals. We have developed a visual analytics system to support comparative studies of patient records. To make a comparison of sequences with different lengths easier, our system incorporates a sequence alignment method. Through its interactive interface, the user can quickly identify patients of interest and conveniently review both the temporal and multivariate aspects of the patient records. We demonstrate the effectiveness of our design and system with case studies using a real-world dataset from the neonatal intensive care unit of UC Davis.

medical record, neonate, sequence, (16 more...)

arXiv.org Machine Learning

2002.08356

Country: North America > United States > California > Yolo County > Davis (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Health Care Providers & Services (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Biomedical Informatics (1.00)
(4 more...)

Add feedback