AITopics | Park, Haesun

Plotting

Park, Haesun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Randomized Algorithms for Symmetric Nonnegative Matrix Factorization

Hayashi, Koby, Aksoy, Sinan G., Ballard, Grey, Park, Haesun

arXiv.org Artificial IntelligenceFeb-12-2024

We propose the first randomized algorithms for Symmetric Nonnegative Matrix Factorization (SymNMF). Nonnegative Matrix Factorization (NMF) is an important method in data analysis with applications to data visualization, text mining, feature learning, information fusion and more [28, 25, 46, 22, 11]. SymNMF is a variant of NMF where the input matrix is symmetric and the output low-rank approximation is also constrained to be symmetric [25, 49]. Applications of SymNMF include (hyper)graph clustering, image segmentation, and information fusion [44, 10, 19, 5, 6]. Several randomized algorithms for nonsymmetric NMF have been previously proposed and shown to be effective for dense and small sparse problems [43, 41, 13], but as far as we are aware there is no prior work on randomized algorithms for SymNMF. Our contributions in this work include a randomized algorithm for SymNMF we call "Low-rank Approximated Input SymNMF" (LAI-SymNMF), a randomized algorithm based on leverage score sampling for least squares problems we call LvS-SymNMF, novel theoretical analysis of leverage score sampling for the Nonnegative Least Squares problem and theoretical analysis of a hybrid sampling scheme for leverage score sampling. The rest of the paper is organized as follows. Section 2, which discusses background material including non-randomized SymNMF algorithms, reviews existing randomized NMF methods and other related work such as randomized methods for other low-rank matrix decompositions and tensor decompositions.

artificial intelligence, leverage score, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2402.08134

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.45)
Government > Regional Government > North America Government > United States Government (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

WellFactor: Patient Profiling using Integrative Embedding of Healthcare Data

Choi, Dongjin, Xiang, Andy, Ozturk, Ozgur, Shrestha, Deep, Drake, Barry, Haidarian, Hamid, Javed, Faizan, Park, Haesun

arXiv.org Artificial IntelligenceDec-21-2023

In the rapidly evolving healthcare industry, platforms now have access to not only traditional medical records, but also diverse data sets encompassing various patient interactions, such as those from healthcare web portals. To address this rich diversity of data, we introduce WellFactor: a method that derives patient profiles by integrating information from these sources. Central to our approach is the utilization of constrained low-rank approximation. WellFactor is optimized to handle the sparsity that is often inherent in healthcare data. Moreover, by incorporating task-specific label information, our method refines the embedding results, offering a more informed perspective on patients. One important feature of WellFactor is its ability to compute embeddings for new, previously unobserved patient data instantaneously, eliminating the need to revisit the entire data set or recomputing the embedding. Comprehensive evaluations on real-world healthcare data demonstrate WellFactor's effectiveness. It produces better results compared to other existing methods in classification performance, yields meaningful clustering of patients, and delivers consistent results in patient similarity searches and predictions.

data mining, information, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2312.14129

Country:

North America > United States > Wisconsin (0.14)
North America > United States > California (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.88)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
(3 more...)

Add feedback

Patient Clustering via Integrated Profiling of Clinical and Digital Data

Choi, Dongjin, Xiang, Andy, Ozturk, Ozgur, Shrestha, Deep, Drake, Barry, Haidarian, Hamid, Javed, Faizan, Park, Haesun

arXiv.org Artificial IntelligenceAug-22-2023

We introduce a novel profile-based patient clustering model designed for clinical data in healthcare. By utilizing a method grounded on constrained low-rank approximation, our model takes advantage of patients' clinical data and digital interaction data, including browsing and search, to construct patient profiles. As a result of the method, nonnegative embedding vectors are generated, serving as a low-dimensional representation of the patients. Our model was assessed using real-world patient data from a healthcare web portal, with a comprehensive evaluation approach which considered clustering and recommendation capabilities. In comparison to other baselines, our approach demonstrated superior performance in terms of clustering coherence and recommendation accuracy.

data mining, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3583780.3615262

2308.11748

Country:

Europe (0.95)
North America > United States > New York (0.14)
North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Health Care Technology (0.68)
Health & Medicine > Health Care Providers & Services (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(2 more...)

Add feedback

Hypergraph Random Walks, Laplacians, and Clustering

Hayashi, Koby, Aksoy, Sinan G., Park, Cheong Hee, Park, Haesun

arXiv.org Machine LearningOct-27-2020

We propose a flexible framework for clustering hypergraph-structured data based on recently proposed random walks utilizing edge-dependent vertex weights. When incorporating edge-dependent vertex weights (EDVW), a weight is associated with each vertex-hyperedge pair, yielding a weighted incidence matrix of the hypergraph. Such weightings have been utilized in term-document representations of text data sets. We explain how random walks with EDVW serve to construct different hypergraph Laplacian matrices, and then develop a suite of clustering methods that use these incidence matrices and Laplacians for hypergraph clustering. Using several data sets from real-life applications, we compare the performance of these clustering algorithms experimentally against a variety of existing hypergraph clustering methods. We show that the proposed methods produce higher-quality clusters and conclude by highlighting avenues for future work.

artificial intelligence, hypergraph, machine learning, (17 more...)

arXiv.org Machine Learning

2006.16377

Country:

North America > United States > New York (0.14)
North America > United States > Pennsylvania (0.14)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

TASTE: Temporal and Static Tensor Factorization for Phenotyping Electronic Health Records

Afshar, Ardavan, Perros, Ioakeim, Park, Haesun, deFilippi, Christopher, Yan, Xiaowei, Stewart, Walter, Ho, Joyce, Sun, Jimeng

arXiv.org Machine LearningNov-13-2019

Phenotyping electronic health records (EHR) focuses on defining meaningful patient groups (e.g., heart failure group and diabetes group) and identifying the temporal evolution of patients in those groups. Tensor factorization has been an effective tool for phenotyping. Most of the existing works assume either a static patient representation with aggregate data or only model temporal data. However, real EHR data contain both temporal (e.g., longitudinal clinical visits) and static information (e.g., patient demographics), which are difficult to model simultaneously. In this paper, we propose Temporal And Static TEnsor factorization (TASTE) that jointly models both static and temporal information to extract phenotypes. TASTE combines the PARAFAC2 model with non-negative matrix factorization to model a temporal and a static tensor. To fit the proposed model, we transform the original problem into simpler ones which are optimally solved in an alternating fashion. For each of the sub-problems, our proposed mathematical reformulations lead to efficient sub-problem solvers. Comprehensive experiments on large EHR data from a heart failure (HF) study confirmed that TASTE is up to 14x faster than several baselines and the resulting phenotypes were confirmed to be clinically meaningful by a cardiologist. Using 80 phenotypes extracted by TASTE, a simple logistic regression can achieve the same level of area under the curve (AUC) for HF prediction compared to a deep learning model using recurrent neural networks (RNN) with 345 features.

deep learning, phenotype, vascular disease, (25 more...)

arXiv.org Machine Learning

1911.05843

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.55)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CoDiNMF: Co-Clustering of Directed Graphs via NMF

Lim, Woosang (Georgia Institute of Technology) | Du, Rundong (Georgia Institute of Technology) | Park, Haesun (Georgia Institute of Technology)

AAAI ConferencesFeb-8-2018

Co-clustering computes clusters of data items and the related features concurrently, and it has been used in many applications such as community detection, product recommendation, computer vision, and pricing optimization. In this paper, we propose a new co-clustering method, called CoDiNMF, which improves the clustering quality and finds directional patterns among co-clusters by using multiple directed and undirected graphs. We design the objective function of co-clustering by using min-cut criterion combined with an additional term which controls the sum of net directional flow between different co-clusters. In addition, we show that a variant of Nonnegative Matrix Factorization (NMF) can solve the proposed objective function effectively. We run experiments on the US patents and BlogCatalog data sets whose ground truth have been known, and show that CoDiNMF improves clustering results compared to other co-clustering methods in terms of average F1 score, Rand index, and adjusted Rand index (ARI). Finally, we compare CoDiNMF and other co-clustering methods on the Wikipedia data set of philosophers, and we can find meaningful directional flow of influence among co-clusters of philosophers.

artificial intelligence, codinmf, machine learning, (19 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country:

North America > United States (0.47)
Europe (0.29)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.75)

Add feedback

Hybrid Clustering based on Content and Connection Structure using Joint Nonnegative Matrix Factorization

Du, Rundong, Drake, Barry, Park, Haesun

arXiv.org Machine LearningMar-28-2017

We present a hybrid method for latent information discovery on the data sets containing both text content and connection structure based on constrained low rank approximation. The new method jointly optimizes the Nonnegative Matrix Factorization (NMF) objective function for text clustering and the Symmetric NMF (SymNMF) objective function for graph clustering. We propose an effective algorithm for the joint NMF objective function, based on a block coordinate descent (BCD) framework. The proposed hybrid method discovers content associations via latent connections found using SymNMF. The method can also be applied with a natural conversion of the problem when a hypergraph formulation is used or the content is associated with hypergraph edges. Experimental results show that by simultaneously utilizing both content and connection structure, our hybrid method produces higher quality clustering results compared to the other NMF clustering methods that uses content alone (standard NMF) or connection structure alone (SymNMF). We also present some interesting applications to several types of real world data such as citation recommendations of papers. The hybrid method proposed in this paper can also be applied to general data expressed with both feature space vectors and pairwise similarities and can be extended to the case with multiple feature spaces or multiple similarity measures.

algorithm, optimization problem, us government, (18 more...)

arXiv.org Machine Learning

1703.09646

Country:

North America > United States > California (0.14)
North America > United States > Oregon (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Law (0.94)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

PIVE: Per-Iteration Visualization Environment for Real-Time Interactions with Dimension Reduction and Clustering

AAAI ConferencesFeb-14-2017

One of the key advantages of visual analytics is its capability to leverage both humans's visual perception and the power of computing. A big obstacle in integrating machine learning with visual analytics is its high computing cost. To tackle this problem, this paper presents PIVE (Per-Iteration Visualization Environment) that supports real-time interactive visualization with machine learning. By immediately visualizing the intermediate results from algorithm iterations, PIVE enables users to quickly grasp insights and interact with the intermediate output, which then affects subsequent algorithm iterations. In addition, we propose a widely-applicable interaction methodology that allows efficient incorporation of user feedback into virtually any iterative computational method without introducing additional computational cost. We demonstrate the application of PIVE for various dimension reduction algorithms such as multidimensional scaling and t-SNE and clustering and topic modeling algorithms such as k-means and latent Dirichlet allocation.

artificial intelligence, iteration, machine learning, (16 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States > Virginia (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Add feedback

Outlier Detection for Text Data : An Extended Version

Kannan, Ramakrishnan, Woo, Hyenkyun, Aggarwal, Charu C., Park, Haesun

arXiv.org Machine LearningJan-5-2017

The problem of outlier detection is extremely challenging in many domains such as text, in which the attribute values are typically non-negative, and most values are zero. In such cases, it often becomes difficult to separate the outliers from the natural variations in the patterns in the underlying data. In this paper, we present a matrix factorization method, which is naturally able to distinguish the anomalies with the use of low rank approximations of the underlying data. Our iterative algorithm TONMF is based on block coordinate descent (BCD) framework. We define blocks over the term-document matrix such that the function becomes solvable. Given most recently updated values of other matrix blocks, we always update one block at a time to its optimal. Our approach has significant advantages over traditional methods for text outlier detection. Finally, we present experimental results illustrating the effectiveness of our method over competing methods.

artificial intelligence, outlier, us government, (20 more...)

arXiv.org Machine Learning

1701.01325

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Industry: Government > Regional Government > North America Government > United States Government (0.69)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization

Kannan, Ramakrishnan, Ballard, Grey, Park, Haesun

arXiv.org Machine LearningSep-28-2016

Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors $W$ and $H$, for the given input matrix $A$, such that $A \approx W H$. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient parallel algorithms to solve the problem for big data sets. The main contribution of this work is a new, high-performance parallel computational framework for a broad class of NMF algorithms that iteratively solves alternating non-negative least squares (NLS) subproblems for $W$ and $H$. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). The framework is flexible and able to leverage a variety of NMF and NLS algorithms, including Multiplicative Update, Hierarchical Alternating Least Squares, and Block Principal Pivoting. Our implementation allows us to benchmark and compare different algorithms on massive dense and sparse data matrices of size that spans for few hundreds of millions to billions. We demonstrate the scalability of our algorithm and compare it with baseline implementations, showing significant performance improvements. The code and the datasets used for conducting the experiments are available online.

algorithm, optimization problem, us government, (21 more...)

arXiv.org Machine Learning

1609.09154

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback