Minimax Localization of Structural Information in Large Noisy Matrices

Mar-15-2024, 10:00:40 GMT–Neural Information Processing Systems

We consider the problem of identifying a sparse set of relevant columns and rows in a large data matrix with highly corrupted entries. This problem of identifying groups from a collection of bipartite variables such as proteins and drugs, biological species and gene sequences, malware and signatures, etc is commonly referred to as biclustering or co-clustering. Despite its great practical relevance, and although several ad-hoc methods are available for biclustering, theoretical analysis of the problem is largely non-existent. The problem we consider is also closely related to structured multiple hypothesis testing, an area of statistics that has recently witnessed a flurry of activity. We make the following contributions 1. We prove lower bounds on the minimum signal strength needed for successful recovery of a bicluster as a function of the noise variance, size of the matrix and bicluster of interest.

artificial intelligence, machine learning, procedure, (16 more...)

Neural Information Processing Systems

Mar-15-2024, 10:00:40 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.28)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning (1.00)
  - Representation & Reasoning > Search (0.84)