Goto

Collaborating Authors

 clue


Cross-Linked Unified Embedding for cross-modality representation learning

Neural Information Processing Systems

Multi-modal learning is essential for understanding information in the real world. Jointly learning from multi-modal data enables global integration of both shared and modality-specific information, but current strategies often fail when observations from certain modalities are incomplete or missing for part of the subjects. To learn comprehensive representations based on such modality-incomplete data, we present a semi-supervised neural network model called CLUE (Cross-Linked Unified Embedding). Extending from multi-modal VAEs, CLUE introduces the use of cross-encoders to construct latent representations from modality-incomplete observations. Representation learning for modality-incomplete observations is common in genomics. For example, human cells are tightly regulated across multiple related but distinct modalities such as DNA, RNA, and protein, jointly defining a cell's function. We benchmark CLUE on multi-modal data from single cell measurements, illustrating CLUE's superior performance in all assessed categories of the NeurIPS 2021 Multimodal Single-cell Data Integration Competition. While we focus on analysis of single cell genomic datasets, we note that the proposed cross-linked embedding strategy could be readily applied to other cross-modality representation learning problems.


CLUES: Collaborative Private-domain High-quality Data Selection for LLMs via Training Dynamics

Neural Information Processing Systems

Recent research has highlighted the importance of data quality in scaling large language models (LLMs). However, automated data quality control faces unique challenges in collaborative settings where sharing is not allowed directly between data silos. To tackle this issue, this paper proposes a novel data quality control technique based on the notion of data influence on the training dynamics of LLMs, that high quality data are more likely to have similar training dynamics to the anchor dataset. We then leverage the influence of the training dynamics to select high-quality data from different private domains, with centralized model updates on the server side in a collaborative training fashion by either model merging or federated learning. As for the data quality indicator, we compute the per-sample gradients with respect to the private data and the anchor dataset, and use the trace of the accumulated inner products as a measurement of data quality.


Online Bayesian Persuasion Without a Clue

Neural Information Processing Systems

We study online Bayesian persuasion problems in which an informed sender repeatedly faces a receiver with the goal of influencing their behavior through the provision of payoff-relevant information. Previous works assume that the sender has knowledge about either the prior distribution over states of nature or receiver's utilities, or both. We relax such unrealistic assumptions by considering settings in which the sender does not know anything about the prior and the receiver. We design an algorithm that achieves sublinear---in the number of rounds T---regret with respect to an optimal signaling scheme, and we also provide a collection of lower bounds showing that the guarantees of such an algorithm are tight. Our algorithm works by searching a suitable space of signaling schemes in order to learn receiver's best responses.


Pincus

AAAI Conferences

The agent has the ability to automatically generate clues and update its dialogue policy dynamically based on user input.


Sudoku Puzzles

AI Magazine

Each row, each column, and each 3x3 box must contain every number from 1 to 9. Additional number clues can be found by answering these questions:. Each row, each column, and each 3x3 box must contain each of the 9 different letters. If completed properly, a nine letter word will be revealed. Puzzle solutions can be found on page 107.


Countrywide Loan-Underwriting Expert System

AI Magazine

Loan underwriting is the process of evaluating a loan application to determine whether the loan should be funded. The process often starts with a potential borrower walking into a branch office and requesting a loan to purchase or refinance a home. A processor asks the borrower to fill out an application, setting in motion a lengthy information-gathering process in which as many as 1500 data-element pieces will eventually be collected. This loan information includes items about the borrower's employment, income, assets, liabilities, and monthly expenses. During the process, a credit report and appraisal will be ordered from a third-party vendor.


Building Watson: An Overview of the DeepQA Project

AI Magazine

IBM Research undertook a challenge to build a computer system that could compete at the human champion level in real time on the American TV quiz show, Jeopardy. The extent of the challenge includes fielding a real-time automatic contestant on the show, not merely a laboratory exercise. The Jeopardy Challenge helped us address requirements that led to the design of the DeepQA architecture and the implementation of Watson. After three years of intense research and development by a core team of about 20 researchers, Watson is performing at human expert levels in terms of precision, confidence, and speed at the Jeopardy quiz show. Our results strongly suggest that DeepQA is an effective and extensible architecture that can be used as a foundation for combining, deploying, evaluating, and advancing a wide range of algorithmic techniques to rapidly advance the field of question answering (QA).


A Web-Based Agent Challenges Human Experts on Crosswords

AI Magazine

Since the birth of artificial intelligence (AI), games and puzzles have received much attention. The game that has captured most of the attention of computer scientists is chess. The founding fathers of AI such as McCarthy, Simon (Simon and Schaeffer 1992), Samuel, Shannon (Shannon 1950), Turing, and Von Neumann have all been involved in automatic chess playing. After decades of unsuccessful attempts (Mittman and Newborn 1980, Munakata 1996), the IBM machine Deep Blue achieved the astonishing result of defeating world champion Gary Kasparov in May 1997 (Campbell, Hoane, and Hsu 2002). Games play the role of a laboratory where machines can safely be tested by a direct competition with humans.