AITopics | Kang, Eun Yong

Collaborating Authors

Kang, Eun Yong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Hybrid Encoder: Towards Efficient and Precise Native AdsRecommendation via Hybrid Transformer Encoding Networks

Yang, Junhan, Liu, Zheng, Jin, Bowen, Lian, Jianxun, Lian, Defu, Soni, Akshay, Kang, Eun Yong, Wang, Yajun, Sun, Guangzhong, Xie, Xing

arXiv.org Artificial IntelligenceApr-22-2021

Transformer encoding networks have been proved to be a powerful tool of understanding natural languages. They are playing a critical role in native ads service, which facilitates the recommendation of appropriate ads based on user's web browsing history. For the sake of efficient recommendation, conventional methods would generate user and advertisement embeddings independently with a siamese transformer encoder, such that approximate nearest neighbour search (ANN) can be leveraged. Given that the underlying semantic about user and ad can be complicated, such independently generated embeddings are prone to information loss, which leads to inferior recommendation quality. Although another encoding strategy, the cross encoder, can be much more accurate, it will lead to huge running cost and become infeasible for realtime services, like native ads recommendation. In this work, we propose hybrid encoder, which makes efficient and precise native ads recommendation through two consecutive steps: retrieval and ranking. In the retrieval step, user and ad are encoded with a siamese component, which enables relevant candidates to be retrieved via ANN search. In the ranking step, it further represents each ad with disentangled embeddings and each user with ad-related embeddings, which contributes to the fine-grained selection of high-quality ads from the candidate set. Both steps are light-weighted, thanks to the pre-computed and cached intermedia results. To optimize the hybrid encoder's performance in this two-stage workflow, a progressive training pipeline is developed, which builds up the model's capability in the retrieval and ranking task step-by-step. The hybrid encoder's effectiveness is experimentally verified: with very little additional cost, it outperforms the siamese encoder significantly and achieves comparable recommendation quality as the cross encoder.

artificial intelligence, encoder, natural language, (15 more...)

arXiv.org Artificial Intelligence

2104.10925

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Marketing (1.00)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

A powerful and efficient set test for genetic markers that handles confounders

Listgarten, Jennifer, Lippert, Christoph, Kang, Eun Yong, Xiang, Jing, Kadie, Carl M., Heckerman, David

arXiv.org Machine LearningApr-8-2013

Approaches for testing sets of variants, such as a set of rare or common variants within a gene or pathway, for association with complex traits are important. In particular, set tests allow for aggregation of weak signal within a set, can capture interplay among variants, and reduce the burden of multiple hypothesis testing. Until now, these approaches did not address confounding by family relatedness and population structure, a problem that is becoming more important as larger data sets are used to increase power. Results: We introduce a new approach for set tests that handles confounders. Our model is based on the linear mixed model and uses two random effects-one to capture the set association signal and one to capture confounders. We also introduce a computational speedup for two-random-effects models that makes this approach feasible even for extremely large cohorts. Using this model with both the likelihood ratio test and score test, we find that the former yields more power while controlling type I error. Application of our approach to richly structured GAW14 data demonstrates that our method successfully corrects for population structure and family relatedness, while application of our method to a 15,000 individual Crohn's disease case-control cohort demonstrates that it additionally recovers genes not recoverable by univariate analysis. Availability: A Python-based library implementing our approach is available at http://mscompbio.codeplex.com

artificial intelligence, health & medicine, snp, (18 more...)

arXiv.org Machine Learning

doi: 10.1093/bioinformatics/btt177

1205.0793

Country:

Europe (0.68)
North America > United States > New York > New York County > New York City (0.28)

Genre: Research Report > Experimental Study (0.73)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features

Kang, Eun Yong (University of California, Los Angeles) | Shpitser, Ilya (Harvard School of Public Health) | Eskin, Eleazar (University of California, Los Angeles)

AAAI ConferencesJul-15-2010

There have been many efforts to identify causal graphical features such as directed edges between random variables from observational data. Recently, Tian et al. proposed a new dynamic programming algorithm which computes marginalized posterior probabilities of directed edge features over all the possible structures in O( n 3 n ) time when the number of parents per node is bounded by a constant, where n is the number of variables of interest. However the main drawback of this approach is that deciding a single appropriate threshold for the existence of the directed edge feature is difficult due to the scale difference of the posterior probabilities between the directed edges forming v- structures and the directed edges not forming v -structures. We claim that computing posterior probabilities of both adjacencies and v -structures is necessary and more effective for discovering causal graphical features, since it allows us to find a single appropriate decision threshold for the existence of the feature that we are testing. For efficient computation, we provide a novel dynamic programming algorithm which computes the posterior probabilities of all of n ( n – 1)/2 adjacency and n ( n –1 choose 2) v -structure features in O( n 3 * 3 n ) time.

Add feedback