Goto

Collaborating Authors

 Yang, Yanwu


Why is the winner the best?

arXiv.org Artificial Intelligence

International benchmarking competitions have become fundamental for the comparative performance assessment of image analysis methods. However, little attention has been given to investigating what can be learnt from these competitions. Do they really generate scientific progress? What are common and successful participation strategies? What makes a solution superior to a competing method? To address this gap in the literature, we performed a multi-center study with all 80 competitions that were conducted in the scope of IEEE ISBI 2021 and MICCAI 2021. Statistical analyses performed based on comprehensive descriptions of the submitted algorithms linked to their rank as well as the underlying participation strategies revealed common characteristics of winning solutions. These typically include the use of multi-task learning (63%) and/or multi-stage pipelines (61%), and a focus on augmentation (100%), image preprocessing (97%), data curation (79%), and postprocessing (66%). The "typical" lead of a winning team is a computer scientist with a doctoral degree, five years of experience in biomedical image analysis, and four years of experience in deep learning. Two core general development strategies stood out for highly-ranked teams: the reflection of the metrics in the method design and the focus on analyzing and handling failure cases. According to the organizers, 43% of the winning algorithms exceeded the state of the art but only 11% completely solved the respective domain problem. The insights of our study could help researchers (1) improve algorithm development strategies when approaching new problems, and (2) focus on open research questions revealed by this work.


Keyword Decisions in Sponsored Search Advertising: A Literature Review and Research Agenda

arXiv.org Artificial Intelligence

In sponsored search advertising (SSA), keywords serve as the basic unit of business model, linking three stakeholders: consumers, advertisers and search engines. This paper presents an overarching framework for keyword decisions that highlights the touchpoints in search advertising management, including four levels of keyword decisions, i.e., domain-specific keyword pool generation, keyword targeting, keyword assignment and grouping, and keyword adjustment. Using this framework, we review the state-of-the-art research literature on keyword decisions with respect to techniques, input features and evaluation metrics. Finally, we discuss evolving issues and identify potential gaps that exist in the literature and outline novel research perspectives for future exploration.


Causality-based CTR Prediction using Graph Neural Networks

arXiv.org Artificial Intelligence

As a prevalent problem in online advertising, CTR prediction has attracted plentiful attention from both academia and industry. Recent studies have been reported to establish CTR prediction models in the graph neural networks (GNNs) framework. However, most of GNNs-based models handle feature interactions in a complete graph, while ignoring causal relationships among features, which results in a huge drop in the performance on out-of-distribution data. This paper is dedicated to developing a causality-based CTR prediction model in the GNNs framework (Causal-GNN) integrating representations of feature graph, user graph and ad graph in the context of online advertising. In our model, a structured representation learning method (GraphFwFM) is designed to capture high-order representations on feature graph based on causal discovery among field features in gated graph neural networks (GGNNs), and GraphSAGE is employed to obtain graph representations of users and ads. Experiments conducted on three public datasets demonstrate the superiority of Causal-GNN in AUC and Logloss and the effectiveness of GraphFwFM in capturing high-order representations on causal feature graph.


Keyword Targeting Optimization in Sponsored Search Advertising: Combining Selection and Matching

arXiv.org Artificial Intelligence

In sponsored search advertising (SSA), advertisers need to select keywords and determine matching types for selected keywords simultaneously, i.e., keyword targeting. An optimal keyword targeting strategy guarantees reaching the right population effectively. This paper aims to address the keyword targeting problem, which is a challenging task because of the incomplete information of historical advertising performance indices and the high uncertainty in SSA environments. First, we construct a data distribution estimation model and apply a Markov Chain Monte Carlo method to make inference about unobserved indices (i.e., impression and click-through rate) over three keyword matching types (i.e., broad, phrase and exact). Second, we formulate a stochastic keyword targeting model (BB-KSM) combining operations of keyword selection and keyword matching to maximize the expected profit under the chance constraint of the budget, and develop a branch-and-bound algorithm incorporating a stochastic simulation process for our keyword targeting model. Finally, based on a realworld dataset collected from field reports and logs of past SSA campaigns, computational experiments are conducted to evaluate the performance of our keyword targeting strategy. Experimental results show that, (a) BB-KSM outperforms seven baselines in terms of profit; (b) BB-KSM shows its superiority as the budget increases, especially in situations with more keywords and keyword combinations; (c) the proposed data distribution estimation approach can effectively address the problem of incomplete performance indices over the three matching types and in turn significantly promotes the performance of keyword targeting decisions. This research makes important contributions to the SSA literature and the results offer critical insights into keyword management for SSA advertisers.


Uncertainty Quantification in Medical Image Segmentation with Multi-decoder U-Net

arXiv.org Artificial Intelligence

Accurate medical image segmentation is crucial for diagnosis and analysis. However, the models without calibrated uncertainty estimates might lead to errors in downstream analysis and exhibit low levels of robustness. Estimating the uncertainty in the measurement is vital to making definite, informed conclusions. Especially, it is difficult to make accurate predictions on ambiguous areas and focus boundaries for both models and radiologists, even harder to reach a consensus with multiple annotations. In this work, the uncertainty under these areas is studied, which introduces significant information with anatomical structure and is as important as segmentation performance. We exploit the medical image segmentation uncertainty quantification by measuring segmentation performance with multiple annotations in a supervised learning manner and propose a U-Net based architecture with multiple decoders, where the image representation is encoded with the same encoder, and segmentation referring to each annotation is estimated with multiple decoders. Nevertheless, a cross loss function is proposed for bridging the gap between different branches. The proposed architecture is trained in an end-to-end manner and able to improve predictive uncertainty estimates. The model achieves comparable performance with fewer parameters to the integrated training model that ranked the runner-up in the MICCAI-QUBIQ 2020 challenge.