Jin, Ying
SPA: A Graph Spectral Alignment Perspective for Domain Adaptation
Xiao, Zhiqing, Wang, Haobo, Jin, Ying, Feng, Lei, Chen, Gang, Huang, Fei, Zhao, Junbo
Unsupervised domain adaptation (UDA) is a pivotal form in machine learning to extend the in-domain model to the distinctive target domains where the data distributions differ. Most prior works focus on capturing the inter-domain transferability but largely overlook rich intra-domain structures, which empirically results in even worse discriminability. In this work, we introduce a novel graph SPectral Alignment (SPA) framework to tackle the tradeoff. The core of our method is briefly condensed as follows: (i)-by casting the DA problem to graph primitives, SPA composes a coarse graph alignment mechanism with a novel spectral regularizer towards aligning the domain graphs in eigenspaces; (ii)-we further develop a fine-grained message propagation module -- upon a novel neighbor-aware self-training mechanism -- in order for enhanced discriminability in the target domain. On standardized benchmarks, the extensive experiments of SPA demonstrate that its performance has surpassed the existing cutting-edge DA methods. Coupled with dense model analysis, we conclude that our approach indeed possesses superior efficacy, robustness, discriminability, and transferability. Code and data are available at: https://github.com/CrownX/SPA.
Topological properties and organizing principles of semantic networks
Budel, Gabriel, Jin, Ying, Van Mieghem, Piet, Kitsak, Maksim
Interpreting natural language is an increasingly important task in computer algorithms due to the growing availability of unstructured textual data. Natural Language Processing (NLP) applications rely on semantic networks for structured knowledge representation. The fundamental properties of semantic networks must be taken into account when designing NLP algorithms, yet they remain to be structurally investigated. We study the properties of semantic networks from ConceptNet, defined by 7 semantic relations from 11 different languages. We find that semantic networks have universal basic properties: they are sparse, highly clustered, and many exhibit power-law degree distributions. Our findings show that the majority of the considered networks are scale-free. Some networks exhibit language-specific properties determined by grammatical rules, for example networks from highly inflected languages, such as e.g. Latin, German, French and Spanish, show peaks in the degree distribution that deviate from a power law. We find that depending on the semantic relation type and the language, the link formation in semantic networks is guided by different principles. In some networks the connections are similarity-based, while in others the connections are more complementarity-based. Finally, we demonstrate how knowledge of similarity and complementarity in semantic networks can improve NLP algorithms in missing link inference.
UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers
Shi, Dachuan, Tao, Chaofan, Jin, Ying, Yang, Zhendong, Yuan, Chun, Wang, Jiaqi
Real-world data contains a vast amount of multimodal information, among which vision and language are the two most representative modalities. Moreover, increasingly heavier models, \textit{e}.\textit{g}., Transformers, have attracted the attention of researchers to model compression. However, how to compress multimodal models, especially vison-language Transformers, is still under-explored. This paper proposes the \textbf{U}nified and \textbf{P}r\textbf{o}gressive \textbf{P}runing (\textbf{\emph{UPop}}) as a universal vison-language Transformer compression framework, which incorporates 1) unifiedly searching multimodal subnets in a continuous optimization space from the original model, which enables automatic assignment of pruning ratios among compressible modalities and structures; 2) progressively searching and retraining the subnet, which maintains convergence between the search and retrain to attain higher compression ratios. Experiments on various tasks, datasets, and model architectures demonstrate the effectiveness and versatility of the proposed UPop framework. The code is available at https://github.com/sdc17/UPop.
Selection by Prediction with Conformal p-values
Jin, Ying, Candès, Emmanuel J.
Decision making or scientific discovery pipelines such as job hiring and drug discovery often involve multiple stages: before any resource-intensive step, there is often an initial screening that uses predictions from a machine learning model to shortlist a few candidates from a large pool. We study screening procedures that aim to select candidates whose unobserved outcomes exceed user-specified values. We develop a method that wraps around any prediction model to produce a subset of candidates while controlling the proportion of falsely selected units. Building upon the conformal inference framework, our method first constructs p-values that quantify the statistical evidence for large outcomes; it then determines the shortlist by comparing the p-values to a threshold introduced in the multiple testing literature. In many cases, the procedure selects candidates whose predictions are above a data-dependent threshold. Our theoretical guarantee holds under mild exchangeability conditions on the samples, generalizing existing results on multiple conformal p-values. We demonstrate the empirical performance of our method via simulations, and apply it to job hiring and drug discovery datasets.
Upper bounds on the Natarajan dimensions of some function classes
Jin, Ying
The Natarajan dimension is a fundamental tool for characterizing multi-class PAC learnability, generalizing the Vapnik-Chervonenkis (VC) dimension from binary to multi-class classification problems. This work establishes upper bounds on Natarajan dimensions for certain function classes, including (i) multi-class decision tree and random forests, and (ii) multi-class neural networks with binary, linear and ReLU activations. These results may be relevant for describing the performance of certain multi-class learning algorithms.
Policy learning "without'' overlap: Pessimism and generalized empirical Bernstein's inequality
Jin, Ying, Ren, Zhimei, Yang, Zhuoran, Wang, Zhaoran
This paper studies offline policy learning, which aims at utilizing observations collected a priori (from either fixed or adaptively evolving behavior policies) to learn the optimal individualized decision rule in a given class. Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics are lower bounded in the offline dataset. In other words, the performance of these methods depends on the worst-case propensity in the offline dataset. As one has no control over the data collection process, this assumption can be unrealistic in many situations, especially when the behavior policies are allowed to evolve over time with diminishing propensities. In this paper, we propose a new algorithm that optimizes lower confidence bounds (LCBs) -- instead of point estimates -- of the policy values. The LCBs are constructed by quantifying the estimation uncertainty of the augmented inverse propensity weighted (AIPW)-type estimators using knowledge of the behavior policies for collecting the offline data. Without assuming any uniform overlap condition, we establish a data-dependent upper bound for the suboptimality of our algorithm, which depends only on (i) the overlap for the optimal policy, and (ii) the complexity of the policy class. As an implication, for adaptively collected data, we ensure efficient policy learning as long as the propensities for optimal actions are lower bounded over time, while those for suboptimal ones are allowed to diminish arbitrarily fast. In our theoretical analysis, we develop a new self-normalized concentration inequality for IPW estimators, generalizing the well-known empirical Bernstein's inequality to unbounded and non-i.i.d. data.
Is Object Detection Necessary for Human-Object Interaction Recognition?
Jin, Ying, Chen, Yinpeng, Wang, Lijuan, Wang, Jianfeng, Yu, Pei, Liu, Zicheng, Hwang, Jenq-Neng
This paper revisits human-object interaction (HOI) recognition at image level without using supervisions of object location and human pose. We name it detection-free HOI recognition, in contrast to the existing detection-supervised approaches which rely on object and keypoint detections to achieve state of the art. With our method, not only the detection supervision is evitable, but superior performance can be achieved by properly using image-text pre-training (such as CLIP) and the proposed Log-Sum-Exp Sign (LSE-Sign) loss function. Specifically, using text embeddings of class labels to initialize the linear classifier is essential for leveraging the CLIP pre-trained image encoder. In addition, LSE-Sign loss facilitates learning from multiple labels on an imbalanced dataset by normalizing gradients over all classes in a softmax format. Surprisingly, our detection-free solution achieves 60.5 mAP on the HICO dataset, outperforming the detection-supervised state of the art by 13.4 mAP
Is Pessimism Provably Efficient for Offline RL?
Jin, Ying, Yang, Zhuoran, Wang, Zhaoran
We study offline reinforcement learning (RL), which aims to learn an optimal policy based on a dataset collected a priori. Due to the lack of further interactions with the environment, offline RL suffers from the insufficient coverage of the dataset, which eludes most existing theoretical analysis. In this paper, we propose a pessimistic variant of the value iteration algorithm (PEVI), which incorporates an uncertainty quantifier as the penalty function. Such a penalty function simply flips the sign of the bonus function for promoting exploration in online RL, which makes it easily implementable and compatible with general function approximators. Without assuming the sufficient coverage of the dataset, we establish a data-dependent upper bound on the suboptimality of PEVI for general Markov decision processes (MDPs). When specialized to linear MDPs, it matches the information-theoretic lower bound up to multiplicative factors of the dimension and horizon. In other words, pessimism is not only provably efficient but also minimax optimal. In particular, given the dataset, the learned policy serves as the ``best effort'' among all policies, as no other policies can do better. Our theoretical analysis identifies the critical role of pessimism in eliminating a notion of spurious correlation, which emerges from the ``irrelevant'' trajectories that are less covered by the dataset and not informative for the optimal policy.
Incorporating planning intelligence into deep learning: A planning support tool for street network design
Fang, Zhou, Jin, Ying, Yang, Tianren
With the emergence of deep learning techniques, procedural and example-based modeling have been increasingly applied to support automatic content generation and visualization for planning decisions (Hartmann et al., 2017). Procedural modeling relies on manually designated rule sets to produce proposals. Parish and Müller (2001) made one of the first attempts to generate three-dimensional city models for visualization using procedural approaches, where a Lindenmayer system was used to grow road networks and buildings conditioned on global goals and local constraints. Given an initial and a final road point, Galin et al. (2010) developed a cost minimization function to automate path creation, considering the slope of the terrain and natural obstacles. The function was then extended to generate hierarchical road networks between towns at a regional level (Galin et al., 2011). Similar procedural principles can also be applied to allocate land use, subdivide blocks and generate buildings (see, e.g., Chen et al., 2008; Lyu et al., 2015). In comparison, example-based approaches learn from real-world cases in a preprocessing step to extract features and adopt them as templates. Hartmann et al. (2017) developed an automatic road generation tool, StreetGAN, using a generative adversarial network (GAN) to synthesize street networks in a fix-sized region that can maintain the consistency of urban layouts learned from the training data set. Similarly, Kempinska and Murcio (2019) trained Variational Autoencoders (VAEs) using images of street networks derived from OpenStreetMap to capture urban configurations using lowdimensional vectors and generating new street networks by controlling the encoded vectors.
DeepStreet: A deep learning powered urban street network generation module
Fang, Zhou, Yang, Tianren, Jin, Ying
In countries experiencing unprecedented waves of urbanization, there is a need for rapid and high-quality urban street design. Our study presents a novel deep learning powered approach, DeepStreet (DS), for automatic street network generation that can be applied to the urban street design with local characteristics. DS is driven by a Convolutional Neural Network (CNN) that enables the interpolation of streets based on the areas of immediate vicinity. Specifically, the CNN is firstly trained to detect, recognize and capture the local features as well as the patterns of the existing street network sourced from the OpenStreetMap. With the trained CNN, DS is able to predict street networks' future expansion patterns within the predefined region conditioned on its surrounding street networks. To test the performance of DS, we apply it to an area in and around the Eixample area in the City of Barcelona, a well-known example in the fields of urban and transport planning with iconic grid-like street networks in the centre and irregular road alignments farther afield. The results show that DS can (1) detect and self-cluster different types of complex street patterns in Barcelona; (2) predict both gridiron and irregular street and road networks. DS proves to have a great potential as a novel tool for designers to efficiently design the urban street network that well maintains the consistency across the existing and newly generated urban street network. Furthermore, the generated networks can serve as a benchmark to guide the local plan-making especially in rapidly-developing cities. Keywords: Urban street network, machine learning, deep learning, Convolutional Neural Network (CNN), Generative Adversarial Network (GAN), image completion, image inpainting