Goto

Collaborating Authors

 thre


What Shape Is Optimal for Masks in Text Removal?

Nakada, Hyakka, Kubota, Marika

arXiv.org Artificial Intelligence

The advent of generative models has dramatically improved the accuracy of image inpainting. In particular, by removing specific text from document images, reconstructing original images is extremely important for industrial applications. However, most existing methods of text removal focus on deleting simple scene text which appears in images captured by a camera in an outdoor environment. There is little research dedicated to complex and practical images with dense text. Therefore, we created benchmark data for text removal from images including a large amount of text. From the data, we found that text-removal performance becomes vulnerable against mask profile perturbation. Thus, for practical text-removal tasks, precise tuning of the mask shape is essential. This study developed a method to model highly flexible mask profiles and learn their parameters using Bayesian optimization. The resulting profiles were found to be character-wise masks. It was also found that the minimum cover of a text region is not optimal. Our research is expected to pave the way for a user-friendly guideline for manual masking.



A Qualitative comparison for ablation study

Neural Information Processing Systems

The results confirm that the post-processing helps to improve the resolution of the attribution. We provide the simple implementation of our algorithm in Python language. We provide the ablation study on (1) the usage of ReLU and (2) WC/EPC masks in this section. To achieve better performance in both metrics, we suggest to use both masks. We provide the quantitative evaluation on different attribution methods.


Test-time GNN Model Evaluation on Dynamic Graphs

Li, Bo, Zheng, Xin, Jin, Ming, Wang, Can, Pan, Shirui

arXiv.org Artificial Intelligence

Dynamic graph neural networks (DGNNs) have emerged as a leading paradigm for learning from dynamic graphs, which are commonly used to model real-world systems and applications. However, due to the evolving nature of dynamic graph data distributions over time, well-trained DGNNs often face significant performance uncertainty when inferring on unseen and unlabeled test graphs in practical deployment. In this case, evaluating the performance of deployed DGNNs at test time is crucial to determine whether a well-trained DGNN is suited for inference on an unseen dynamic test graph. In this work, we introduce a new research problem: DGNN model evaluation, which aims to assess the performance of a specific DGNN model trained on observed dynamic graphs by estimating its performance on unseen dynamic graphs during test time. Specifically, we propose a Dynamic Graph neural network Evaluator, dubbed DyGEval, to address this new problem. The proposed DyGEval involves a two-stage framework: (1) test-time dynamic graph simulation, which captures the training-test distributional differences as supervision signals and trains an evaluator; and (2) DyGEval development and training, which accurately estimates the performance of the well-trained DGNN model on the test-time dynamic graphs. Extensive experiments demonstrate that the proposed DyGEval serves as an effective evaluator for assessing various DGNN backbones across different dynamic graphs under distribution shifts.


Hessian-guided Perturbed Wasserstein Gradient Flows for Escaping Saddle Points

Yamamoto, Naoya, Kim, Juno, Suzuki, Taiji

arXiv.org Machine Learning

Wasserstein gradient flow (WGF) is a common method to perform optimization over the space of probability measures. While WGF is guaranteed to converge to a first-order stationary point, for nonconvex functionals the converged solution does not necessarily satisfy the second-order optimality condition; i.e., it could converge to a saddle point. In this work, we propose a new algorithm for probability measure optimization, perturbed Wasserstein gradient flow (PWGF), that achieves second-order optimality for general nonconvex objectives. PWGF enhances WGF by injecting noisy perturbations near saddle points via a Gaussian process-based scheme. By pushing the measure forward along a random vector field generated from a Gaussian process, PWGF helps the solution escape saddle points efficiently by perturbing the solution towards the smallest eigenvalue direction of the Wasserstein Hessian. We theoretically derive the computational complexity for PWGF to achieve a second-order stationary point. Furthermore, we prove that PWGF converges to a global optimum in polynomial time for strictly benign objectives.



A Qualitative comparison for ablation study

Neural Information Processing Systems

The results confirm that the post-processing helps to improve the resolution of the attribution. We provide the simple implementation of our algorithm in Python language. We provide the ablation study on (1) the usage of ReLU and (2) WC/EPC masks in this section. To achieve better performance in both metrics, we suggest to use both masks. We provide the quantitative evaluation on different attribution methods.


Optimal Multi-Objective Best Arm Identification with Fixed Confidence

Chen, Zhirui, Karthik, P. N., Chee, Yeow Meng, Tan, Vincent Y. F.

arXiv.org Machine Learning

We consider a multi-armed bandit setting with finitely many arms, in which each arm yields an $M$-dimensional vector reward upon selection. We assume that the reward of each dimension (a.k.a. {\em objective}) is generated independently of the others. The best arm of any given objective is the arm with the largest component of mean corresponding to the objective. The end goal is to identify the best arm of {\em every} objective in the shortest (expected) time subject to an upper bound on the probability of error (i.e., fixed-confidence regime). We establish a problem-dependent lower bound on the limiting growth rate of the expected stopping time, in the limit of vanishing error probabilities. This lower bound, we show, is characterised by a max-min optimisation problem that is computationally expensive to solve at each time step. We propose an algorithm that uses the novel idea of {\em surrogate proportions} to sample the arms at each time step, eliminating the need to solve the max-min optimisation problem at each step. We demonstrate theoretically that our algorithm is asymptotically optimal. In addition, we provide extensive empirical studies to substantiate the efficiency of our algorithm. While existing works on pure exploration with multi-objective multi-armed bandits predominantly focus on {\em Pareto frontier identification}, our work fills the gap in the literature by conducting a formal investigation of the multi-objective best arm identification problem.


Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization

Kawaharazuka, Kento, Obinata, Yoshiki, Kanazawa, Naoaki, Okada, Kei, Inaba, Masayuki

arXiv.org Artificial Intelligence

State recognition of the environment and objects, such as the open/closed state of doors and the on/off of lights, is indispensable for robots that perform daily life support and security tasks. Until now, state recognition methods have been based on training neural networks from manual annotations, preparing special sensors for the recognition, or manually programming to extract features from point clouds or raw images. In contrast, we propose a robotic state recognition method using a pre-trained vision-language model, which is capable of Image-to-Text Retrieval (ITR) tasks. We prepare several kinds of language prompts in advance, calculate the similarity between these prompts and the current image by ITR, and perform state recognition. By applying the optimal weighting to each prompt using black-box optimization, state recognition can be performed with higher accuracy. Experiments show that this theory enables a variety of state recognitions by simply preparing multiple prompts without retraining neural networks or manual programming. In addition, since only prompts and their weights need to be prepared for each recognizer, there is no need to prepare multiple models, which facilitates resource management. It is possible to recognize the open/closed state of transparent doors, the state of whether water is running or not from a faucet, and even the qualitative state of whether a kitchen is clean or not, which have been challenging so far, through language.


An evolutionary approach for discovering non-Gaussian stochastic dynamical systems based on nonlocal Kramers-Moyal formulas

Li, Yang, Xu, Shengyuan, Duan, Jinqiao

arXiv.org Machine Learning

Discovering explicit governing equations of stochastic dynamical systems with both (Gaussian) Brownian noise and (non-Gaussian) L\'evy noise from data is chanllenging due to possible intricate functional forms and the inherent complexity of L\'evy motion. This present research endeavors to develop an evolutionary symbol sparse regression (ESSR) approach to extract non-Gaussian stochastic dynamical systems from sample path data, based on nonlocal Kramers-Moyal formulas, genetic programming, and sparse regression. More specifically, the genetic programming is employed to generate a diverse array of candidate functions, the sparse regression technique aims at learning the coefficients associated with these candidates, and the nonlocal Kramers-Moyal formulas serve as the foundation for constructing the fitness measure in genetic programming and the loss function in sparse regression. The efficacy and capabilities of this approach are showcased through its application to several illustrative models. This approach stands out as a potent instrument for deciphering non-Gaussian stochastic dynamics from available datasets, indicating a wide range of applications across different fields.