South America
RusCode: Russian Cultural Code Benchmark for Text-to-Image Generation
Vasilev, Viacheslav, Agafonova, Julia, Gerasimenko, Nikolai, Kapitanov, Alexander, Mikhailova, Polina, Mironova, Evelina, Dimitrov, Denis
Text-to-image generation models have gained popularity among users around the world. However, many of these models exhibit a strong bias toward English-speaking cultures, ignoring or misrepresenting the unique characteristics of other language groups, countries, and nationalities. The lack of cultural awareness can reduce the generation quality and lead to undesirable consequences such as unintentional insult, and the spread of prejudice. In contrast to the field of natural language processing, cultural awareness in computer vision has not been explored as extensively. In this paper, we strive to reduce this gap. We propose a RusCode benchmark for evaluating the quality of text-to-image generation containing elements of the Russian cultural code. To do this, we form a list of 19 categories that best represent the features of Russian visual culture. Our final dataset consists of 1250 text prompts in Russian and their translations into English. The prompts cover a wide range of topics, including complex concepts from art, popular culture, folk traditions, famous people's names, natural objects, scientific achievements, etc. We present the results of a human evaluation of the side-by-side comparison of Russian visual concepts representations using popular generative models.
Target-Augmented Shared Fusion-based Multimodal Sarcasm Explanation Generation
Goel, Palaash, Chauhan, Dushyant Singh, Akhtar, Md Shad
Sarcasm is a linguistic phenomenon that intends to ridicule a target (e.g., entity, event, or person) in an inherent way. Multimodal Sarcasm Explanation (MuSE) aims at revealing the intended irony in a sarcastic post using a natural language explanation. Though important, existing systems overlooked the significance of the target of sarcasm in generating explanations. In this paper, we propose a Target-aUgmented shaRed fusion-Based sarcasm explanatiOn model, aka. TURBO. We design a novel shared-fusion mechanism to leverage the inter-modality relationships between an image and its caption. TURBO assumes the target of the sarcasm and guides the multimodal shared fusion mechanism in learning intricacies of the intended irony for explanations. We evaluate our proposed TURBO model on the MORE+ dataset. Comparison against multiple baselines and state-of-the-art models signifies the performance improvement of TURBO by an average margin of $+3.3\%$. Moreover, we explore LLMs in zero and one-shot settings for our task and observe that LLM-generated explanation, though remarkable, often fails to capture the critical nuances of the sarcasm. Furthermore, we supplement our study with extensive human evaluation on TURBO's generated explanations and find them out to be comparatively better than other systems.
Optimistic Interior Point Methods for Sequential Hypothesis Testing by Betting
The technique of "testing by betting" frames nonparametric sequential hypothesis testing as a multiple-round game, where a player bets on future observations that arrive in a streaming fashion, accumulates wealth that quantifies evidence against the null hypothesis, and rejects the null once the wealth exceeds a specified threshold while controlling the false positive error. Designing an online learning algorithm that achieves a small regret in the game can help rapidly accumulate the bettor's wealth, which in turn can shorten the time to reject the null hypothesis under the alternative $H_1$. However, many of the existing works employ the Online Newton Step (ONS) to update within a halved decision space to avoid a gradient explosion issue, which is potentially conservative for rapid wealth accumulation. In this paper, we introduce a novel strategy utilizing interior-point methods in optimization that allows updates across the entire interior of the decision space without the risk of gradient explosion. Our approach not only maintains strong statistical guarantees but also facilitates faster null hypothesis rejection in critical scenarios, overcoming the limitations of existing approaches.
Coupling Agent-Based Simulations and VR universes: the case of GAMA and Unity
Drogoul, Alexis, Taillandier, Patrick, Brugiรจre, Arthur, Martinez, Louis, Sillano, Lรฉon, Lesquoy, Baptiste, Nghi, Huynh Quang
Agent-based models (ABMs) and video games, including those taking advantage of virtual reality (VR), have undergone a remarkable parallel evolution, achieving impressive levels of complexity and sophistication. This paper argues that while ABMs prioritize scientific analysis and understanding and VR aims for immersive entertainment, they both simulate artificial worlds and can benefit from closer integration. Coupling both approaches indeed opens interesting possibilities for research and development in various fields, and in particular education, at the heart of the SIMPLE project, an EU-funded project on the development of digital tools for awareness raising on environmental issues. However, existing tools often present limitations, including technical complexity, limited functionalities, and lack of interoperability. To address these challenges, we introduce a novel framework for linking GAMA, a popular ABM platform, with Unity, a widely used game engine. This framework enables seamless data exchange, real-time visualization, and user interaction within VR environments, allowing researchers to leverage the strengths of both ABMs and VR for more impactful and engaging simulations. We demonstrate the capabilities of our framework through two prototypes built to highlight its potential in representing and interacting with complex socio-environmental system models. We conclude by emphasizing the importance of continued collaboration between the ABM and VR communities to develop robust, user-friendly tools, paving the way for a new era of collaborative research and immersive experiences in simulations.
Exoplanet Transit Candidate Identification in TESS Full-Frame Images via a Transformer-Based Algorithm
Salinas, Helem, Brahm, Rafael, Olmschenk, Greg, Barry, Richard K., Pichara, Karim, Silva, Stela Ishitani, Araujo, Vladimir
The Transiting Exoplanet Survey Satellite (TESS) is surveying a large fraction of the sky, generating a vast database of photometric time series data that requires thorough analysis to identify exoplanetary transit signals. Automated learning approaches have been successfully applied to identify transit signals. However, most existing methods focus on the classification and validation of candidates, while few efforts have explored new techniques for the search of candidates. To search for new exoplanet transit candidates, we propose an approach to identify exoplanet transit signals without the need for phase folding or assuming periodicity in the transit signals, such as those observed in multi-transit light curves. To achieve this, we implement a new neural network inspired by Transformers to directly process Full Frame Image (FFI) light curves to detect exoplanet transits. Transformers, originally developed for natural language processing, have recently demonstrated significant success in capturing long-range dependencies compared to previous approaches focused on sequential data. This ability allows us to employ multi-head self-attention to identify exoplanet transit signals directly from the complete light curves, combined with background and centroid time series, without requiring prior transit parameters. The network is trained to learn characteristics of the transit signal, like the dip shape, which helps distinguish planetary transits from other variability sources. Our model successfully identified 214 new planetary system candidates, including 122 multi-transit light curves, 88 single-transit and 4 multi-planet systems from TESS sectors 1-26 with a radius > 0.27 $R_{\mathrm{Jupiter}}$, demonstrating its ability to detect transits regardless of their periodicity.
O1 Embedder: Let Retrievers Think Before Action
Yan, Ruiran, Liu, Zheng, Lian, Defu
The growing power of large language models (LLMs) has revolutionized how people access and utilize information. Notably, the LLMs excel at performing fine-grained data representation, which facilitates precise retrieval of information. They also generate high-quality answers based on external references, enabling the production of useful knowledge. The recent introduction of reasoning models, like OpenAI O1 and DeepSeek R1, marks another leap forward, highlighting LLMs' ability to think progressively before delivering final answers. This breakthrough significantly improves the ability to address complex tasks, e.g., coding and math proofs. Inspired by this progress, we aim to develop similar capabilities for retrieval models, which hold great promise for tackling critical challenges in the field, including multi-task retrieval, zero-shot retrieval, and tasks requiring intensive reasoning of complex relationships. With this motivation, we propose a novel approach called O1 Embedder, which generates useful thoughts for the input query before making retrieval for the target documents. To realize this objective, we conquer two technical difficulties. First, we design a data synthesis workflow, creating training signals for O1 Embedder by generating initial thoughts from an LLM-expert and subsequently refining them using a retrieval committee. Second, we optimize the training process, enabling a pre-trained model to be jointly fine-tuned to generate retrieval thoughts via behavior cloning and perform dense retrieval through contrastive learning. Our approach is evaluated by comprehensive experiments, where substantial improvements are achieved across 12 popular datasets, spanning both in-domain and out-of-domain scenarios. These results highlight O1 Embedder's remarkable accuracy and generalizability, paving the way for the development of next-generation IR foundation models.
The Combined Problem of Online Task Assignment and Lifelong Path Finding in Logistics Warehouses: A Case Study
Zhu, Fengming, Lin, Fangzhen, Xu, Weijia, Guo, Yifei
We study the combined problem of online task assignment and lifelong path finding, which is crucial for the logistics industries. However, most literature either (1) focuses on lifelong path finding assuming a given task assigner, or (2) studies the offline version of this problem where tasks are known in advance. We argue that, to maximize the system throughput, the online version that integrates these two components should be tackled directly. To this end, we introduce a formal framework of the combined problem and its solution concept. Then, we design a rule-based lifelong planner under a practical robot model that works well even in environments with severe local congestion. Upon that, we automate the search for the task assigner with respect to the underlying path planner. Simulation experiments conducted in warehouse scenarios at \textit{Meituan}, one of the largest shopping platforms in China, demonstrate that (a)~\textit{in terms of time efficiency}, our system requires only 83.77\% of the execution time needed for the currently deployed system at Meituan, outperforming other SOTA algorithms by 8.09\%; (b)~\textit{in terms of economic efficiency}, ours can achieve the same throughput with only 60\% of the agents currently in use.
Efficient Sparsification of Simplicial Complexes via Local Densities of States
Savostianov, Anton, Schaub, Michael T., Guglielmi, Nicola, Tudisco, Francesco
Simplicial complexes (SCs), a generalization of graph models for relational data that account for higher-order relations between data items, have become a popular abstraction for analyzing complex data using tools from topological data analysis or topological signal processing. However, the analysis of many real-world datasets leads to dense SCs with a large number of higher-order interactions. Unfortunately, analyzing such large SCs often has a prohibitive cost in terms of computation time and memory consumption. The sparsification of such complexes, i.e., the approximation of an original SC with a sparser simplicial complex with only a log-linear number of high-order simplices while maintaining a spectrum close to the original SC, is of broad interest. In this work, we develop a novel method for a probabilistic sparsifaction of SCs. At its core lies the efficient computation of sparsifying sampling probability through local densities of states as functional descriptors of the spectral information. To avoid pathological structures in the spectrum of the corresponding Hodge Laplacian operators, we suggest a "kernel-ignoring" decomposition for approximating the sampling probability; additionally, we exploit error estimates to show asymptotically prevailing algorithmic complexity of the developed method. The performance of the framework is demonstrated on the family of Vietoris--Rips filtered simplicial complexes.
Optimality in importance sampling: a gentle survey
Llorente, Fernando, Martino, Luca
Monte Carlo (MC) methods are powerful tools for numerical inference and optimization widely employed in statistics, signal processing and machine learning Liu (2004); Robert and Casella (2004). They are mainly used for computing approximately the solution of definite integrals, and by extension, of differential equations (for this reason, MC schemes can be considered stochastic quadrature rules). Although exact analytical solutions to integrals are always desirable, such unicorns are rarely available, specially in real-world systems. Many applications inevitably require the approximation of intractable integrals. Specifically, Bayesian methods need the computation of expectations with respect to posterior probability density function (pdf) which, generally, are analytically intractable Gelman et al. (2013). The MC methods can be divided in four main families: direct methods (based on transformations or random variables), accept-reject techniques, Markov chain Monte Carlo (MCMC) algorithms, and importance sampling (IS) schemes Luengo et al. (2020); Martino et al. (2018). The last two families are the most popular for the facility and universality of their possible application Liang et al. (2010); Liu (2004); Robert and Casella (2004). All the MC methods require the choice of a suitable proposal density that is crucial for their performance Luengo et al. (2020); Robert and Casella (2004).
Data Assetization via Resources-decoupled Federated Learning
Zhao, Jianzhe, Zhu, Feida, He, Lingyan, Tang, Zixin, Gao, Mingce, Yang, Shiyu, Guo, Guibing
With the development of the digital economy, data is increasingly recognized as an essential resource for both work and life. However, due to privacy concerns, data owners tend to maximize the value of data through the circulation of information rather than direct data transfer. Federated learning (FL) provides an effective approach to collaborative training models while preserving privacy. However, as model parameters and training data grow, there are not only real differences in data resources between different data owners, but also mismatches between data and computing resources. These challenges lead to inadequate collaboration among data owners, compute centers, and model owners, reducing the global utility of the three parties and the effectiveness of data assetization. In this work, we first propose a framework for resource-decoupled FL involving three parties. Then, we design a Tripartite Stackelberg Model and theoretically analyze the Stackelberg-Nash equilibrium (SNE) for participants to optimize global utility. Next, we propose the Quality-aware Dynamic Resources-decoupled FL algorithm (QD-RDFL), in which we derive and solve the optimal strategies of all parties to achieve SNE using backward induction. We also design a dynamic optimization mechanism to improve the optimal strategy profile by evaluating the contribution of data quality from data owners to the global model during real training. Finally, our extensive experiments demonstrate that our method effectively encourages the linkage of the three parties involved, maximizing the global utility and value of data assets.