cascade model
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- North America > United States > California (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Hong Kong (0.04)
- Research Report (0.34)
- Overview (0.34)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.47)
- North America > Canada > Alberta (0.14)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > China > Hong Kong (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.69)
- North America > Canada > Alberta (0.14)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > China > Hong Kong (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- North America > United States > California (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Hong Kong (0.04)
- Research Report (0.34)
- Overview (0.34)
- Information Technology > Security & Privacy (1.00)
- Government > Military (1.00)
- Education > Educational Setting > Online (0.41)
3eb65004054f5d21fca4087f5658c727-AuthorFeedback.pdf
Thanks for the insightful and helpful reviews, which will significantly improve our paper. R1, R2, R3 indicate to whom the concern belongs. Ground truth is in red, predictions are in blue, and predicted eye gaze point of the gaze-based model is in green. SVMo bridges global and local context both spatially ( e.g., whole frame vs anchor Other contributions include exhaustive experiments which may be useful for future studies. In (c), the predicted gaze falls on the intersection of 3 objects, slightly closer to the center of the rabbit.
PRIM: Towards Practical In-Image Multilingual Machine Translation
Tian, Yanzhi, Liu, Zeming, Liu, Zhengyang, Feng, Chong, Li, Xin, Huang, Heyan, Guo, Yuhang
In-Image Machine Translation (IIMT) aims to translate images containing texts from one language to another. Current research of end-to-end IIMT mainly conducts on synthetic data, with simple background, single font, fixed text position, and bilingual translation, which can not fully reflect real world, causing a significant gap between the research and practical conditions. To facilitate research of IIMT in real-world scenarios, we explore Practical In-Image Multilingual Machine Translation (IIMMT). In order to convince the lack of publicly available data, we annotate the PRIM dataset, which contains real-world captured one-line text images with complex background, various fonts, diverse text positions, and supports multilingual translation directions. We propose an end-to-end model VisTrans to handle the challenge of practical conditions in PRIM, which processes visual text and background information in the image separately, ensuring the capability of multilingual translation while improving the visual quality. Experimental results indicate the VisTrans achieves a better translation quality and visual effect compared to other models. The code and dataset are available at: https://github.com/BITHLP/PRIM.
- Europe > Austria > Vienna (0.15)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (11 more...)
From Dark Matter to Galaxies with Convolutional Neural Networks
Yip, Jacky H. T., Zhang, Xinyue, Wang, Yanfang, Zhang, Wei, Sun, Yueqiu, Contardo, Gabriella, Villaescusa-Navarro, Francisco, He, Siyu, Genel, Shy, Ho, Shirley
Cosmological simulations play an important role in the interpretation of astronomical data, in particular in comparing observed data to our theoretical expectations. However, to compare data with these simulations, the simulations in principle need to include gravity, magneto-hydrodyanmics, radiative transfer, etc. These ideal large-volume simulations (gravo-magneto-hydrodynamical) are incredibly computationally expensive which can cost tens of millions of CPU hours to run. In this paper, we propose a deep learning approach to map from the dark-matter-only simulation (computationally cheaper) to the galaxy distribution (from the much costlier cosmological simulation). The main challenge of this task is the high sparsity in the target galaxy distribution: space is mainly empty. We propose a cascade architecture composed of a classification filter followed by a regression procedure. We show that our result outperforms a state-of-the-art model used in the astronomical community, and provides a good trade-off between computational cost and prediction accuracy.
- North America > United States (0.04)
- North America > Canada (0.04)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- Asia > China > Hong Kong (0.04)
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems
Wei, Chengwei, Wang, Bin, Kim, Jung-jae, Chen, Nancy F.
Recent advances in large language models (LLMs) and multimodal LLMs (MLLMs) have led to strong reasoning ability across a wide range of tasks. However, their ability to perform mathematical reasoning from spoken input remains underexplored. Prior studies on speech modality have mostly focused on factual speech understanding or simple audio reasoning tasks, providing limited insight into logical step-by-step reasoning, such as that required for mathematical problem solving. To address this gap, we introduce Spoken Math Question Answering (Spoken-MQA), a new benchmark designed to evaluate the mathematical reasoning capabilities of speech-based models, including both cascade models (ASR + LLMs) and end-to-end speech LLMs. Spoken-MQA covers a diverse set of math problems, including pure arithmetic, single-step and multi-step contextual reasoning, and knowledge-oriented reasoning problems, all presented in unambiguous natural spoken language. Through extensive experiments, we find that: (1) while some speech LLMs perform competitively on contextual reasoning tasks involving basic arithmetic, they still struggle with direct arithmetic problems; (2) current LLMs exhibit a strong bias toward symbolic mathematical expressions written in LaTex and have difficulty interpreting verbalized mathematical expressions; and (3) mathematical knowledge reasoning abilities are significantly degraded in current speech LLMs.