waterfall
Yosemite's glowing, golden waterfall is flowing again
Environment Yosemite's glowing, golden waterfall is flowing again The annual natural phenomenon event is expected to last until February 26. Breakthroughs, discoveries, and DIY tips sent six days a week. As if California's Yosemite National Park wasn't magical enough, its famous El Capitan rock formation is graced with an exhilarating phenomenon every February. The Horsetail Fall--a waterfall that falls across the formation's eastern side--can take on an otherworldly gleam during the latter part of the month. During those gorgeous moments, the water flow looks like a stream of gold hurtling down the rock face, and it's hard to believe that it's just nature doing its thing.
- North America > United States > Maryland (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States > Maryland (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States > Texas > Loving County (0.04)
- North America > United States > Maine (0.04)
- North America > United States > Colorado (0.04)
- (3 more...)
- Transportation > Passenger (1.00)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Air (1.00)
- (6 more...)
mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data
Chen, Haonan, Wang, Liang, Yang, Nan, Zhu, Yutao, Zhao, Ziliang, Wei, Furu, Dou, Zhicheng
Multimodal embedding models have gained significant attention for their ability to map data from different modalities, such as text and images, into a unified representation space. However, the limited labeled multimodal data often hinders embedding performance. Recent approaches have leveraged data synthesis to address this problem, yet the quality of synthetic data remains a critical bottleneck. In this work, we identify three criteria for high-quality synthetic multimodal data. First, broad scope ensures that the generated data covers diverse tasks and modalities, making it applicable to various downstream scenarios. Second, robust cross-modal alignment makes different modalities semantically consistent. Third, high fidelity ensures that the synthetic data maintains realistic details to enhance its reliability. Guided by these principles, we synthesize datasets that: (1) cover a wide range of tasks, modality combinations, and languages, (2) are generated via a deep thinking process within a single pass of a multimodal large language model, and (3) incorporate real-world images with accurate and relevant texts, ensuring fidelity through self-evaluation and refinement. Leveraging these high-quality synthetic and labeled datasets, we train a multimodal multilingual E5 model mmE5. Extensive experiments demonstrate that mmE5 achieves state-of-the-art performance on the MMEB Benchmark and superior multilingual performance on the XTD benchmark. Our codes, datasets and models are released in https://github.com/haon-chen/mmE5.
- Europe > Austria > Vienna (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (10 more...)
Robust Multi-bit Text Watermark with LLM-based Paraphrasers
Xu, Xiaojun, Jia, Jinghan, Yao, Yuanshun, Liu, Yang, Li, Hang
We propose an imperceptible multi-bit text watermark embedded by paraphrasing with LLMs. We fine-tune a pair of LLM paraphrasers that are designed to behave differently so that their paraphrasing difference reflected in the text semantics can be identified by a trained decoder. To embed our multi-bit watermark, we use two paraphrasers alternatively to encode the pre-defined binary code at the sentence level. Then we use a text classifier as the decoder to decode each bit of the watermark. Through extensive experiments, we show that our watermarks can achieve over 99.99\% detection AUC with small (1.1B) text paraphrasers while keeping the semantic information of the original sentence. More importantly, our pipeline is robust under word substitution and sentence paraphrasing perturbations and generalizes well to out-of-distributional data. We also show the stealthiness of our watermark with LLM-based evaluation. We open-source the code: https://github.com/xiaojunxu/multi-bit-text-watermark.
- North America > United States > Nevada (0.04)
- North America > United States > Michigan (0.04)
- North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
- Europe > Germany > Bremen > Bremen (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Deep Learning Evidence for Global Optimality of Gerver's Sofa
Leng, Kuangdai, Bi, Jia, Cha, Jaehoon, Pinilla, Samuel, Thiyagalingam, Jeyan
The Moving Sofa Problem, formally proposed by Leo Moser in 1966, seeks to determine the largest area of a two-dimensional shape that can navigate through an $L$-shaped corridor with unit width. The current best lower bound is about 2.2195, achieved by Joseph Gerver in 1992, though its global optimality remains unproven. In this paper, we investigate this problem by leveraging the universal approximation strength and computational efficiency of neural networks. We report two approaches, both supporting Gerver's conjecture that his shape is the unique global maximum. Our first approach is continuous function learning. We drop Gerver's assumptions that i) the rotation of the corridor is monotonic and symmetric and, ii) the trajectory of its corner as a function of rotation is continuously differentiable. We parameterize rotation and trajectory by independent piecewise linear neural networks (with input being some pseudo time), allowing for rich movements such as backward rotation and pure translation. We then compute the sofa area as a differentiable function of rotation and trajectory using our "waterfall" algorithm. Our final loss function includes differential terms and initial conditions, leveraging the principles of physics-informed machine learning. Under such settings, extensive training starting from diverse function initialization and hyperparameters is conducted, unexceptionally showing rapid convergence to Gerver's solution. Our second approach is via discrete optimization of the Kallus-Romik upper bound, which converges to the maximum sofa area from above as the number of rotation angles increases. We uplift this number to 10000 to reveal its asymptotic behavior. It turns out that the upper bound yielded by our models does converge to Gerver's area (within an error of 0.01% when the number of angles reaches 2100). We also improve their five-angle upper bound from 2.37 to 2.3337.
Poetry2Image: An Iterative Correction Framework for Images Generated from Chinese Classical Poetry
Jiang, Jing, Ling, Yiran, Li, Binzhu, Li, Pengxiang, Piao, Junming, Zhang, Yu
Text-to-image generation models often struggle with key element loss or semantic confusion in tasks involving Chinese classical poetry.Addressing this issue through fine-tuning models needs considerable training costs. Additionally, manual prompts for re-diffusion adjustments need professional knowledge. To solve this problem, we propose Poetry2Image, an iterative correction framework for images generated from Chinese classical poetry. Utilizing an external poetry dataset, Poetry2Image establishes an automated feedback and correction loop, which enhances the alignment between poetry and image through image generation models and subsequent re-diffusion modifications suggested by large language models (LLM). Using a test set of 200 sentences of Chinese classical poetry, the proposed method--when integrated with five popular image generation models--achieves an average element completeness of 70.63%, representing an improvement of 25.56% over direct image generation. In tests of semantic correctness, our method attains an average semantic consistency of 80.09%. The study not only promotes the dissemination of ancient poetry culture but also offers a reference for similar non-fine-tuning methods to enhance LLM generation.