Goto

Collaborating Authors

 glyph


IF-Font: Ideographic Description Sequence-Following Font Generation

Neural Information Processing Systems

Few-shot font generation (FFG) aims to learn the target style from a limited number of reference glyphs and generate the remaining glyphs in the target font. Previous works focus on disentangling the content and style features of glyphs, combining the content features of the source glyph with the style features of the reference glyph to generate new glyphs. However, the disentanglement is challenging due to the complexity of glyphs, often resulting in glyphs that are influenced by the style of the source glyph and prone to artifacts. We propose IF-Font, a novel paradigm which incorporates Ideographic Description Sequence (IDS) instead of the source glyph to control the semantics of generated glyphs. To achieve this, we quantize the reference glyphs into tokens, and model the token distribution of target glyphs using corresponding IDS and reference tokens. The proposed method excels in synthesizing glyphs with neat and correct strokes, and enables the creation of new glyphs based on provided IDS. Extensive experiments demonstrate that our method greatly outperforms state-of-the-art methods in both one-shot and few-shot settings, particularly when the target styles differ significantly from the training font styles.


Glyph: Fast and Accurately Training Deep Neural Networks on Encrypted Data

Neural Information Processing Systems

Because of the lack of expertise, to gain benefits from their data, average users have to upload their private data to cloud servers they may not trust. Due to legal or privacy constraints, most users are willing to contribute only their encrypted data, and lack interests or resources to join deep neural network (DNN) training in cloud.



We sincerely thank all the reviewers for their insightful suggestions

Neural Information Processing Systems

We sincerely thank all the reviewers for their insightful suggestions. We will add them back in the updated version, which will have 1 more page. Finally, we relax BERT and fine-tune the two models jointly. Results are shown in Table 1. As can be seen, this auxiliary training objective introduces a +0.8 F1 performance boost.




We thank the reviewers for their careful reading of the manuscript and their constructive suggestions

Neural Information Processing Systems

We thank the reviewers for their careful reading of the manuscript and their constructive suggestions. Chimera supports the switching between BFV and TFHE, while Glyph enables the switching between BGV and TFHE. Some users may not have such large network bandwidth. In contrast, Glyph first trains a CNN network model by a plaintext public dataset. Except sending the encrypted input data, the training of Glyph does not involve the client.


A Further Details on NetHack

Neural Information Processing Systems

We support different padding strategies and alphabet sizes, but by default we choose an alphabet size of 96, where the last character is used for padding.


An MLP Baseline for Handwriting Recognition Using Planar Curvature and Gradient Orientation

Nouri, Azam

arXiv.org Artificial Intelligence

This study investigates whether second-order geometric cues - planar curvature magnitude, curvature sign, and gradient orientation - are sufficient on their own to drive a multilayer perceptron (MLP) classifier for handwritten character recognition (HCR), offering an alternative to convolutional neural networks (CNNs). Using these three handcrafted feature maps as inputs, our curvature-orientation MLP achieves 97 percent accuracy on MNIST digits and 89 percent on EMNIST letters. These results underscore the discriminative power of curvature-based representations for handwritten character images and demonstrate that the advantages of deep learning can be realized even with interpretable, hand-engineered features.


Glyph: Scaling Context Windows via Visual-Text Compression

Cheng, Jiale, Liu, Yusen, Zhang, Xinyu, Fei, Yulin, Hong, Wenyi, Lyu, Ruiliang, Wang, Weihan, Su, Zhe, Gu, Xiaotao, Liu, Xiao, Bai, Yushi, Tang, Jie, Wang, Hongning, Huang, Minlie

arXiv.org Artificial Intelligence

Large language models (LLMs) increasingly rely on long-context modeling for tasks such as document understanding, code analysis, and multi-step reasoning. However, scaling context windows to the million-token level brings prohibitive computational and memory costs, limiting the practicality of long-context LLMs. In this work, we take a different perspective-visual context scaling-to tackle this challenge. Instead of extending token-based sequences, we propose Glyph, a framework that renders long texts into images and processes them with vision-language models (VLMs). This approach substantially compresses textual input while preserving semantic information, and we further design an LLM-driven genetic search to identify optimal visual rendering configurations for balancing accuracy and compression. Through extensive experiments, we demonstrate that our method achieves 3-4x token compression while maintaining accuracy comparable to leading LLMs such as Qwen3-8B on various long-context benchmarks. This compression also leads to around 4x faster prefilling and decoding, and approximately 2x faster SFT training. Furthermore, under extreme compression, a 128K-context VLM could scale to handle 1M-token-level text tasks. In addition, the rendered text data benefits real-world multimodal tasks, such as document understanding. Our code and model are released at https://github.com/thu-coai/Glyph.