Zha, Yaohua
MambaIRv2: Attentive State Space Restoration
Guo, Hang, Guo, Yong, Zha, Yaohua, Zhang, Yulun, Li, Wenbo, Dai, Tao, Xia, Shu-Tao, Li, Yawei
The Mamba-based image restoration backbones have recently demonstrated significant potential in balancing global reception and computational efficiency. However, the inherent causal modeling limitation of Mamba, where each token depends solely on its predecessors in the scanned sequence, restricts the full utilization of pixels across the image and thus presents new challenges in image restoration. In this work, we propose MambaIRv2, which equips Mamba with the non-causal modeling ability similar to ViTs to reach the attentive state space restoration model. Specifically, the proposed attentive state-space equation allows to attend beyond the scanned sequence and facilitate image unfolding with just one single scan. Moreover, we further introduce a semantic-guided neighboring mechanism to encourage interaction between distant but similar pixels. Extensive experiments show our MambaIRv2 outperforms SRFormer by \textbf{even 0.35dB} PSNR for lightweight SR even with \textbf{9.3\% less} parameters and suppresses HAT on classic SR by \textbf{up to 0.29dB}. Code is available at \url{https://github.com/csguoh/MambaIR}.
Towards Scalable Semantic Representation for Recommendation
Zhang, Taolin, Pan, Junwei, Wang, Jinpeng, Zha, Yaohua, Dai, Tao, Chen, Bin, Luo, Ruisheng, Deng, Xiaoxiang, Wang, Yuan, Yue, Ming, Jiang, Jie, Xia, Shu-Tao
With recent advances in large language models (LLMs), there has been emerging numbers of research in developing Semantic IDs based on LLMs to enhance the performance of recommendation systems. However, the dimension of these embeddings needs to match that of the ID embedding in recommendation, which is usually much smaller than the original length. Such dimension compression results in inevitable losses in discriminability and dimension robustness of the LLM embeddings, which motivates us to scale up the semantic representation. In this paper, we propose Mixture-of-Codes, which first constructs multiple independent codebooks for LLM representation in the indexing stage, and then utilizes the Semantic Representation along with a fusion module for the downstream recommendation stage. Extensive analysis and experiments demonstrate that our method achieves superior discriminability and dimension robustness scalability, leading to the best scale-up performance in recommendations. An intuitive practice is to simply project the LLM embeddings to low-dimension embeddings via only MLPs into the recommendation systems for feature interactions.