Chen, Ren
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Reka Team, null, Ormazabal, Aitor, Zheng, Che, d'Autume, Cyprien de Masson, Yogatama, Dani, Fu, Deyu, Ong, Donovan, Chen, Eric, Lamprecht, Eugenie, Pham, Hai, Ong, Isaac, Aleksiev, Kaloyan, Li, Lei, Henderson, Matthew, Bain, Max, Artetxe, Mikel, Relan, Nishant, Padlewski, Piotr, Liu, Qi, Chen, Ren, Phua, Samuel, Yang, Yazheng, Tay, Yi, Wang, Yuqi, Zhu, Zhongkai, Xie, Zhihui
We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka. Reka models are able to process and reason with text, images, video, and audio inputs. This technical report discusses details of training some of these models and provides comprehensive evaluation results. We show that Reka Edge and Reka Flash are not only state-of-the-art but also outperform many much larger models, delivering outsized values for their respective compute class. Meanwhile, our most capable and largest model, Reka Core, approaches the best frontier models on both automatic evaluations and blind human evaluations. On image question answering benchmarks (e.g. MMMU, VQAv2), Core performs competitively to GPT4-V. Meanwhile, on multimodal chat, Core ranks as the second most preferred model under a blind third-party human evaluation setup, outperforming other models such as Claude 3 Opus. On text benchmarks, Core not only performs competitively to other frontier models on a set of well-established benchmarks (e.g. MMLU, GSM8K) but also outperforms GPT4-0613 on human evaluation. On video question answering (Perception-Test), Core outperforms Gemini Ultra. Models are shipped in production at http://chat.reka.ai . A showcase of non cherry picked qualitative examples can also be found at http://showcase.reka.ai .
LLM-Rec: Personalized Recommendation via Prompting Large Language Models
Lyu, Hanjia, Jiang, Song, Zeng, Hanqing, Wang, Qifan, Zhang, Si, Chen, Ren, Leung, Chris, Tang, Jiajie, Xia, Yinglong, Luo, Jiebo
We investigate various prompting strategies for enhancing personalized recommendation performance with large language models (LLMs) through input augmentation. Our proposed approach, termed LLM-Rec, encompasses four distinct prompting strategies: (1) basic prompting, (2) recommendation-driven prompting, (3) engagement-guided prompting, and (4) recommendation-driven + engagement-guided prompting. Our empirical experiments show that incorporating the augmented input text generated by LLM leads to improved recommendation performance. Recommendation-driven and engagement-guided prompting strategies are found to elicit LLM's understanding of global and local item characteristics. This finding highlights the importance of leveraging diverse prompts and input augmentation techniques to enhance the recommendation capabilities with LLMs.
Decoupling the Depth and Scope of Graph Neural Networks
Zeng, Hanqing, Zhang, Muhan, Xia, Yinglong, Srivastava, Ajitesh, Malevich, Andrey, Kannan, Rajgopal, Prasanna, Viktor, Jin, Long, Chen, Ren
State-of-the-art Graph Neural Networks (GNNs) have limited scalability with respect to the graph and model sizes. On large graphs, increasing the model depth often means exponential expansion of the scope (i.e., receptive field). Beyond just a few layers, two fundamental challenges emerge: 1. degraded expressivity due to oversmoothing, and 2. expensive computation due to neighborhood explosion. We propose a design principle to decouple the depth and scope of GNNs -- to generate representation of a target entity (i.e., a node or an edge), we first extract a localized subgraph as the bounded-size scope, and then apply a GNN of arbitrary depth on top of the subgraph. A properly extracted subgraph consists of a small number of critical neighbors, while excluding irrelevant ones. The GNN, no matter how deep it is, smooths the local neighborhood into informative representation rather than oversmoothing the global graph into "white noise". Theoretically, decoupling improves the GNN expressive power from the perspectives of graph signal processing (GCN), function approximation (GraphSAGE) and topological learning (GIN). Empirically, on seven graphs (with up to 110M nodes) and six backbone GNN architectures, our design achieves significant accuracy improvement with orders of magnitude reduction in computation and hardware cost.