Goto

Collaborating Authors

 Hu, Wenjie


DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

arXiv.org Artificial Intelligence

The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek LLM, a project dedicated to advancing open-source language models with a long-term perspective. To support the pre-training phase, we have developed a dataset that currently consists of 2 trillion tokens and is continuously expanding. We further conduct supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the creation of DeepSeek Chat models. Our evaluation results demonstrate that DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in the domains of code, mathematics, and reasoning. Furthermore, open-ended evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance compared to GPT-3.5.


Breaking the Curse of Quality Saturation with User-Centric Ranking

arXiv.org Artificial Intelligence

A key puzzle in search, ads, and recommendation is that the ranking model can only utilize a small portion of the vastly available user interaction data. As a result, increasing data volume, model size, or computation FLOPs will quickly suffer from diminishing returns. We examined this problem and found that one of the root causes may lie in the so-called ``item-centric'' formulation, which has an unbounded vocabulary and thus uncontrolled model complexity. To mitigate quality saturation, we introduce an alternative formulation named ``user-centric ranking'', which is based on a transposed view of the dyadic user-item interaction data. We show that this formulation has a promising scaling property, enabling us to train better-converged models on substantially larger data sets.


Students' Voices on Generative AI: Perceptions, Benefits, and Challenges in Higher Education

arXiv.org Artificial Intelligence

This study explores university students' perceptions of generative AI (GenAI) technologies, such as ChatGPT, in higher education, focusing on familiarity, their willingness to engage, potential benefits and challenges, and effective integration. A survey of 399 undergraduate and postgraduate students from various disciplines in Hong Kong revealed a generally positive attitude towards GenAI in teaching and learning. Students recognized the potential for personalized learning support, writing and brainstorming assistance, and research and analysis capabilities. However, concerns about accuracy, privacy, ethical issues, and the impact on personal development, career prospects, and societal values were also expressed. According to John Biggs' 3P model, student perceptions significantly influence learning approaches and outcomes. By understanding students' perceptions, educators and policymakers can tailor GenAI technologies to address needs and concerns while promoting effective learning outcomes. Insights from this study can inform policy development around the integration of GenAI technologies into higher education. By understanding students' perceptions and addressing their concerns, policymakers can create well-informed guidelines and strategies for the responsible and effective implementation of GenAI tools, ultimately enhancing teaching and learning experiences in higher education.


Modeling Combinatorial Evolution in Time Series Prediction

arXiv.org Machine Learning

For instance, earthquake wave is the observation Time series modeling aims to capture the intrinsic factors underpinning of crustal movements, while different actions like running and observed data and its evolution. However, most existing studies walking will cause differences in observations of a fitness-tracking ignore the evolutionary relations among these factors, which are device. Moreover, in practice, we often observe the combinatorial what cause the combinatorial evolution of a given time series. For evolution of data; that is, the observed time series being covered example, personal interests are intrinsic factors hidden behind users' by the influence of multiple factors, and especially the relations observable online shopping behaviors; consequently, a precise item among these factors. For example, an earthquake is the result of recommendation depends not only on discovering the item-interest quick transitions from smooth movements in the Earth's crust to relationship, but also on an understanding of how user interests intense ones, which cause a sudden release of energy in the Earth's shift over time. In this paper, we propose to represent complex and crust. Meanwhile, observing one's online shopping logs, precise dynamic relations among intrinsic factors of time series data by item recommendations rely on tracing and understanding the shift means of an evolutionary state graph structure.


Capturing Evolution Genes for Time Series Data

arXiv.org Machine Learning

The modeling of time series is becoming increasingly critical in a wide variety of applications. Overall, data evolves by following different patterns, which are generally caused by different user behaviors. Given a time series, we define the evolution gene to capture the latent user behaviors and to describe how the behaviors lead to the generation of time series. In particular, we propose a uniform framework that recognizes different evolution genes of segments by learning a classifier, and adopt an adversarial generator to implement the evolution gene by estimating the segments' distribution. Experimental results based on a synthetic dataset and five real-world datasets show that our approach can not only achieve a good prediction results (e.g., averagely +10.56% in terms of F1), but is also able to provide explanations of the results.


Representation Learning for Scale-Free Networks

AAAI Conferences

Network embedding aims to learn the low-dimensional representations of vertexes in a network, while structure and inherent properties of the network is preserved. Existing network embedding works primarily focus on preserving the microscopic structure, such as the first- and second-order proximity of vertexes, while the macroscopic scale-free property is largely ignored. Scale-free property depicts the fact that vertex degrees follow a heavy-tailed distribution (i.e., only a few vertexes have high degrees) and is a critical property of real-world networks, such as social networks. In this paper, we study the problem of learning representations for scale-free networks. We first theoretically analyze the difficulty of embedding and reconstructing a scale-free network in the Euclidean space, by converting our problem to the sphere packing problem. Then, we propose the "degree penalty" principle for designing scale-free property preserving network embedding algorithm: punishing the proximity between high-degree vertexes. We introduce two implementations of our principle by utilizing the spectral techniques and a skip-gram model respectively. Extensive experiments on six datasets show that our algorithms are able to not only reconstruct heavy-tailed distributed degree distribution, but also outperform state-of-the-art embedding models in various network mining tasks, such as vertex classification and link prediction.


Representation Learning for Scale-free Networks

arXiv.org Machine Learning

Network embedding aims to learn the low-dimensional representations of vertexes in a network, while structure and inherent properties of the network is preserved. Existing network embedding works primarily focus on preserving the microscopic structure, such as the first- and second-order proximity of vertexes, while the macroscopic scale-free property is largely ignored. Scale-free property depicts the fact that vertex degrees follow a heavy-tailed distribution (i.e., only a few vertexes have high degrees) and is a critical property of real-world networks, such as social networks. In this paper, we study the problem of learning representations for scale-free networks. We first theoretically analyze the difficulty of embedding and reconstructing a scale-free network in the Euclidean space, by converting our problem to the sphere packing problem. Then, we propose the "degree penalty" principle for designing scale-free property preserving network embedding algorithm: punishing the proximity between high-degree vertexes. We introduce two implementations of our principle by utilizing the spectral techniques and a skip-gram model respectively. Extensive experiments on six datasets show that our algorithms are able to not only reconstruct heavy-tailed distributed degree distribution, but also outperform state-of-the-art embedding models in various network mining tasks, such as vertex classification and link prediction.