GitHub Releases Dataset of Six Million Open-Source Methods for Code Search Research
Regular web search engines like Google may be great for finding a restaurant, but they are lousy for locating a snippet of code. In a bid to help software developers and foster innovative code search research, GitHub last week announced the CodeSearchNet Challenge in a joint effort with California-based machine learning development tools startup Weights & Biases. A large dataset and several baseline models showing the current state of the art in code search have been released to help scientists build models for the challenge. Faced with unsatisfactory code search results from natural language processing engines, researchers have in recent years been applying machine learning techniques to improve their code searches. They quickly realized however that, unlike natural language with GLUE benchmarks, there are currently no standard datasets suitable for evaluating code search processes.
Oct-1-2019, 18:34:24 GMT