coco 1
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness
Singhal, Manav, Aggarwal, Tushar, Awasthi, Abhijeet, Natarajan, Nagarajan, Kanade, Aditya
Existing evaluation benchmarks of language models of code (code LMs) focus almost exclusively on whether the LMs can generate functionally-correct code. In real-world software engineering, developers think beyond functional correctness. They have requirements on "how" a functionality should be implemented to meet overall system design objectives like efficiency, security, and maintainability. They would also trust the code LMs more if the LMs demonstrate robust understanding of requirements and code semantics. We propose a new benchmark NoFunEval to evaluate code LMs on non-functional requirements and simple classification instances for both functional and non-functional requirements. We propose a prompting method, Coding Concepts (CoCo), as a way for a developer to communicate the domain knowledge to the LMs. We conduct an extensive evaluation of twenty-two code LMs. Our finding is that they generally falter when tested on our benchmark, hinting at fundamental blindspots in their training setups. Surprisingly, even the classification accuracy on functional-correctness instances derived from the popular HumanEval benchmark is low, calling in question the depth of their comprehension and the source of their success in generating functionally-correct code in the first place. We will release our benchmark and evaluation scripts publicly at https://aka.ms/NoFunEval.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.98)
Remotely-Piloted Delivery Service Expands Its Capabilities
Coco, the robot based delivery service, announced the official launch of COCO 1, a larger, more advanced version of its signature pink bot. The COCO 1 is a first of its kind delivery robot designed and manufactured in partnership with the largest micro mobility hardware manufacturer, Segway. Coco is currently deploying 1,000s of COCO 1 robots to serve local merchants in multiple cities, over the next few months. With its increased carrying capacity, the COCO 1 will deliver larger orders for a wider range of merchants, further eliminating the need for car-based delivery. Compared to the current model, the COCO 1 offers a number of added features including a more efficient drivetrain and a larger battery capacity that allows for an increased delivery radius of up to three miles, nearly double the radius of the original model.
- Transportation (0.87)
- Energy > Energy Storage (0.57)
- Electrical Industrial Apparatus (0.57)