goliath
David vs. Goliath: A comparative study of different-sized LLMs for code generation in the domain of automotive scenario generation
Bauerfeind, Philipp, Salarpour, Amir, Fernandez, David, MohajerAnsari, Pedram, Reschke, Johannes, Pesé, Mert D.
Scenario simulation is central to testing autonomous driving systems. Scenic, a domain-specific language (DSL) for CARLA, enables precise and reproducible scenarios, but NL-to-Scenic generation with large language models (LLMs) suffers from scarce data, limited reproducibility, and inconsistent metrics. We introduce NL2Scenic, an open dataset and framework with 146 NL/Scenic pairs, a difficulty-stratified 30-case test split, an Example Retriever, and 14 prompting variants (ZS, FS, CoT, SP, MoT). We evaluate 13 models: four proprietary (GPT-4o, GPT-5, Claude-Sonnet-4, Gemini-2.5-pro) and nine open-source code models (Qwen2.5Coder 0.5B-32B; CodeLlama 7B/13B/34B), using text metrics (BLEU, ChrF, EDIT-SIM, CrystalBLEU) and execution metrics (compilation and generation), and compare them with an expert study (n=11). EDIT-SIM correlates best with human judgments; we also propose EDIT-COMP (F1 of EDIT-SIM and compilation) as a robust dataset-level proxy that improves ranking fidelity. GPT-4o performs best overall, while Qwen2.5Coder-14B reaches about 88 percent of its expert score on local hardware. Retrieval-augmented prompting, Few-Shot with Example Retriever (FSER), consistently boosts smaller models, and scaling shows diminishing returns beyond mid-size, with Qwen2.5Coder outperforming CodeLlama at comparable scales. NL2Scenic and EDIT-COMP offer a standardized, reproducible basis for evaluating Scenic code generation and indicate that mid-size open-source models are practical, cost-effective options for autonomous-driving scenario programming.
Baby Yoda may rule Disney Plus, but this hidden gem is worth a look
The first time I watched Gargoyles, a Disney cartoon about stone-winged creatures that come alive at night to fight evil, I was enraptured. This was something so dramatically different from anything I'd seen to that point. Thanks to Disney Plus, I had a chance to rewatch the show. My initial response 25 years later: How the hell did this show even get made? That isn't meant as a slight.
Robotics innovations at CES 2018
The 2018 Nissan Leaf receives CES2018 Tech For a Better World Innovation Award. CES2018, the Consumer Technology Association's massive annual expo, was full of self driving electric and augmented cars. Every hardware startup should visit CES before they build anything. It has to be the most humbling experience any small robotics startup could have. CES2018 is what big marketing budgets look like. And as robotics shifts more and more to consumer facing, this is what the competition looks like.
Welcome to Goliath, our science backed assessment.
Goliath takes a holistic view of people and it has four scales covering: mindset, physical fitness, professional performance, and energy management. Each scale is concise and consists of 10 items and the entire assessment takes about ten minutes to complete. The visual presentation of results on our wheel allows an individual to take in a quick snapshot of how they measure up against our criteria. The overall objective is to move towards a position of balance across the four domains, and in particular across the individual elements. More specifically, the closer one is to the outside edge of the wheel the better the result.
Understanding Ontological Levels
Masolo, Claudio (Laboratory for Applied Ontology, ISTC-CNR)
In this paper, I defend a multiplicative approach that distinguishes statues from amounts of matter, political entities from physical ones, qua entities (e.g. John qua Alitalia passenger) from players (e.g. John), etc. I develop a theory of levels which is based on the primitive notions of level, parthood, and grounding (a kind of existential dependence) and that is used to characterize more specific relations like constitution, inherence, and abstraction. I neither aim to propose a `definitive' theory of levels nor to commit to their ontological or conceptual nature. Hence, the adjective `ontological' used in the title does not qualify the nature of the entities that belong to levels but the way the notion of level is characterized, i.e. in terms of general and philosophically well-founded notions. By keeping away from a purely realist attitude, I can then discuss the adequacy of some alternative first-order theories to account for three puzzling scenarios.