uuid
- Europe > United Kingdom > North Sea > Central North Sea (0.04)
- North America > United States > Arizona (0.04)
- Research Report (0.93)
- Workflow (0.71)
- Education (0.93)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.70)
- Leisure & Entertainment > Games > Computer Games (0.67)
- Research Report (0.93)
- Workflow (0.71)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.70)
- Leisure & Entertainment > Games > Computer Games (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Semantic Specialization in MoE Appears with Scale: A Study of DeepSeek R1 Expert Specialization
Olson, Matthew Lyle, Ratzlaff, Neale, Hinck, Musashi, Luo, Man, Yu, Sungduk, Xue, Chendi, Lal, Vasudev
DeepSeek-R1, the largest open-source Mixture-of-Experts (MoE) model, has demonstrated reasoning capabilities comparable to proprietary frontier models. Prior research has explored expert routing in MoE models, but findings suggest that expert selection is often token-dependent rather than semantically driven. Given DeepSeek-R1's enhanced reasoning abilities, we investigate whether its routing mechanism exhibits greater semantic specialization than previous MoE models. To explore this, we conduct two key experiments: (1) a word sense disambiguation task, where we examine expert activation patterns for words with differing senses, and (2) a cognitive reasoning analysis, where we assess DeepSeek-R1's structured thought process in an interactive task setting of DiscoveryWorld. We conclude that DeepSeek-R1's routing mechanism is more semantically aware and it engages in structured cognitive processes.
SM3-Text-to-Query: Synthetic Multi-Model Medical Text-to-Query Benchmark
Sivasubramaniam, Sithursan, Osei-Akoto, Cedric, Zhang, Yi, Stockinger, Kurt, Fuerst, Jonathan
Electronic health records (EHRs) are stored in various database systems with different database models on heterogeneous storage architectures, such as relational databases, document stores, or graph databases. These different database models have a big impact on query complexity and performance. While this has been a known fact in database research, its implications for the growing number of Text-to-Query systems have surprisingly not been investigated so far. In this paper, we present SM3-Text-to-Query, the first multi-model medical Text-to-Query benchmark based on synthetic patient data from Synthea, following the SNOMED-CT taxonomy -- a widely used knowledge graph ontology covering medical terminology. SM3-Text-to-Query provides data representations for relational databases (PostgreSQL), document stores (MongoDB), and graph databases (Neo4j and GraphDB (RDF)), allowing the evaluation across four popular query languages, namely SQL, MQL, Cypher, and SPARQL. We systematically and manually develop 408 template questions, which we augment to construct a benchmark of 10K diverse natural language question/query pairs for these four query languages (40K pairs overall). On our dataset, we evaluate several common in-context-learning (ICL) approaches for a set of representative closed and open-source LLMs. Our evaluation sheds light on the trade-offs between database models and query languages for different ICL strategies and LLMs. Last, SM3-Text-to-Query is easily extendable to additional query languages or real, standard-based patient databases.
- North America > United States (0.93)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Czechia > Prague (0.04)
- (5 more...)
Text2BIM: Generating Building Models Using a Large Language Model-based Multi-Agent Framework
Du, Changyu, Esser, Sebastian, Nousias, Stavros, Borrmann, André
The conventional BIM authoring process typically requires designers to master complex and tedious modeling commands in order to materialize their design intentions within BIM authoring tools. This additional cognitive burden complicates the design process and hinders the adoption of BIM and model-based design in the AEC (Architecture, Engineering, and Construction) industry. To facilitate the expression of design intentions more intuitively, we propose Text2BIM, an LLM-based multi-agent framework that can generate 3D building models from natural language instructions. This framework orchestrates multiple LLM agents to collaborate and reason, transforming textual user input into imperative code that invokes the BIM authoring tool's APIs, thereby generating editable BIM models with internal layouts, external envelopes, and semantic information directly in the software. Furthermore, a rule-based model checker is introduced into the agentic workflow, utilizing predefined domain knowledge to guide the LLM agents in resolving issues within the generated models and iteratively improving model quality. Extensive experiments were conducted to compare and analyze the performance of three different LLMs under the proposed framework. The evaluation results demonstrate that our approach can effectively generate high-quality, structurally rational building models that are aligned with the abstract concepts specified by user input. Finally, an interactive software prototype was developed to integrate the framework into the BIM authoring software Vectorworks, showcasing the potential of modeling by chatting.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
- North America > United States (0.04)
- Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Asia > China > Fujian Province (0.04)
- Construction & Engineering (1.00)
- Leisure & Entertainment > Games (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)
DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents
Jansen, Peter, Côté, Marc-Alexandre, Khot, Tushar, Bransom, Erin, Mishra, Bhavana Dalvi, Majumder, Bodhisattwa Prasad, Tafjord, Oyvind, Clark, Peter
Automated scientific discovery promises to accelerate progress across scientific domains. However, developing and evaluating an AI agent's capacity for end-to-end scientific reasoning is challenging as running real-world experiments is often prohibitively expensive or infeasible. In this work we introduce DISCOVERYWORLD, the first virtual environment for developing and benchmarking an agent's ability to perform complete cycles of novel scientific discovery. DISCOVERYWORLD contains a variety of different challenges, covering topics as diverse as radioisotope dating, rocket science, and proteomics, to encourage development of general discovery skills rather than task-specific solutions. DISCOVERYWORLD itself is an inexpensive, simulated, text-based environment (with optional 2D visual overlay). It includes 120 different challenge tasks, spanning eight topics each with three levels of difficulty and several parametric variations. Each task requires an agent to form hypotheses, design and run experiments, analyze results, and act on conclusions. DISCOVERYWORLD further provides three automatic metrics for evaluating performance, based on (a) task completion, (b) task-relevant actions taken, and (c) the discovered explanatory knowledge. We find that strong baseline agents, that perform well in prior published environments, struggle on most DISCOVERYWORLD tasks, suggesting that DISCOVERYWORLD captures some of the novel challenges of discovery, and thus that DISCOVERYWORLD may help accelerate near-term development and assessment of scientific discovery competency in agents. Code available at: www.github.com/allenai/discoveryworld
- Education (0.93)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.90)
- Leisure & Entertainment > Games > Computer Games (0.67)
VALERIE22 -- A photorealistic, richly metadata annotated dataset of urban environments
The VALERIE tool pipeline is a synthetic data generator developed with the goal to contribute to the understanding of domain-specific factors that influence perception performance of DNNs (deep neural networks). This work was carried out under the German research project KI Absicherung in order to develop a methodology for the validation of DNNs in the context of pedestrian detection in urban environments for automated driving. The VALERIE22 dataset was generated with the VALERIE procedural tools pipeline providing a photorealistic sensor simulation rendered from automatically synthesized scenes. The dataset provides a uniquely rich set of metadata, allowing extraction of specific scene and semantic features (like pixel-accurate occlusion rates, positions in the scene and distance + angle to the camera). This enables a multitude of possible tests on the data and we hope to stimulate research on understanding performance of DNNs. Based on performance metric a comparison with several other publicly available datasets is provided, demonstrating that VALERIE22 is one of best performing synthetic datasets currently available in the open domain.
- North America > United States > New York > New York County > New York City (0.05)
- Asia > India (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (4 more...)
- Transportation > Ground > Road (1.00)
- Information Technology (1.00)
- Automobiles & Trucks (1.00)
General-Purpose Computing on a Semantic Network Substrate
A semantic network is a directed labeled graph (Sowa, 1991). The thesis of this article is that the state of a computing machine, its low-level instructions, and the executing program can be represented as a semantic network. The computational model that is presented can be instantiated using any semantic network representation. However, given the existence of the Resource Description Framework (RDF) (Manola & Miller, 2004) and the popular Web Ontology Language (OWL) (McGuinness & Harmelen, 2004), this article presents the theory and the application in terms of these constructs. The computing model that is proposed is perhaps simple in theory, but in application, requires a relatively strong background in the computer sciences.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (8 more...)