Goto

Collaborating Authors

 Le, Quynh


Llamarine: Open-source Maritime Industry-specific Large Language Model

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated substantial potential in addressing complex reasoning tasks, yet their general-purpose nature often limits their effectiveness in specialized domains such as maritime navigation. To bridge this gap, we introduce Llamarine, the first open-source LLM designed specifically for maritime navigation. Llamarine 1.0 is developed through continued pretraining and fine-tuning on a high-quality corpus comprising maritime textbooks, research publications, and web text from Wikipedia. This domain-specific training enables the model to acquire expert-level knowledge in navigational principles, collision avoidance, route optimization, and regulatory compliance. Our key contributions include (a) the curation of a comprehensive maritime dataset from authoritative sources, ensuring depth and reliability in the model's knowledge base; (b) the development of a foundational model capable of reasoning about complex navigational challenges with greater accuracy than general-purpose LLMs; and (c) the establishment of a benchmark to evaluate performance in maritime-specific decision-making tasks. Experimental results demonstrate that Llamarine outperforms both general-purpose and commercial LLMs in critical navigation-related tasks, such as trajectory planning, risk assessment, and compliance with maritime regulations. By providing an open-source foundation model trained exclusively on high-quality maritime literature, Llamarine paves the way for AI-driven advancements in maritime safety, efficiency, and operational decision-making.


DANA: Domain-Aware Neurosymbolic Agents for Consistency and Accuracy

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have shown remarkable capabilities, but their inherent probabilistic nature often leads to inconsistency and inaccuracy in complex problem-solving tasks. This paper introduces DANA (Domain-Aware Neurosymbolic Agent), an architecture that addresses these issues by integrating domain-specific knowledge with neurosymbolic approaches. We begin by analyzing current AI architectures, including AutoGPT, LangChain ReAct and OpenAI's ChatGPT, through a neurosymbolic lens, highlighting how their reliance on probabilistic inference contributes to inconsistent outputs. In response, DANA captures and applies domain expertise in both natural-language and symbolic forms, enabling more deterministic and reliable problem-solving behaviors. We implement a variant of DANA using Hierarchical Task Plans (HTPs) in the open-source OpenSSA framework. This implementation achieves over 90\% accuracy on the FinanceBench financial-analysis benchmark, significantly outperforming current LLM-based systems in both consistency and accuracy. Application of DANA in physical industries such as semiconductor shows that its flexible architecture for incorporating knowledge is effective in mitigating the probabilistic limitations of LLMs and has potential in tackling complex, real-world problems that require reliability and precision.


Enhancing Q&A with Domain-Specific Fine-Tuning and Iterative Reasoning: A Comparative Study

arXiv.org Artificial Intelligence

AI-powered question-answering (Q&A) systems have emerged as important tools, alongside established search technologies, to enable quick access to relevant information and knowledge from large digital sources that are complex and time-consuming for humans to navigate. Advancements in large language models (LLMs) have revolutionized the field of Q&A, with models like GPT-3 (Brown et al. 2020), BERT (Devlin et al. 2018), and RoBERTa (Liu et al. 2019) demonstrating remarkable abilities in understanding and generating human-like text. However, the effectiveness of such models in handling domain-specific questions that require specialized knowledge is limited. Retrieval-augmented generation (RAG) techniques, which combine information retrieval and generative models (Lewis et al. 2021), have shown promise in boosting the quality of LLM output in Q&A tasks. RAG systems leverage the strengths of both retrieval and generation components to provide contextually relevant and informative responses. While there is a lack of established quantification of RAG accuracy, early findings suggest that generic RAG does not perform well in complex domains such as finance. In one instance, RAG based on generic LLMs such as GPT-4-Turbo fails to answer 81% of the questions derived from Securities and Exchange Commission (SEC) financial filings (Islam et al. 2023). Aitomatic, Inc. (except as noted, all authors are from Aitomatic)