Goto

Collaborating Authors

 data component


Prompt Orchestration Markup Language

Zhang, Yuge, Chen, Nan, Xu, Jiahang, Yang, Yuqing

arXiv.org Artificial Intelligence

Large Language Models (LLMs) require sophisticated prompting, yet current practices face challenges in structure, data integration, format sensitivity, and tooling. Existing methods lack comprehensive solutions for organizing complex prompts involving diverse data types (documents, tables, images) or managing presentation variations systematically. To address these gaps, we introduce POML (Prompt Orchestration Markup Language). POML employs component-based markup for logical structure (roles, tasks, examples), specialized tags for seamless data integration, and a CSS-like styling system to decouple content from presentation, reducing formatting sensitivity. It includes templating for dynamic prompts and a comprehensive developer toolkit (IDE support, SDKs) to improve version control and collaboration. We validate POML through two case studies demonstrating its impact on complex application integration (PomLink) and accuracy performance (TableQA), as well as a user study assessing its effectiveness in real-world development scenarios.


Translation of Multifaceted Data without Re-Training of Machine Translation Systems

Moon, Hyeonseok, Lee, Seungyoon, Hong, Seongtae, Lee, Seungjun, Park, Chanjun, Lim, Heuiseok

arXiv.org Artificial Intelligence

Translating major language resources to build minor language resources becomes a widely-used approach. Particularly in translating complex data points composed of multiple components, it is common to translate each component separately. However, we argue that this practice often overlooks the interrelation between components within the same data point. To address this limitation, we propose a novel MT pipeline that considers the intra-data relation in implementing MT for training data. In our MT pipeline, all the components in a data point are concatenated to form a single translation sequence and subsequently reconstructed to the data components after translation. We introduce a Catalyst Statement (CS) to enhance the intra-data relation, and Indicator Token (IT) to assist the decomposition of a translated sequence into its respective data components. Through our approach, we have achieved a considerable improvement in translation quality itself, along with its effectiveness as training data. Compared with the conventional approach that translates each data component separately, our method yields better training data that enhances the performance of the trained model by 2.690 points for the web page ranking (WPR) task, and 0.845 for the question generation (QG) task in the XGLUE benchmark.


Towards automation of threat modeling based on a semantic model of attack patterns and weaknesses

Brazhuk, Andrei

arXiv.org Artificial Intelligence

This works considers challenges of building and usage a formal knowledge base (model), which unites the ATT&CK, CAPEC, CWE, CVE security enumerations. The proposed model can be used to learn relations between attack techniques, attack pattern, weaknesses, and vulnerabilities in order to build various threat landscapes, in particular, for threat modeling. The model is created as an ontology with freely available datasets in the OWL and RDF formats. The use of ontologies is an alternative of structural and graph based approaches to integrate the security enumerations. In this work we consider an approach of threat modeling with the data components of ATT&CK based on the knowledge base and an ontology driven threat modeling framework. Also, some evaluations are made, how it can be possible to use the ontological approach of threat modeling and which challenges this can be faced.


Primer on TensorFlow and how PerceptiLabs Makes it Easier - KDnuggets

#artificialintelligence

In A New Visual Approach to Machine Learning Modeling, we talked about how TensorFlow is one of the most popular machine learning (ML) framework today, but it's not necessarily an easy one for beginners to start building ML models. That's why we decided to create a GUI on top of TensorFlow. With PerceptiLabs, beginners can get started building a model more quickly, and those with more experience can still dive into the code. Both types of users benefit from PerceptiLabs' rich set of visualizations that include the ability to see a model's architecture, experiment and see how parameter and code changes affect models in real time, and view a rich set of training and validation stats. Given that PerceptiLabs runs TensorFlow behind the scenes, we thought we'd walk through the framework so you can understand its basics, and how it is utilized by PerceptiLabs.


Embed Ethical Guidelines in Autonomous Weapons

Communications of the ACM

As a combat veteran and more recently an industry technologist and university professor, I have observed with concern the increasing automation--and dehumanization--of warfare. Sarah Underwood's discussion of autonomous weapons in her news story "Potential and Peril" (June 2017) highlighting this trend also reminded me of the current effort to update the ACM Code of Ethics, which says nothing about the responsibilities of ACM members in defense industries building the software and hardware in weapons systems. Underwood said understanding the limitations, dangers, and potential of autonomous and other warfare technologies must be a priority for those designing such systems in order to minimize the "collateral damage" of civilian casualties and property/infrastructure destruction. Defense technologists must be aware of and follow appropriate ethical guidelines for creating and managing automated weapons systems of any kind. Removing human control and moral reasoning from weapons will not make wars less likely or less harmful to humans.