technical document
Do Chatbots Walk the Talk of Responsible AI?
Aaronson, Susan Ariel, Moreno, Michael
Introduction In April 2025, sixteen - year - old Adam Raine committed suicide . Over the course of several months, the teen confided his suicidal thoughts to Open AI's ChatGPT chatbot . ChatGPT is not designed or developed to provide therapy, but it did not respond to Adam's prompts with suggestions that he obtain professional help . Moreover, w hen Adam expressed concern that his parents would blame themselves if he died, ChatGPT reportedly responded, "That doesn't mean you owe them survival," and offered to help draft his suicide note. Adam's death was not the only example of chatbot misbehavior. OpenAI claims it doesn't permit ChatGPT "to generate hateful, harassing, violent, or adult content." In July 2025, a reporter documented ChatGPT providing users with detailed instructions for self - mutilation, murder, and satanic rituals. O penAI has also acknowledged that individuals can misuse its systems. But the company has taken some responsibility.
- North America > Canada (0.15)
- North America > United States (0.14)
- Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.04)
- Law (1.00)
- Government (1.00)
- Information Technology > Security & Privacy (0.68)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.41)
Contextual Graph Transformer: A Small Language Model for Enhanced Engineering Document Information Extraction
Standard transformer-based language models, while powerful for general text, often struggle with the fine-grained syntax and entity relationships in complex technical, engineering documents. To address this, we propose the Contextual Graph Transformer (CGT), a hybrid neural architecture that combines Graph Neural Networks (GNNs) and Transformers for domain-specific question answering. CGT constructs a dynamic graph over input tokens using sequential, skip-gram, and semantic similarity edges, which is processed by GATv2Conv layers for local structure learning. These enriched embeddings are then passed to a Transformer encoder to capture global dependencies. Unlike generic large models, technical domains often require specialized language models with stronger contextualization and structure awareness. CGT offers a parameter-efficient solution for such use cases. Integrated into a Retrieval-Augmented Generation (RAG) pipeline, CGT outperforms baselines like GPT-2 and BERT, achieving 24.7% higher accuracy than GPT-2 with 62.4% fewer parameters. This gain stems from CGTs ability to jointly model structural token interactions and long-range semantic coherence. The model is trained from scratch using a two-phase approach: pretraining on general text followed by fine-tuning on domain-specific manuals. This highlights CGTs adaptability to technical language, enabling better grounding, entity tracking, and retrieval-augmented responses in real-world applications.
- Research Report (0.50)
- Overview (0.48)
GAI: Generative Agents for Innovation
This study examines whether collective reasoning among generative agents can facilitate novel and coherent thinking that leads to innovation. To achieve this, it proposes GAI, a new LLM-empowered framework designed for reflection and interaction among multiple generative agents to replicate the process of innovation. The core of the GAI framework lies in an architecture that dynamically processes the internal states of agents and a dialogue scheme specifically tailored to facilitate analogy-driven innovation. The framework's functionality is evaluated using Dyson's invention of the bladeless fan as a case study, assessing the extent to which the core ideas of the innovation can be replicated through a set of fictional technical documents. The experimental results demonstrate that models with internal states significantly outperformed those without, achieving higher average scores and lower variance. Notably, the model with five heterogeneous agents equipped with internal states successfully replicated the key ideas underlying the Dyson's invention. This indicates that the internal state enables agents to refine their ideas, resulting in the construction and sharing of more coherent and comprehensive concepts.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Japan > Honshū > Tōhoku > Miyagi Prefecture > Sendai (0.04)
Telco-DPR: A Hybrid Dataset for Evaluating Retrieval Models of 3GPP Technical Specifications
Saraiva, Thaina, Sousa, Marco, Vieira, Pedro, Rodrigues, António
This paper proposes a Question-Answering (QA) system for the telecom domain using 3rd Generation Partnership Project (3GPP) technical documents. Alongside, a hybrid dataset, Telco-DPR, which consists of a curated 3GPP corpus in a hybrid format, combining text and tables, is presented. Additionally, the dataset includes a set of synthetic question/answer pairs designed to evaluate the retrieval performance of QA systems on this type of data. The retrieval models, including the sparse model, Best Matching 25 (BM25), as well as dense models, such as Dense Passage Retriever (DPR) and Dense Hierarchical Retrieval (DHR), are evaluated and compared using top-K accuracy and Mean Reciprocal Rank (MRR). The results show that DHR, a retriever model utilising hierarchical passage selection through fine-tuning at both the document and passage levels, outperforms traditional methods in retrieving relevant technical information, achieving a Top-10 accuracy of 86.2%. Additionally, the Retriever-Augmented Generation (RAG) technique, used in the proposed QA system, is evaluated to demonstrate the benefits of using the hybrid dataset and the DHR. The proposed QA system, using the developed RAG model and the Generative Pretrained Transformer (GPT)-4, achieves a 14% improvement in answer accuracy, when compared to a previous benchmark on the same dataset.
- Europe > Portugal > Lisbon > Lisbon (0.14)
- North America > United States (0.05)
Evaluation of Table Representations to Answer Questions from Tables in Documents : A Case Study using 3GPP Specifications
Roychowdhury, Sujoy, Soman, Sumit, Ranjani, HG, Sharma, Avantika, Gunda, Neeraj, Bala, Sai Krishna
With the ubiquitous use of document corpora for question answering, one important aspect which is especially relevant for technical documents is the ability to extract information from tables which are interspersed with text. The major challenge in this is that unlike free-flow text or isolated set of tables, the representation of a table in terms of what is a relevant chunk is not obvious. We conduct a series of experiments examining various representations of tabular data interspersed with text to understand the relative benefits of different representations. We choose a corpus of $3^{rd}$ Generation Partnership Project (3GPP) documents since they are heavily interspersed with tables. We create expert curated dataset of question answers to evaluate our approach. We conclude that row level representations with corresponding table header information being included in every cell improves the performance of the retrieval, thus leveraging the structural information present in the tabular data.
Deep Learning for Technical Document Classification
Jiang, Shuo, Hu, Jie, Magee, Christopher L., Luo, Jianxi
In large technology companies, the requirements for managing and organizing technical documents created by engineers and managers have increased dramatically in recent years, which has led to a higher demand for more scalable, accurate, and automated document classification. Prior studies have only focused on processing text for classification, whereas technical documents often contain multimodal information. To leverage multimodal information for document classification to improve the model performance, this paper presents a novel multimodal deep learning architecture, TechDoc, which utilizes three types of information, including natural language texts and descriptive images within documents and the associations among the documents. The architecture synthesizes the convolutional neural network, recurrent neural network, and graph neural network through an integrated training process. We applied the architecture to a large multimodal technical document database and trained the model for classifying documents based on the hierarchical International Patent Classification system. Our results show that TechDoc presents a greater classification accuracy than the unimodal methods and other state-of-the-art benchmarks. The trained model can potentially be scaled to millions of real-world multimodal technical documents, which is useful for data and knowledge management in large technology companies and organizations.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Asia > Singapore (0.05)
- Asia > China > Shanghai > Shanghai (0.05)
- (5 more...)
- Information Technology (1.00)
- Law > Intellectual Property & Technology Law (0.68)
Fake It to Make It: Companies Beef Up AI Models With Synthetic Data
Companies rely on real-world data to train artificial-intelligence models that can identify anomalies, make predictions and generate insights. To detect credit-card fraud, for example, researchers train AI models to look for specific patterns of known suspicious behavior, gleaned from troves of data. But unique, or rare, types of fraud are difficult to detect when there isn't enough data to support the algorithm's training. To get around that, companies are learning to fake it, building so-called synthetic data sets designed to augment training data. At American Express Co., machine-learning and data scientists have been experimenting with synthetic data for nearly two years in hopes of improving the company's AI-based fraud-detection models, said Dmitry Efimov, head of the company's Machine Learning Center of Excellence. The credit-card company uses an advanced form of AI to generate fake fraud patterns aimed at bolstering the real training data.
- Information Technology (1.00)
- Banking & Finance (0.96)
- Law Enforcement & Public Safety > Fraud (0.78)
Cybersecurity Researchers Build a Better 'Canary Trap'
During World War II, British intelligence agents planted false documents on a corpse to fool Nazi Germany into preparing for an assault on Greece. "Operation Mincemeat" was a success, and covered the actual Allied invasion of Sicily. The "canary trap" technique in espionage spreads multiple versions of false documents to conceal a secret. Canary traps can be used to sniff out information leaks, or as in WWII, to create distractions that hide valuable information. WE-FORGE, a new data protection system designed in the Department of Computer Science, uses artificial intelligence to build on the canary trap concept.
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.43)
10 Google Patents to Boost Your SEO Effort
Learning about SEO is a bit of a challenge, isn't it? On the one hand, there is no single body of knowledge and the information has to be collected bit by bit from many different places. On the other hand, the information is often misinterpreted, giving rise to fake ranking factors and far fetched theories. That's why to learn the truth about SEO, it's best to go to the very source -- Google itself. In the past, I have already discussed a few sources of SEO information at Google, namely the SEO Starter Guide and the Quality Raters Guidelines.
- Law > Intellectual Property & Technology Law (0.49)
- Information Technology > Services (0.35)
Developing AI-enabled database system for technical documents
It has been announced that Showa Denko and Cinnamon will jointly develop a database system equipped with artificial intelligence (AI) to register technical documents. This development program has been accredited as a project to be subsidized by the New Energy and Technology Development Organization (NEDO) as a part of NEDO's'Program to Support Joint Development of AI Systems'. Technical documents accumulated by Japan's manufacturing industry contain a massive amount of knowledge. However, most of those documents are stored as data on paper. Showa Denko believes that if we want to make the most of pre-existing technical knowledge and utilize it as a source of new value, we should convert'analog data' into'digital data' and store it on electronic databases. It is difficult to manually convert massive amounts of paper based data.