Goto

Collaborating Authors

 Generative AI


Arabic Large Language Models for Medical Text Generation

arXiv.org Artificial Intelligence

Efficient hospital management systems (HMS) are critical worldwide to address challenges such as overcrowding, limited resources, and poor availability of urgent health care. Existing methods often lack the ability to provide accurate, real-time medical advice, particularly for irregular inputs and underrepresented languages. To overcome these limitations, this study proposes an approach that fine-tunes large language models (LLMs) for Arabic medical text generation. The system is designed to assist patients by providing accurate medical advice, diagnoses, drug recommendations, and treatment plans based on user input. The research methodology required the collection of a unique dataset from social media platforms, capturing real-world medical conversations between patients and doctors. The dataset, which includes patient complaints together with medical advice, was properly cleaned and preprocessed to account for multiple Arabic dialects. Fine-tuning state-of-the-art generative models, such as Mistral-7B-Instruct-v0.2, LLaMA-2-7B, and GPT-2 Medium, optimized the system's ability to generate reliable medical text. Results from evaluations indicate that the fine-tuned Mistral-7B model outperformed the other models, achieving average BERT (Bidirectional Encoder Representations from Transformers) Score values in precision, recall, and F1-scores of 68.5\%, 69.08\%, and 68.5\%, respectively. Comparative benchmarking and qualitative assessments validate the system's ability to produce coherent and relevant medical replies to informal input. This study highlights the potential of generative artificial intelligence (AI) in advancing HMS, offering a scalable and adaptable solution for global healthcare challenges, especially in linguistically and culturally diverse environments.


A Modular and Multimodal Generative AI Framework for Urban Building Energy Data: Generating Synthetic Homes

arXiv.org Artificial Intelligence

Computational models have emerged as powerful tools for energy modeling research, touting scalability and quantitative results. However, these models require a plethora of data, some of which is inaccessible, expensive, or raises privacy concerns. We introduce a modular multimodal framework to produce this data from publicly accessible residential information and images using generative artificial intelligence (AI). Additionally, we provide a pipeline demonstrating this framework, and we evaluate its generative AI components. Our experiments show that our framework's use of AI avoids common issues with generative models. Our framework produces realistic, labeled data. By reducing dependence on costly or restricted data sources, we pave a path towards more accessible and reproducible research.


Imposing AI: Deceptive design patterns against sustainability

arXiv.org Artificial Intelligence

Generative AI is being massively deployed in digital services, at a scale that will result in significant environmental harm. We document how tech companies are transforming established user interfaces to impose AI use and show how and to what extent these strategies fit within established deceptive pattern categories. We identify two main design strategies that are implemented to impose AI use in both personal and professional contexts: imposing AI features in interfaces at the expense of existing non-AI features and promoting narratives about AI that make it harder to resist using it. We discuss opportunities for regulating the imposed adoption of AI features, which would inevitably lead to negative environmental effects.


'It's going to be a life skill': educators discuss the impact of AI on university education

The Guardian

'It's going to be a life skill': educators discuss the impact of AI on university education Artificial intelligence is changing how students learn and the world they'll graduate into. OpenAI CEO Sam Altman recently told a US podcast that if he was graduating today, "I would feel like the luckiest kid in all of history." Altman, whose company developed and released ChatGPT in November 2022, believes the transformative power of AI offers unprecedented opportunities for young people. Yes, there will be job displacement, but "this always happens," says Altman, "and young people are the best at adapting to this." New, more exciting jobs will emerge, full of greater possibilities. For UK sixth-formers and their families looking at universities, trying to make the best possible choices about what to study - and where - in the age of generative AI, Altman's words may offer some comfort.


Automated Classification of Tutors' Dialogue Acts Using Generative AI: A Case Study Using the CIMA Corpus

arXiv.org Artificial Intelligence

First submitted: 30 Oct 2023. The final version will be available open access via the journal. Abstract This study explores the use of generative AI for automating the classification of tutors' Dialogue Acts (DAs), aiming to reduce the time and effort required by traditional manual coding. This case study uses the open - source CIMA corpus, in which tutors' re sponses are pre - annotated into four DA categories. Both GPT - 3.5 - turbo and GPT - 4 models were tested using tailored prompts. Results show that GPT - 4 achieved 80% accuracy, a weighted F1 - score of 0.81, and a Cohen's Kappa of 0.74, surpassing baseline performa nce and indicating substantial agreement with human annotations. These findings suggest that generative AI has strong potential to provide an efficient and accessible approach to DA classification, with meaningful implications for educational dialogue analysis. The study also highlights the importance of task - specific label definitions and contextual information in enhanc ing the quality of automated annotation. Finally, it underscores the ethical considerations associated with the use of generative AI and the need for responsible and transparent research practices.


Can Large Language Models Understand As Well As Apply Patent Regulations to Pass a Hands-On Patent Attorney Test?

arXiv.org Artificial Intelligence

The legal field already uses various large language models (LLMs) in actual applications, but their quantitative performance and reasons for it are underexplored. We evaluated several open-source and proprietary LLMs -- including GPT-series, Anthropic, Deepseek and Llama-3, variants -- on parts of the European Qualifying Examination (EQE) for future European Patent Attorneys. OpenAI o1 led with 0.82 accuracy and 0.81 F1 score, whereas (Amazon Web Services) AWS Llama 3.1 8B lagged at 0.50 accuracy, and a Python-deployed Llama 3.1 8B scored 0.55. The latter two are within the range of mere guessing for the two-answer forced-choice design. None of the evaluated models could have passed the examination fully, as accuracy never exceeded the average threshold of 0.90 required for professional-level standards -- also not models that are regularly promoted for their assumed beyond-PhD- and bar-admitted-lawyer-level performance. GPT-4o excelled at integrating text and graphics, while Claude 3 Opus often lost formatting coherence. Human patent experts evaluated the textual justifications and uncovered various critical shortcomings of each model. They valued clarity and legal rationale over the raw correctness of the answers, which revealed misalignment between automatic metrics and expert judgment. Model outputs were sensitive to modest temperature changes and prompt wording, which underscores the remaining necessity of expert oversight. Future work should target logical consistency, robust multimodality, and adaptive prompting to approach human-level patent proficiency. In summary, despite the outstanding performance of recent large models, the general public might overestimate their performance. The field has a long way to go to develop a virtual patent attorney. This paper wants to point out several specific limitations that need solutions.


Partnering with generative AI in the finance function

MIT Technology Review

CFOs are experimenting with AI use cases to free up capacity for business-critical work. Generative AI has the potential to transform the finance function. By taking on some of the more mundane tasks that can occupy a lot of time, generative AI tools can help free up capacity for more high-value strategic work. For chief financial officers, this could mean spending more time and energy on proactively advising the business on financial strategy as organizations around the world continue to weather ongoing geopolitical and financial uncertainty. CFOs can use large language models (LLMs) and generative AI tools to support everyday tasks like generating quarterly reports, communicating with investors, and formulating strategic summaries, says Andrew W. Lo, Charles E. and Susan T. Harris professor and director of the Laboratory for Financial Engineering at the MIT Sloan School of Management. "LLMs can't replace the CFO by any means, but they can take a lot of the drudgery out of the role by providing first drafts of documents that summarize key issues and outline strategic priorities."


Data-driven generative simulation of SDEs using diffusion models

arXiv.org Machine Learning

This paper introduces a new approach to generating sample paths of unknown stochastic differential equations (SDEs) using diffusion models, a class of generative AI models commonly employed in image and video applications. Unlike the traditional Monte Carlo methods for simulating SDEs, which require explicit specifications of the drift and diffusion coefficients, our method takes a model-free, data-driven approach. Given a finite set of sample paths from an SDE, we utilize conditional diffusion models to generate new, synthetic paths of the same SDE. To demonstrate the effectiveness of our approach, we conduct a simulation experiment to compare our method with alternative benchmark ones including neural SDEs. Furthermore, in an empirical study we leverage these synthetically generated sample paths to enhance the performance of reinforcement learning algorithms for continuous-time mean-variance portfolio selection, hinting promising applications of diffusion models in financial analysis and decision-making.


A Structured Review of Underwater Object Detection Challenges and Solutions: From Traditional to Large Vision Language Models

arXiv.org Artificial Intelligence

Despite its significance, the underwater world remains largely overlooked as a result of the challenging conditions that hinder traditional research methods. Historically, the study of marine ecosystems relied on labor intensive research [1], which provided limited data and had a high error margin. In recent years, advances in autonomous and remotely operated vehicles (AUVs and ROVs) have revolutionized underwater exploration. These technologies, equipped with object detection systems, now allow real-time monitoring, which includes capturing images of marine organisms, environmental conditions, and even assessing biodiversity [2], [3]. However, the quality of images and videos captured underwater remains a significant obstacle. Light absorption, scattering, and water-related distortions, such as haze and color shifts [4], create noisy low-contrast images, further compounded by complex underwater backgrounds and camera motion. These challenges call for advanced detection techniques capable of accurately identifying and localizing objects despite underwater noise. Efficient underwater object detection (UOD) is crucial for a variety of marine applications, including biodiversity monitoring, conservation efforts, and resource management.


Prompt-Driven Image Analysis with Multimodal Generative AI: Detection, Segmentation, Inpainting, and Interpretation

arXiv.org Artificial Intelligence

Prompt-driven image analysis converts a single natural-language instruction into multiple steps: locate, segment, edit, and describe. We present a practical case study of a unified pipeline that combines open-vocabulary detection, promptable segmentation, text-conditioned inpainting, and vision-language description into a single workflow. The system works end to end from a single prompt, retains intermediate artifacts for transparent debugging (such as detections, masks, overlays, edited images, and before and after composites), and provides the same functionality through an interactive UI and a scriptable CLI for consistent, repeatable runs. We highlight integration choices that reduce brittleness, including threshold adjustments, mask inspection with light morphology, and resource-aware defaults. In a small, single-word prompt segment, detection and segmentation produced usable masks in over 90% of cases with an accuracy above 85% based on our criteria. On a high-end GPU, inpainting makes up 60 to 75% of total runtime under typical guidance and sampling settings, which highlights the need for careful tuning. The study offers implementation-guided advice on thresholds, mask tightness, and diffusion parameters, and details version pinning, artifact logging, and seed control to support replay. Our contribution is a transparent, reliable pattern for assembling modern vision and multimodal models behind a single prompt, with clear guardrails and operational practices that improve reliability in object replacement, scene augmentation, and removal.