Not enough data to create a plot.
Try a different view from the menu above.
Oman
Weak Supervision Performance Evaluation via Partial Identification
Programmatic Weak Supervision (PWS) enables supervised model training without direct access to ground truth labels, utilizing weak labels from heuristics, crowdsourcing, or pre-trained models. However, the absence of ground truth complicates model evaluation, as traditional metrics such as accuracy, precision, and recall cannot be directly calculated. In this work, we present a novel method to address this challenge by framing model evaluation as a partial identification problem and estimating performance bounds using Fréchet bounds. Our approach derives reliable bounds on key metrics without requiring labeled data, overcoming core limitations in current weak supervision evaluation techniques. Through scalable convex optimization, we obtain accurate and computationally efficient bounds for metrics including accuracy, precision, recall, and F1-score, even in high-dimensional settings. This framework offers a robust approach to assessing model quality without ground truth labels, enhancing the practicality of weakly supervised learning for real-world applications.
Synthetic Data for Robust AI Model Development in Regulated Enterprises
In today's business landscape, organizations need to find the right balance between using their customers' data ethically to power AI solutions and being compliant regarding data privacy and data usage regulations. In this paper, we discuss synthetic data as a possible solution to this dilemma. Synthetic data is simulated data that mimics the real data. We explore how organizations in heavily regulated industries, such as financial institutions or healthcare organizations, can leverage synthetic data to build robust AI solutions while staying compliant. We demonstrate that synthetic data offers two significant advantages by allowing AI models to learn from more diverse data and by helping organizations stay compliant against data privacy laws with the use of synthetic data instead of customer information. We discuss case studies to show how synthetic data can be effectively used in the finance and healthcare sector while discussing the challenges of using synthetic data and some ethical questions it raises. Our research finds that synthetic data could be a game-changer for AI in regulated industries. The potential can be realized when industry, academia, and regulators collaborate to build solutions. We aim to initiate discussions on the use of synthetic data to build ethical, responsible, and effective AI systems in regulated enterprise industries.
PenTest++: Elevating Ethical Hacking with AI and Automation
Al-Sinani, Haitham S., Mitchell, Chris J.
Traditional ethical hacking relies on skilled professionals and time-intensive command management, which limits its scalability and efficiency. To address these challenges, we introduce PenTest++, an AI-augmented system that integrates automation with generative AI (GenAI) to optimise ethical hacking workflows. Developed in a controlled virtual environment, PenTest++ streamlines critical penetration testing tasks, including reconnaissance, scanning, enumeration, exploitation, and documentation, while maintaining a modular and adaptable design. The system balances automation with human oversight, ensuring informed decision-making at key stages, and offers significant benefits such as enhanced efficiency, scalability, and adaptability. However, it also raises ethical considerations, including privacy concerns and the risks of AI-generated inaccuracies (hallucinations). This research underscores the potential of AI-driven systems like PenTest++ to complement human expertise in cybersecurity by automating routine tasks, enabling professionals to focus on strategic decision-making. By incorporating robust ethical safeguards and promoting ongoing refinement, PenTest++ demonstrates how AI can be responsibly harnessed to address operational and ethical challenges in the evolving cybersecurity landscape.
AI-Enhanced Ethical Hacking: A Linux-Focused Experiment
Al-Sinani, Haitham S., Mitchell, Chris J.
This technical report investigates the integration of generative AI (GenAI), specifically ChatGPT, into the practice of ethical hacking through a comprehensive experimental study and conceptual analysis. Conducted in a controlled virtual environment, the study evaluates GenAI's effectiveness across the key stages of penetration testing on Linux-based target machines operating within a virtual local area network (LAN), including reconnaissance, scanning and enumeration, gaining access, maintaining access, and covering tracks. The findings confirm that GenAI can significantly enhance and streamline the ethical hacking process while underscoring the importance of balanced human-AI collaboration rather than the complete replacement of human input. The report also critically examines potential risks such as misuse, data biases, hallucination, and over-reliance on AI. This research contributes to the ongoing discussion on the ethical use of AI in cybersecurity and highlights the need for continued innovation to strengthen security defences.
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Gaido, Marco, Papi, Sara, Bentivogli, Luisa, Brutti, Alessio, Cettolo, Mauro, Gretter, Roberto, Matassoni, Marco, Nabih, Mohamed, Negri, Matteo
The rise of foundation models (FMs), coupled with regulatory efforts addressing their risks and impacts, has sparked significant interest in open-source models. However, existing speech FMs (SFMs) fall short of full compliance with the open-source principles, even if claimed otherwise, as no existing SFM has model weights, code, and training data publicly available under open-source terms. In this work, we take the first step toward filling this gap by focusing on the 24 official languages of the European Union (EU). We collect suitable training data by surveying automatic speech recognition datasets and unlabeled speech corpora under open-source compliant licenses, for a total of 950k hours. Additionally, we release automatic transcripts for 441k hours of unlabeled data under the permissive CC-BY license, thereby facilitating the creation of open-source SFMs for the EU languages.
An Early Investigation into the Utility of Multimodal Large Language Models in Medical Imaging
Khan, Sulaiman, Biswas, Md. Rafiul, Murad, Alina, Ali, Hazrat, Shah, Zubair
Recent developments in multimodal large language models (MLLMs) have spurred significant interest in their potential applications across various medical imaging domains. On the one hand, there is a temptation to use these generative models to synthesize realistic-looking medical image data, while on the other hand, the ability to identify synthetic image data in a pool of data is also significantly important. In this study, we explore the potential of the Gemini (\textit{gemini-1.0-pro-vision-latest}) and GPT-4V (gpt-4-vision-preview) models for medical image analysis using two modalities of medical image data. Utilizing synthetic and real imaging data, both Gemini AI and GPT-4V are first used to classify real versus synthetic images, followed by an interpretation and analysis of the input images. Experimental results demonstrate that both Gemini and GPT-4 could perform some interpretation of the input images. In this specific experiment, Gemini was able to perform slightly better than the GPT-4V on the classification task. In contrast, responses associated with GPT-4V were mostly generic in nature. Our early investigation presented in this work provides insights into the potential of MLLMs to assist with the classification and interpretation of retinal fundoscopy and lung X-ray images. We also identify key limitations associated with the early investigation study on MLLMs for specialized tasks in medical image analysis.
Visual Hallucination: Definition, Quantification, and Prescriptive Remediations
Rani, Anku, Rawte, Vipula, Sharma, Harshad, Anand, Neeraj, Rajbangshi, Krishnav, Sheth, Amit, Das, Amitava
The troubling rise of hallucination presents perhaps the most significant impediment to the advancement of responsible AI. In recent times, considerable research has focused on detecting and mitigating hallucination in Large Language Models (LLMs). However, it's worth noting that hallucination is also quite prevalent in Vision-Language models (VLMs). In this paper, we offer a fine-grained discourse on profiling VLM hallucination based on two tasks: i) image captioning, and ii) Visual Question Answering (VQA). We delineate eight fine-grained orientations of visual hallucination: i) Contextual Guessing, ii) Identity Incongruity, iii) Geographical Erratum, iv) Visual Illusion, v) Gender Anomaly, vi) VLM as Classifier, vii) Wrong Reading, and viii) Numeric Discrepancy. We curate Visual HallucInation eLiciTation (VHILT), a publicly available dataset comprising 2,000 samples generated using eight VLMs across two tasks of captioning and VQA along with human annotations for the categories as mentioned earlier.
Improving Sequence-to-Sequence Models for Abstractive Text Summarization Using Meta Heuristic Approaches
Saxena, Aditya, Ranjan, Ashutosh
As human society transitions into the information age, reduction in our attention span is a contingency, and people who spend time reading lengthy news articles are decreasing rapidly and the need for succinct information is higher than ever before. Therefore, it is essential to provide a quick overview of important news by concisely summarizing the top news article and the most intuitive headline. When humans try to make summaries, they extract the essential information from the source and add useful phrases and grammatical annotations from the original extract. Humans have a unique ability to create abstractions. However, automatic summarization is a complicated problem to solve. The use of sequence-to-sequence (seq2seq) models for neural abstractive text summarization has been ascending as far as prevalence. Numerous innovative strategies have been proposed to develop the current seq2seq models further, permitting them to handle different issues like saliency, familiarity, and human lucidness and create excellent synopses. In this article, we aimed toward enhancing the present architectures and models for abstractive text summarization. The modifications have been aimed at fine-tuning hyper-parameters, attempting specific encoder-decoder combinations. We examined many experiments on an extensively used CNN/DailyMail dataset to check the effectiveness of various models.
Taxi dispatching strategies with compensations
Billhardt, Holger, Fernández, Alberto, Ossowski, Sascha, Palanca, Javier, Bajo, Javier
Urban mobility efficiency is of utmost importance in big cities. Taxi vehicles are key elements in daily traffic activity. The advance of ICT and geo-positioning systems has given rise to new opportunities for improving the efficiency of taxi fleets in terms of waiting times of passengers, cost and time for drivers, traffic density, CO2 emissions, etc., by using more informed, intelligent dispatching. Still, the explicit spatial and temporal components, as well as the scale and, in particular, the dynamicity of the problem of pairing passengers and taxis in big towns, render traditional approaches for solving standard assignment problem useless for this purpose, and call for intelligent approximation strategies based on domain-specific heuristics. Furthermore, taxi drivers are often autonomous actors and may not agree to participate in assignments that, though globally efficient, may not be sufficently beneficial for them individually. This paper presents a new heuristic algorithm for taxi assignment to customers that considers taxi reassignments if this may lead to globally better solutions. In addition, as such new assignments may reduce the expected revenues of individual drivers, we propose an economic compensation scheme to make individually rational drivers agree to proposed modifications in their assigned clients. We carried out a set of experiments, where several commonly used assignment strategies are compared to three different instantiations of our heuristic algorithm. The results indicate that our proposal has the potential to reduce customer waiting times in fleets of autonomous taxis, while being also beneficial from an economic point of view.
Modeling Considerations for Developing Deep Space Autonomous Spacecraft and Simulators
Agia, Christopher, Vila, Guillem Casadesus, Bandyopadhyay, Saptarshi, Bayard, David S., Cheung, Kar-Ming, Lee, Charles H., Wood, Eric, Aenishanslin, Ian, Ardito, Steven, Fesq, Lorraine, Pavone, Marco, Nesnas, Issa A. D.
To extend the limited scope of autonomy used in prior missions for operation in distant and complex environments, there is a need to further develop and mature autonomy that jointly reasons over multiple subsystems, which we term system-level autonomy. System-level autonomy establishes situational awareness that resolves conflicting information across subsystems, which may necessitate the refinement and interconnection of the underlying spacecraft and environment onboard models. However, with a limited understanding of the assumptions and tradeoffs of modeling to arbitrary extents, designing onboard models to support system-level capabilities presents a significant challenge. In this paper, we provide a detailed analysis of the increasing levels of model fidelity for several key spacecraft subsystems, with the goal of informing future spacecraft functional- and system-level autonomy algorithms and the physics-based simulators on which they are validated. We do not argue for the adoption of a particular fidelity class of models but, instead, highlight the potential tradeoffs and opportunities associated with the use of models for onboard autonomy and in physics-based simulators at various fidelity levels. We ground our analysis in the context of deep space exploration of small bodies, an emerging frontier for autonomous spacecraft operation in space, where the choice of models employed onboard the spacecraft may determine mission success. We conduct our experiments in the Multi-Spacecraft Concept and Autonomy Tool (MuSCAT), a software suite for developing spacecraft autonomy algorithms.