Goto

Collaborating Authors

 Generative AI


Seeing the Forest and the Trees: Solving Visual Graph and Tree Based Data Structure Problems using Large Multimodal Models

arXiv.org Artificial Intelligence

Recent advancements in generative AI systems have raised concerns about academic integrity among educators. Beyond excelling at solving programming problems and text-based multiple-choice questions, recent research has also found that large multimodal models (LMMs) can solve Parsons problems based only on an image. However, such problems are still inherently text-based and rely on the capabilities of the models to convert the images of code blocks to their corresponding text. In this paper, we further investigate the capabilities of LMMs to solve graph and tree data structure problems based only on images. To achieve this, we computationally construct and evaluate a novel benchmark dataset comprising 9,072 samples of diverse graph and tree data structure tasks to assess the performance of the GPT-4o, GPT-4v, Gemini 1.5 Pro, Gemini 1.5 Flash, Gemini 1.0 Pro Vision, and Claude 3 model families. GPT-4o and Gemini 1.5 Flash performed best on trees and graphs respectively. GPT-4o achieved 87.6% accuracy on tree samples, while Gemini 1.5 Flash, achieved 56.2% accuracy on graph samples. Our findings highlight the influence of structural and visual variations on model performance. This research not only introduces an LMM benchmark to facilitate replication and further exploration but also underscores the potential of LMMs in solving complex computing problems, with important implications for pedagogy and assessment practices.


Now Meta is trying to stop OpenAI's for-profit conversion too

Engadget

Meta sent a letter to California's attorney general on Thursday urging him to stop OpenAI from converting to a for-profit company, a move that Meta says would be "wrong" and "could lead to a proliferation of similar start-up ventures that are notionally charitable until they are potentially profitable." The letter from Meta Platforms to Attorney General Rob Bonta, first reported on by The Wall Street Journal, comes on the heels of an injunction filed by Elon Musk at the end of November that also asked for OpenAI's conversion to be blocked. Meta argues in its letter, which The Verge has published in full, that OpenAI was able to raise billions of dollars from investors under its original nonprofit mission and now "wants to change its status while retaining all of the benefits that enabled it to reach the point it has today." It goes on to say, "OpenAI should not be allowed to flout the law by taking and reappropriating assets it built as a charity and using them for potentially enormous private gains." The letter also calls upon the attorney general to look into OpenAI's past practices as a nonprofit.


OpenAI whistleblower found dead in San Francisco apartment

BBC News

OpenAI says its models are "trained on publicly available data". Mr Balaji left the company in August, telling the New York Times he had since been working on personal projects. He grew up in Cupertino, California, before going to study computer science at the University of California, Berkeley. A spokesperson for OpenAI said in a statement cited by CNBC News that it was "devastated to learn of this incredibly sad news today and our hearts go out to Suchir's loved ones during this difficult time". US and Canadian news publishers, including the New York Times, and a group of best-selling writers, including John Grisham, have filed lawsuits claiming the company was illegally using news articles to train its software.


OpenAI whistleblower found dead in San Francisco apartment from apparent suicide

FOX News

Fox News Flash top headlines are here. Check out what's clicking on Foxnews.com. If you or someone you know is having thoughts of suicide, please contact the Suicide & Crisis Lifeline at 988 or 1-800-273-TALK (8255). A former OpenAI employee and whistleblower, Suchir Balaji, was recently found dead in his apartment in San Francisco, California. The San Francisco Office of the Chief Medical Examiner has identified Balaji, 26, as the deceased person, according to the San Jose Mercury News.


Big Tech's new AI obsession: 'Agents' that do your work for you

The Japan Times

If you're just getting up to speed on chatbots and copilots, you're already falling behind. Talk in Silicon Valley now is squarely focused on "agents" -- artificial intelligence that can handle multistep chores like onboarding clients, approving expenses and not just routing but actually responding to customer-service requests, all with minimal human supervision. OpenAI CEO Sam Altman calls agents "the next giant breakthrough." Salesforce has already signed deals to install AI agents at more than 200 companies including Accenture, Adecco Group, FedEx, International Business Machines, and RBC Wealth Management. "We're really at the edge of a revolutionary transformation," Salesforce CEO Marc Benioff said on the software company's most recent earnings call.


Advancing Vehicle Plate Recognition: Multitasking Visual Language Models with VehiclePaliGemma

arXiv.org Artificial Intelligence

License plate recognition (LPR) involves automated systems that utilize cameras and computer vision to read vehicle license plates. Such plates collected through LPR can then be compared against databases to identify stolen vehicles, uninsured drivers, crime suspects, and more. The LPR system plays a significant role in saving time for institutions such as the police force. In the past, LPR relied heavily on Optical Character Recognition (OCR), which has been widely explored to recognize characters in images. Usually, collected plate images suffer from various limitations, including noise, blurring, weather conditions, and close characters, making the recognition complex. Existing LPR methods still require significant improvement, especially for distorted images. To fill this gap, we propose utilizing visual language models (VLMs) such as OpenAI GPT4o, Google Gemini 1.5, Google PaliGemma (Pathways Language and Image model + Gemma model), Meta Llama 3.2, Anthropic Claude 3.5 Sonnet, LLaVA, NVIDIA VILA, and moondream2 to recognize such unclear plates with close characters. This paper evaluates the VLM's capability to address the aforementioned problems. Additionally, we introduce ``VehiclePaliGemma'', a fine-tuned Open-sourced PaliGemma VLM designed to recognize plates under challenging conditions. We compared our proposed VehiclePaliGemma with state-of-the-art methods and other VLMs using a dataset of Malaysian license plates collected under complex conditions. The results indicate that VehiclePaliGemma achieved superior performance with an accuracy of 87.6\%. Moreover, it is able to predict the car's plate at a speed of 7 frames per second using A100-80GB GPU. Finally, we explored the multitasking capability of VehiclePaliGemma model to accurately identify plates containing multiple cars of various models and colors, with plates positioned and oriented in different directions.


Superhuman performance of a large language model on the reasoning tasks of a physician

arXiv.org Artificial Intelligence

Performance of large language models (LLMs) on medical tasks has traditionally been evaluated using multiple choice question benchmarks. However, such benchmarks are highly constrained, saturated with repeated impressive performance by LLMs, and have an unclear relationship to performance in real clinical scenarios. Clinical reasoning, the process by which physicians employ critical thinking to gather and synthesize clinical data to diagnose and manage medical problems, remains an attractive benchmark for model performance. Prior LLMs have shown promise in outperforming clinicians in routine and complex diagnostic scenarios. We sought to evaluate OpenAI's o1-preview model, a model developed to increase run-time via chain of thought processes prior to generating a response. We characterize the performance of o1-preview with five experiments including differential diagnosis generation, display of diagnostic reasoning, triage differential diagnosis, probabilistic reasoning, and management reasoning, adjudicated by physician experts with validated psychometrics. Our primary outcome was comparison of the o1-preview output to identical prior experiments that have historical human controls and benchmarks of previous LLMs. Significant improvements were observed with differential diagnosis generation and quality of diagnostic and management reasoning. No improvements were observed with probabilistic reasoning or triage differential diagnosis. This study highlights o1-preview's ability to perform strongly on tasks that require complex critical thinking such as diagnosis and management while its performance on probabilistic reasoning tasks was similar to past models. New robust benchmarks and scalable evaluation of LLM capabilities compared to human physicians are needed along with trials evaluating AI in real clinical settings.


Optimizing AI-Assisted Code Generation

arXiv.org Artificial Intelligence

In recent years, the rise of AI-assisted code-generation tools has significantly transformed software development. While code generators have mainly been used to support conventional software development, their use will be extended to powerful and secure AI systems. Systems capable of generating code, such as ChatGPT, OpenAI Codex, GitHub Copilot, and AlphaCode, take advantage of advances in machine learning (ML) and natural language processing (NLP) enabled by large language models (LLMs). However, it must be borne in mind that these models work probabilistically, which means that although they can generate complex code from natural language input, there is no guarantee for the functionality and security of the generated code. However, to fully exploit the considerable potential of this technology, the security, reliability, functionality, and quality of the generated code must be guaranteed. This paper examines the implementation of these goals to date and explores strategies to optimize them. In addition, we explore how these systems can be optimized to create safe, high-performance, and executable artificial intelligence (AI) models, and consider how to improve their accessibility to make AI development more inclusive and equitable.


Generative AI: A Pix2pix-GAN-Based Machine Learning Approach for Robust and Efficient Lung Segmentation

arXiv.org Artificial Intelligence

Chest radiography is climacteric in identifying different pulmonary diseases, yet radiologist workload and inefficiency can lead to misdiagnoses. Automatic, accurate, and efficient segmentation of lung from X-ray images of chest is paramount for early disease detection. This study develops a deep learning framework using a Pix2pix Generative Adversarial Network (GAN) to segment pulmonary abnormalities from CXR images. This framework's image preprocessing and augmentation techniques were properly incorporated with a U-Net-inspired generator-discriminator architecture. Initially, it loaded the CXR images and manual masks from the Montgomery and Shenzhen datasets, after which preprocessing and resizing were performed. A U-Net generator is applied to the processed CXR images that yield segmented masks; then, a Discriminator Network differentiates between the generated and real masks. Montgomery dataset served as the model's training set in the study, and the Shenzhen dataset was used to test its robustness, which was used here for the first time. An adversarial loss and an L1 distance were used to optimize the model in training. All metrics, which assess precision, recall, F1 score, and Dice coefficient, prove the effectiveness of this framework in pulmonary abnormality segmentation. It, therefore, sets the basis for future studies to be performed shortly using diverse datasets that could further confirm its clinical applicability in medical imaging.


Generative Modeling with Diffusion

arXiv.org Machine Learning

We introduce the diffusion model as a method to generate new samples. Generative models have been recently adopted for tasks such as art generation (Stable Diffusion, Dall-E) and text generation (ChatGPT). Diffusion models in particular apply noise to sample data and then "reverse" this noising process to generate new samples. We will formally define the noising and denoising processes, then introduce algorithms to train and generate with a diffusion model. Finally, we will explore a potential application of diffusion models in improving classifier performance on imbalanced data.