AITopics

2407.11987

Genre: Research Report > New Finding (0.35)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

arXiv.org Artificial IntelligenceJun-4-2024

RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

Nasiriany, Soroush, Maddukuri, Abhiram, Zhang, Lance, Parikh, Adeet, Lo, Aaron, Joshi, Abhishek, Mandlekar, Ajay, Zhu, Yuke

Recent advancements in Artificial Intelligence (AI) have largely been propelled by scaling. In Robotics, scaling is hindered by the lack of access to massive robot datasets. We advocate using realistic physical simulation as a means to scale environments, tasks, and datasets for robot learning methods. We present RoboCasa, a large-scale simulation framework for training generalist robots in everyday environments. RoboCasa features realistic and diverse scenes focusing on kitchen environments. We provide thousands of 3D assets across over 150 object categories and dozens of interactable furniture and appliances. We enrich the realism and diversity of our simulation with generative AI tools, such as object assets from text-to-3D models and environment textures from text-to-image models. We design a set of 100 tasks for systematic evaluation, including composite tasks generated by the guidance of large language models. To facilitate learning, we provide high-quality human demonstrations and integrate automated trajectory generation methods to substantially enlarge our datasets with minimal human burden. Our experiments show a clear scaling trend in using synthetically generated robot data for large-scale imitation learning and show great promise in harnessing simulation data in real-world tasks. Videos and open-source code are available at https://robocasa.ai/

dataset, demonstration, simulation, (16 more...)

2406.02523

Country: North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report (0.64)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Hetz, Martin J., Carl, Nicolas, Haggenmüller, Sarah, Wies, Christoph, Michel, Maurice Stephan, Wessels, Frederik, Brinker, Titus J.

Superhuman performance in urology board questions by an explainable large language model enabled for context integration of the European Association of Urology guidelines: the UroBot study

arXiv.org Artificial IntelligenceJun-4-2024

Large Language Models (LLMs) are revolutionizing medical Question-Answering (medQA) through extensive use of medical literature. However, their performance is often hampered by outdated training data and a lack of explainability, which limits clinical applicability. This study aimed to create and assess UroBot, a urology-specialized chatbot, by comparing it with state-of-the-art models and the performance of urologists on urological board questions, ensuring full clinician-verifiability. UroBot was developed using OpenAI's GPT-3.5, GPT-4, and GPT-4o models, employing retrieval-augmented generation (RAG) and the latest 2023 guidelines from the European Association of Urology (EAU). The evaluation included ten runs of 200 European Board of Urology (EBU) In-Service Assessment (ISA) questions, with performance assessed by the mean Rate of Correct Answers (RoCA). UroBot-4o achieved an average RoCA of 88.4%, surpassing GPT-4o by 10.8%, with a score of 77.6%. It was also clinician-verifiable and exhibited the highest run agreement as indicated by Fleiss' Kappa (k = 0.979). By comparison, the average performance of urologists on board questions, as reported in the literature, is 68.7%. UroBot's clinician-verifiable nature and superior accuracy compared to both existing models and urologists on board questions highlight its potential for clinical integration. The study also provides the necessary code and instructions for further development of UroBot.

european association, superhuman performance, urology board question, (12 more...)

2406.01428

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
North America > United States (0.04)
Europe > Switzerland (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Urology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

arXiv.org Machine LearningJun-4-2024

Guiding a Diffusion Model with a Bad Version of Itself

Karras, Tero, Aittala, Miika, Kynkäänniemi, Tuomas, Lehtinen, Jaakko, Aila, Timo, Laine, Samuli

The primary axes of interest in image-generating diffusion models are image quality, the amount of variation in the results, and how well the results align with a given condition, e.g., a class label or a text prompt. The popular classifier-free guidance approach uses an unconditional model to guide a conditional model, leading to simultaneously better prompt alignment and higher-quality images at the cost of reduced variation. These effects seem inherently entangled, and thus hard to control. We make the surprising observation that it is possible to obtain disentangled control over image quality without compromising the amount of variation by guiding generation using a smaller, less-trained version of the model itself rather than an unconditional model. This leads to significant improvements in ImageNet generation, setting record FIDs of 1.01 for 64 64 and 1.25 for 512 512, using publicly available networks. Furthermore, the method is also applicable to unconditional diffusion models, drastically improving their quality.

diffusion model, guidance, proc, (16 more...)

arXiv.org Machine Learning

2406.02507

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

McCormack, Jon, Wilson, Elliott, Rajcic, Nina, Llano, Maria Teresa

Mimetic Poet

This paper presents the design and initial assessment of a novel device that uses generative AI to facilitate creative ideation, inspiration, and reflective thought. Inspired by magnetic poetry, which was originally designed to help overcome writer's block, the device allows participants to compose short poetic texts from a limited vocabulary by physically placing words on the device's surface. Upon composing the text, the system employs a large language model (LLM) to generate a response, displayed on an e-ink screen. We explored various strategies for internally sequencing prompts to foster creative thinking, including analogy, allegorical interpretations, and ideation. We installed the device in our research laboratory for two weeks and held a focus group at the conclusion to evaluate the design. The design choice to limit interactions with the LLM to poetic text, coupled with the tactile experience of assembling the poem, fostered a deeper and more enjoyable engagement with the LLM compared to traditional chatbot or screen-based interactions. This approach gives users the opportunity to reflect on the AI-generated responses in a manner conducive to creative thought.

interaction, participant, poem, (14 more...)

2407.11984

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Greater London > London (0.14)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

The Life Cycle of Large Language Models: A Review of Biases in Education

Lee, Jinsook, Hicke, Yann, Yu, Renzhe, Brooks, Christopher, Kizilcec, René F.

Large Language Models (LLMs) are increasingly adopted in educational contexts to provide personalized support to students and teachers. The unprecedented capacity of LLM-based applications to understand and generate natural language can potentially improve instructional effectiveness and learning outcomes, but the integration of LLMs in education technology has renewed concerns over algorithmic bias which may exacerbate educational inequities. In this review, building on prior work on mapping the traditional machine learning life cycle, we provide a holistic map of the LLM life cycle from the initial development of LLMs to customizing pre-trained models for various applications in educational settings. We explain each step in the LLM life cycle and identify potential sources of bias that may arise in the context of education. We discuss why current measures of bias from traditional machine learning fail to transfer to LLM-generated content in education, such as tutoring conversations because the text is high-dimensional, there can be multiple correct responses, and tailoring responses may be pedagogically desirable rather than unfair. This review aims to clarify the complex nature of bias in LLM applications and provide practical guidance for their evaluation to promote educational equity.

arxiv preprint arxiv, language model, proceedings, (12 more...)

2407.11203

Country:

North America > Central America (0.04)
South America > Uruguay > Maldonado > Maldonado (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(10 more...)

Genre:

Overview (1.00)
Instructional Material > Online (0.67)
Instructional Material > Course Syllabus & Notes (0.67)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (1.00)
Education > Curriculum > Subject-Specific Education (1.00)
Education > Assessment & Standards > Student Performance (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

White, Matt, Haddad, Ibrahim, Osborne, Cailean, Yanglet, Xiao-Yang Liu, Abdelmonsef, Ahmed, Varghese, Sachin

The Model Openness Framework: Promoting Completeness and Openness for Reproducibility, Transparency, and Usability in Artificial Intelligence

Generative AI (GAI) offers unprecedented opportunities for research and innovation, but its commercialization has raised concerns about transparency, reproducibility, and safety. Many open GAI models lack the necessary components for full understanding and reproducibility, and some use restrictive licenses whilst claiming to be ``open-source''. To address these concerns, we propose the Model Openness Framework (MOF), a ranked classification system that rates machine learning models based on their completeness and openness, following principles of open science, open source, open data, and open access. The MOF requires specific components of the model development lifecycle to be included and released under appropriate open licenses. This framework aims to prevent misrepresentation of models claiming to be open, guide researchers and developers in providing all model components under permissive licenses, and help individuals and organizations identify models that can be safely adopted without restrictions. By promoting transparency and reproducibility, the MOF combats ``openwashing'' practices and establishes completeness and openness as primary criteria alongside the core tenets of responsible AI. Wide adoption of the MOF will foster a more open AI ecosystem, benefiting research, innovation, and adoption of state-of-the-art models.

license, model openness framework white, model producer, (12 more...)

2403.13784

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report > Promising Solution (0.48)

Industry:

Law (1.00)
Government (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.49)

Jones, Keenan, Zahrah, Fatima, Nurse, Jason R. C.

Embedding Privacy in Computational Social Science and Artificial Intelligence Research

Privacy is a human right. It ensures that individuals are free to engage in discussions, participate in groups, and form relationships online or offline without fear of their data being inappropriately harvested, analyzed, or otherwise used to harm them. Preserving privacy has emerged as a critical factor in research, particularly in the computational social science (CSS), artificial intelligence (AI) and data science domains, given their reliance on individuals' data for novel insights. The increasing use of advanced computational models stands to exacerbate privacy concerns because, if inappropriately used, they can quickly infringe privacy rights and lead to adverse effects for individuals -- especially vulnerable groups -- and society. We have already witnessed a host of privacy issues emerge with the advent of large language models (LLMs), such as ChatGPT, which further demonstrate the importance of embedding privacy from the start. This article contributes to the field by discussing the role of privacy and the issues that researchers working in CSS, AI, data science and related domains are likely to face. It then presents several key considerations for researchers to ensure participant privacy is best preserved in their research design, data collection and use, analysis, and dissemination of research results.

dataset, information, privacy, (13 more...)

doi: 10.36190/2024.18

2404.11515

Country:

Europe > Italy (0.04)
South America > Brazil (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)

von Däniken, Pius, Deriu, Jan, Tuggener, Don, Cieliebak, Mark

Favi-Score: A Measure for Favoritism in Automated Preference Ratings for Generative AI Evaluation

Generative AI systems have become ubiquitous for all kinds of modalities, which makes the issue of the evaluation of such models more pressing. One popular approach is preference ratings, where the generated outputs of different systems are shown to evaluators who choose their preferences. In recent years the field shifted towards the development of automated (trained) metrics to assess generated outputs, which can be used to create preference ratings automatically. In this work, we investigate the evaluation of the metrics themselves, which currently rely on measuring the correlation to human judgments or computing sign accuracy scores. These measures only assess how well the metric agrees with the human ratings. However, our research shows that this does not tell the whole story. Most metrics exhibit a disagreement with human system assessments which is often skewed in favor of particular text generation systems, exposing a degree of favoritism in automated metrics. This paper introduces a formal definition of favoritism in preference metrics, and derives the Favi-Score, which measures this phenomenon. In particular we show that favoritism is strongly related to errors in final system rankings. Thus, we propose that preference-based metrics ought to be evaluated on both sign accuracy scores and favoritism.

computational linguistic, evaluation, favi-score, (13 more...)

2406.01131

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(10 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.70)

Han, Xizewen, Zhou, Mingyuan

Diffusion Boosted Trees

arXiv.org Machine LearningJun-3-2024

A series of pivotal works in recent years (Song and Ermon, 2019; Ho et al., 2020; Song et al., 2021; Dhariwal and Nichol, 2021; Rombach et al., 2022; Karras et al., 2022) has propelled diffusion-based generative models (Sohl-Dickstein et al., 2015) to the forefront of generative AI, capturing a significant amount of academic and industrial interest by the success of this class of models in content generation. Meanwhile, another line of work, Classification and Regression Diffusion Models (CARD) (Han et al., 2022), has been proposed to tackle supervised learning problems with a denoising diffusion probabilistic modeling framework, shedding new lights on both the foundational machine learning paradigm and the new elite in the generative AI family. More specifically, CARD learns the target conditional distribution of the response variable y given the covariates x, p(y | x), without imposing explicit parametric assumptions on its probability density function, and makes predictions by utilizing the stochastic nature of its output to directly generate samples that resemble y from this target distribution. This framework has demonstrated outstanding results on both regression and image classification tasks: in regression, it shows the capability of modeling conditional distributions with flexible statistical attributes, and achieves state-of-the-art metrics on real-world datasets; for image classification, it introduces a novel paradigm to evaluate instance-level prediction confidence besides improving the prediction accuracy by a deterministic classifier. However, CARD models are parameterized by deep neural networks. The work of Grinsztajn et al. (2022) has illustrated that tree-based models remain the state-of-the-art function choice for modeling tabular data, and could outperform neural networks by a wide margin. Tabular data is a crucial type of dataset for many supervised learning tasks, characterized by its table-format structure similar to a spreadsheet or a relational database, where each row represents an individual record or observation, and each column represents a feature or attribute of that record.

gradient, proceedings, timestep, (13 more...)

arXiv.org Machine Learning

2406.01813

Country:

North America > United States > Texas > Travis County > Austin (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.54)