AITopics | Generative AI

Collaborating Authors

Generative AI

News Overviews Instructional Materials AI-Alerts Classics

AIhub monthly digest: October 2024 – Nobel Prizes, the AI Song Contest, and towards safe and reliable AI agents

AIHubOct-29-2024, 10:07:00 GMT

Welcome to our monthly digest, where you can catch up with any AIhub stories you may have missed, peruse the latest news, recap recent events, and more. This month, we learn about research towards safe and reliable AI agent behaviour, discuss generative AI hype, congratulate the Nobel Prize winners in physics and chemistry, and take a tour of recent conferences. In the latest in our series of interviews featuring the AAAI/ACM SIGAI doctoral consortium participants, we heard from Pulkit Verma about his research on safe and reliable behavior of AI agents. He is currently investigating the minimal set of requirements in an AI system that would enable a user to assess and understand the limits of its safe operability. There has been a string of articles recently about the end of generative AI hype.

ai song contest, monthly digest, nobel prize, (6 more...)

AIHub

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.16)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
North America > United States > California > Santa Clara County > San Jose (0.05)
(2 more...)

Genre: Personal > Honors > Award (0.36)

Industry:

Media > Music (0.41)
Leisure & Entertainment (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)

Add feedback

'I readily prefer this one': What I learned testing Copilot Pro and ChatGPT Plus side by side

PCWorldOct-29-2024, 10:00:00 GMT

It feels like artificial intelligence is in everything these days -- TVs, laptops, phones, websites, even PDF editors -- and that big boom can be attributed to the wild success of OpenAI's ChatGPT. Microsoft is doing its best to compete with its own Copilot AI chatbot, and both ChatGPT and Copilot are pretty good at providing answers, generating text and images, and holding conversations. Most importantly, they're both free to use. But ChatGPT and Copilot both offer paid plans in the form of ChatGPT Plus and Copilot Pro, respectively. Why would you pay for them when they're freely available?

chatgpt plus, copilot, microsoft 365, (13 more...)

PCWorld

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.36)

Add feedback

Generative AI Enabled Matching for 6G Multiple Access

Wang, Xudong, Du, Hongyang, Niyato, Dusit, Zhou, Lijie, Feng, Lei, Yang, Zhixiang, Zhou, Fanqin, Li, Wenjing

arXiv.org Artificial IntelligenceOct-29-2024

In wireless networks, applying deep learning models to solve matching problems between different entities has become a mainstream and effective approach. However, the complex network topology in 6G multiple access presents significant challenges for the real-time performance and stability of matching generation. Generative artificial intelligence (GenAI) has demonstrated strong capabilities in graph feature extraction, exploration, and generation, offering potential for graph-structured matching generation. In this paper, we propose a GenAI-enabled matching generation framework to support 6G multiple access. Specifically, we first summarize the classical matching theory, discuss common GenAI models and applications from the perspective of matching generation. Then, we propose a framework based on generative diffusion models (GDMs) that iteratively denoises toward reward maximization to generate a matching strategy that meets specific requirements. Experimental results show that, compared to decision-based AI approaches, our framework can generate more effective matching strategies based on given conditions and predefined rewards, helping to solve complex problems in 6G multiple access, such as task allocation.

application, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2411.04137

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Beijing > Beijing (0.04)
Europe > Greece (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.84)

Industry: Health & Medicine (0.97)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.85)

Add feedback

A Novel Psychometrics-Based Approach to Developing Professional Competency Benchmark for Large Language Models

Kardanova, Elena, Ivanova, Alina, Tarasova, Ksenia, Pashchenko, Taras, Tikhoniuk, Aleksei, Yusupova, Elen, Kasprzhak, Anatoly, Kuzminov, Yaroslav, Kruchinskaia, Ekaterina, Brun, Irina

arXiv.org Artificial IntelligenceOct-29-2024

The era of large language models (LLM) raises questions not only about how to train models, but also about how to evaluate them. Despite numerous existing benchmarks, insufficient attention is often given to creating assessments that test LLMs in a valid and reliable manner. To address this challenge, we accommodate the Evidence-centered design (ECD) methodology and propose a comprehensive approach to benchmark development based on rigorous psychometric principles. In this paper, we have made the first attempt to illustrate this approach by creating a new benchmark in the field of pedagogy and education, highlighting the limitations of existing benchmark development approach and taking into account the development of LLMs. We conclude that a new approach to benchmarking is required to match the growing complexity of AI applications in the educational context. We construct a novel benchmark guided by the Bloom's taxonomy and rigorously designed by a consortium of education experts trained in test development. Thus the current benchmark provides an academically robust and practical assessment tool tailored for LLMs, rather than human participants. Tested empirically on the GPT model in the Russian language, it evaluates model performance across varied task complexities, revealing critical gaps in current LLM capabilities. Our results indicate that while generative AI tools hold significant promise for education - potentially supporting tasks such as personalized tutoring, real-time feedback, and multilingual learning - their reliability as autonomous teachers' assistants right now remain rather limited, particularly in tasks requiring deeper cognitive engagement.

benchmark, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.00045

Country:

Asia > Russia (0.14)
North America > United States > New York (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)

Genre:

Instructional Material (1.00)
Research Report > New Finding (0.66)

Industry:

Education > Educational Setting (1.00)
Education > Assessment & Standards (1.00)
Education > Curriculum > Subject-Specific Education (0.93)
Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Add feedback

CurateGPT: A flexible language-model assisted biocuration tool

Caufield, Harry, Kroll, Carlo, O'Neil, Shawn T, Reese, Justin T, Joachimiak, Marcin P, Hegde, Harshad, Harris, Nomi L, Krishnamurthy, Madan, McLaughlin, James A, Smedley, Damian, Haendel, Melissa A, Robinson, Peter N, Mungall, Christopher J

arXiv.org Artificial IntelligenceOct-29-2024

Effective data-driven biomedical discovery requires data curation: a time-consuming process of finding, organizing, distilling, integrating, interpreting, annotating, and validating diverse information into a structured form suitable for databases and knowledge bases. Accurate and efficient curation of these digital assets is critical to ensuring that they are FAIR, trustworthy, and sustainable. Unfortunately, expert curators face significant time and resource constraints. The rapid pace of new information being published daily is exceeding their capacity for curation. Generative AI, exemplified by instruction-tuned large language models (LLMs), has opened up new possibilities for assisting human-driven curation. The design philosophy of agents combines the emerging abilities of generative AI with more precise methods. A curator's tasks can be aided by agents for performing reasoning, searching ontologies, and integrating knowledge across external sources, all efforts otherwise requiring extensive manual effort. Our LLM-driven annotation tool, CurateGPT, melds the power of generative AI together with trusted knowledge bases and literature sources. CurateGPT streamlines the curation process, enhancing collaboration and efficiency in common workflows. Compared to direct interaction with an LLM, CurateGPT's agents enable access to information beyond that in the LLM's training data and they provide direct links to the data supporting each claim. This helps curators, researchers, and engineers scale up curation efforts to keep pace with the ever-increasing volume of scientific data.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2411.00046

Country:

Europe > Germany > Berlin (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.89)

Add feedback

Can Knowledge Editing Really Correct Hallucinations?

Huang, Baixiang, Chen, Canyu, Xu, Xiongxiao, Payani, Ali, Shu, Kai

arXiv.org Artificial IntelligenceOct-29-2024

Large Language Models (LLMs) suffer from hallucinations, referring to the nonfactual information in generated content, despite their superior capacities across tasks. Meanwhile, knowledge editing has been developed as a new popular paradigm to correct the erroneous factual knowledge encoded in LLMs with the advantage of avoiding retraining from scratch. However, one common issue of existing evaluation datasets for knowledge editing is that they do not ensure LLMs actually generate hallucinated answers to the evaluation questions before editing. When LLMs are evaluated on such datasets after being edited by different techniques, it is hard to directly adopt the performance to assess the effectiveness of different knowledge editing methods in correcting hallucinations. Thus, the fundamental question remains insufficiently validated: Can knowledge editing really correct hallucinations in LLMs? We proposed HalluEditBench to holistically benchmark knowledge editing methods in correcting real-world hallucinations. First, we rigorously construct a massive hallucination dataset with 9 domains, 26 topics and more than 6, 000 hallucinations. Then, we assess the performance of knowledge editing methods in a holistic way on five dimensions including Efficacy, Generalization, Portability, Locality, and Robustness. Through HalluEditBench, we have provided new insights into the potentials and limitations of different knowledge editing methods in correcting hallucinations, which could inspire future improvements and facilitate the progress in the field of knowledge editing. Considering Table 1: Performance measured by Accuracy (%) the high cost of retraining LLMs from scratch, of Llama2-7B before editing ("Pre-edit") and after knowledge editing has been designed as a new applying typical knowledge editing methods ("Postedit") paradigm to correct erroneous or outdated factual on common existing evaluation datasets. When such datasets are adopted to evaluate the performance of LLMs after being edited, it is hard to directly use the scores to judge the effectiveness of different knowledge editing techniques in correcting hallucinations, which is the motivation of applying knowledge editing to LLMs. To better illustrate this point, following the evaluation setting in (Zhang et al., 2024e), we conducted a preliminary study to examine the pre-edit and post-edit performances of Llama2-7B on the aforementioned Who is the Chief Scientist of OpenAI? Who is the Chief Scientist of OpenAI? Who is the Chief Scientist of OpenAI?

arxiv preprint, editing, knowledge editing, (14 more...)

arXiv.org Artificial Intelligence

2410.16251

Country:

North America > Canada (0.04)
Europe > Poland (0.04)
South America > Venezuela > Gulf of Paria (0.04)
(12 more...)

Genre:

Overview (0.67)
Research Report (0.56)

Industry: Health & Medicine > Therapeutic Area (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.75)

Add feedback

OpenAI's Whisper invents parts of transcriptions -- a lot

EngadgetOct-28-2024, 12:00:39 GMT

Imagine going to the doctor, telling them exactly how you're feeling and then a transcription later adds false information and alters your story. That could be the case in medical centers that use Whisper, OpenAI's transcription tool. Over a dozen developers, software engineers and academic researchers have found evidence that Whisper creates hallucinations -- invented text -- that includes made up medications, racial commentary and violent remarks, ABC News reports. Yet, in the last month, open-source AI platform HuggingFace saw 4.2 million downloads of Whisper's latest version. The tool is also built into Oracle and Microsoft's cloud computing platforms, along with some versions of ChatGPT.

large language model, machine learning, natural language, (10 more...)

Engadget

Country:

North America > United States > Virginia (0.06)
North America > United States > Michigan (0.06)

Industry: Health & Medicine > Health Care Providers & Services (0.74)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.62)

Add feedback

Large Language Models for Manufacturing

Li, Yiwei, Zhao, Huaqin, Jiang, Hanqi, Pan, Yi, Liu, Zhengliang, Wu, Zihao, Shu, Peng, Tian, Jie, Yang, Tianze, Xu, Shaochen, Lyu, Yanjun, Blenk, Parker, Pence, Jacob, Rupram, Jason, Banu, Eliza, Liu, Ninghao, Wang, Linbing, Song, Wenzhan, Zhai, Xiaoming, Song, Kenan, Zhu, Dajiang, Li, Beiwen, Wang, Xianqiao, Liu, Tianming

arXiv.org Artificial IntelligenceOct-28-2024

The rapid advances in Large Language Models (LLMs) have the potential to transform manufacturing industry, offering new opportunities to optimize processes, improve efficiency, and drive innovation. This paper provides a comprehensive exploration of the integration of LLMs into the manufacturing domain, focusing on their potential to automate and enhance various aspects of manufacturing, from product design and development to quality control, supply chain optimization, and talent management. Through extensive evaluations across multiple manufacturing tasks, we demonstrate the remarkable capabilities of state-of-the-art LLMs, such as GPT-4V, in understanding and executing complex instructions, extracting valuable insights from vast amounts of data, and facilitating knowledge sharing. We also delve into the transformative potential of LLMs in reshaping manufacturing education, automating coding processes, enhancing robot control systems, and enabling the creation of immersive, data-rich virtual environments through the industrial metaverse. By highlighting the practical applications and emerging use cases of LLMs in manufacturing, this paper aims to provide a valuable resource for professionals, researchers, and decision-makers seeking to harness the power of these technologies to address real-world challenges, drive operational excellence, and unlock sustainable growth in an increasingly competitive landscape.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.21418

Country: North America > United States (1.00)

Genre:

Overview (1.00)
Instructional Material (1.00)
Workflow (0.93)
Research Report > Promising Solution (0.92)

Industry:

Semiconductors & Electronics (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)
(10 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.68)

Add feedback

CaloChallenge 2022: A Community Challenge for Fast Calorimeter Simulation

Krause, Claudius, Giannelli, Michele Faucci, Kasieczka, Gregor, Nachman, Benjamin, Salamani, Dalila, Shih, David, Zaborowska, Anna, Amram, Oz, Borras, Kerstin, Buckley, Matthew R., Buhmann, Erik, Buss, Thorsten, Cardoso, Renato Paulo Da Costa, Caterini, Anthony L., Chernyavskaya, Nadezda, Corchia, Federico A. G., Cresswell, Jesse C., Diefenbacher, Sascha, Dreyer, Etienne, Ekambaram, Vijay, Eren, Engin, Ernst, Florian, Favaro, Luigi, Franchini, Matteo, Gaede, Frank, Gross, Eilam, Hsu, Shih-Chieh, Jaruskova, Kristina, Käch, Benno, Kalagnanam, Jayant, Kansal, Raghav, Kim, Taewoo, Kobylianskii, Dmitrii, Korol, Anatolii, Korcari, William, Krücker, Dirk, Krüger, Katja, Letizia, Marco, Li, Shu, Liu, Qibin, Liu, Xiulong, Loaiza-Ganem, Gabriel, Madula, Thandikire, McKeown, Peter, Melzer-Pellmann, Isabell-A., Mikuni, Vinicius, Nguyen, Nam, Ore, Ayodele, Schweitzer, Sofia Palacios, Pang, Ian, Pedro, Kevin, Plehn, Tilman, Pokorski, Witold, Qu, Huilin, Raikwar, Piyush, Raine, John A., Reyes-Gonzalez, Humberto, Rinaldi, Lorenzo, Ross, Brendan Leigh, Scham, Moritz A. W., Schnake, Simon, Shimmin, Chase, Shlizerman, Eli, Soybelman, Nathalie, Srivatsa, Mudhakar, Tsolaki, Kalliopi, Vallecorsa, Sofia, Yeo, Kyongmin, Zhang, Rui

arXiv.org Artificial IntelligenceOct-28-2024

We present the results of the "Fast Calorimeter Simulation Challenge 2022" -- the CaloChallenge. We study state-of-the-art generative models on four calorimeter shower datasets of increasing dimensionality, ranging from a few hundred voxels to a few tens of thousand voxels. The 31 individual submissions span a wide range of current popular generative architectures, including Variational AutoEncoders (VAEs), Generative Adversarial Networks (GANs), Normalizing Flows, Diffusion models, and models based on Conditional Flow Matching. We compare all submissions in terms of quality of generated calorimeter showers, as well as shower generation time and model size. To assess the quality we use a broad range of different metrics including differences in 1-dimensional histograms of observables, KPD/FPD scores, AUCs of binary classifiers, and the log-posterior of a multiclass classifier. The results of the CaloChallenge provide the most complete and comprehensive survey of cutting-edge approaches to calorimeter fast simulation to date. In addition, our work provides a uniquely detailed perspective on the important problem of how to evaluate generative models. As such, the results presented here should be applicable for other domains that use generative AI and require fast and faithful generation of samples in a large phase space.

log-posterior multiclass log-posterior, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.21611

Country:

Asia (0.67)
Europe > Germany (0.67)
North America > United States > California (0.45)
North America > United States > Wisconsin (0.27)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.87)

Industry:

Education (1.00)
Government > Regional Government (0.67)
Energy > Oil & Gas > Upstream (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Auto-assessment of assessment: A conceptual framework towards fulfilling the policy gaps in academic assessment practices

Khan, Wasiq, Topham, Luke K., Atherton, Peter, Al-Shabandar, Raghad, Kolivand, Hoshang, Khan, Iftikhar, Hussain, Abir

arXiv.org Artificial IntelligenceOct-28-2024

Education is being transformed by rapid advances in Artificial Intelligence (AI), including emerging Generative Artificial Intelligence (GAI). Such technology can significantly support academics and students by automating monotonous tasks and making personalised suggestions. However, despite the potential of the technology, there are significant concerns regarding AI misuse, particularly by students in assessments. There are two schools of thought: one advocates for a complete ban on it, while the other views it as a valuable educational tool, provided it is governed by a robust usage policy. This contradiction clearly indicates a major policy gap in academic practices, and new policies are required to uphold academic standards while enabling staff and students to benefit from technological advancements. We surveyed 117 academics from three countries (UK, UAE, and Iraq), and identified that most academics retain positive opinions regarding AI in education. For example, the majority of experienced academics do not favour complete bans, and they see the potential benefits of AI for students, teaching staff, and academic institutions. Importantly, academics specifically identified the particular benefits of AI for autonomous assessment (71.79% of respondents agreed). Therefore, for the first time, we propose a novel AI framework for autonomously evaluating students' work (e.g., reports, coursework, etc.) and automatically assigning grades based on their knowledge and in-depth understanding of the submitted content. The survey results further highlight a significant lack of awareness of modern AI-based tools (e.g., ChatGPT) among experienced academics, a gap that must be addressed to uphold educational standards.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.08892

Country:

Asia > Middle East > UAE (0.25)
Asia > Middle East > Iraq (0.25)
North America > United States > Virginia (0.04)
(4 more...)

Genre:

Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.94)

Industry:

Education > Educational Technology > Educational Software (0.68)
Education > Educational Setting > Online (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback