AITopics | D'Souza, Jennifer

Collaborating Authors

D'Souza, Jennifer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment

Giglou, Hamed Babaei, D'Souza, Jennifer, Karras, Oliver, Auer, Sören

arXiv.org Artificial IntelligenceMar-27-2025

Ontology Alignment (OA) is fundamental for achieving semantic interoperability across diverse knowledge systems. We present OntoAligner, a comprehensive, modular, and robust Python toolkit for ontology alignment, designed to address current limitations with existing tools faced by practitioners. Existing tools are limited in scalability, modularity, and ease of integration with recent AI advances. OntoAligner provides a flexible architecture integrating existing lightweight OA techniques such as fuzzy matching but goes beyond by supporting contemporary methods with retrieval-augmented generation and large language models for OA. The framework prioritizes extensibility, enabling researchers to integrate custom alignment algorithms and datasets. This paper details the design principles, architecture, and implementation of the OntoAligner, demonstrating its utility through benchmarks on standard OA tasks. Our evaluation highlights OntoAligner's ability to handle large-scale ontologies efficiently with few lines of code while delivering high alignment quality. By making OntoAligner open-source, we aim to provide a resource that fosters innovation and collaboration within the OA community, empowering researchers and practitioners with a toolkit for reproducible OA research and real-world applications.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.21902

Country: Europe > Germany (0.46)

Genre: Research Report (0.64)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bridging the Evaluation Gap: Leveraging Large Language Models for Topic Model Evaluation

Tan, Zhiyin, D'Souza, Jennifer

arXiv.org Artificial IntelligenceFeb-11-2025

This study presents a framework for automated evaluation of dynamically evolving topic taxonomies in scientific literature using Large Language Models (LLMs). In digital library systems, topic modeling plays a crucial role in efficiently organizing and retrieving scholarly content, guiding researchers through complex knowledge landscapes. As research domains proliferate and shift, traditional human centric and static evaluation methods struggle to maintain relevance. The proposed approach harnesses LLMs to measure key quality dimensions, such as coherence, repetitiveness, diversity, and topic-document alignment, without heavy reliance on expert annotators or narrow statistical metrics. Tailored prompts guide LLM assessments, ensuring consistent and interpretable evaluations across various datasets and modeling techniques. Experiments on benchmark corpora demonstrate the method's robustness, scalability, and adaptability, underscoring its value as a more holistic and dynamic alternative to conventional evaluation strategies.

computational linguistic, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.07352

Country:

Europe (1.00)
North America > United States > New York (0.28)

Genre: Research Report (0.64)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation

Eger, Steffen, Cao, Yong, D'Souza, Jennifer, Geiger, Andreas, Greisinger, Christian, Gross, Stephanie, Hou, Yufang, Krenn, Brigitte, Lauscher, Anne, Li, Yizhi, Lin, Chenghua, Moosavi, Nafise Sadat, Zhao, Wei, Miller, Tristan

arXiv.org Artificial IntelligenceFeb-7-2025

With the advent of large multimodal language models, science is now at a threshold of an AI-based technological transformation. Recently, a plethora of new AI models and tools has been proposed, promising to empower researchers and academics worldwide to conduct their research more effectively and efficiently. This includes all aspects of the research cycle, especially (1) searching for relevant literature; (2) generating research ideas and conducting experimentation; generating (3) text-based and (4) multimodal content (e.g., scientific figures and diagrams); and (5) AI-based automatic peer review. In this survey, we provide an in-depth overview over these exciting recent developments, which promise to fundamentally alter the scientific research process for good. Our survey covers the five aspects outlined above, indicating relevant datasets, methods and results (including evaluation) as well as limitations and scope for future research. Ethical concerns regarding shortcomings of these tools and potential for misuse (fake science, plagiarism, harms to research integrity) take a particularly prominent place in our discussion. We hope that our survey will not only become a reference guide for newcomers to the field but also a catalyst for new AI-based initiatives in the area of "AI4Science".

information retrieval, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2502.05151

Country:

North America > United States > Maryland (0.27)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > United Kingdom > England > Greater Manchester (0.14)
(3 more...)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Government (1.00)
Education (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(9 more...)

Add feedback

Mining for Species, Locations, Habitats, and Ecosystems from Scientific Papers in Invasion Biology: A Large-Scale Exploratory Study with Large Language Models

D'Souza, Jennifer, Laubach, Zachary, Mustafa, Tarek Al, Zarrieß, Sina, Frühstückl, Robert, Illari, Phyllis

arXiv.org Artificial IntelligenceJan-30-2025

This paper presents an exploratory study that harnesses the capabilities of large language models (LLMs) to mine key ecological entities from invasion biology literature. Specifically, we focus on extracting species names, their locations, associated habitats, and ecosystems, information that is critical for understanding species spread, predicting future invasions, and informing conservation efforts. Traditional text mining approaches often struggle with the complexity of ecological terminology and the subtle linguistic patterns found in these texts. By applying general-purpose LLMs without domain-specific fine-tuning, we uncover both the promise and limitations of using these models for ecological entity extraction. In doing so, this study lays the groundwork for more advanced, automated knowledge extraction tools that can aid researchers and practitioners in understanding and managing biological invasions.

ecosystem, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.18287

Country:

Europe (1.00)
Africa (0.68)
North America > United States > Louisiana (0.14)
North America > United States > Colorado (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Reflections from the 2024 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry

Zimmermann, Yoel, Bazgir, Adib, Afzal, Zartashia, Agbere, Fariha, Ai, Qianxiang, Alampara, Nawaf, Al-Feghali, Alexander, Ansari, Mehrad, Antypov, Dmytro, Aswad, Amro, Bai, Jiaru, Baibakova, Viktoriia, Biswajeet, Devi Dutta, Bitzek, Erik, Bocarsly, Joshua D., Borisova, Anna, Bran, Andres M, Brinson, L. Catherine, Calderon, Marcel Moran, Canalicchio, Alessandro, Chen, Victor, Chiang, Yuan, Circi, Defne, Charmes, Benjamin, Chaudhary, Vikrant, Chen, Zizhang, Chiu, Min-Hsueh, Clymo, Judith, Dabhadkar, Kedar, Daelman, Nathan, Datar, Archit, de Jong, Wibe A., Evans, Matthew L., Fard, Maryam Ghazizade, Fisicaro, Giuseppe, Gangan, Abhijeet Sadashiv, George, Janine, Gonzalez, Jose D. Cojal, Götte, Michael, Gupta, Ankur K., Harb, Hassan, Hong, Pengyu, Ibrahim, Abdelrahman, Ilyas, Ahmed, Imran, Alishba, Ishimwe, Kevin, Issa, Ramsey, Jablonka, Kevin Maik, Jones, Colin, Josephson, Tyler R., Juhasz, Greg, Kapoor, Sarthak, Kang, Rongda, Khalighinejad, Ghazal, Khan, Sartaaj, Klawohn, Sascha, Kuman, Suneel, Ladines, Alvin Noe, Leang, Sarom, Lederbauer, Magdalena, Sheng-Lun, null, Liao, null, Liu, Hao, Liu, Xuefeng, Lo, Stanley, Madireddy, Sandeep, Maharana, Piyush Ranjan, Maheshwari, Shagun, Mahjoubi, Soroush, Márquez, José A., Mills, Rob, Mohanty, Trupti, Mohr, Bernadette, Moosavi, Seyed Mohamad, Moßhammer, Alexander, Naghdi, Amirhossein D., Naik, Aakash, Narykov, Oleksandr, Näsström, Hampus, Nguyen, Xuan Vu, Ni, Xinyi, O'Connor, Dana, Olayiwola, Teslim, Ottomano, Federico, Ozhan, Aleyna Beste, Pagel, Sebastian, Parida, Chiku, Park, Jaehee, Patel, Vraj, Patyukova, Elena, Petersen, Martin Hoffmann, Pinto, Luis, Pizarro, José M., Plessers, Dieter, Pradhan, Tapashree, Pratiush, Utkarsh, Puli, Charishma, Qin, Andrew, Rajabi, Mahyar, Ricci, Francesco, Risch, Elliot, Ríos-García, Martiño, Roy, Aritra, Rug, Tehseen, Sayeed, Hasan M, Scheidgen, Markus, Schilling-Wilhelmi, Mara, Schloz, Marcel, Schöppach, Fabian, Schumann, Julia, Schwaller, Philippe, Schwarting, Marcus, Sharlin, Samiha, Shen, Kevin, Shi, Jiale, Si, Pradip, D'Souza, Jennifer, Sparks, Taylor, Sudhakar, Suraj, Talirz, Leopold, Tang, Dandan, Taran, Olga, Terboven, Carla, Tropin, Mark, Tsymbal, Anastasiia, Ueltzen, Katharina, Unzueta, Pablo Andres, Vasan, Archit, Vinchurkar, Tirtha, Vo, Trung, Vogel, Gabriel, Völker, Christoph, Weinreich, Jan, Yang, Faradawn, Zaki, Mohd, Zhang, Chi, Zhang, Sylvester, Zhang, Weijie, Zhu, Ruijie, Zhu, Shang, Janssen, Jan, Li, Calvin, Foster, Ian, Blaiszik, Ben

arXiv.org Artificial IntelligenceJan-2-2025

Here, we present the outcomes from the second Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry, which engaged participants across global hybrid locations, resulting in 34 team submissions. The submissions spanned seven key application areas and demonstrated the diverse utility of LLMs for applications in (1) molecular and material property prediction; (2) molecular and material design; (3) automation and novel interfaces; (4) scientific communication and education; (5) research data management and automation; (6) hypothesis generation and evaluation; and (7) knowledge extraction and reasoning from scientific literature. Each team submission is presented in a summary table with links to the code and as brief papers in the appendix. Beyond team results, we discuss the hackathon event and its hybrid format, which included physical hubs in Toronto, Montreal, San Francisco, Berlin, Lausanne, and Tokyo, alongside a global online hub to enable local and virtual collaboration. Overall, the event highlighted significant improvements in LLM capabilities since the previous year's hackathon, suggesting continued expansion of LLMs for applications in materials science and chemistry research. These outcomes demonstrate the dual utility of LLMs as both multipurpose models for diverse machine learning tasks and platforms for rapid prototyping custom applications in scientific research.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2411.15221

Country:

North America > United States > Maryland (0.45)
North America > Canada > Ontario > Toronto (0.34)
North America > Canada > Quebec > Montreal (0.34)
(3 more...)

Genre:

Workflow (1.00)
Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
(3 more...)

Industry:

Materials > Construction Materials (1.00)
Information Technology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(11 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Add feedback

LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis

Giglou, Hamed Babaei, D'Souza, Jennifer, Auer, Sören

arXiv.org Artificial IntelligenceSep-27-2024

In response to the growing complexity and volume of scientific literature, this paper introduces the LLMs4Synthesis framework, designed to enhance the capabilities of Large Language Models (LLMs) in generating high-quality scientific syntheses. This framework addresses the need for rapid, coherent, and contextually rich integration of scientific insights, leveraging both open-source and proprietary LLMs. It also examines the effectiveness of LLMs in evaluating the integrity and reliability of these syntheses, alleviating inadequacies in current quantitative metrics. Our study contributes to this field by developing a novel methodology for processing scientific papers, defining new synthesis types, and establishing nine detailed quality criteria for evaluating syntheses. The integration of LLMs with reinforcement learning and AI feedback is proposed to optimize synthesis quality, ensuring alignment with established criteria. The LLMs4Synthesis framework and its components are made available, promising to enhance both the generation and evaluation processes in scientific research synthesis.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2409.18812

Country:

Asia > China (0.16)
Europe > Germany (0.14)
North America > United States (0.14)
Europe > Spain (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

Add feedback

Exploring the Latest LLMs for Leaderboard Extraction

Kabongo, Salomon, D'Souza, Jennifer, Auer, Sören

arXiv.org Artificial IntelligenceJul-8-2024

The rapid advancements in Large Language Models (LLMs) have opened new avenues for automating complex tasks in AI research. This paper investigates the efficacy of different LLMs-Mistral 7B, Llama-2, GPT-4-Turbo and GPT-4.o in extracting leaderboard information from empirical AI research articles. We explore three types of contextual inputs to the models: DocTAET (Document Title, Abstract, Experimental Setup, and Tabular Information), DocREC (Results, Experiments, and Conclusions), and DocFULL (entire document). Our comprehensive study evaluates the performance of these models in generating (Task, Dataset, Metric, Score) quadruples from research papers. The findings reveal significant insights into the strengths and limitations of each model and context type, providing valuable guidance for future AI research automation efforts.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2406.04383

Country:

Europe > Germany (0.28)
Europe > Middle East > Malta (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Large Language Models as Evaluators for Scientific Synthesis

Evans, Julia, D'Souza, Jennifer, Auer, Sören

arXiv.org Artificial IntelligenceJul-3-2024

Our study explores how well the state-of-the-art Large Language Models (LLMs), like GPT-4 and Mistral, can assess the quality of scientific summaries or, more fittingly, scientific syntheses, comparing their evaluations to those of human annotators. We used a dataset of 100 research questions and their syntheses made by GPT-4 from abstracts of five related papers, checked against human quality ratings. The study evaluates both the closed-source GPT-4 and the open-source Mistral model's ability to rate these summaries and provide reasons for their judgments. Preliminary results show that LLMs can offer logical explanations that somewhat match the quality ratings, yet a deeper statistical analysis shows a weak correlation between LLM and human ratings, suggesting the potential and current limitations of LLMs in scientific synthesis evaluation.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2407.02977

Country:

Europe (1.00)
North America > United States (0.69)

Genre: Research Report > New Finding (1.00)

Industry:

Education (1.00)
Materials > Chemicals (0.70)
Energy (0.70)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Scholarly Question Answering using Large Language Models in the NFDI4DataScience Gateway

Giglou, Hamed Babaei, Taffa, Tilahun Abedissa, Abdullah, Rana, Usmanova, Aida, Usbeck, Ricardo, D'Souza, Jennifer, Auer, Sören

arXiv.org Artificial IntelligenceJun-11-2024

This paper introduces a scholarly Question Answering (QA) system on top of the NFDI4DataScience Gateway, employing a Retrieval Augmented Generation-based (RAG) approach. The NFDI4DS Gateway, as a foundational framework, offers a unified and intuitive interface for querying various scientific databases using federated search. The RAG-based scholarly QA, powered by a Large Language Model (LLM), facilitates dynamic interaction with search results, enhancing filtering capabilities and fostering a conversational engagement with the Gateway search. The effectiveness of both the Gateway and the scholarly QA system is demonstrated through experimental analysis.

gateway, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2406.07257

Country:

Europe > Germany (0.68)
North America > United States > Pennsylvania (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Effective Context Selection in LLM-based Leaderboard Generation: An Empirical Study

Kabongo, Salomon, D'Souza, Jennifer, Auer, Sören

arXiv.org Artificial IntelligenceJun-6-2024

This paper explores the impact of context selection on the efficiency of Large Language Models (LLMs) in generating Artificial Intelligence (AI) research leaderboards, a task defined as the extraction of (Task, Dataset, Metric, Score) quadruples from scholarly articles. By framing this challenge as a text generation objective and employing instruction finetuning with the FLAN-T5 collection, we introduce a novel method that surpasses traditional Natural Language Inference (NLI) approaches in adapting to new developments without a predefined taxonomy. Through experimentation with three distinct context types of varying selectivity and length, our study demonstrates the importance of effective context selection in enhancing LLM accuracy and reducing hallucinations, providing a new pathway for the reliable and efficient generation of AI leaderboards. This contribution not only advances the state of the art in leaderboard generation but also sheds light on strategies to mitigate common challenges in LLM-based information extraction.

large language model, leaderboard, natural language, (13 more...)

arXiv.org Artificial Intelligence

2407.02409

Country: Europe > Germany (0.14)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback