Goto

Collaborating Authors

 nomenclature



Evaluation of the Automated Labeling Method for Taxonomic Nomenclature Through Prompt-Optimized Large Language Model

arXiv.org Artificial Intelligence

-- Scientific names of organisms consist of a genus name and a species epithet, with the latter often reflecting aspects such as morphology, ecology, distribution, and cultural background. Traditionally, researchers have manually labeled species names by care fully examining taxonomic descriptions, a process that demands substantial time and effort when dealing with large datasets. This study evaluates the feasibility of automatic species name labeling using large language model (LLM) by leveraging the ir text classification and semantic extraction capabilities. Using the spider name dataset compiled by Mammola et al., we compared LLM - based labeling results -- enhanced through prompt engineering -- with human annotations. The results indicate that LLM - based classification achieved high accuracy in Morphology, Geography, and People categories. However, classification accuracy was lower in Ecology & Behavior and Modern & Past Culture, revealing challenges in interpreting animal behavior and cultural contexts. Fut ure research will focus on improving accuracy through optimized few - shot learning and retrieval - augmented generation techniques, while also expanding the applicability of LLM - based labeling to diverse biological taxa. Humans have long sought to construct systematic classification methods to understand the complexity of natural phenomena and objects. These efforts serve as a foundation for uncovering patterns and interrelationships in nature, facilitating the accumulation of scientific knowledge.


SLIM-RAFT: A Novel Fine-Tuning Approach to Improve Cross-Linguistic Performance for Mercosur Common Nomenclature

arXiv.org Artificial Intelligence

Natural language processing (NLP) has seen significant advancements with the advent of large language models (LLMs). However, substantial improvements are still needed for languages other than English, especially for specific domains like the applications of Mercosur Common Nomenclature (NCM), a Brazilian Harmonized System (HS). To address this gap, this study uses TeenyTineLLaMA, a foundational Portuguese LLM, as an LLM source to implement the NCM application processing. Additionally, a simplified Retrieval-Augmented Fine-Tuning (RAFT) technique, termed SLIM-RAFT, is proposed for task-specific fine-tuning of LLMs. This approach retains the chain-of-thought (CoT) methodology for prompt development in a more concise and streamlined manner, utilizing brief and focused documents for training. The proposed model demonstrates an efficient and cost-effective alternative for fine-tuning smaller LLMs, significantly outperforming TeenyTineLLaMA and ChatGPT-4 in the same task. Although the research focuses on NCM applications, the methodology can be easily adapted for HS applications worldwide.


Exploring Federated Deep Learning for Standardising Naming Conventions in Radiotherapy Data

arXiv.org Artificial Intelligence

Standardising structure volume names in radiotherapy (RT) data is necessary to enable data mining and analyses, especially across multi-institutional centres. This process is time and resource intensive, which highlights the need for new automated and efficient approaches to handle the task. Several machine learning-based methods have been proposed and evaluated to standardise nomenclature. However, no studies have considered that RT patient records are distributed across multiple data centres. This paper introduces a method that emulates real-world environments to establish standardised nomenclature. This is achieved by integrating decentralised real-time data and federated learning (FL). A multimodal deep artificial neural network was proposed to standardise RT data in federated settings. Three types of possible attributes were extracted from the structures to train the deep learning models: tabular, visual, and volumetric. Simulated experiments were carried out to train the models across several scenarios including multiple data centres, input modalities, and aggregation strategies. The models were compared against models developed with single modalities in federated settings, in addition to models trained in centralised settings. Categorical classification accuracy was calculated on hold-out samples to inform the models performance. Our results highlight the need for fusing multiple modalities when training such models, with better performance reported with tabular-volumetric models. In addition, we report comparable accuracy compared to models built in centralised settings. This demonstrates the suitability of FL for handling the standardization task. Additional ablation analyses showed that the total number of samples in the data centres and the number of data centres highly affects the training process and should be carefully considered when building standardisation models.


Decision support system for distributed manufacturing based on input-output analysis and economic complexity

arXiv.org Artificial Intelligence

The disruption of supplies during the Covid-19 crisis has led to shortages but has also shown the adaptability of some companies, which have succeeded in adapting their production chains quickly to produce goods experiencing shortages: hydroalcoholic gel, masks, and medical gowns. These productive jumps from product A to product B are feasible because of the know-how proximity between the two classes of products. The proximities were computed from the analysis of co-exports and resulted in the construction of the product space. Based on the product space, as well as the customer-supplier relationships resulting from the input-output matrices, we propose a recommender system for companies. The goal is to promote distributed manufacturing by recommending a list of local suppliers to each company. As there is not always a local supplier for a desired product class, we consider the proximity between products to identify, in the absence of a supplier, a substitute supplier able to adapt its production tools to provide the required product. Our experiments are based on French data, from which we build a graph of synergies illustrating the potential productive links between companies. Finally, we show that our approach offers new perspectives to determine the level of territories' industrial resilience considering potential productive jumps.


Why Are AI/ML Job Titles So Vague?

#artificialintelligence

Artificial intelligence and machine learning jobs are among the most sought after by both freshers and experienced candidates. Seemingly glamorous work, the thrill of exploring a (relatively) new domain, a good pay package, and fancy job titles -- these are some of the reasons driving this trend. AI and ML are vast fields and often encompass certain subfields. Some of the more common job titles include machine learning engineers, AI engineers, data analysts, and data scientists. Then there are a few unconventional ones like Intelligence Designer, Data Curator, Digital Knowledge Manager, and Machine Learning Data Scientist.


13 Artificial Intelligence Trends for Investors to Watch

#artificialintelligence

"As soon as it works, nobody calls it AI anymore." Those were the words of John McCarthy, a computer scientist who is considered one of the founding fathers of artificial intelligence. It makes you wonder when artificial intelligence (AI) will stop being a disruptive technology and just become something everyone uses to do things more efficiently. One way to gauge the maturity of any given technology is to see where it sits on the Gartner Hype Cycle. As it turns out, artificial intelligence has spawned its own Gartner Hype Cycle.


Making sense of machine learning

#artificialintelligence

AI has become so pervasive, almost every software vendor laying claim to today's most hyped technology. In fact, Gartner's latest Hype Cycle for Emerging Technologies uncerimoniously drops machine learning from its infamous curve. Hang on -- see what I did there? I used "AI" and "machine learning" interchangeably, which should get me busted by the artificial thought police. The first thing you need to know about AI (and machine learning) is that it's full of confusing, overlapping terminology, not to mention algorithms with functions that are opaque to all but a select few.


AI, VR, AR: Is innovation creating a new Tower of Babel?

#artificialintelligence

William Shakespeare once wrote, "What's in a name? That which we call a rose by any other name would smell as sweet." For the star-crossed lovers of Romeo and Juliet, it meant that a name is nothing but an artificial and meaningless convention. Many panelists at SXSW seemed to have a similar mindset. There are two kinds: chatbots that rely on scripted query and response systems and those that are powered by artificial intelligence.


Making sense of machine learning 7wData

#artificialintelligence

As Matt Asay observed last week, AI appears to be reaching "peak ludicrous mode," with almost every software vendor laying claim to today's most hyped technology. Hang on -- see what I did there? I used "AI" and "machine learning" interchangeably, which should get me busted by the artificial thought police. The first thing you need to know about AI (and machine learning) is that it's full of confusing, overlapping terminology, not to mention algorithms with functions that are opaque to all but a select few. This combination of hype and nearly impenetrable nomenclature can get pretty irritating.