Goto

Collaborating Authors

 Beltsville


Appendix A V ariational Paragraph Embedder A.1 Selection of substitution rate p

Neural Information Processing Systems

Figure 4: Impact of the proportion of injected noise for learning Paragraph Em-beddings on XSum dataset. (Figure 4). The results of the ablation study are presented in Table 5. Embedder in providing clean and denoised reconstructions. In general, it has been observed that generations progress in a coarse-to-fine manner. The early time step, which is close to 1, tends to be less fluent and generic. This was the nicest stay we have ever had. Turtle Bay was a great resort. This was the nicest stay we have ever had.



Machine learning and natural language processing models to predict the extent of food processing

arXiv.org Artificial Intelligence

The dramatic increase in consumption of ultra-processed food has been associated with numerous adverse health effects. Given the public health consequences linked to ultra-processed food consumption, it is highly relevant to build computational models to predict the processing of food products. We created a range of machine learning, deep learning, and NLP models to predict the extent of food processing by integrating the FNDDS dataset of food products and their nutrient profiles with their reported NOVA processing level. Starting with the full nutritional panel of 102 features, we further implemented coarse-graining of features to 65 and 13 nutrients by dropping flavonoids and then by considering the 13-nutrient panel of FDA, respectively. LGBM Classifier and Random Forest emerged as the best model for 102 and 65 nutrients, respectively, with an F1-score of 0.9411 and 0.9345 and MCC of 0.8691 and 0.8543. For the 13-nutrient panel, Gradient Boost achieved the best F1-score of 0.9284 and MCC of 0.8425. We also implemented NLP based models, which exhibited state-of-the-art performance.


War Elephants: Rethinking Combat AI and Human Oversight

arXiv.org Artificial Intelligence

This paper explores the changes that pervasive AI is having on the nature of combat. We look beyond the substitution of AI for experts to an approach where complementary human and machine abilities are blended. Using historical and modern examples, we show how autonomous weapons systems can be effectively managed by teams of human "AI Operators" combined with AI/ML "Proxy Operators." By basing our approach on the principles of complementation, we provide for a flexible and dynamic approach to managing lethal autonomous systems. We conclude by presenting a path to achieving an integrated vision of machine-speed combat where the battlefield AI is operated by AI Operators that watch for patterns of behavior within battlefield to assess the performance of lethal autonomous systems. This approach enables the development of combat systems that are likely to be more ethical, operate at machine speed, and are capable of responding to a broader range of dynamic battlefield conditions than any purely autonomous AI system could support.


The World of Generative AI: Deepfakes and Large Language Models

arXiv.org Artificial Intelligence

The latest development in artificial intelligence (AI), chatbots, the product of generative AI, has captivated the public in the last two years. But it similarly poses an unprecedented challenge and can have potentially unwanted effects on our lives. OpenAI released the chatbot ChatGPT on November 30, 2022. The overwhelming response of the public towards ChatGPT usage pushed Google to release Bard, ChatGPT's rival, and Microsoft to release AI-powered Bing. But the recent GPT-4 topped the list as it has more capabilities than any other existing chatbot. Being LLM-based, these chatbots create synthetic media with the intention of creating better content, enhanced quality, or professional voices. The capabilities of such chatbots raise questions on the ethical use of AI. In the meantime, deepfakes, which are high-quality AI-generated fake videos, have been circulating online. Synthetically generated deepfake videos have exceeded acceptable limits in terms of reality distortion.


Cotton Yield Prediction Using Random Forest

arXiv.org Artificial Intelligence

The cotton industry in the United States is committed to sustainable production practices that minimize water, land, and energy use while improving soil health and cotton output. Climate-smart agricultural technologies are being developed to boost yields while decreasing operating expenses. Crop yield prediction, on the other hand, is difficult because of the complex and nonlinear impacts of cultivar, soil type, management, pest and disease, climate, and weather patterns on crops. To solve this issue, we employ machine learning (ML) to forecast production while considering climate change, soil diversity, cultivar, and inorganic nitrogen levels. From the 1980s to the 1990s, field data were gathered across the southern cotton belt of the United States. To capture the most current effects of climate change over the previous six years, a second data source was produced using the process-based crop model, GOSSYM. We concentrated our efforts on three distinct areas inside each of the three southern states: Texas, Mississippi, and Georgia. To simplify the amount of computations, accumulated heat units (AHU) for each set of experimental data were employed as an analogy to use time-series weather data. The Random Forest Regressor yielded a 97.75% accuracy rate, with a root mean square error of 55.05 kg/ha and an R2 of around 0.98. These findings demonstrate how an ML technique may be developed and applied as a reliable and easy-to-use model to support the cotton climate-smart initiative.


PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model

arXiv.org Artificial Intelligence

Autoregressive models for text sometimes generate repetitive and low-quality output because errors accumulate during the steps of generation. This issue is often attributed to exposure bias - the difference between how a model is trained, and how it is used during inference. Denoising diffusion models provide an alternative approach in which a model can revisit and revise its output. However, they can be computationally expensive and prior efforts on text have led to models that produce less fluent output compared to autoregressive models, especially for longer text and paragraphs. In this paper, we propose PLANNER, a model that combines latent semantic diffusion with autoregressive generation, to generate fluent text while exercising global control over paragraphs. The model achieves this by combining an autoregressive "decoding" module with a "planning" module that uses latent diffusion to generate semantic paragraph embeddings in a coarse-to-fine manner. The proposed method is evaluated on various conditional generation tasks, and results on semantic generation, text completion and summarization show its effectiveness in generating high-quality long-form text in an efficient manner.


Exploring New Frontiers in Agricultural NLP: Investigating the Potential of Large Language Models for Food Applications

arXiv.org Artificial Intelligence

This paper explores new frontiers in agricultural natural language processing by investigating the effectiveness of using food-related text corpora for pretraining transformer-based language models. In particular, we focus on the task of semantic matching, which involves establishing mappings between food descriptions and nutrition data. To accomplish this, we fine-tune a pre-trained transformer-based language model, AgriBERT, on this task, utilizing an external source of knowledge, such as the FoodOn ontology. To advance the field of agricultural NLP, we propose two new avenues of exploration: (1) utilizing GPT-based models as a baseline and (2) leveraging ChatGPT as an external source of knowledge. ChatGPT has shown to be a strong baseline in many NLP tasks, and we believe it has the potential to improve our model in the task of semantic matching and enhance our model's understanding of food-related concepts and relationships. Additionally, we experiment with other applications, such as cuisine prediction based on food ingredients, and expand the scope of our research to include other NLP tasks beyond semantic matching. Overall, this paper provides promising avenues for future research in this field, with potential implications for improving the performance of agricultural NLP applications.


Down the Rabbit Hole: Detecting Online Extremism, Radicalisation, and Politicised Hate Speech

arXiv.org Artificial Intelligence

Social media is a modern person's digital voice to project and engage with new ideas and mobilise communities $\unicode{x2013}$ a power shared with extremists. Given the societal risks of unvetted content-moderating algorithms for Extremism, Radicalisation, and Hate speech (ERH) detection, responsible software engineering must understand the who, what, when, where, and why such models are necessary to protect user safety and free expression. Hence, we propose and examine the unique research field of ERH context mining to unify disjoint studies. Specifically, we evaluate the start-to-finish design process from socio-technical definition-building and dataset collection strategies to technical algorithm design and performance. Our 2015-2021 51-study Systematic Literature Review (SLR) provides the first cross-examination of textual, network, and visual approaches to detecting extremist affiliation, hateful content, and radicalisation towards groups and movements. We identify consensus-driven ERH definitions and propose solutions to existing ideological and geographic biases, particularly due to the lack of research in Oceania/Australasia. Our hybridised investigation on Natural Language Processing, Community Detection, and visual-text models demonstrates the dominating performance of textual transformer-based algorithms. We conclude with vital recommendations for ERH context mining researchers and propose an uptake roadmap with guidelines for researchers, industries, and governments to enable a safer cyberspace.


POSTDOCTORAL RESEARCH ASSOCIATE (Research Biologist (Computational / Bioinformatics / Geneticist)

#artificialintelligence

The USDA, Agricultural Research Service, Animal Biosciences and Biotechnology, in Beltsville, Maryland, is seeking a POSTDOCTORAL RESEARCH ASSOCIATE, (Research Biologist (Computational/Bioinformatics)/Geneticist) for a TWO YEAR APPOINTMENT. Salary is commensurate with experience (starting at $74,950 per annum) plus benefits. As antimicrobial resistance becomes a more significant problem worldwide, the USDA is actively investigating alternative to antibiotics in swine that promote growth performance and disease resistance. The incumbent will work as a part of a multi-disciplinary team (microbiology, immunology, physiology, and bioinformatics) to identify and develop alternatives to in-feed antibiotics. The participant's specific project will involve the application of machine learning models to microbiome datasets with the end goal of identifying biomarkers associated with improved swine growth.