Mugla
Enhancing Cryptocurrency Market Forecasting: Advanced Machine Learning Techniques and Industrial Engineering Contributions
Pinky, Jannatun Nayeem, Akula, Ramya
Cryptocurrencies, as decentralized digital assets, have experienced rapid growth and adoption, with over 23,000 cryptocurrencies and a market capitalization nearing \$1.1 trillion (about \$3,400 per person in the US) as of 2023. This dynamic market presents significant opportunities and risks, highlighting the need for accurate price prediction models to manage volatility. This chapter comprehensively reviews machine learning (ML) techniques applied to cryptocurrency price prediction from 2014 to 2024. We explore various ML algorithms, including linear models, tree-based approaches, and advanced deep learning architectures such as transformers and large language models. Additionally, we examine the role of sentiment analysis in capturing market sentiment from textual data like social media posts and news articles to anticipate price fluctuations. With expertise in optimizing complex systems and processes, industrial engineers are pivotal in enhancing these models. They contribute by applying principles of process optimization, efficiency, and risk mitigation to improve computational performance and data management. This chapter highlights the evolving landscape of cryptocurrency price prediction, the integration of emerging technologies, and the significant role of industrial engineers in refining predictive models. By addressing current limitations and exploring future research directions, this chapter aims to advance the development of more accurate and robust prediction systems, supporting better-informed investment decisions and more stable market behavior.
Association rule mining with earthquake data collected from Turkiye region
Earthquakes are evaluated among the most destructive disasters for human beings, as also experienced for Turkiye region. Data science has the property of discovering hidden patterns in case a sufficient volume of data is supplied. Time dependency of events, specifically being defined by co-occurrence in a specific time window, may be handled as an associate rule mining task such as a market-basket analysis application. In this regard, we assumed each day's seismic activity as a single basket of events, leading to discovering the association patterns between these events. Consequently, this study presents the most prominent association rules for the earthquakes recorded in Turkiye region in the last 5 years, each year presented separately. Results indicate statistical inference with events recorded from regions of various distances, which could be further verified with geologic evidence from the field. As a result, we believe that the current study may form a statistical basis for the future works with the aid of machine learning algorithm performed for associate rule mining.
Towards Foundation Models for Learning on Tabular Data
Zhang, Han, Wen, Xumeng, Zheng, Shun, Xu, Wei, Bian, Jiang
Learning on tabular data underpins numerous real-world applications. Despite considerable efforts in developing effective learning models for tabular data, current transferable tabular models remain in their infancy, limited by either the lack of support for direct instruction following in new tasks or the neglect of acquiring foundational knowledge and capabilities from diverse tabular datasets. In this paper, we propose Tabular Foundation Models (TabFMs) to overcome these limitations. TabFMs harness the potential of generative tabular learning, employing a pre-trained large language model (LLM) as the base model and fine-tuning it using purpose-designed objectives on an extensive range of tabular datasets. This approach endows TabFMs with a profound understanding and universal capabilities essential for learning on tabular data. Our evaluations underscore TabFM's effectiveness: not only does it significantly excel in instruction-following tasks like zero-shot and in-context inference, but it also showcases performance that approaches, and in instances, even transcends, the renowned yet mysterious closed-source LLMs like GPT-4. Furthermore, when fine-tuning with scarce data, our model achieves remarkable efficiency and maintains competitive performance with abundant training data. Finally, while our results are promising, we also delve into TabFM's limitations and potential opportunities, aiming to stimulate and expedite future research on developing more potent TabFMs.
The Troubling Emergence of Hallucination in Large Language Models -- An Extensive Definition, Quantification, and Prescriptive Remediations
Rawte, Vipula, Chakraborty, Swagata, Pathak, Agnibh, Sarkar, Anubhav, Tonmoy, S. M Towhidul Islam, Chadha, Aman, Sheth, Amit P., Das, Amitava
The recent advancements in Large Language Models (LLMs) have garnered widespread acclaim for their remarkable emerging capabilities. However, the issue of hallucination has parallelly emerged as a by-product, posing significant concerns. While some recent endeavors have been made to identify and mitigate different types of hallucination, there has been a limited emphasis on the nuanced categorization of hallucination and associated mitigation methods. To address this gap, we offer a fine-grained discourse on profiling hallucination based on its degree, orientation, and category, along with offering strategies for alleviation. As such, we define two overarching orientations of hallucination: (i) factual mirage (FM) and (ii) silver lining (SL). To provide a more comprehensive understanding, both orientations are further sub-categorized into intrinsic and extrinsic, with three degrees of severity - (i) mild, (ii) moderate, and (iii) alarming. We also meticulously categorize hallucination into six types: (i) acronym ambiguity, (ii) numeric nuisance, (iii) generated golem, (iv) virtual voice, (v) geographic erratum, and (vi) time wrap. Furthermore, we curate HallucInation eLiciTation (HILT), a publicly available dataset comprising of 75,000 samples generated using 15 contemporary LLMs along with human annotations for the aforementioned categories. Finally, to establish a method for quantifying and to offer a comparative spectrum that allows us to evaluate and rank LLMs based on their vulnerability to producing hallucinations, we propose Hallucination Vulnerability Index (HVI). We firmly believe that HVI holds significant value as a tool for the wider NLP community, with the potential to serve as a rubric in AI-related policy-making. In conclusion, we propose two solution strategies for mitigating hallucinations.
Twitter Data Analysis: Izmir Earthquake Case
Agrali, Özgür, Sökün, Hakan, Karaarslan, Enis
T\"urkiye is located on a fault line; earthquakes often occur on a large and small scale. There is a need for effective solutions for gathering current information during disasters. We can use social media to get insight into public opinion. This insight can be used in public relations and disaster management. In this study, Twitter posts on Izmir Earthquake that took place on October 2020 are analyzed. We question if this analysis can be used to make social inferences on time. Data mining and natural language processing (NLP) methods are used for this analysis. NLP is used for sentiment analysis and topic modelling. The latent Dirichlet Allocation (LDA) algorithm is used for topic modelling. We used the Bidirectional Encoder Representations from Transformers (BERT) model working with Transformers architecture for sentiment analysis. It is shown that the users shared their goodwill wishes and aimed to contribute to the initiated aid activities after the earthquake. The users desired to make their voices heard by competent institutions and organizations. The proposed methods work effectively. Future studies are also discussed.
Automatic autism spectrum disorder detection using artificial intelligence methods with MRI neuroimaging: A review
Moridian, Parisa, Ghassemi, Navid, Jafari, Mahboobeh, Salloum-Asfar, Salam, Sadeghi, Delaram, Khodatars, Marjane, Shoeibi, Afshin, Khosravi, Abbas, Ling, Sai Ho, Subasi, Abdulhamit, Alizadehsani, Roohallah, Gorriz, Juan M., Abdulla, Sara A, Acharya, U. Rajendra
Autism spectrum disorder (ASD) is a brain condition characterized by diverse signs and symptoms that appear in early childhood. ASD is also associated with communication deficits and repetitive behavior in affected individuals. Various ASD detection methods have been developed, including neuroimaging modalities and psychological tests. Among these methods, magnetic resonance imaging (MRI) imaging modalities are of paramount importance to physicians. Clinicians rely on MRI modalities to diagnose ASD accurately. The MRI modalities are non-invasive methods that include functional (fMRI) and structural (sMRI) neuroimaging methods. However, diagnosing ASD with fMRI and sMRI for specialists is often laborious and time-consuming; therefore, several computer-aided design systems (CADS) based on artificial intelligence (AI) have been developed to assist specialist physicians. Conventional machine learning (ML) and deep learning (DL) are the most popular schemes of AI used for diagnosing ASD. This study aims to review the automated detection of ASD using AI. We review several CADS that have been developed using ML techniques for the automated diagnosis of ASD using MRI modalities. There has been very limited work on the use of DL techniques to develop automated diagnostic models for ASD. A summary of the studies developed using DL is provided in the Supplementary Appendix. Then, the challenges encountered during the automated diagnosis of ASD using MRI and AI techniques are described in detail. Additionally, a graphical comparison of studies using ML and DL to diagnose ASD automatically is discussed. We suggest future approaches to detecting ASDs using AI techniques and MRI neuroimaging.
Digital Twin Based Disaster Management System Proposal: DT-DMS
Dogan, Özgür, Sahin, Oguzhan, Karaarslan, Enis
The damage and the impact of natural disasters are becoming more destructive with the increase of urbanization. Today's metropolitan cities are not sufficiently prepared for the pre and post-disaster situations. Digital Twin technology can provide a solution. A virtual copy of the physical city could be created by collecting data from sensors of the Internet of Things (IoT) devices and stored on the cloud infrastructure. This virtual copy is kept current and up to date with the continuous flow of the data coming from the sensors. We propose a disaster management system utilizing machine learning called DT-DMS is used to support decision-making mechanisms. This study aims to show how to educate and prepare emergency center staff by simulating potential disaster situations on the virtual copy. The event of a disaster will be simulated allowing emergency center staff to make decisions and depicting the potential outcomes of these decisions. A rescue operation after an earthquake is simulated. Test results are promising and the simulation scope is planned to be extended.
Solving The Exam Scheduling Problems in Central Exams With Genetic Algorithms
It is the efficient use of resources expected from an exam scheduling application. There are various criteria for efficient use of resources and for all tests to be carried out at minimum cost in the shortest possible time. It is aimed that educational institutions with such criteria successfully carry out central examination organizations. In the study, a two-stage genetic algorithm was developed. In the first stage, the assignment of courses to sessions was carried out. In the second stage, the students who participated in the test session were assigned to examination rooms. Purposes of the study are increasing the number of joint students participating in sessions, using the minimum number of buildings in the same session, and reducing the number of supervisors using the minimum number of classrooms possible. In this study, a general purpose exam scheduling solution for educational institutions was presented. The developed system can be used in different central examinations to create originality. Given the results of the sample application, it is seen that the proposed genetic algorithm gives successful results.1
Optimization of Project Scheduling Activities in Dynamic CPM and PERT Networks Using Genetic Algorithms
Calp, Muhammed Hanefi, Akcayol, Muhammet Ali
Projects consist of interconnected dimensions such as objective, time, resource and environment. Use of these dimensions in a controlled way and their effective scheduling brings the project success. Project scheduling process includes defining project activities, and estimation of time and resources to be used for the activities. At this point, the project resource-scheduling problems have begun to attract more attention after Program Evaluation and Review Technique (PERT) and Critical Path Method (CPM) are developed one after the other. However, complexity and difficulty of CPM and PERT processes led to the use of these techniques through artificial intelligence methods such as Genetic Algorithm (GA). In this study, an algorithm was proposed and developed, which determines critical path, critical activities and project completion duration by using GA, instead of CPM and PERT techniques used for network analysis within the scope of project management. The purpose of using GA was that these algorithms are an effective method for solution of complex optimization problems. Therefore, correct decisions can be made for implemented project activities by using obtained results. Thus, optimum results were obtained in a shorter time than the CPM and PERT techniques by using the model based on the dynamic algorithm. It is expected that this study will contribute to the performance field (time, speed, low error etc.) of other studies.
Medical Diagnosis with a Novel SVM-CoDOA Based Hybrid Approach
Machine Learning is an important sub-field of the Artificial Intelligence and it has been become a very critical task to train Machine Learning techniques via effective method or techniques. Recently, researchers try to use alternative techniques to improve ability of Machine Learning techniques. Moving from the explanations, objective of this study is to introduce a novel SVM-CoDOA (Cognitive Development Optimization Algorithm trained Support Vector Machines) system for general medical diagnosis. In detail, the system consists of a SVM, which is trained by CoDOA, a newly developed optimization algorithm. As it is known, use of optimization algorithms is an essential task to train and improve Machine Learning techniques. In this sense, the study has provided a medical diagnosis oriented problem scope in order to show effectiveness of the SVM-CoDOA hybrid formation.