Goto

Collaborating Authors

 numerical


Supplementary Materials PERFOGRAPH: A Numerical A ware Program Graph Representation for Performance Optimization and Program Analysis

Neural Information Processing Systems

We investigated the effectiveness of Digit Embedding. We can see that the numbers in the (100090-100140) range are clustered together. Supplementary Materials for PERFOGRAPH: A Numerical A ware Program Graph Representation for Performance Optimization and Program Analysis We investigated with more ranges. Figure 3 shows the 2-d embedding of decimal numbers in the range [1.0, 10.0] and [20.0-31.0]. And the numbers with larger differences like (1.6478, 30.7010), (5.339, 30.5113) are far from Figure 3: Embedding of decimal numbers in the range [1.0, 10.0] and [20.0-31.0] 2 So, the above examples clearly demonstrate the effectiveness of Digit Embedding for generating the Please note that in this setup, the Digit Embedding is still applied.


Optimal Design of a Walking Robot: Analytical, Numerical, and Machine Learning Methods for Multicriteria Synthesis

Ibrayeva, Arman, Omarov, Batyrkhan

arXiv.org Artificial Intelligence

This paper addresses several critical stages of designing a walking robot, including optimal structural synthesis, introducing a novel 'rational' mechanical structure aimed at enhancing efficiency and simplifying control system, while addressing practical limitations observed in existing designs. The study includes development of novel multicriteria synthesis methods for achieving optimal leg design, integrating analytical and numerical methods. In addition, a method based on Non-dominated Sorting Genetic Algorithm II is presented. Turning modes are investigated, and for the first time, the isotropy criterion, typically applied to parallel manipulators, is used for optimizing walking robot parameters to ensure optimal force and motion transfer in all directions. Several physical prototypes are developed to experimentally validate the functionality of different mechanisms of the robot, including adaptation to the surface irregularities and navigation using LiDAR.

  Country:
  Genre: Research Report (0.40)
  Industry: Energy (0.35)

GroundHog: Revolutionizing GLDAS Groundwater Storage Downscaling for Enhanced Recharge Estimation in Bangladesh

Ahmed, Saleh Sakib, Zzaman, Rashed Uz, Jony, Saifur Rahman, Himel, Faizur Rahman, Sharmin, Afroza, Rahman, A. H. M. Khalequr, Rahman, M. Sohel, Nowreen, Sara

arXiv.org Artificial Intelligence

Long-term groundwater level (GWL) measurement is vital for effective policymaking and recharge estimation using annual maxima and minima. However, current methods prioritize short-term predictions and lack multi-year applicability, limiting their utility. Moreover, sparse in-situ measurements lead to reliance on low-resolution satellite data like GLDAS as the ground truth for Machine Learning models, further constraining accuracy. To overcome these challenges, we first develop an ML model to mitigate data gaps, achieving $R^2$ scores of 0.855 and 0.963 for maximum and minimum GWL predictions, respectively. Subsequently, using these predictions and well observations as ground truth, we train an Upsampling Model that uses low-resolution (25 km) GLDAS data as input to produce high-resolution (2 km) GWLs, achieving an excellent $R^2$ score of 0.96. Our approach successfully upscales GLDAS data for 2003-2024, allowing high-resolution recharge estimations and revealing critical trends for proactive resource management. Our method allows upsampling of groundwater storage (GWS) from GLDAS to high-resolution GWLs for any points independently of officially curated piezometer data, making it a valuable tool for decision-making.


Implementing LLMs in industrial process modeling: Addressing Categorical Variables

Koronaki, Eleni D., Suntaxi, Geremy Loachamin, Papavasileiou, Paris, Giovanis, Dimitrios G., Kathrein, Martin, Boudouvis, Andreas G., Bordas, Stéphane P. A.

arXiv.org Machine Learning

Important variables of processes are, in many occasions, categorical, i.e. names or labels representing, e.g. categories of inputs, or types of reactors or a sequence of steps. In this work, we use Large Language Models (LLMs) to derive embeddings of such inputs that represent their actual meaning, or reflect the ``distances" between categories, i.e. how similar or dissimilar they are. This is a marked difference from the current standard practice of using binary, or one-hot encoding to replace categorical variables with sequences of ones and zeros. Combined with dimensionality reduction techniques, either linear such as Principal Components Analysis (PCA), or nonlinear such as Uniform Manifold Approximation and Projection (UMAP), the proposed approach leads to a \textit{meaningful}, low-dimensional feature space. The significance of obtaining meaningful embeddings is illustrated in the context of an industrial coating process for cutting tools that includes both numerical and categorical inputs. The proposed approach enables feature importance which is a marked improvement compared to the current state-of-the-art (SotA) in the encoding of categorical variables.


Exploring Fairness in Educational Data Mining in the Context of the Right to be Forgotten

Qian, Wei, Chen, Aobo, Zhao, Chenxu, Li, Yangyi, Huai, Mengdi

arXiv.org Artificial Intelligence

Student data, which is a critical component in EDM research, can contain personal information, such as age and gender, as well as academic performance and activity data from online learning systems [24]. By offering valuable insights into student learning, EDM supports the development of more effective educational practices and policies, ultimately improving student outcomes. One of the most popular techniques in the previous works is incorporating machine learning techniques, which has achieved remarkable success in discovering intricate structures within educational datasets. However, in recent years, concerns about the fairness of deploying algorithmic decision-making in the educational context have emerged [2, 22, 27, 49]. Particularly, machine learning models can produce biased and unfair outcomes for certain student groups, significantly affecting their educational opportunities and achievements. Given that the data empowering EDM research can often contain personally identifiable and other sensitive information, there has been increased attention to privacy protection in recent years [37, 43]. Additionally, privacy legislation such as the California Consumer Privacy Act [39] and the former Right to be Forgotten [17] has granted users the right to erase the impact of their sensitive information from the trained models to protect their privacy. One approach to protecting users' privacy involves enabling the trained machine learning model to entirely forget Both authors contributed equally to this research.


Neighboring Extremal Optimal Control Theory for Parameter-Dependent Closed-loop Laws

Rai, Ayush, Mou, Shaoshuai, Anderson, Brian D. O.

arXiv.org Artificial Intelligence

This study introduces an approach to obtain a neighboring extremal optimal control (NEOC) solution for a closed-loop optimal control problem, applicable to a wide array of nonlinear systems and not necessarily quadratic performance indices. The approach involves investigating the variation incurred in the functional form of a known closed-loop optimal control law due to small, known parameter variations in the system equations or the performance index. The NEOC solution can formally be obtained by solving a linear partial differential equation, akin to those encountered in the iterative solution of a nonlinear Hamilton-Jacobi equation. Motivated by numerical procedures for solving these latter equations, we also propose a numerical algorithm based on the Galerkin algorithm, leveraging the use of basis functions to solve the underlying Hamilton-Jacobi equation of the original optimal control problem. The proposed approach simplifies the NEOC problem by reducing it to the solution of a simple set of linear equations, thereby eliminating the need for a full re-solution of the adjusted optimal control problem. Furthermore, the variation to the optimal performance index can be obtained as a function of both the system state and small changes in parameters, allowing the determination of the adjustment to an optimal control law given a small adjustment of parameters in the system or the performance index. Moreover, in order to handle large known parameter perturbations, we propose a homotopic approach that breaks down the single calculation of NEOC into a finite set of multiple steps. Finally, the validity of the claims and theory is supported by theoretical analysis and numerical simulations.


Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems

Davis, Ernest, Aaronson, Scott

arXiv.org Artificial Intelligence

Our test sets were too small and too haphazard to support statistically valid conclusions, but they were suggestive of a number of conclusions. We summarize these here, and discuss them at greater length in section 7. Over the kinds of problems tested, GPT-4 with either plug-in is significantly stronger than GPT-4 by itself, or, almost certainly, than any AI that existed a year ago. However it is still far from reliable; it often outputs a wrong answer or fails to output any answer. In terms of overall score, we would judge that these systems performs on the level of a middling undergraduate student. However, their capacities and weaknesses do not align with a human student; the systems solve some problems that even capable students would find challenging, whereas they fail on some problems that even middling high school students would find easy.


Quantifying Outlierness of Funds from their Categories using Supervised Similarity

Desai, Dhruv, Dhiman, Ashmita, Sharma, Tushar, Sharma, Deepika, Mehta, Dhagash, Pasquali, Stefano

arXiv.org Artificial Intelligence

Mutual fund categorization has become a standard tool for the investment management industry and is extensively used by allocators for portfolio construction and manager selection, as well as by fund managers for peer analysis and competitive positioning. As a result, a (unintended) miscategorization or lack of precision can significantly impact allocation decisions and investment fund managers. Here, we aim to quantify the effect of miscategorization of funds utilizing a machine learning based approach. We formulate the problem of miscategorization of funds as a distance-based outlier detection problem, where the outliers are the data-points that are far from the rest of the data-points in the given feature space. We implement and employ a Random Forest (RF) based method of distance metric learning, and compute the so-called class-wise outlier measures for each data-point to identify outliers in the data. We test our implementation on various publicly available data sets, and then apply it to mutual fund data. We show that there is a strong relationship between the outlier measures of the funds and their future returns and discuss the implications of our findings.


Crime Prediction using Machine Learning with a Novel Crime Dataset

Shohan, Faisal Tareque, Akash, Abu Ubaida, Ibrahim, Muhammad, Alam, Mohammad Shafiul

arXiv.org Artificial Intelligence

Crime is an unlawful act that carries legal repercussions. Bangladesh has a high crime rate due to poverty, population growth, and many other socio-economic issues. For law enforcement agencies, understanding crime patterns is essential for preventing future criminal activity. For this purpose, these agencies need structured crime database. This paper introduces a novel crime dataset that contains temporal, geographic, weather, and demographic data about 6574 crime incidents of Bangladesh. We manually gather crime news articles of a seven year time span from a daily newspaper archive. We extract basic features from these raw text. Using these basic features, we then consult standard service-providers of geo-location and weather data in order to garner these information related to the collected crime incidents. Furthermore, we collect demographic information from Bangladesh National Census data. All these information are combined that results in a standard machine learning dataset. Together, 36 features are engineered for the crime prediction task. Five supervised machine learning classification algorithms are then evaluated on this newly built dataset and satisfactory results are achieved. We also conduct exploratory analysis on various aspects the dataset. This dataset is expected to serve as the foundation for crime incidence prediction systems for Bangladesh and other countries. The findings of this study will help law enforcement agencies to forecast and contain crime as well as to ensure optimal resource allocation for crime patrol and prevention.


Kaggle Master with Heart Attack Prediction Kaggle Project

#artificialintelligence

Kaggle Master with Heart Attack Prediction Kaggle Project - Kaggle is Machine Learning & Data Science community. Become Kaggle master with real machine learning kaggle project Preview this Course Kaggle, a subsidiary of Google LLC, is an online community of data scientists and machine learning practitioners. Kaggle is a platform where data scientists can compete in machine learning challenges. These challenges can be anything from predicting housing prices to detect Machine learning describes systems that make predictions using a model trained on real-world data. Machine learning is constantly being applied to new industries and ne Data science includes preparing, analyzing, and processing data.