Jambi
LLM for Everyone: Representing the Underrepresented in Large Language Models
Natural language processing (NLP) has witnessed a profound impact of large language models (LLMs) that excel in a multitude of tasks. However, the limitation of LLMs in multilingual settings, particularly in underrepresented languages, remains a significant hurdle. This thesis aims to bridge the gap in NLP research and development by focusing on underrepresented languages. A comprehensive evaluation of LLMs is conducted to assess their capabilities in these languages, revealing the challenges of multilingual and multicultural generalization. Addressing the multilingual generalization gap, this thesis proposes data-and-compute-efficient methods to mitigate the disparity in LLM ability in underrepresented languages, allowing better generalization on underrepresented languages without the loss of task generalization ability. The proposed solutions cover cross-lingual continual instruction tuning, retrieval-based cross-lingual in-context learning, and in-context query alignment. Furthermore, a novel method to measure cultural values alignment between LLMs operating in different languages is proposed, ensuring cultural sensitivity and inclusivity. These contributions aim to enhance the multilingual and multicultural alignment of LLMs in underrepresented languages, ultimately advancing the NLP field toward greater equality and inclusiveness.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Indonesia > Bali (0.04)
- Asia > Middle East > Jordan (0.04)
- (62 more...)
- Research Report > Promising Solution (1.00)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Instructional Material (1.00)
- Law (1.00)
- Government (1.00)
- Education (1.00)
- (4 more...)
An Agent-Based Model of Elephant Crop Raid Dynamics in the Periyar-Agasthyamalai Complex, India
Purathekandy, Anjali, Oommen, Meera Anna, Wikelski, Martin, Subramani, Deepak N
Human-wildlife conflict challenges conservation worldwide, which requires innovative management solutions. We developed a prototype Agent-Based Model (ABM) to simulate interactions between humans and solitary bull Asian elephants in the Periyar-Agasthyamalai complex of the Western Ghats in Kerala, India. The main challenges were the complex behavior of elephants and insufficient movement data from the region. Using literature, expert insights, and field surveys, we created a prototype behavior model that incorporates crop habituation, thermoregulation, and aggression. We designed a four-step calibration method to adapt relocation data from radio-tagged elephants in Indonesia to model elephant movements in the model domain. The ABM's structure, including the assumptions, submodels, and data usage are detailed following the Overview, Design concepts, Details protocol. The ABM simulates various food availability scenarios to study elephant behavior and environmental impact on space use and conflict patterns. The results indicate that the wet months increase conflict and thermoregulation significantly influences elephant movements and crop raiding. Starvation and crop habituation intensify these patterns. This prototype ABM is an initial model that offers information on the development of a decision support system in wildlife management and will be further enhanced with layers of complexity and subtlety across various dimensions. Access the ABM at https://github.com/quest-lab-iisc/abm-elephant-project.
- Asia > Indonesia > Sumatra > Jambi > Jambi (0.04)
- Africa > South Africa (0.04)
- Europe > Germany (0.04)
- (16 more...)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (0.93)
- Research Report > Experimental Study (0.67)
- Health & Medicine (1.00)
- Food & Agriculture > Agriculture (1.00)
- Education (0.67)
- Leisure & Entertainment (0.67)
NusaCrowd: Open Source Initiative for Indonesian NLP Resources
Cahyawijaya, Samuel, Lovenia, Holy, Aji, Alham Fikri, Winata, Genta Indra, Wilie, Bryan, Mahendra, Rahmad, Wibisono, Christian, Romadhony, Ade, Vincentio, Karissa, Koto, Fajri, Santoso, Jennifer, Moeljadi, David, Wirawan, Cahya, Hudi, Frederikus, Parmonangan, Ivan Halim, Alfina, Ika, Wicaksono, Muhammad Satrio, Putra, Ilham Firdausi, Rahmadani, Samsul, Oenang, Yulianti, Septiandri, Ali Akbar, Jaya, James, Dhole, Kaustubh D., Suryani, Arie Ardiyanti, Putri, Rifki Afina, Su, Dan, Stevens, Keith, Nityasya, Made Nindyatama, Adilazuarda, Muhammad Farid, Ignatius, Ryan, Diandaru, Ryandito, Yu, Tiezheng, Ghifari, Vito, Dai, Wenliang, Xu, Yan, Damapuspita, Dyah, Tho, Cuk, Karo, Ichwanul Muslim Karo, Fatyanosa, Tirana Noor, Ji, Ziwei, Fung, Pascale, Neubig, Graham, Baldwin, Timothy, Ruder, Sebastian, Sujaini, Herry, Sakti, Sakriani, Purwarianti, Ayu
We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have brought together 137 datasets and 118 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their value is demonstrated through multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and the local languages of Indonesia. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and the local languages of Indonesia. Our work strives to advance natural language processing (NLP) research for languages that are under-represented despite being widely spoken.
- North America > United States > Texas > Dallas County > Dallas (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Timor-Leste (0.14)
- (64 more...)
- Law (0.67)
- Government (0.67)
- Information Technology > Services (0.67)
- (3 more...)
Functional Time Series Forecasting: Functional Singular Spectrum Analysis Approaches
Trinka, Jordan, Haghbin, Hossein, Maadooliat, Mehdi
In this paper, we propose two nonparametric methods used in the forecasting of functional time-dependent data, namely functional singular spectrum analysis recurrent forecasting and vector forecasting. Both algorithms utilize the results of functional singular spectrum analysis and past observations in order to predict future data points where recurrent forecasting predicts one function at a time and the vector forecasting makes predictions using functional vectors. We compare our forecasting methods to a gold standard algorithm used in the prediction of functional, time-dependent data by way of simulation and real data and we find our techniques do better for periodic stochastic processes.
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- North America > United States > California (0.04)
- Indian Ocean > Arabian Gulf (0.04)
- (4 more...)
- Overview (0.67)
- Research Report (0.64)
SLIC-UAV: A Method for monitoring recovery in tropical restoration projects through identification of signature species using UAVs
Williams, Jonathan, Schönlieb, Carola-Bibiane, Swinfield, Tom, Irawan, Bambang, Achmad, Eva, Zudhi, Muhammad, Habibi, null, Gemita, Elva, Coomes, David A.
Logged forests cover four million square kilometres of the tropics and restoring these forests is essential if we are to avoid the worst impacts of climate change, yet monitoring recovery is challenging. Tracking the abundance of visually identifiable, early-successional species enables successional status and thereby restoration progress to be evaluated. Here we present a new pipeline, SLIC-UAV, for processing Unmanned Aerial Vehicle (UAV) imagery to map early-successional species in tropical forests. The pipeline is novel because it comprises: (a) a time-efficient approach for labelling crowns from UAV imagery; (b) machine learning of species based on spectral and textural features within individual tree crowns, and (c) automatic segmentation of orthomosaiced UAV imagery into 'superpixels', using Simple Linear Iterative Clustering (SLIC). Creating superpixels reduces the dataset's dimensionality and focuses prediction onto clusters of pixels, greatly improving accuracy. To demonstrate SLIC-UAV, support vector machines and random forests were used to predict the species of hand-labelled crowns in a restoration concession in Indonesia. Random forests were most accurate at discriminating species for whole crowns, with accuracy ranging from 79.3% when mapping five common species, to 90.5% when mapping the three most visually-distinctive species. In contrast, support vector machines proved better for labelling automatically segmented superpixels, with accuracy ranging from 74.3% to 91.7% for the same species. Models were extended to map species across 100 hectares of forest. The study demonstrates the power of SLIC-UAV for mapping characteristic early-successional tree species as an indicator of successional stage within tropical forest restoration areas. Continued effort is needed to develop easy-to-implement and low-cost technology to improve the affordability of project management.
- Europe > Austria > Vienna (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Southeast Asia (0.04)
- (16 more...)
- Energy (1.00)
- Information Technology (0.88)
- Materials > Paper & Forest Products (0.66)
- Media > Photography (0.45)
Multivariate Functional Singular Spectrum Analysis Over Different Dimensional Domains
Trinka, Jordan, Haghbin, Hossein, Maadooliat, Mehdi
A common problem in time series analysis is detection, extraction, and exploration of mean, seasonal, trend, and noise components in time series data. A technique known as singular spectrum analysis (SSA) has been developed as a nonparametric, exploratory method which can be used to identify such interesting components in ordinary time series where observations are scalars (Golyandina et al., 2001). Often times, many variables are observed as a result of a single stochastic process and investigation of time series components can be made richer by performing a multivariate analysis of these vector observations. The MSSA algorithm is a technique that has seen success over its univariate SSA counterpart in decomposing a multidimensional time series into components if the covariates are moderately correlated (Golyandina and Stepanov, 2012). MSSA also has been broken up into two approaches of vertical MSSA (VMSSA) and horizontal MSSA (HMSSA) where VMSSA involves the vertical stacking of univariate Hankel trajectory matrices while HMSSA works with the horizontal stacking of the same elements (Hassani and Mahmoudvand, 2018). Over the course of the last 15 years, MSSA has seen significant success in various areas of application see Groth and Ghil (2011); Golyandina and Stepanov (2012); Silva et al. (2018); Hassani et al. (2019). Functional data analysis embodies the evaluation and exploration of data that is comprised of functions such as curves or surfaces (Ramsay and Silverman, 2005). Functional PCA (FPCA) is a technique that is used to find the most informative directions in a timeindependent collection of functional subjects (Ramsay and Silverman, 2005). Univariate Functional Singular Spectrum Analysis (FSSA) was developed by Haghbin et al. (2019) as a novel technique that is used to decompose a time-dependent collection of functional
- North America > United States > Montana (0.28)
- Asia > Indonesia > Sumatra > Jambi > Jambi (0.04)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- (4 more...)
Deep learning for Aerosol Forecasting
Hoyne, Caleb, Mukkavilli, S. Karthik, Meger, David
Reanalysis datasets combining numerical physics models and limited observations to generate a synthesised estimate of variables in an Earth system, are prone to biases against ground truth. Biases identified with the NASA Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2) aerosol optical depth (AOD) dataset, against the Aerosol Robotic Network (AERONET) ground measurements in previous studies, motivated the development of a deep learning based AOD prediction model globally. This study combines a convolutional neural network (CNN) with MERRA-2, tested against all AERONET sites. The new hybrid CNN-based model provides better estimates validated versus AERONET ground truth, than only using MERRA-2 reanalysis.
- Asia > Southeast Asia (0.14)
- Asia > Indonesia > Sumatra > Jambi > Jambi (0.05)
- North America > Canada > Quebec > Montreal (0.05)
- (10 more...)