AITopics | Thayasivam, Uthayasanker

Collaborating Authors

Thayasivam, Uthayasanker

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Adapting the Tesseract Open-Source OCR Engine for Tamil and Sinhala Legacy Fonts and Creating a Parallel Corpus for Tamil-Sinhala-English

Vasantharajan, Charangan, Tharmalingam, Laksika, Thayasivam, Uthayasanker

arXiv.org Artificial IntelligenceDec-15-2022

Most low-resource languages do not have the necessary resources to create even a substantial monolingual corpus. These languages may often be found in government proceedings but mainly in Portable Document Format (PDF) that contains legacy fonts. Extracting text from these documents to create a monolingual corpus is challenging due to legacy font usage and printer-friendly encoding, which are not optimized for text extraction. Therefore, we propose a simple, automatic, and novel idea that can scale for Tamil, Sinhala, English languages, and many documents along with parallel corpora. Since Tamil and Sinhala are Low-Resource Languages, we improved the performance of Tesseract by employing LSTM-based training on more than 20 legacy fonts to recognize printed characters in these languages. Especially, our model detects code-mixed text, numbers, and special characters from the printed document. It is shown that this approach can reduce the character-level error rate of Tesseract from 6.03 to 2.61 for Tamil (-3.42% relative change) and 7.61 to 4.74 for Sinhala (-2.87% relative change), as well as the word-level error rate from 39.68 to 20.61 for Tamil (-19.07% relative change) and 35.04 to 26.58 for Sinhala (-8.46% relative change) on the test set. Also, our newly created parallel corpus consists of 185.4k, 168.9k, and 181.04k sentences and 2.11M, 2.22M, and 2.33M Words in Tamil, Sinhala, and English respectively. This study shows that fine-tuning Tesseract models on multiple new fonts help to understand the texts and enhances the performance of the OCR. We made newly trained models and the source code for fine-tuning Tesseract, freely available.

corpus, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IALP57159.2022.9961304

2109.05952

Country: Asia > Sri Lanka (0.47)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.56)

Add feedback

Towards Offensive Language Identification for Tamil Code-Mixed YouTube Comments and Posts

Vasantharajan, Charangan, Thayasivam, Uthayasanker

arXiv.org Artificial IntelligenceAug-26-2021

Offensive Language detection in social media platforms has been an active field of research over the past years. In non-native English spoken countries, social media users mostly use a code-mixed form of text in their posts/comments. This poses several challenges in the offensive content identification tasks, and considering the low resources available for Tamil, the task becomes much harder. The current study presents extensive experiments using multiple deep learning, and transfer learning models to detect offensive content on YouTube. We propose a novel and flexible approach of selective translation and transliteration techniques to reap better results from fine-tuning and ensembling multilingual transformer networks like BERT, Distil- BERT, and XLM-RoBERTa. The experimental results showed that ULMFiT is the best model for this task. The best performing models were ULMFiT and mBERTBiLSTM for this Tamil code-mix dataset instead of more popular transfer learning models such as Distil- BERT and XLM-RoBERTa and hybrid deep learning models. The proposed model ULMFiT and mBERTBiLSTM yielded good results and are promising for effective offensive speech identification in low-resourced languages.

artificial intelligence, machine learning, proceedings, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s42979-021-00977-y

2108.10939

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Data-Driven Simulation of Ride-Hailing Services using Imitation and Reinforcement Learning

Jayasinghe, Haritha, Jayatilaka, Tarindu, Gunawardena, Ravin, Thayasivam, Uthayasanker

arXiv.org Artificial IntelligenceApr-6-2021

The rapid growth of ride-hailing platforms has created a highly competitive market where businesses struggle to make profits, demanding the need for better operational strategies. However, real-world experiments are risky and expensive for these platforms as they deal with millions of users daily. Thus, a need arises for a simulated environment where they can predict users' reactions to changes in the platform-specific parameters such as trip fares and incentives. Building such a simulation is challenging, as these platforms exist within dynamic environments where thousands of users regularly interact with one another. This paper presents a framework to mimic and predict user, specifically driver, behaviors in ride-hailing services. We use a data-driven hybrid reinforcement learning and imitation learning approach for this. First, the agent utilizes behavioral cloning to mimic driver behavior using a real-world data set. Next, reinforcement learning is applied on top of the pre-trained agents in a simulated environment, to allow them to adapt to changes in the platform. Our framework provides an ideal playground for ride-hailing platforms to experiment with platform-specific parameters to predict drivers' behavioral patterns.

agent, artificial intelligence, ground transportation, (20 more...)

arXiv.org Artificial Intelligence

2104.02661

Genre: Research Report (0.82)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Adaptation of Multivariate Concept to Multi-Way Agglomerative Clustering for Hierarchical Aspect Aggregation

Malepathirana, Tamasha (University of Moratuwa) | Perera, Rashindrie (University of Moratuwa) | Abeysinghe, Yasasi (University of Moratuwa) | Albar, Yumna (University of Moratuwa) | Thayasivam, Uthayasanker (University of Moratuwa)

AAAI ConferencesMay-15-2019

Hierarchical review aspect aggregation is an important challenge in review summarization. Currently, agglomerative clustering is widely used for hierarchical aspect aggregation. We identify an important but less studied issue in using agglomerative clustering for the aforementioned task. This paper proposes a novel approach to generate a multi-way hierarchy by adaptation of the multivariate concept. Furthermore, we propose a novel experimentation approach to evaluate the acceptability of the aspect relations obtained from the hierarchy generated.

artificial intelligence, data mining, hierarchy, (19 more...)

AAAI Conferences

The Thirty-Second International Flairs Conference

Country:

North America > United States (0.14)
Asia (0.14)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Improved Convergence of Iterative Ontology Alignment using Block-Coordinate Descent

Thayasivam, Uthayasanker (University of Georgia) | Doshi, Prashant (University of Georgia)

AAAI ConferencesJul-21-2012

A wealth of ontologies, many of which overlap in their scope, has made aligning ontologies an important problem for the semantic Web. Consequently, several algorithms now exist for automatically aligning ontologies, with mixed success in their performances. Crucial challenges for these algorithms involve scaling to large ontologies, and as applications of ontology alignment evolve, performing the alignment in a reasonable amount of time without compromising on the quality of the alignment. A class of alignment algorithms is iterative and often consumes more time than others while delivering solutions of high quality. We present a novel and general approach for speeding up the multivariable optimization process utilized by these algorithms. Specifically, we use the technique of block-coordinate descent in order to possibly improve the speed of convergence of the iterative alignment techniques. We integrate this approach into three well-known alignment systems and show that the enhanced systems generate similar or improved alignments in significantly less time on a comprehensive testbed of ontology pairs. This represents an important step toward making alignment techniques computationally more feasible.

alignment, artificial intelligence, ontology, (18 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country: North America > United States > Georgia > Clarke County > Athens (0.14)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Add feedback