mitra
Stemming -- The Evolution and Current State with a Focus on Bangla
Paul, Abhijit, Farin, Mashiat Amin, Abdullah, Sharif Md., Kabir, Ahmedul, Masud, Zarif, Rayana, Shebuti
Bangla, the seventh most widely spoken language worldwide with 300 million native speakers, faces digital under-representation due to limited resources and lack of annotated datasets. Stemming, a critical preprocessing step in language analysis, is essential for low-resource, highly-inflectional languages like Bangla, because it can reduce the complexity of algorithms and models by significantly reducing the number of words the algorithm needs to consider. This paper conducts a comprehensive survey of stemming approaches, emphasizing the importance of handling morphological variants effectively. While exploring the landscape of Bangla stemming, it becomes evident that there is a significant gap in the existing literature. The paper highlights the discontinuity from previous research and the scarcity of accessible implementations for replication. Furthermore, it critiques the evaluation methodologies, stressing the need for more relevant metrics. In the context of Bangla's rich morphology and diverse dialects, the paper acknowledges the challenges it poses. To address these challenges, the paper suggests directions for Bangla stemmer development. It concludes by advocating for robust Bangla stemmers and continued research in the field to enhance language analysis and processing.
Civil Society in the Loop: Feedback-Driven Adaptation of (L)LM-Assisted Classification in an Open-Source Telegram Monitoring Tool
Pustet, Milena, Steffen, Elisabeth, Mihaljeviฤ, Helena, Stanjek, Grischa, Illies, Yannis
The role of civil society organizations (CSOs) in monitoring harmful online content is increasingly crucial, especially as platform providers reduce their investment in content moderation. AI tools can assist in detecting and monitoring harmful content at scale. However, few open-source tools offer seamless integration of AI models and social media monitoring infrastructures. Given their thematic expertise and contextual understanding of harmful content, CSOs should be active partners in co-developing technological tools, providing feedback, helping to improve models, and ensuring alignment with stakeholder needs and values, rather than as passive 'consumers'. However, collaborations between the open source community, academia, and civil society remain rare, and research on harmful content seldom translates into practical tools usable by civil society actors. This work in progress explores how CSOs can be meaningfully involved in an AI-assisted open-source monitoring tool of anti-democratic movements on Telegram, which we are currently developing in collaboration with CSO stakeholders.
Vertically rolling ball 'challenges our basic understanding of physics'
Breakthroughs, discoveries, and DIY tips sent every weekday. Gravity seems like a predictable, even mundane, aspect of existence. The physics dictating one of the universe's four fundamental forces is relatively straightforward to understand and calculate (most of the time, at least). Even so, the relationships between objects with mass and energy continues to surprise physical engineers. Take recent observations made by a team at the University of Waterloo, for example.
Why are 'driverless' cars still hitting things? Depends on how they 'see.'
Late last month, a Tesla owner shared shocking dashcam footage of his Model 3 appearing to collide with and drive through a deer at high speeds. The car, which the driver says was engaged in Tesla's driver-assist Full-Self Driving (FSD) mode, never detected the deer standing in the middle of the road and didn't hit the brakes or maneuver to avoid it. That case came just a few months after a vehicle from Waymo, a leading self-driving company, reportedly ran over and killed a pet dog in a collision the company says was "unavoidable." Neither driverless cars, according to reports detailing the incidents, spotted the animals on the road fast enough to avoid them. Video is cut right before sensitive things appear on screen.
Clustering Mixtures of Discrete Distributions: A Note on Mitra's Algorithm
Clustering is a critical challenge in network science, pivotal for detecting underlying patterns and structures in unlabeled data. To explore the boundaries of this challenge, stochastic block models (SBMs) have been effectively utilized as a mathematical framework to assess the performance of clustering algorithms. Specifically, an SBM is a statistical model developed to reveal the structural dynamics of networks or graphs, where nodes represent individual entities and edges symbolize the connections between them. In a typical SBM, nodes are categorized into blocks or communities according to their connectivity patterns, with the probability of an edge existing between any two nodes depending on the blocks to which they belong [3]. For example, in a social network using an SBM, nodes might be organized by attributes such as age, gender, or geographic location, with friendship probabilities determined by their block memberships [1, 6]. The Bipartite Stochastic Block Model(B-SBM)[2] extends the conventional SBM to accommodate networks comprising two distinct node types, forming a bipartite graph structure. This adaptation is particularly beneficial in contexts such as recommendation systems, where nodes represent users and products, or in particular social networks, where nodes might denote individuals and the groups or events they participate in. In B-SBMs, the connections between nodes from different sets are governed by an "affinity matrix" that specifies the likelihood of linkage based on group affiliations. This matrix is integral to capturing interaction patterns within the network, allowing for a sophisticated estimation of model parameters from observed connections.
How to Guarantee the Safety of Autonomous Vehicles
The original version of this story appeared in Quanta Magazine. Driverless cars and planes are no longer the stuff of the future. In the city of San Francisco alone, two taxi companies have collectively logged 8 million miles of autonomous driving through August 2023. And more than 850,000 autonomous aerial vehicles, or drones, are registered in the United States--not counting those owned by the military. But there are legitimate concerns about safety.
AI, 23 new forensic standards in new CA curriculum - Telugu Bullet
The Institute of Chartered Accountants of India (ICAI) will introduce Artificial Intelligence and forensic science in its curriculum for the Chartered Accountants to detect financial fraud at a much earlier stage. In most cases, the fraud is detected only when they reach a substantial volume. This new curriculum aims to track such irregularity at a much earlier stage so that the big scams either do not happen or are detected at the initial stages. This is the first time when the institute will bring such big technological changes in their international courses. President of ICAI, Debashish Mitra, said: "We are introducing artificial intelligence, data analytics and new forensic standards in the new curriculum. The mission of ICAI is to provide a strong foundation of knowledge, skill, and professional value that enables students to grow as wholesome professionals and adapt to change throughout their professional career."
Mitra
In this paper, we study Bayesian techniques for entity discovery and temporal segmentation of videos. Existing temporal video segmentation techniques are based on low-level features, and are usually suitable for discovering short, homogeneous shots rather than diverse scenes, each of which contains several such shots. We define scenes in terms of semantic entities (eg.
Easier And Faster Ways To Train AI
Mixing and matching Some companies may use combinations of these techniques. MicroAI, for instance, does a combination of unsupervised, simple reinforcement, and incremental learning. Its application deals with monitoring data streams to look for anomalies for applications like security or preventive maintenance. To create the model, MicroAI starts with unsupervised learning, which clusters similar feature sets according to a provided schema. The reinforcement comes through a human in the loop that will label and reinforce certain patterns for future recognition.