Goto

Collaborating Authors

 Indian Ocean


PiC: A Phrase-in-Context Dataset for Phrase Understanding and Semantic Search

arXiv.org Artificial Intelligence

While contextualized word embeddings have been a de-facto standard, learning contextualized phrase embeddings is less explored and being hindered by the lack of a human-annotated benchmark that tests machine understanding of phrase semantics given a context sentence or paragraph (instead of phrases alone). To fill this gap, we propose PiC -- a dataset of ~28K of noun phrases accompanied by their contextual Wikipedia pages and a suite of three tasks for training and evaluating phrase embeddings. Training on PiC improves ranking models' accuracy and remarkably pushes span-selection (SS) models (i.e., predicting the start and end index of the target phrase) near-human accuracy, which is 95% Exact Match (EM) on semantic search given a query phrase and a passage. Interestingly, we find evidence that such impressive performance is because the SS models learn to better capture the common meaning of a phrase regardless of its actual context. SotA models perform poorly in distinguishing two senses of the same phrase in two contexts (~60% EM) and in estimating the similarity between two different phrases in the same context (~70% EM).


Iran Says It Thwarted a Drone Attack on a Munitions Facility

NYT > Middle East

But some Telegram channels, including that of Sepah Cyberi, which is affiliated with Iran's Revolutionary Guards Corps, accused Israel and its agents inside the county of being behind the attack and warned "experience has shown that Iran will retaliate." "Wait for rogue drones hitting Zionist oil tankers," its posting said. Iran and Israel have been engaged in a shadow war on land, sea, air and in cyberspace for the past three years, with Israel carrying out strikes on Iranian military and nuclear facilities and assassinating scientists and a senior military official. During the tenure of Prime Minister Naftali Bennett, Israel also started targeting Iranian defense and military officials and key infrastructure. Mr. Bennett called it the "octopus doctrine" of striking inside Iran to damage its capacity to arm proxy militias in the region hostile to the Jewish state.


Explainable deep learning for insights in El Ni\~no and river flows

arXiv.org Artificial Intelligence

The El Ni\~no Southern Oscillation (ENSO) is a semi-periodic fluctuation in sea surface temperature (SST) over the tropical central and eastern Pacific Ocean that influences interannual variability in regional hydrology across the world through long-range dependence or teleconnections. Recent research has demonstrated the value of Deep Learning (DL) methods for improving ENSO prediction as well as Complex Networks (CN) for understanding teleconnections. However, gaps in predictive understanding of ENSO-driven river flows include the black box nature of DL, the use of simple ENSO indices to describe a complex phenomenon and translating DL-based ENSO predictions to river flow predictions. Here we show that eXplainable DL (XDL) methods, based on saliency maps, can extract interpretable predictive information contained in global SST and discover SST information regions and dependence structures relevant for river flows which, in tandem with climate network constructions, enable improved predictive understanding. Our results reveal additional information content in global SST beyond ENSO indices, develop understanding of how SSTs influence river flows, and generate improved river flow prediction, including uncertainty estimation. Observations, reanalysis data, and earth system model simulations are used to demonstrate the value of the XDL-CN based methods for future interannual and decadal scale climate projections.


Earthquake Magnitude and b value prediction model using Extreme Learning Machine

arXiv.org Artificial Intelligence

Earthquake prediction has been a challenging research area for many decades, where the future occurrence of this highly uncertain calamity is predicted. In this paper, several parametric and non-parametric features were calculated, where the non-parametric features were calculated using the parametric features. $8$ seismic features were calculated using Gutenberg-Richter law, the total recurrence, and the seismic energy release. Additionally, criterions such as Maximum Relevance and Maximum Redundancy were applied to choose the pertinent features. These features along with others were used as input for an Extreme Learning Machine (ELM) Regression Model. Magnitude and time data of $5$ decades from the Assam-Guwahati region were used to create this model for magnitude prediction. The Testing Accuracy and Testing Speed were computed taking the Root Mean Squared Error (RMSE) as the parameter for evaluating the mode. As confirmed by the results, ELM shows better scalability with much faster training and testing speed (up to a thousand times faster) than traditional Support Vector Machines. The testing RMSE came out to be around $0.097$. To further test the model's robustness -- magnitude-time data from California was used to calculate the seismic indicators which were then fed into an ELM and then tested on the Assam-Guwahati region. The model proves to be robust and can be implemented in early warning systems as it continues to be a major part of Disaster Response and management.


Combining Self-labeling with Selective Sampling

arXiv.org Artificial Intelligence

Since data is the fuel that drives machine learning models, and access to labeled data is generally expensive, semi-supervised methods are constantly popular. They enable the acquisition of large datasets without the need for too many expert labels. This work combines self-labeling techniques with active learning in a selective sampling scenario. We propose a new method that builds an ensemble classifier. Based on an evaluation of the inconsistency of the decisions of the individual base classifiers for a given observation, a decision is made on whether to request a new label or use the self-labeling. In preliminary studies, we show that naive application of self-labeling can harm performance by introducing bias towards selected classes and consequently lead to skewed class distribution. Hence, we also propose mechanisms to reduce this phenomenon. Experimental evaluation shows that the proposed method matches current selective sampling methods or achieves better results.


An Efficient Drifters Deployment Strategy to Evaluate Water Current Velocity Fields

arXiv.org Artificial Intelligence

Water current prediction is essential for understanding ecosystems, and to shed light on the role of the ocean in the global climate context. Solutions vary from physical modeling, and long-term observations, to short-term measurements. In this paper, we consider a common approach for water current prediction that uses Lagrangian floaters for water current prediction by interpolating the trajectory of the elements to reflect the velocity field. Here, an important aspect that has not been addressed before is where to initially deploy the drifting elements such that the acquired velocity field would efficiently represent the water current. To that end, we use a clustering approach that relies on a physical model of the velocity field. Our method segments the modeled map and determines the deployment locations as those that will lead the floaters to 'visit' the center of the different segments. This way, we validate that the area covered by the floaters will capture the in-homogeneously in the velocity field. Exploration over a dataset of velocity field maps that span over a year demonstrates the applicability of our approach, and shows a considerable improvement over the common approach of uniformly randomly choosing the initial deployment sites. Finally, our implementation code can be found in [1].


Categorical Tools for Natural Language Processing

arXiv.org Artificial Intelligence

This thesis develops the translation between category theory and computational linguistics as a foundation for natural language processing. The three chapters deal with syntax, semantics and pragmatics. First, string diagrams provide a unified model of syntactic structures in formal grammars. Second, functors compute semantics by turning diagrams into logical, tensor, neural or quantum computation. Third, the resulting functorial models can be composed to form games where equilibria are the solutions of language processing tasks. This framework is implemented as part of DisCoPy, the Python library for computing with string diagrams. We describe the correspondence between categorical, linguistic and computational structures, and demonstrate their applications in compositional natural language processing.


Forty Years After 'Tron,' Storytellers Are Moving onto the Metaverse - Variety Forty Years After 'Tron,' Storytellers Are Moving onto the Metaverse – Variety

#artificialintelligence

To create engagement, you have to have a story. In the metaverse, the creators will create the community, and the stories they tell will create the community, just like at the beginning of time. The king's storyteller kept people engaged, Shakespeare kept people engaged,


Bivariate Causal Discovery for Categorical Data via Classification with Optimal Label Permutation

arXiv.org Artificial Intelligence

Causal discovery for quantitative data has been extensively studied but less is known for categorical data. We propose a novel causal model for categorical data based on a new classification model, termed classification with optimal label permutation (COLP). By design, COLP is a parsimonious classifier, which gives rise to a provably identifiable causal model. A simple learning algorithm via comparing likelihood functions of causal and anti-causal models suffices to learn the causal direction. Through experiments with synthetic and real data, we demonstrate the favorable performance of the proposed COLP-based causal model compared to state-of-the-art methods. We also make available an accompanying R package COLP, which contains the proposed causal discovery algorithm and a benchmark dataset of categorical cause-effect pairs.


Tecno's Phantom X2 Pro phone has a pop-out portrait lens for 'pure' bokeh

Engadget

Many smartphones these days offer artificial bokeh in their portrait photography modes, but with the help of a retractable camera, you can achieve true optical bokeh without missing any edges. Chinese brand Tecno is now bringing such a feature to its latest flagship device, the Phantom X2 Pro 5G, which packs a "world-first" pop-out portrait lens. This is just a little over two years after Xiaomi showed off a retractable 120mm-equivalent wide aperture lens, but it never left the prototype stage. Tecno's intriguing portrait camera has a 50-megapixel resolution with a relatively large 1/2.7-inch Optically, this 2.5x zoom lens offers an f/1.49