AITopics | Azer, Erfan Sadeqi

ParsiNLU: A Suite of Language Understanding Challenges for Persian

Khashabi, Daniel, Cohan, Arman, Shakeri, Siamak, Hosseini, Pedram, Pezeshkpour, Pouya, Alikhani, Malihe, Aminnaseri, Moin, Bitaab, Marzieh, Brahman, Faeze, Ghazarian, Sarik, Gheini, Mozhdeh, Kabiri, Arman, Mahabadi, Rabeeh Karimi, Memarrast, Omid, Mosallanezhad, Ahmadreza, Noury, Erfan, Raji, Shahab, Rasooli, Mohammad Sadegh, Sadeghi, Sepideh, Azer, Erfan Sadeqi, Samghabadi, Niloofar Safi, Shafaei, Mahsa, Sheybani, Saber, Tazarv, Ali, Yaghoobzadeh, Yadollah

arXiv.org Artificial IntelligenceDec-11-2020

Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this rich language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce ParsiNLU, the first benchmark in Persian language that includes a range of high-level tasks -- Reading Comprehension, Textual Entailment, etc. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers. This results in over 14.5$k$ new instances across 6 distinct NLU tasks. Besides, we present the first results on state-of-the-art monolingual and multi-lingual pre-trained language-models on this benchmark and compare them with human performance, which provides valuable insights into our ability to tackle natural language understanding challenges in Persian. We hope ParsiNLU fosters further research and advances in Persian language understanding.

dataset, machine translation, survey article, (17 more...)

arXiv.org Artificial Intelligence

2012.06154

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Maryland (0.28)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.93)
Education > Assessment & Standards > Student Performance (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)

Add feedback

A Practical Algorithm for Distributed Clustering and Outlier Detection

Chen, Jiecao, Azer, Erfan Sadeqi, Zhang, Qin

Neural Information Processing SystemsFeb-14-2020, 10:15:21 GMT

We study the classic k-means/median clustering, which are fundamental problems in unsupervised learning, in the setting where data are partitioned across multiple sites, and where we are allowed to discard a small portion of the data by labeling them as outliers. We propose a simple approach based on constructing small summary for the original dataset. The proposed method is time and communication efficient, has good approximation guarantees, and can identify the global outliers effectively. To the best of our knowledge, this is the first practical algorithm with theoretical guarantees for distributed clustering with outliers. Our experiments on both real and synthetic data have demonstrated the clear superiority of our algorithm against all the baseline algorithms in almost all metrics.

artificial intelligence, data mining, machine learning, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.96)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.40)

Add feedback

On the Capabilities and Limitations of Reasoning for Natural Language Understanding

Khashabi, Daniel, Azer, Erfan Sadeqi, Khot, Tushar, Sabharwal, Ashish, Roth, Dan

arXiv.org Artificial IntelligenceJan-8-2019

Recent systems for natural language understanding are strong at overcoming linguistic variability for lookup style reasoning. Yet, their accuracy drops dramatically as the number of reasoning steps increases. We present the first formal framework to study such empirical observations, addressing the ambiguity, redundancy, incompleteness, and inaccuracy that the use of language introduces when representing a hidden conceptual space. Our formal model uses two interrelated spaces: a conceptual meaning space that is unambiguous and complete but hidden, and a linguistic symbol space that captures a noisy grounding of the meaning space in the symbols or words of a language. We apply this framework to study the connectivity problem in undirected graphs---a core reasoning problem that forms the basis for more complex multi-hop reasoning. We show that it is indeed possible to construct a high-quality algorithm for detecting connectivity in the (latent) meaning graph, based on an observed noisy symbol graph, as long as the noise is below our quantified noise level and only a few hops are needed. On the other hand, we also prove an impossibility result: if a query requires a large number (specifically, logarithmic in the size of the meaning graph) of hops, no reasoning system operating over the symbol graph is likely to recover any useful property of the meaning graph. This highlights a fundamental barrier for a class of reasoning problems and systems, and suggests the need to limit the distance between the two spaces, rather than investing in multi-hop reasoning with "many" hops.

graph, neural network, us government, (21 more...)

arXiv.org Artificial Intelligence

1901.02522

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.82)

Industry: Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.86)
Information Technology > Artificial Intelligence > Natural Language > Understanding (0.60)

Add feedback

A Practical Algorithm for Distributed Clustering and Outlier Detection

Chen, Jiecao, Azer, Erfan Sadeqi, Zhang, Qin

Neural Information Processing SystemsDec-31-2018

We study the classic k-means/median clustering, which are fundamental problems in unsupervised learning, in the setting where data are partitioned across multiple sites, and where we are allowed to discard a small portion of the data by labeling them as outliers. We propose a simple approach based on constructing small summary for the original dataset. The proposed method is time and communication efficient, has good approximation guarantees, and can identify the global outliers effectively. To the best of our knowledge, this is the first practical algorithm with theoretical guarantees for distributed clustering with outliers. Our experiments on both real and synthetic data have demonstrated the clear superiority of our algorithm against all the baseline algorithms in almost all metrics.

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Indiana > Monroe County > Bloomington (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

A Practical Algorithm for Distributed Clustering and Outlier Detection

Chen, Jiecao, Azer, Erfan Sadeqi, Zhang, Qin

Neural Information Processing SystemsDec-31-2018

We study the classic k-means/median clustering, which are fundamental problems in unsupervised learning, in the setting where data are partitioned across multiple sites, and where we are allowed to discard a small portion of the data by labeling them as outliers. We propose a simple approach based on constructing small summary for the original dataset. The proposed method is time and communication efficient, has good approximation guarantees, and can identify the global outliers effectively. To the best of our knowledge, this is the first practical algorithm with theoretical guarantees for distributed clustering with outliers. Our experiments on both real and synthetic data have demonstrated the clear superiority of our algorithm against all the baseline algorithms in almost all metrics.

algorithm, artificial intelligence, health & medicine, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.40)

Add feedback

A Practical Algorithm for Distributed Clustering and Outlier Detection

Chen, Jiecao, Azer, Erfan Sadeqi, Zhang, Qin

arXiv.org Artificial IntelligenceMay-23-2018

We study the classic $k$-means/median clustering, which are fundamental problems in unsupervised learning, in the setting where data are partitioned across multiple sites, and where we are allowed to discard a small portion of the data by labeling them as outliers. We propose a simple approach based on constructing small summary for the original dataset. The proposed method is time and communication efficient, has good approximation guarantees, and can identify the global outliers effectively. To the best of our knowledge, this is the first practical algorithm with theoretical guarantees for distributed clustering with outliers. Our experiments on both real and synthetic data have demonstrated the clear superiority of our algorithm against all the baseline algorithms in almost all metrics.

algorithm, artificial intelligence, data mining, (18 more...)

arXiv.org Artificial Intelligence

1805.09495

Country: North America > United States > Indiana (0.15)

Genre: Research Report (0.82)

Technology: