Not enough data to create a plot.
Try a different view from the menu above.
Momeni, Elaheh
Speeding Up Question Answering Task of Language Models via Inverted Index
Ji, Xiang, Sungu-Eryilmaz, Yesim, Momeni, Elaheh, Rawassizadeh, Reza
Natural language processing applications, such as conversational agents and their question-answering capabilities, are widely used in the real world. Despite the wide popularity of large language models (LLMs), few real-world conversational agents take advantage of LLMs. Extensive resources consumed by LLMs disable developers from integrating them into end-user applications. In this study, we leverage an inverted indexing mechanism combined with LLMs to improve the efficiency of question-answering models for closed-domain questions. Our experiments show that using the index improves the average response time by 97.44%. In addition, due to the reduced search scope, the average BLEU score improved by 0.23 while using the inverted index.
Modeling Evolution of Topics in Large-Scale Temporal Text Corpora
Momeni, Elaheh (University of Vienna) | Karunasekera, Shanika (University of Melbourne) | Goyal, Palash (University of Southern California) | Lerman, Kristina (University of Southern California)
Large text temporal collections provide insights into social and cultural change over time. To quantify changes in topics in these corpora, embedding methods have been used as a diachronic tool. However, they have limited utility for modeling changes in topics due to the stochastic nature of training. We propose a new computational approach for tracking and detecting temporal evolution of topics in a large collection of texts. This approach for identifying dynamic topics and modeling their evolution combines the advantages of two methods: (1) word embeddings to learn contextual semantic representation of words from temporal snapshots of the data and (2) dynamic network analysis to identify dynamic topics by using dynamic semantic similarity networks developed using embedding models. Experimenting with two large temporal data sets from the legal and real estate domains, we show that this approach performs faster (due to parallelizing different snapshots), uncovers more coherent topics (compared to available dynamic topic modeling approaches), and effectively enables modeling evolution leveraging the network structure.
Properties, Prediction, and Prevalence of Useful User-Generated Comments for Descriptive Annotation of Social Media Objects
Momeni, Elaheh (University of Vienna) | Cardie, Claire (Cornell University) | Ott, Myle (Cornell University)
User-generated comments in online social media have recently been gaining increasing attention as a viable source of general-purpose descriptive annotations for digital objects like photos or videos. Because users have different levels of expertise, however, the quality of their comments can vary from very useful to entirely useless. Our aim is to provide automated support for the curation of useful user-generated comments from public collections of digital objects. After constructing a crowd-sourced gold standard of useful and not useful comments, we use standard machine learning methods to develop a usefulness classifier, exploring the impact of surface-level, syntactic, semantic, and topic-based features in addition to extra-linguistic attributes of the author and his or her social media activity. We then adapt an existing model of prevalence detection that uses the learned classifier to investigate patterns in the commenting culture of two popular social media platforms. We find that the prevalence of useful comments is platform-specific and is further influenced by the entity type of the media object being commented on (person, place, event), its time period (e.g., year of an event), and the degree of polarization among commenters.
Semantic Web-Based Integration of Heterogeneous Web Resources
Momeni, Elaheh (University of Vienna)
Vast volumes of information from public Web portals are readily accessible from virtually any computer in the world. This can be seen as an enormous repository of information which brings significant business value for companies working in e-commerce activities. However, the main problems encountered when using this information are: (I) the information is published in various, non-machine-processable formats, (II) a lack of services that match and store information from various sources in a homogenous structure, and (III) the accessible datasets are rarely provided with e-commerce concepts in mind. These problems make them difficult to use by e-commerce applications. The main goal of this paper is to propose a methodology and analysis of components required for combining and integrating information into machine-processable dataset from different Web data sources, based on suitable e-commerce ontology. In order to demonstrate proposed methodology, the process of wrapping and matching the data from two public datasets will be discussed as an example.