AITopics | Florez, Omar

Collaborating Authors

Florez, Omar

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TwHIN-BERT: A Socially-Enriched Pre-trained Language Model for Multilingual Tweet Representations at Twitter

Zhang, Xinyang, Malkov, Yury, Florez, Omar, Park, Serim, McWilliams, Brian, Han, Jiawei, El-Kishky, Ahmed

arXiv.org Artificial IntelligenceAug-26-2023

Pre-trained language models (PLMs) are fundamental for natural language processing applications. Most existing PLMs are not tailored to the noisy user-generated text on social media, and the pre-training does not factor in the valuable social engagement logs available in a social network. We present TwHIN-BERT, a multilingual language model productionized at Twitter, trained on in-domain data from the popular social network. TwHIN-BERT differs from prior pre-trained language models as it is trained with not only text-based self-supervision, but also with a social objective based on the rich social engagements within a Twitter heterogeneous information network (TwHIN). Our model is trained on 7 billion tweets covering over 100 distinct languages, providing a valuable representation to model short, noisy, user-generated text. We evaluate our model on various multilingual social recommendation and semantic understanding tasks and demonstrate significant metric improvement over established pre-trained language models. We open-source TwHIN-BERT and our curated hashtag prediction and social engagement benchmark datasets to the research community.

machine learning, natural language, tweet, (14 more...)

arXiv.org Artificial Intelligence

2209.07562

Country:

Europe (0.68)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Illinois (0.14)
(2 more...)

Genre: Research Report (0.50)

Industry: Information Technology > Services (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Non-Parametric Temporal Adaptation for Social Media Topic Classification

Mireshghallah, Fatemehsadat, Vogler, Nikolai, He, Junxian, Florez, Omar, El-Kishky, Ahmed, Berg-Kirkpatrick, Taylor

arXiv.org Artificial IntelligenceMay-15-2023

User-generated social media data is constantly changing as new trends influence online discussion and personal information is deleted due to privacy concerns. However, most current NLP models are static and rely on fixed training data, which means they are unable to adapt to temporal change -- both test distribution shift and deleted training data -- without frequent, costly re-training. In this paper, we study temporal adaptation through the task of longitudinal hashtag prediction and propose a non-parametric dense retrieval technique, which does not require re-training, as a simple but effective solution. In experiments on a newly collected, publicly available, year-long Twitter dataset exhibiting temporal distribution shift, our method improves by 64.12% over the best parametric baseline without any of its costly gradient-based updating. Our dense retrieval approach is also particularly well-suited to dynamically deleted user data in line with data privacy laws, with negligible computational cost and performance loss.

artificial intelligence, hashtag, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2209.05706

Country:

Europe (1.00)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback