Goto

Collaborating Authors

 serp


Digital Gatekeepers: Google's Role in Curating Hashtags and Subreddits

arXiv.org Artificial Intelligence

Search engines play a crucial role as digital gatekeepers, shaping the visibility of Web and social media content through algorithmic curation. This study investigates how search engines like Google selectively promotes or suppresses certain hashtags and subreddits, impacting the information users encounter. By comparing search engine results with nonsampled data from Reddit and Twitter/X, we reveal systematic biases in content visibility. Google's algorithms tend to suppress subreddits and hashtags related to sexually explicit material, conspiracy theories, advertisements, and cryptocurrencies, while promoting content associated with higher engagement. These findings suggest that Google's gatekeeping practices influence public discourse by curating the social media narratives available to users.


Algorithmic Behaviors Across Regions: A Geolocation Audit of YouTube Search for COVID-19 Misinformation between the United States and South Africa

arXiv.org Artificial Intelligence

Despite being an integral tool for finding health-related information online, YouTube has faced criticism for disseminating COVID-19 misinformation globally to its users. Yet, prior audit studies have predominantly investigated YouTube within the Global North contexts, often overlooking the Global South. To address this gap, we conducted a comprehensive 10-day geolocation-based audit on YouTube to compare the prevalence of COVID-19 misinformation in search results between the United States (US) and South Africa (SA), the countries heavily affected by the pandemic in the Global North and the Global South, respectively. For each country, we selected 3 geolocations and placed sock-puppets, or bots emulating "real" users, that collected search results for 48 search queries sorted by 4 search filters for 10 days, yielding a dataset of 915K results. We found that 31.55% of the top-10 search results contained COVID-19 misinformation. Among the top-10 search results, bots in SA faced significantly more misinformative search results than their US counterparts. Overall, our study highlights the contrasting algorithmic behaviors of YouTube search between two countries, underscoring the need for the platform to regulate algorithmic behavior consistently across different regions of the Globe.


LiteNeXt: A Novel Lightweight ConvMixer-based Model with Self-embedding Representation Parallel for Medical Image Segmentation

arXiv.org Artificial Intelligence

The emergence of deep learning techniques has advanced the image segmentation task, especially for medical images. Many neural network models have been introduced in the last decade bringing the automated segmentation accuracy close to manual segmentation. However, cutting-edge models like Transformer-based architectures rely on large scale annotated training data, and are generally designed with densely consecutive layers in the encoder, decoder, and skip connections resulting in large number of parameters. Additionally, for better performance, they often be pretrained on a larger data, thus requiring large memory size and increasing resource expenses. In this study, we propose a new lightweight but efficient model, namely LiteNeXt, based on convolutions and mixing modules with simplified decoder, for medical image segmentation. The model is trained from scratch with small amount of parameters (0.71M) and Giga Floating Point Operations Per Second (0.42). To handle boundary fuzzy as well as occlusion or clutter in objects especially in medical image regions, we propose the Marginal Weight Loss that can help effectively determine the marginal boundary between object and background. Furthermore, we propose the Self-embedding Representation Parallel technique, that can help augment the data in a self-learning manner. Experiments on public datasets including Data Science Bowls, GlaS, ISIC2018, PH2, and Sunnybrook data show promising results compared to other state-of-the-art CNN-based and Transformer-based architectures. Our code will be published at: https://github.com/tranngocduvnvp/LiteNeXt.


Enhanced Facet Generation with LLM Editing

arXiv.org Artificial Intelligence

In information retrieval, facet identification of a user query is an important task. If a search service can recognize the facets of a user's query, it has the potential to offer users a much broader range of search results. Previous studies can enhance facet prediction by leveraging retrieved documents and related queries obtained through a search engine. However, there are challenges in extending it to other applications when a search engine operates as part of the model. First, search engines are constantly updated. Therefore, additional information may change during training and test, which may reduce performance. The second challenge is that public search engines cannot search for internal documents. Therefore, a separate search system needs to be built to incorporate documents from private domains within the company. We propose two strategies that focus on a framework that can predict facets by taking only queries as input without a search engine. The first strategy is multi-task learning to predict SERP. By leveraging SERP as a target instead of a source, the proposed model deeply understands queries without relying on external modules. The second strategy is to enhance the facets by combining Large Language Model (LLM) and the small model. Overall performance improves when small model and LLM are combined rather than facet generation individually.


Navigating the Post-API Dilemma Search Engine Results Pages Present a Biased View of Social Media Data

arXiv.org Artificial Intelligence

Recent decisions to discontinue access to social media APIs are having detrimental effects on Internet research and the field of computational social science as a whole. This lack of access to data has been dubbed the Post-API era of Internet research. Fortunately, popular search engines have the means to crawl, capture, and surface social media data on their Search Engine Results Pages (SERP) if provided the proper search query, and may provide a solution to this dilemma. In the present work we ask: does SERP provide a complete and unbiased sample of social media data? Is SERP a viable alternative to direct API-access? To answer these questions, we perform a comparative analysis between (Google) SERP results and nonsampled data from Reddit and Twitter/X. We find that SERP results are highly biased in favor of popular posts; against political, pornographic, and vulgar posts; are more positive in their sentiment; and have large topical gaps. Overall, we conclude that SERP is not a viable alternative to social media API access.


Evaluating Generative Ad Hoc Information Retrieval

arXiv.org Artificial Intelligence

Recent advances in large language models have enabled the development of viable generative information retrieval systems. A generative retrieval system returns a grounded generated text in response to an information need instead of the traditional document ranking. Quantifying the utility of these types of responses is essential for evaluating generative retrieval systems. As the established evaluation methodology for ranking-based ad hoc retrieval may seem unsuitable for generative retrieval, new approaches for reliable, repeatable, and reproducible experimentation are required. In this paper, we survey the relevant information retrieval and natural language processing literature, identify search tasks and system architectures in generative retrieval, develop a corresponding user model, and study its operationalization. This theoretical analysis provides a foundation and new insights for the evaluation of generative ad hoc retrieval systems.


6 Ways Search Marketers Can Leverage ChatGPT- AI for SEO Today - Great Learning

#artificialintelligence

Search Engine Optimization (SEO) is a critical aspect of digital marketing, and the use of Artificial Intelligence (AI) technologies like ChatGPT can help search marketers improve their SEO efforts. ChatGPT, a large language model developed by OpenAI, can be leveraged by search marketers in various ways to enhance their SEO strategies. Content creation: ChatGPT can be used to generate high-quality, unique, and relevant content for a website or blog. By providing ChatGPT with a topic or keyword, it can generate an article or blog post that is optimized for SEO. This can save search marketers significant time and effort in creating and optimizing content.


Quantifying Political Bias in News Articles

arXiv.org Artificial Intelligence

Search bias analysis is getting more attention in recent years since search results could affect In this work, we aim to establish an automated model for evaluating ideological bias in online news articles. The dataset is composed of news articles in search results as well as the newspaper articles. The current automated model results show that model capability is not sufficient to be exploited for annotating the documents automatically, thereby computing bias in search results.


What is a search engine result page?

#artificialintelligence

With the development of the internet, the world is rapidly going towards digitalization. Search engine optimization (SEO) has become an important digital skill in the present day. Our website Digital Skills PK is continuing to share articles about basic knowledge of digital skills. Beginners can benefit from our articles. In this article, we will talk about the Google search engine results page(SERPs).


Is SEO Dead? A Fresh Look At The Age-Old Search Industry Question

#artificialintelligence

The march of time is inevitable. Whether the horse and buggy are replaced by the automobile or the slide rule is replaced by the calculator, everything eventually becomes obsolete. And if you listen to the rumors, this time around, it's search engine optimization. Rest in peace, SEO: 1997-2022. SEO is still alive and kicking. It's just as relevant today as it has ever been.