Goto

Collaborating Authors

 autocomplete


Synthetic Prefixes to Mitigate Bias in Real-Time Neural Query Autocomplete

Rajan, Adithya, Liu, Xiaoyu, Verma, Prateek, Arora, Vibhu

arXiv.org Artificial Intelligence

We introduce a data-centric approach for mitigating presentation bias in real-time neural query autocomplete systems through the use of synthetic prefixes. These prefixes are generated from complete user queries collected during regular search sessions where autocomplete was not active. This allows us to enrich the training data for learning to rank models with more diverse and less biased examples. This method addresses the inherent bias in engagement signals collected from live query autocomplete interactions, where model suggestions influence user behavior. Our neural ranker is optimized for real-time deployment under strict latency constraints and incorporates a rich set of features, including query popularity, seasonality, fuzzy match scores, and contextual signals such as department affinity, device type, and vertical alignment with previous user queries. To support efficient training, we introduce a task-specific simplification of the listwise loss, reducing computational complexity from $O(n^2)$ to $O(n)$ by leveraging the query autocomplete structure of having only one ground-truth selection per prefix. Deployed in a large-scale e-commerce setting, our system demonstrates statistically significant improvements in user engagement, as measured by mean reciprocal rank and related metrics. Our findings show that synthetic prefixes not only improve generalization but also provide a scalable path toward bias mitigation in other low-latency ranking tasks, including related searches and query recommendations.


Mark Zuckerberg 'predicts' AI will write most of Meta's code within 12 to 18 months

Engadget

Mark Zuckerberg says he believes most of the Meta's code will be written by AI agents sometime within the next year-and-a-half. Zuckerberg made the prediction during an hour-long interview with podcaster Dwarkesh Patel. "I would guess sometime in the next 12 to 18 months, we'll reach the point where most of the code that's going towards these efforts is written by AI," said Zuckerberg, referring to the company's efforts to build internal AI agents. "And I don't mean like autocomplete... I'm talking more like you give it a goal, it can run tests, it can improve things, it can find issues, it writes higher quality code than the average very good person on the team already." Meta CEO, Mark Zuckerberg: "within 12-18 months, most of the code is written by AI" It won't just be autocomplete.


The Morning After: Google dismisses Elon Musk's claim that autocomplete interfered in the election

Engadget

Google has responded to allegations it "censored" searches about Donald Trump after Elon Musk baselessly claimed the company had imposed a "search ban" on the former president. Google explained the bugs in its autocomplete feature caused the issues. But Musk's tweet, viewed more than 118 million times, has forced the search giant to publicly explain one of its most basic features. Google added that the strange suggestions for "president donald" were due to a "bug that spanned the political spectrum." It also affected searches related to former President Barack Obama and other political figures. You can get these reports delivered daily direct to your inbox.


Why are conservatives claiming Google is covering up the shooting of Trump?

Al Jazeera

Google has come under fire from conservatives in the United States amid claims that the tech giant is suppressing information about the attempted assassination of Donald Trump in order to influence the presidential election. Trump, who is running for a second term in the White House on the Republican Party ticket, narrowly escaped being killed when a lone gunman opened fire at a campaign rally in Pennsylvania on July 13. The attack, which killed one rally attendee, injured two others, and bloodied the former president's ear, has spawned a number of unsubstantiated claims and conspiracy theories. The latest revolves around Google Search's autocomplete feature, which is designed to help users save time by predicting their search query based on the opening letters or words that are inputted. Over the weekend, some internet users noticed that writing about assassination attempts in the Google search bar did not automatically prompt search queries about the shooting of Trump.


AI chatbots are intruding into online communities where people are trying to connect with other humans

AIHub

A parent asked a question in a private Facebook group in April 2024: Does anyone with a child who is both gifted and disabled have any experience with New York City public schools? The parent received a seemingly helpful answer that laid out some characteristics of a specific school, beginning with the context that "I have a child who is also 2e," meaning twice exceptional. On a Facebook group for swapping unwanted items near Boston, a user looking for specific items received an offer of a "gently used" Canon camera and an "almost-new portable air conditioning unit that I never ended up using." Both of these responses were lies. That child does not exist and neither do the camera or air conditioner.


How the quest to type Chinese on a QWERTY keyboard created autocomplete

MIT Technology Review

These 44 keystrokes marked the first steps in a process known as "input" or shuru: the act of getting Chinese characters to appear on a computer monitor or other digital device using a QWERTY keyboard or trackpad. Across all computational and digital media, Chinese text entry relies on software programs known as "input method editors"--better known as "IMEs" or simply "input methods" (shurufa). IMEs are a form of "middleware," so named because they operate in between the hardware of the user's device and the software of its program or application. Whether a person is composing a Chinese document in Microsoft Word, searching the web, sending text messages, or otherwise, an IME is always at work, intercepting all of the user's keystrokes and trying to figure out which Chinese characters the user wants to produce. Input, simply put, is the way ymiw2klt4pwyy … becomes a string of Chinese characters.


CAMRA: Copilot for AMR Annotation

Cai, Jon Z., Ahmed, Shafiuddin Rehan, Bonn, Julia, Wright-Bettner, Kristin, Palmer, Martha, Martin, James H.

arXiv.org Artificial Intelligence

In this paper, we introduce CAMRA (Copilot for AMR Annotatations), a cutting-edge web-based tool designed for constructing Abstract Meaning Representation (AMR) from natural language text. CAMRA offers a novel approach to deep lexical semantics annotation such as AMR, treating AMR annotation akin to coding in programming languages. Leveraging the familiarity of programming paradigms, CAMRA encompasses all essential features of existing AMR editors, including example lookup, while going a step further by integrating Propbank roleset lookup as an autocomplete feature within the tool. Notably, CAMRA incorporates AMR parser models as coding co-pilots, greatly enhancing the efficiency and accuracy of AMR annotators. To demonstrate the tool's capabilities, we provide a live demo accessible at: https://camra.colorado.edu


XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages

Ruder, Sebastian, Clark, Jonathan H., Gutkin, Alexander, Kale, Mihir, Ma, Min, Nicosia, Massimo, Rijhwani, Shruti, Riley, Parker, Sarr, Jean-Michel A., Wang, Xinyi, Wieting, John, Gupta, Nitish, Katanova, Anna, Kirov, Christo, Dickinson, Dana L., Roark, Brian, Samanta, Bidisha, Tao, Connie, Adelani, David I., Axelrod, Vera, Caswell, Isaac, Cherry, Colin, Garrette, Dan, Ingle, Reeve, Johnson, Melvin, Panteleev, Dmitry, Talukdar, Partha

arXiv.org Artificial Intelligence

Data scarcity is a crucial issue for the development of highly multilingual NLP systems. Yet for many under-represented languages (ULs) -- languages for which NLP re-search is particularly far behind in meeting user needs -- it is feasible to annotate small amounts of data. Motivated by this, we propose XTREME-UP, a benchmark defined by: its focus on the scarce-data scenario rather than zero-shot; its focus on user-centric tasks -- tasks with broad adoption by speakers of high-resource languages; and its focus on under-represented languages where this scarce-data scenario tends to be most realistic. XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies including ASR, OCR, MT, and information access tasks that are of general utility. We create new datasets for OCR, autocomplete, semantic parsing, and transliteration, and build on and refine existing datasets for other tasks. XTREME-UP provides methodology for evaluating many modeling scenarios including text-only, multi-modal (vision, audio, and text),supervised parameter tuning, and in-context learning. We evaluate commonly used models on the benchmark. We release all code and scripts to train and evaluate models


AI Chatbots Are Doing Something a Lot Like Improv

TIME - Tech

For weeks after his bizarre conversation with Bing's new chatbot went viral, New York Times columnist Kevin Roose wasn't sure what had happened. "The explanations you get for how these language models work, they're not that satisfying," Roose said at one point. "No one can tell me why this chatbot tried to break up my marriage." He's not alone in feeling confused. Powered by a relatively new form of AI called large language models, this new generation of chatbots defies our intuitions about how to interact with computers.


How AI Could Transform Email

WIRED

What if your inbox were jam-packed with AI-generated emails? You may already be on the receiving end of emails written by artificial intelligence, with the help of a human prompter. Austin Distel, a senior director of marketing at Jasper, is one of those humans. Austin smiles as he demonstrates Japer's knack for email composition. "These are tools in my tool belt that helped me perform faster, but also better," he says before sharing that he often uses generative AI to rewrite work emails so they sound like Jerry Seinfeld.