AITopics | Ingle, Reeve

Collaborating Authors

Ingle, Reeve

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

OCR Language Models with Custom Vocabularies

Garst, Peter, Ingle, Reeve, Fujii, Yasuhisa

arXiv.org Artificial IntelligenceAug-18-2023

Language models are useful adjuncts to optical models for producing accurate optical character recognition (OCR) results. One factor which limits the power of language models in this context is the existence of many specialized domains with language statistics very different from those implied by a general language model - think of checks, medical prescriptions, and many other specialized document classes. This paper introduces an algorithm for efficiently generating and attaching a domain specific word based language model at run time to a general language model in an OCR system. In order to best use this model the paper also introduces a modified CTC beam search decoder which effectively allows hypotheses to remain in contention based on possible future completion of vocabulary words. The result is a substantial reduction in word error rate in recognizing material from specialized domains.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2308.09671

Country:

North America > United States > California (0.28)
Europe > Austria (0.28)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.36)

Add feedback

XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages

Ruder, Sebastian, Clark, Jonathan H., Gutkin, Alexander, Kale, Mihir, Ma, Min, Nicosia, Massimo, Rijhwani, Shruti, Riley, Parker, Sarr, Jean-Michel A., Wang, Xinyi, Wieting, John, Gupta, Nitish, Katanova, Anna, Kirov, Christo, Dickinson, Dana L., Roark, Brian, Samanta, Bidisha, Tao, Connie, Adelani, David I., Axelrod, Vera, Caswell, Isaac, Cherry, Colin, Garrette, Dan, Ingle, Reeve, Johnson, Melvin, Panteleev, Dmitry, Talukdar, Partha

arXiv.org Artificial IntelligenceMay-24-2023

Data scarcity is a crucial issue for the development of highly multilingual NLP systems. Yet for many under-represented languages (ULs) -- languages for which NLP re-search is particularly far behind in meeting user needs -- it is feasible to annotate small amounts of data. Motivated by this, we propose XTREME-UP, a benchmark defined by: its focus on the scarce-data scenario rather than zero-shot; its focus on user-centric tasks -- tasks with broad adoption by speakers of high-resource languages; and its focus on under-represented languages where this scarce-data scenario tends to be most realistic. XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies including ASR, OCR, MT, and information access tasks that are of general utility. We create new datasets for OCR, autocomplete, semantic parsing, and transliteration, and build on and refine existing datasets for other tasks. XTREME-UP provides methodology for evaluating many modeling scenarios including text-only, multi-modal (vision, audio, and text),supervised parameter tuning, and in-context learning. We evaluate commonly used models on the benchmark. We release all code and scripts to train and evaluate models

computational linguistic, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2305.11938

Country:

Europe (1.00)
North America > United States (0.68)
Asia > Middle East (0.68)

Genre: Research Report > New Finding (0.67)

Industry:

Education (0.67)
Health & Medicine > Therapeutic Area (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.93)

Add feedback