AITopics | okapi

Collaborating Authors

okapi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

0918183ced31affb7ce0345e45ac1943-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 11:08:30 GMT

We evaluate Okapi using three datasets - iWildCam, PovertyMap, and CivilComments - taken from the WILDS 2.0 benchmark [63]. These datasets were chosen specifically due to the poor performance reported by [63] for semi-supervised and domain adaptation methods across the board, in relation to the ERM baselines. For PovertyMap in particular, ERM was found to vastly outperform any competing methods utilising the unlabelled data and/or domain labels. The task is multiclass species classification of animals in camera trap images. The dataset contains 1022K images of animals annotated with the domain, s, that identifies the camera trap that captured it.

artificial intelligence, encoder, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

0918183ced31affb7ce0345e45ac1943-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 11:08:27 GMT

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report (0.94)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

0918183ced31affb7ce0345e45ac1943-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 08:56:28 GMT

dataset, encoder, okapi, (14 more...)

Neural Information Processing Systems

Country: Africa (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

0918183ced31affb7ce0345e45ac1943-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 08:56:25 GMT

dataset, international conference, learning, (13 more...)

Neural Information Processing Systems

Country:

Africa (0.04)
Europe > France (0.04)
Asia > Nepal (0.04)
Asia > Indonesia (0.04)

Genre: Research Report (0.94)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Okapi: Generalising Better by Making Statistical Matches Match

Neural Information Processing SystemsDec-23-2025, 17:52:17 GMT

We propose Okapi, a simple, efficient, and general method for robust semi-supervised learning based on online statistical matching. Our method uses a nearest-neighbours-based matching procedure to generate cross-domain views for a consistency loss, while eliminating statistical outliers. In order to perform the online matching in a runtime-and memory-efficient way, we draw upon the self-supervised literature and combine a memory bank with a slow-moving momentum encoder. The consistency loss is applied within the feature space, rather than on the predictive distribution, making the method agnostic to both the modality and the task in question. We experiment on the WILDS 2.0 datasets Sagawa et al., which significantly expands the range of modalities, applications, and shifts available for studying and benchmarking real-world unsupervised adaptation. Contrary to Sagawa et al., we show that it is in fact possible to leverage additional unlabelled data to improve upon empirical risk minimisation (ERM) results with the right method. Our method outperforms the baseline methods in terms of out-of-distribution (OOD) generalisation on the iWildCam (a multi-class classification task) and PovertyMap (a regression task) image datasets as well as the CivilComments (a binary classification task) text dataset. Furthermore, from a qualitative perspective, we show the matches obtained from the learned encoder are strongly semantically related. Code for our paper is publicly available at https://github.com/wearepal/okapi/.

name change, okapi, statistical match match, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Penedo, Guilherme, Kydlíček, Hynek, Sabolčec, Vinko, Messmer, Bettina, Foroutan, Negar, Kargaran, Amir Hossein, Raffel, Colin, Jaggi, Martin, Von Werra, Leandro, Wolf, Thomas

arXiv.org Artificial IntelligenceJun-27-2025

Pre-training state-of-the-art large language models (LLMs) requires vast amounts of clean and diverse text data. While the open development of large high-quality English pre-training datasets has seen substantial recent progress, training performant multilingual LLMs remains a challenge, in large part due to the inherent difficulty of tailoring filtering and deduplication pipelines to a large number of languages. In this work, we introduce a new pre-training dataset curation pipeline based on FineWeb that can be automatically adapted to support any language. We extensively ablate our pipeline design choices on a set of nine diverse languages, guided by a set of meaningful and informative evaluation tasks that were chosen through a novel selection process based on measurable criteria. Ultimately, we show that our pipeline can be used to create non-English corpora that produce more performant models than prior datasets. We additionally introduce a straightforward and principled approach to rebalance datasets that takes into consideration both duplication count and quality, providing an additional performance uplift. Finally, we scale our pipeline to over 1000 languages using almost 100 Common Crawl snapshots to produce FineWeb2, a new 20 terabyte (5 billion document) multilingual dataset which we release along with our pipeline, training, and evaluation codebases.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.2092

Country:

Europe (1.00)
Asia > Middle East (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Education (0.46)
Information Technology > Software (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Okapi: Generalising Better by Making Statistical Matches Match

Neural Information Processing SystemsOct-9-2024, 13:09:27 GMT

classification task, okapi, statistical match match, (2 more...)

Neural Information Processing Systems

Genre: Play > Prospect (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

Lai, Viet Dac, Van Nguyen, Chien, Ngo, Nghia Trung, Nguyen, Thuat, Dernoncourt, Franck, Rossi, Ryan A., Nguyen, Thien Huu

arXiv.org Artificial IntelligenceAug-1-2023

A key technology for the development of large language models (LLMs) involves instruction tuning that helps align the models' responses with human expectations to realize impressive learning abilities. Two major approaches for instruction tuning characterize supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), which are currently applied to produce the best commercial LLMs (e.g., ChatGPT). To improve the accessibility of LLMs for research and development efforts, various instruction-tuned open-source LLMs have also been introduced recently, e.g., Alpaca, Vicuna, to name a few. However, existing open-source LLMs have only been instruction-tuned for English and a few popular languages, thus hindering their impacts and accessibility to many other languages in the world. Among a few very recent work to explore instruction tuning for LLMs in multiple languages, SFT has been used as the only approach to instruction-tune LLMs for multiple languages. This has left a significant gap for fine-tuned LLMs based on RLHF in diverse languages and raised important questions on how RLHF can boost the performance of multilingual instruction tuning. To overcome this issue, we present Okapi, the first system with instruction-tuned LLMs based on RLHF for multiple languages. Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research. We also present benchmark datasets to enable the evaluation of generative LLMs in multiple languages. Our experiments demonstrate the advantages of RLHF for multilingual instruction over SFT for different base models and datasets. Our framework and resources are released at https://github.com/nlp-uoregon/Okapi.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2307.16039

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
Asia > China > Hong Kong (0.04)
(7 more...)

Genre: Research Report (0.64)

Industry: Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Okapi: Generalising Better by Making Statistical Matches Match

Bartlett, Myles, Romiti, Sara, Sharmanska, Viktoriia, Quadrianto, Novi

arXiv.org Artificial IntelligenceNov-7-2022

We propose Okapi, a simple, efficient, and general method for robust semi-supervised learning based on online statistical matching. Our method uses a nearest-neighbours-based matching procedure to generate cross-domain views for a consistency loss, while eliminating statistical outliers. In order to perform the online matching in a runtime- and memory-efficient way, we draw upon the self-supervised literature and combine a memory bank with a slow-moving momentum encoder. The consistency loss is applied within the feature space, rather than on the predictive distribution, making the method agnostic to both the modality and the task in question. We experiment on the WILDS 2.0 datasets Sagawa et al., which significantly expands the range of modalities, applications, and shifts available for studying and benchmarking real-world unsupervised adaptation. Contrary to Sagawa et al., we show that it is in fact possible to leverage additional unlabelled data to improve upon empirical risk minimisation (ERM) results with the right method. Our method outperforms the baseline methods in terms of out-of-distribution (OOD) generalisation on the iWildCam (a multi-class classification task) and PovertyMap (a regression task) image datasets as well as the CivilComments (a binary classification task) text dataset. Furthermore, from a qualitative perspective, we show the matches obtained from the learned encoder are strongly semantically related. Code for our paper is publicly available at https://github.com/wearepal/okapi/.

artificial intelligence, inductive learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2211.05236

Country:

Africa (0.14)
Europe > France (0.04)
Asia > Nepal (0.04)
Asia > Indonesia (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

CBRE tech boss joins board of AI platform Real Estate Weekly

#artificialintelligenceOct-3-2019, 16:38:15 GMT

Okapi, a commercial real estate-focused artificial intelligence (AI) platform, has raised $5.5 million in Series-A financing. The funding round was led by Marius Nacht, the co-founder and chairman of Check Point Software Technologies and a hi-tech entrepreneur, and brings the total amount of capital Okapi has raised to $8.4 million. Founded by Iris Tsidon and Maya Gal, Okapi is a machine learning-powered software platform that analyzes disparate streams of property-related data to provide building professionals with predictive, targeted insights that improve tenant comfort and increase landlords' income opportunities. "After beginning North American operations in 2017, we quickly gained traction with Canada's largest landlords, helping to improve operations for their portfolios while increasing NOIs by 1-3 percent," said Tsidon, Okapi's CEO. "Just a few months after launching in the U.S., this funding round enables us to expand our team and increase our market penetration. We have found that there is incredible demand for artificial intelligence tools to analyze the vast troves of data that owners and operators are neglecting, and now we have the resources to add industry veterans to our staff and advisory board to help facilitate our expansion."

cbre tech boss join board, largest landlord, okapi, (5 more...)

#artificialintelligence

Country:

North America > United States (0.27)
North America > Canada (0.27)

Industry: Banking & Finance > Real Estate (1.00)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback