Goto

Collaborating Authors

 louis


She didn't get an apartment because of an AI-generated score – and sued to help others avoid the same fate

The Guardian

That was the score Mary Louis was given by an AI-powered tenant screening tool. The software, SafeRent, didn't explain in its 11-page report how the score was calculated or how it weighed various factors. It didn't say what the score actually signified. It just displayed Louis's number and determined it was too low. Louis, who works as a security guard, had applied for an apartment in an eastern Massachusetts suburb.


Benchmarking Benchmark Leakage in Large Language Models

Xu, Ruijie, Wang, Zengzhi, Fan, Run-Ze, Liu, Pengfei

arXiv.org Artificial Intelligence

Amid the expanding use of pre-training data, the phenomenon of benchmark dataset leakage has become increasingly prominent, exacerbated by opaque training processes and the often undisclosed inclusion of supervised data in contemporary Large Language Models (LLMs). This issue skews benchmark effectiveness and fosters potentially unfair comparisons, impeding the field's healthy development. To address this, we introduce a detection pipeline utilizing Perplexity and N-gram accuracy, two simple and scalable metrics that gauge a model's prediction precision on benchmark, to identify potential data leakages. By analyzing 31 LLMs under the context of mathematical reasoning, we reveal substantial instances of training even test set misuse, resulting in potentially unfair comparisons. These findings prompt us to offer several recommendations regarding model documentation, benchmark setup, and future evaluations. Notably, we propose the "Benchmark Transparency Card" to encourage clear documentation of benchmark utilization, promoting transparency and healthy developments of LLMs. we have made our leaderboard, pipeline implementation, and model predictions publicly available, fostering future research.


MultiPoT: Multilingual Program of Thoughts Harnesses Multiple Programming Languages

Luo, Xianzhen, Zhu, Qingfu, Zhang, Zhiming, Qin, Libo, Wang, Xu, Yang, Qing, Xu, Dongliang, Che, Wanxiang

arXiv.org Artificial Intelligence

Program of Thoughts (PoT) is an approach characterized by its executable intermediate steps, which ensure the accuracy of the numerical calculations in the reasoning process. Currently, PoT primarily uses Python. However, relying solely on a single language may result in suboptimal solutions and overlook the potential benefits of other programming languages. In this paper, we conduct comprehensive experiments on the programming languages used in PoT and find that no single language consistently delivers optimal performance across all tasks and models. The effectiveness of each language varies depending on the specific scenarios. Inspired by this, we propose a task and model agnostic approach called MultiPoT, which harnesses strength and diversity from various languages. Experimental results reveal that it significantly outperforms Python Self-Consistency. Furthermore, it achieves comparable or superior performance compared to the best monolingual PoT in almost all tasks across all models. In particular, MultiPoT achieves more than 4.6\% improvement on average on both Starcoder and ChatGPT (gpt-3.5-turbo).


Dallas County man gets 3 years for $1.2M online romance scam

FOX News

Fox News Flash top headlines are here. Check out what's clicking on Foxnews.com. A Texas man who was part of a romance scam that bilked a Missouri woman out of $1.2 million was sentenced on Tuesday to three years in federal prison and ordered to repay the money. Rotimi Oladimeji, 38, of Richardson, Texas, was sentenced one year after he pleaded guilty to two counts of mail fraud, two counts of wire fraud and one count of conspiracy to commit mail fraud and wire fraud, the U.S. Attorney's office in St. Louis said in a news release. Oladimeji and two others spotted the victim on the "Silver Singles" online dating site, prosecutors said.


Unsupervised Summarization Re-ranking

Ravaut, Mathieu, Joty, Shafiq, Chen, Nancy

arXiv.org Artificial Intelligence

With the rise of task-specific pre-training objectives, abstractive summarization models like PEGASUS offer appealing zero-shot performance on downstream summarization tasks. However, the performance of such unsupervised models still lags significantly behind their supervised counterparts. Similarly to the supervised setup, we notice a very high variance in quality among summary candidates from these models while only one candidate is kept as the summary output. In this paper, we propose to re-rank summary candidates in an unsupervised manner, aiming to close the performance gap between unsupervised and supervised models. Our approach improves the unsupervised PEGASUS by up to 7.27% and ChatGPT by up to 6.86% relative mean ROUGE across four widely-adopted summarization benchmarks ; and achieves relative gains of 7.51% (up to 23.73% from XSum to WikiHow) averaged over 30 zero-shot transfer setups (finetuning on a dataset, evaluating on another).


Internship – Data Engineering and Data Science at Xplor - St. Louis, MO, United States

#artificialintelligence

Take a seat on the rocket ship and join us as a summer intern within our technology department. We're a global team of builders, listeners and problem-solvers who are relentlessly focused on making life simple, so our customers can get back to growing their business, engaging consumers and doing what they love. At Xplor, the Central Technology Team has one main purpose: to enable and complement the business strategies and goals while solving real problems for our customers and users. We have dozens of applications in our everyday-life verticals that all have their technology uniqueness and their individual purpose. We also use some of the latest technology in Microsoft Azure, AWS, and Containers and are constantly looking to find innovative new ways to meet the challenges of running a unique global business.



Hyperparameter Tuning with Python: Boost your machine learning model's performance via hyperparameter tuning: Owen, Louis: 9781803235875: Amazon.com: Books

#artificialintelligence

You'll start with an introduction to hyperparameter tuning and understand why it's important. Next, you'll learn the best methods for hyperparameter tuning for a variety of use cases and specific algorithm types. This book will not only cover the usual grid or random search but also other powerful underdog methods. Individual chapters are also dedicated to the three main groups of hyperparameter tuning methods: exhaustive search, heuristic search, Bayesian optimization, and multi-fidelity optimization. Later, you will learn about top frameworks like Scikit, Hyperopt, Optuna, NNI, and DEAP to implement hyperparameter tuning.


GitHub - jeffheaton/t81_558_deep_learning: Washington University (in St. Louis) Course T81-558: Applications of Deep Neural Networks

#artificialintelligence

The content of this course changes as technology evolves, to keep up to date with changes follow me on GitHub. Deep learning is a group of exciting new technologies for neural networks. Through a combination of advanced training techniques and neural network architectural components, it is now possible to create neural networks that can handle tabular data, images, text, and audio as both input and output. Deep learning allows a neural network to learn hierarchies of information in a way that is like the function of the human brain. This course will introduce the student to classic neural network structures, Convolution Neural Networks (CNN), Long Short-Term Memory (LSTM), Gated Recurrent Neural Networks (GRU), General Adversarial Networks (GAN) and reinforcement learning.


Origami mini-robot does gymnastics for a good cause

Stanford Engineering

Despite its small size, this soft robot can manoeuvre on solid ground and through water (pictured). A pea-sized origami robot can fold, unfold and perform a range of acrobatic moves -- potentially making it useful for many biomedical applications1. All prices are NET prices. VAT will be added later in the checkout. Tax calculation will be finalised during checkout.