Goto

Collaborating Authors

 Instructional Material


Human Feedback is not Gold Standard

arXiv.org Artificial Intelligence

Human feedback has become the de facto standard for evaluating the performance of Large Language Models, and is increasingly being used as a training objective. However, it is not clear which properties of a generated output this single `preference' score captures. We hypothesise that preference scores are subjective and open to undesirable biases. We critically analyse the use of human feedback for both training and evaluation, to verify whether it fully captures a range of crucial error criteria. We find that while preference scores have fairly good coverage, they under-represent important aspects like factuality. We further hypothesise that both preference scores and error annotation may be affected by confounders, and leverage instruction-tuned models to generate outputs that vary along two possible confounding dimensions: assertiveness and complexity. We find that the assertiveness of an output skews the perceived rate of factuality errors, indicating that human annotations are not a fully reliable evaluation metric or training objective. Finally, we offer preliminary evidence that using human feedback as a training objective disproportionately increases the assertiveness of model outputs. We encourage future work to carefully consider whether preference scores are well aligned with the desired objective.


We and AI free online course "Living with AI" is back – tell your friends!

AIHub

The second run of a five-week free course for anyone on AI starts on the 15th January 2023. We and AI were delighted to work with The Scottish AI Alliance designing content for their course designed to give the general public of Scotland (and beyond!) an introduction to the world of AI. "Living with AI" is a great and rare commitment to making general level AI literacy attainable to everyone to enable a greater understanding of the uses and questions posed by the range of tools and technologies, and to explore their future potential. Although creating materials was challenging due to the rapid changes and innovation, proliferating hype and platform restrictions, feedback from the first run of the course shows how worthwhile the effort was. Each week has a specific topic and the course is made up of written articles, videos from AI experts across Scotland, audio clips, reflective and activity-based exercises, quizzes and opportunities to collaborate with other learners. All content has been refreshed for 2024 and will be released at once, so learners can go at their own pace in their own time.


How does self-supervised pretraining improve robustness against noisy labels across various medical image classification datasets?

arXiv.org Artificial Intelligence

Noisy labels can significantly impact medical image classification, particularly in deep learning, by corrupting learned features. Self-supervised pretraining, which doesn't rely on labeled data, can enhance robustness against noisy labels. However, this robustness varies based on factors like the number of classes, dataset complexity, and training size. In medical images, subtle inter-class differences and modality-specific characteristics add complexity. Previous research hasn't comprehensively explored the interplay between self-supervised learning and robustness against noisy labels in medical image classification, considering all these factors. In this study, we address three key questions: i) How does label noise impact various medical image classification datasets? ii) Which types of medical image datasets are more challenging to learn and more affected by label noise? iii) How do different self-supervised pretraining methods enhance robustness across various medical image datasets? Our results show that DermNet, among five datasets (Fetal plane, DermNet, COVID-DU-Ex, MURA, NCT-CRC-HE-100K), is the most challenging but exhibits greater robustness against noisy labels. Additionally, contrastive learning stands out among the eight self-supervised methods as the most effective approach to enhance robustness against noisy labels.


Curriculum for Crowd Counting -- Is it Worthy?

arXiv.org Artificial Intelligence

Recent advances in deep learning techniques have achieved remarkable performance in several computer vision problems. A notably intuitive technique called Curriculum Learning (CL) has been introduced recently for training deep learning models. Surprisingly, curriculum learning achieves significantly improved results in some tasks but marginal or no improvement in others. Hence, there is still a debate about its adoption as a standard method to train supervised learning models. In this work, we investigate the impact of curriculum learning in crowd counting using the density estimation method. We performed detailed investigations by conducting 112 experiments using six different CL settings using eight different crowd models. Our experiments show that curriculum learning improves the model learning performance and shortens the convergence time.


GACE: Learning Graph-Based Cross-Page Ads Embedding For Click-Through Rate Prediction

arXiv.org Artificial Intelligence

Predicting click-through rate (CTR) is the core task of many ads online recommendation systems, which helps improve user experience and increase platform revenue. In this type of recommendation system, we often encounter two main problems: the joint usage of multi-page historical advertising data and the cold start of new ads. In this paper, we proposed GACE, a graph-based cross-page ads embedding generation method. It can warm up and generate the representation embedding of cold-start and existing ads across various pages. Specifically, we carefully build linkages and a weighted undirected graph model considering semantic and page-type attributes to guide the direction of feature fusion and generation. We designed a variational auto-encoding task as pre-training module and generated embedding representations for new and old ads based on this task. The results evaluated in the public dataset AliEC from RecBole and the real-world industry dataset from Alipay show that our GACE method is significantly superior to the SOTA method. In the online A/B test, the click-through rate on three real-world pages from Alipay has increased by 3.6%, 2.13%, and 3.02%, respectively. Especially in the cold-start task, the CTR increased by 9.96%, 7.51%, and 8.97%, respectively.


Get a comprehensive ChatGPT education for just 25

PCWorld

ChatGPT took the world by storm in 2023, and while it hasn't exactly changed work the way some may have thought it would, it's still a useful tool to know. This four-course bundle is taught by leading online instructors Mike Wheeler (4.6/5-star instructor rating), John Elder (4.4/5-star rating), and Alex Genadinik (4.4/5-star rating). They'll give you an introduction to ChatGPT, showing you the basic applications and hacks. From there, you'll also learn how to build your own chatbots powered by ChatGPT and how to use ChatGPT to scale your business. Get familiar with ChatGPT in 2024.


Extending LLMs' Context Window with 100 Samples

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are known to have limited extrapolation ability beyond their pre-trained context window, constraining their application in downstream tasks with lengthy inputs. Recent studies have sought to extend LLMs' context window by modifying rotary position embedding (RoPE), a popular position encoding method adopted by well-known LLMs such as LLaMA, PaLM, and GPT-NeoX. However, prior works like Position Interpolation (PI) and YaRN are resource-intensive and lack comparative experiments to assess their applicability. In this work, we identify the inherent need for LLMs' attention entropy (i.e. the information entropy of attention scores) to maintain stability and introduce a novel extension to RoPE which combines adjusting RoPE's base frequency and scaling the attention logits to help LLMs efficiently adapt to a larger context window. We validate the superiority of our method in both fine-tuning performance and robustness across different context window sizes on various context-demanding tasks. Notably, our method extends the context window of LLaMA-2-7B-Chat to 16,384 with only 100 samples and 6 training steps, showcasing extraordinary efficiency. Finally, we also explore how data compositions and training curricula affect context window extension for specific downstream tasks, suggesting fine-tuning LLMs with lengthy conversations as a good starting point. We release our code and SFT data at https://github.com/GAIR-NLP/Entropy-ABF.


Generative Artificial Intelligence in Higher Education: Evidence from an Analysis of Institutional Policies and Guidelines

arXiv.org Artificial Intelligence

The release of ChatGPT in November 2022 prompted a massive uptake of generative artificial intelligence (GenAI) across higher education institutions (HEIs). HEIs scrambled to respond to its use, especially by students, looking first to regulate it and then arguing for its productive integration within teaching and learning. In the year since the release, HEIs have increasingly provided policies and guidelines to direct GenAI. In this paper we examined documents produced by 116 US universities categorized as high research activity or R1 institutions to comprehensively understand GenAI related advice and guidance given to institutional stakeholders. Through an extensive analysis, we found the majority of universities (N=73, 63%) encourage the use of GenAI and many provide detailed guidance for its use in the classroom (N=48, 41%). More than half of all institutions provided sample syllabi (N=65, 56%) and half (N=58, 50%) provided sample GenAI curriculum and activities that would help instructors integrate and leverage GenAI in their classroom. Notably, most guidance for activities focused on writing, whereas code and STEM-related activities were mentioned half the time and vaguely even when they were (N=58, 50%). Finally, more than one half of institutions talked about the ethics of GenAI on a range of topics broadly, including Diversity, Equity and Inclusion (DEI) (N=60, 52%). Overall, based on our findings we caution that guidance for faculty can become burdensome as extensive revision of pedagogical approaches is often recommended in the policies.


Temporal and Between-Group Variability in College Dropout Prediction

arXiv.org Artificial Intelligence

Large-scale administrative data is a common input in early warning systems for college dropout in higher education. Still, the terminology and methodology vary significantly across existing studies, and the implications of different modeling decisions are not fully understood. This study provides a systematic evaluation of contributing factors and predictive performance of machine learning models over time and across different student groups. Drawing on twelve years of administrative data at a large public university in the US, we find that dropout prediction at the end of the second year has a 20% higher AUC than at the time of enrollment in a Random Forest model. Also, most predictive factors at the time of enrollment, including demographics and high school performance, are quickly superseded in predictive importance by college performance and in later stages by enrollment behavior. Regarding variability across student groups, college GPA has more predictive value for students from traditionally disadvantaged backgrounds than their peers. These results can help researchers and administrators understand the comparative value of different data sources when building early warning systems and optimizing decisions under specific policy goals.


AI and Education: Will Chatbots Soon Tutor Your Children?

NYT > Business Day

Mr. Khan's vision of tutoring bots tapped into a decades-old Silicon Valley dream: automated teaching platforms that instantly customize lessons for each student. Proponents argue that developing such systems would help close achievement gaps in schools by delivering relevant, individualized instruction to children faster and more efficiently than human teachers ever could. In pursuit of such ideals, tech companies and philanthropists over the years have urged schools to purchase a laptop for each child, championed video tutorial platforms and financed learning apps that customize students' lessons. Some online math and literacy interventions have reported positive effects. But many education technology efforts have not proved to significantly close academic achievement gaps or improve student results like high school graduation rates.