Goto

Collaborating Authors

 attainment


Evaluating undergraduate mathematics examinations in the era of generative AI: a curriculum-level case study

arXiv.org Artificial Intelligence

Generative artificial intelligence (GenAI) tools such as OpenAI's ChatGPT are transforming the educational landscape, prompting reconsideration of traditional assessment practices. In parallel, universities are exploring alternatives to in-person, closed-book examinations, raising concerns about academic integrity and pedagogical alignment in uninvigilated settings. This study investigates whether traditional closed-book mathematics examinations retain their pedagogical relevance when hypothetically administered in uninvigilated, open-book settings with GenAI access. Adopting an empirical approach, we generate, transcribe, and blind-mark GenAI submissions to eight undergraduate mathematics examinations at a Russell Group university, spanning the entirety of the first-year curriculum. By combining independent GenAI responses to individual questions, we enable a meaningful evaluation of GenAI performance, both at the level of modules and across the first-year curriculum. We find that GenAI attainment is at the level of a first-class degree, though current performance can vary between modules. Further, we find that GenAI performance is remarkably consistent when viewed across the entire curriculum, significantly more so than that of students in invigilated examinations. Our findings evidence the need for redesigning assessments in mathematics for unsupervised settings, and highlight the potential reduction in pedagogical value of current standards in the era of generative artificial intelligence.


HyperFlexis: Joint Design of Algorithms and Systems for Multi-SLO Serving and Fast Scaling

arXiv.org Artificial Intelligence

Modern large language model (LLM) serving systems face challenges from highly variable requests with diverse lengths, priorities, and stage-specific service-level objectives (SLOs). Meeting these requires real-time scheduling, rapid and cost-effective scaling, and support for both collocated and disaggregated Prefill/Decode (P/D) architectures. We present HyperFlexis, a unified LLM serving system that integrates algorithmic and system-level innovations to jointly optimize scheduling and scaling under multiple SLOs. It features a multi-SLO-aware scheduler that leverages budget estimation and request prioritization to ensure proactive SLO compliance for both new and ongoing requests. The system supports prefill- and decode-stage multi-SLO scheduling for P/D-disaggregated architectures and KV cache transfers. It also enables cost-effective scaling decisions, prefill-decode instance linking during scaling, and rapid P/D role transitions. To accelerate scaling and reduce cold-start latency, a device-to-device (D2D) weight transfer mechanism is proposed that lowers weight loading overhead by up to 19.39$\times$. These optimizations allow the system to achieve up to 4.44$\times$ higher SLO attainment, 65.82% lower request latency, and cost parity with state-of-the-art baselines. The code will be released soon.


On the attainment of the Wasserstein--Cramer--Rao lower bound

arXiv.org Machine Learning

Recently, a Wasserstein analogue of the Cramer--Rao inequality has been developed using the Wasserstein information matrix (Otto metric). This inequality provides a lower bound on the Wasserstein variance of an estimator, which quantifies its robustness against additive noise. In this study, we investigate conditions for an estimator to attain the Wasserstein--Cramer--Rao lower bound (asymptotically), which we call the (asymptotic) Wasserstein efficiency. We show a condition under which Wasserstein efficient estimators exist for one-parameter statistical models. This condition corresponds to a recently proposed Wasserstein analogue of one-parameter exponential families (e-geodesics). We also show that the Wasserstein estimator, a Wasserstein analogue of the maximum likelihood estimator based on the Wasserstein score function, is asymptotically Wasserstein efficient in location-scale families.


Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving

arXiv.org Artificial Intelligence

Serving large language models (LLMs) is expensive, especially for providers hosting many models, making cost reduction essential. The unique workload patterns of serving multiple LLMs (i.e., multi-LLM serving) create new opportunities and challenges for this task. The long-tail popularity of models and their long idle periods present opportunities to improve utilization through GPU sharing. However, existing GPU sharing systems lack the ability to adjust their resource allocation and sharing policies at runtime, making them ineffective at meeting latency service-level objectives (SLOs) under rapidly fluctuating workloads. This paper presents Prism, a multi-LLM serving system that unleashes the full potential of GPU sharing to achieve both cost efficiency and SLO attainment. At its core, Prism tackles a key limitation of existing systems$\unicode{x2014}$the lack of $\textit{cross-model memory coordination}$, which is essential for flexibly sharing GPU memory across models under dynamic workloads. Prism achieves this with two key designs. First, it supports on-demand memory allocation by dynamically mapping physical to virtual memory pages, allowing flexible memory redistribution among models that space- and time-share a GPU. Second, it improves memory efficiency through a two-level scheduling policy that dynamically adjusts sharing strategies based on models' runtime demands. Evaluations on real-world traces show that Prism achieves more than $2\times$ cost savings and $3.3\times$ SLO attainment compared to state-of-the-art systems.


SLOs-Serve: Optimized Serving of Multi-SLO LLMs

arXiv.org Artificial Intelligence

This paper introduces SLOs-Serve, a system designed for serving multi-stage large language model (LLM) requests with application- and stage-specific service level objectives (SLOs). The key idea behind SLOs-Serve is to customize the allocation of tokens to meet these SLO requirements. SLOs-Serve uses a multi-SLO dynamic programming-based algorithm to continuously optimize token allocations under SLO constraints by exploring the full design space of chunked prefill and (optional) speculative decoding. Leveraging this resource planning algorithm, SLOs-Serve effectively supports multi-SLOs and multi-replica serving with dynamic request routing while being resilient to bursty arrivals. Our evaluation across 6 LLM application scenarios (including summarization, coding, chatbot, tool calling, and reasoning) demonstrates that SLOs-Serve improves per-GPU serving capacity by 2.2x on average compared to prior state-of-the-art systems.


Efficiently Serving LLM Reasoning Programs with Certaindex

arXiv.org Artificial Intelligence

The rapid evolution of large language models (LLMs) has unlocked their capabilities in advanced reasoning tasks like mathematical problem-solving, code generation, and legal analysis. Central to this progress are inference-time reasoning algorithms, which refine outputs by exploring multiple solution paths, at the cost of increasing compute demands and response latencies. Existing serving systems fail to adapt to the scaling behaviors of these algorithms or the varying difficulty of queries, leading to inefficient resource use and unmet latency targets. We present Dynasor, a system that optimizes inference-time compute for LLM reasoning queries. Unlike traditional engines, Dynasor tracks and schedules requests within reasoning queries and uses Certaindex, a proxy that measures statistical reasoning progress based on model certainty, to guide compute allocation dynamically. Dynasor co-adapts scheduling with reasoning progress: it allocates more compute to hard queries, reduces compute for simpler ones, and terminates unpromising queries early, balancing accuracy, latency, and cost. On diverse datasets and algorithms, Dynasor reduces compute by up to 50% in batch processing and sustaining 3.3x higher query rates or 4.7x tighter latency SLOs in online serving.


Granularity at Scale: Estimating Neighborhood Socioeconomic Indicators from High-Resolution Orthographic Imagery and Hybrid Learning

arXiv.org Artificial Intelligence

Many areas of the world are without basic information on the socioeconomic well-being of the residing population due to limitations in existing data collection methods. Overhead images obtained remotely, such as from satellite or aircraft, can help serve as windows into the state of life on the ground and help "fill in the gaps" where community information is sparse, with estimates at smaller geographic scales requiring higher resolution sensors. Concurrent with improved sensor resolutions, recent advancements in machine learning and computer vision have made it possible to quickly extract features from and detect patterns in image data, in the process correlating these features with other information. In this work, we explore how well two approaches, a supervised convolutional neural network and semi-supervised clustering based on bag-of-visual-words, estimate population density, median household income, and educational attainment of individual neighborhoods from publicly available high-resolution imagery of cities throughout the United States. Results and analyses indicate that features extracted from the imagery can accurately estimate the density (R$^2$ up to 0.81) of neighborhoods, with the supervised approach able to explain about half the variation in a population's income and education. In addition to the presented approaches serving as a basis for further geographic generalization, the novel semi-supervised approach provides a foundation for future work seeking to estimate fine-scale information from aerial imagery without the need for label data.


AI Designs Decisions

#artificialintelligence

Havelock Ellis said it is not the attainment of the goal that matters, it is the things met with by the way. He was speaking of philosophy. In business AI is all about goal attainment. The things met along the way are decisions. Decisions constitute a focus of the recent survey by Signal AI of 1,000 C-suite executives in an attempt to estimate the impact of AI on the U.S. economy.


Priority to unemployed immigrants? A causal machine learning evaluation of training in Belgium

arXiv.org Machine Learning

We investigate heterogenous employment effects of Flemish training programmes. Based on administrative individual data, we analyse programme effects at various aggregation levels using Modified Causal Forests (MCF), a causal machine learning estimator for multiple programmes. While all programmes have positive effects after the lock-in period, we find substantial heterogeneity across programmes and types of unemployed. Simulations show that assigning unemployed to programmes that maximise individual gains as identified in our estimation can considerably improve effectiveness. Simplified rules, such as one giving priority to unemployed with low employability, mostly recent migrants, lead to about half of the gains obtained by more sophisticated rules.


Automation and Artificial Intelligence: How machines are affecting people and places

#artificialintelligence

At first, technologists issued dystopian alarms about the power of automation and artificial intelligence (AI) to destroy jobs. Then came a correction, with a wave of reassurances. Now, the discourse appears to be arriving at a more complicated understanding, suggesting that automation will bring neither apocalypse nor utopia, but instead both benefits and stress alike. Such is the ambiguous and sometimes disembodied nature of the "future of work" discussion. Hence the analysis presented here.