Goto

Collaborating Authors

 unclear


0f83556a305d789b1d71815e8ea4f4b0-Supplemental.pdf

Neural Information Processing Systems

A.1 List of Neural Topic Modeling Works used in our Meta-Analysis In Table 6, we report the forty publications used in our meta-analysis (Section 3), which are sourced from a survey of neural topic models (Zhao et al., 2021b). A.2 Preprocessing Details Our steps are delineated in our implementation,22 but we list our choices here for easy reference. Corpus statistics are in Table 7. We use the default en-core-web-smspaCy model (Honnibal et al., 2020), version 3.0.5, Document processing - We do not process documents with fewer than 25 whitespace-separated tokens.


Reviewer 1: Unclear about the evaluation for outer iterations; Does the number of aggregated tasks affect

Neural Information Processing Systems

Y es, the total complexity is proportional to the number of aggregated tasks. Add experiments to compare ANIL and MAML and w.r .t. the size B of samples: Why sample size in inner-loop is not taken into analysis, as Fallah et al. [4] does: This setting has also been considered in Rajeswaran et al. [24], Ji et al. [13]. Reviewer 2: Dependence on ฮบ. iMAML depends on ฮบ in contrast to poly (ฮบ) of this work: Add an experiment to verify the tightness: Great point! W e will definitely add such an experiment in the revision. W e will clarify it in the revision.


Automated Risk-of-Bias Assessment of Randomized Controlled Trials: A First Look at a GEPA-trained Programmatic Prompting Framework

arXiv.org Artificial Intelligence

Assessing risk of bias (RoB) in randomized controlled trials is essential for trustworthy evidence synthesis, but the process is resource-intensive and prone to variability across reviewers. Large language models (LLMs) offer a route to automation, but existing methods rely on manually engineered prompts that are difficult to reproduce, generalize, or evaluate. This study introduces a programmable RoB assessment pipeline that replaces ad-hoc prompt design with structured, code-based optimization using DSPy and its GEPA module. GEPA refines LLM reasoning through Pareto-guided search and produces inspectable execution traces, enabling transparent replication of every step in the optimization process. We evaluated the method on 100 RCTs from published meta-analyses across seven RoB domains. GEPA-generated prompts were applied to both open-weight models (Mistral Small 3.1 with GPT-oss-20b) and commercial models (GPT-5 Nano and GPT-5 Mini). In domains with clearer methodological reporting, such as Random Sequence Generation, GEPA-generated prompts performed best, with similar results for Allocation Concealment and Blinding of Participants, while the commercial model performed slightly better overall. We also compared GEPA with three manually designed prompts using Claude 3.5 Sonnet. GEPA achieved the highest overall accuracy and improved performance by 30%-40% in Random Sequence Generation and Selective Reporting, and showed generally comparable, competitively aligned performance in the other domains relative to manual prompts. These findings suggest that GEPA can produce consistent and reproducible prompts for RoB assessment, supporting the structured and principled use of LLMs in evidence synthesis.


Reviewer 1: Unclear about the evaluation for outer iterations; Does the number of aggregated tasks affect

Neural Information Processing Systems

Y es, the total complexity is proportional to the number of aggregated tasks. Add experiments to compare ANIL and MAML and w.r .t. the size B of samples: Why sample size in inner-loop is not taken into analysis, as Fallah et al. [4] does: This setting has also been considered in Rajeswaran et al. [24], Ji et al. [13]. Reviewer 2: Dependence on ฮบ. iMAML depends on ฮบ in contrast to poly (ฮบ) of this work: Add an experiment to verify the tightness: Great point! W e will definitely add such an experiment in the revision. W e will clarify it in the revision.


Scaling Public Health Text Annotation: Zero-Shot Learning vs. Crowdsourcing for Improved Efficiency and Labeling Accuracy

arXiv.org Artificial Intelligence

Public health researchers are increasingly interested in using social media data to study health-related behaviors, but manually labeling this data can be labor-intensive and costly. This study explores whether zero-shot labeling using large language models (LLMs) can match or surpass conventional crowd-sourced annotation for Twitter posts related to sleep disorders, physical activity, and sedentary behavior. Multiple annotation pipelines were designed to compare labels produced by domain experts, crowd workers, and LLM-driven approaches under varied prompt-engineering strategies. Our findings indicate that LLMs can rival human performance in straightforward classification tasks and significantly reduce labeling time, yet their accuracy diminishes for tasks requiring more nuanced domain knowledge. These results clarify the trade-offs between automated scalability and human expertise, demonstrating conditions under which LLM-based labeling can be efficiently integrated into public health research without undermining label quality.


The ethical landscape of robot-assisted surgery. A systematic review

arXiv.org Artificial Intelligence

Background: Robot-assisted surgery has been widely adopted in recent years. However, compared to other health technologies operating in close proximity to patients in a vulnerable state, ethical issues of robot-assisted surgery have received less attention. Against the background of increasing automation that are expected to raise new ethical issues, this systematic review aims to map the state of the ethical debate in this field. Methods: A protocol was registered in the international prospective register of systematic reviews (PROSPERO CRD42023397951). Medline via PubMed, EMBASE, CINHAL, Philosophers' Index, IEEE Xplorer, Web of Science (Core Collection), Scopus and Google Scholar were searched in January 2023. Screening, extraction, and analysis were conducted independently by two authors. A qualitative narrative synthesis was performed. Results: Out of 1,723 records, 66 records were included in the final dataset. Seven major strands of the ethical debate emerged during analysis. These include questions of harms and benefits, responsibility and control, professional-patient relationship, ethical issues in surgical training and learning, justice, translational questions, and economic considerations. Discussion: The identified themes testify to a broad range of different and differing ethical issues requiring careful deliberation and integration into the surgical ethos. Looking forward, we argue that a different perspective in addressing robotic surgical devices might be helpful to consider upcoming challenges of automation.


Classifying the reported ability in clinical mobility descriptions

arXiv.org Artificial Intelligence

Assessing how individuals perform different activities is key information for modeling health states of individuals and populations. Descriptions of activity performance in clinical free text are complex, including syntactic negation and similarities to textual entailment tasks. We explore a variety of methods for the novel task of classifying four types of assertions about activity performance: Able, Unable, Unclear, and None (no information). We find that ensembling an SVM trained with lexical features and a CNN achieves 77.9% macro F1 score on our task, and yields nearly 80% recall on the rare Unclear and Unable samples. Finally, we highlight several challenges in classifying performance assertions, including capturing information about sources of assistance, incorporating syntactic structure and negation scope, and handling new modalities at test time. Our findings establish a strong baseline for this novel task, and identify intriguing areas for further research.


If the Impact of Artificial Intelligence on Work is Unclear, What Can Schools Do?

#artificialintelligence

Artificial intelligence is already reshaping the labor market. Its impact will likely become even more disruptive. But experts have historically been bad at predicting which jobs and tasks will be lost to automation, and public officials have historically been slow to respond to technological advances with smart, effective regulations. That's the nutshell of a RAND Corporation report on "The Risks of Artificial Intelligence to Security and the Future of Work," released earlier this week. What can K-12 educators and policymakers take away from the work?


Unclear If Siri Speaker Will Have Display Screen Like Amazon's Echo Show

International Business Times

Amazon revealed the Echo Show Tuesday, and with coverage of the Alexa gadget came rumors about Apple's upcoming Siri speaker. Apple employees have been testing the Siri speaker in their homes for several months, sources familiar with the matter told Bloomberg. So far, it's unknown whether Apple's upcoming Siri speaker will come with a built-in display, like Amazon's Echo Show. Marketing chief Phil Schiller said last week in an interview he thinks voice assistant devices are beneficial, but that doesn't mean you'd never want a screen. KGI Securities analyst Ming-Chi Kuo previously said there's a more than 50 percent chance Apple could announce its Siri speaker at the Worldwide Developers Conference this June.