Goto

Collaborating Authors

 Rosenfeld, Ariel


The Einstein Test: Towards a Practical Test of a Machine's Ability to Exhibit Superintelligence

arXiv.org Artificial Intelligence

Creative and disruptive insights (CDIs), such as the development of the theory of relativity, have punctuated human history, marking pivotal shifts in our intellectual trajectory. Recent advancements in artificial intelligence (AI) have sparked debates over whether state of the art models possess the capacity to generate CDIs. We argue that the ability to create CDIs should be regarded as a significant feature of machine superintelligence (SI).To this end, we propose a practical test to evaluate whether an approach to AI targeting SI can yield novel insights of this kind. We propose the Einstein test: given the data available prior to the emergence of a known CDI, can an AI independently reproduce that insight (or one that is formally equivalent)? By achieving such a milestone, a machine can be considered to at least match humanity's past top intellectual achievements, and therefore to have the potential to surpass them.


Whose LLM is it Anyway? Linguistic Comparison and LLM Attribution for GPT-3.5, GPT-4 and Bard

arXiv.org Artificial Intelligence

Large Language Models (LLMs), such as GPT-3.5 [25], GPT-4 [1] and Bard [21], have revolutionized and popularized natural language processing and AI, demonstrating human-like and super-human performance in a wide range of text-based tasks [37]. While the layman may find the responses of LLMs hard to distinguish from human-generated ones [26, 3], a plethora of recent literature has shown that it is possible to successfully discern human-generated text from LLM-generated text using various computational techniques [30, 5, 15]. Among the developed techniques, the linguistic approach, which focuses on the structure, patterns, and nuances inherent in human language, stands out as a promising option that offers both high statistical performance [14] as well as theoretically-grounded explanatory power [23], as opposed to alternative "black-box" machine-learning techniques [2, 29]. Indeed, recent literature has shown that human and LLM-generated texts are, generally speaking, linguistically different across a wide variety of tasks and datasets including news reporting [23], hotel reviewing [9], essay writing [14] and scientific communication [6] to name a few. Common to these and similar studies is the observation that LLM-generated texts tend to be extensive and comprehensive, highly organized, follow a logical structure or formally stated, and present higher objectivity and lower prevalence of bias and harmful content compared to human-generated texts [34]. Extensive research into human-generated texts has consistently demonstrated the inherent diversity in human writing styles, resulting in distinct linguistic patterns, structures, and nuances [27, 22, 28].


Detecting LLM-Assisted Writing in Scientific Communication: Are We There Yet?

arXiv.org Artificial Intelligence

Large Language Models (LLMs), exemplified by ChatGPT, have significantly reshaped text generation, particularly in the realm of writing assistance. While ethical considerations underscore the importance of transparently acknowledging LLM use, especially in scientific communication, genuine acknowledgment remains infrequent. A potential avenue to encourage accurate acknowledging of LLM-assisted writing involves employing automated detectors. Our evaluation of four cutting-edge LLM-generated text detectors reveals their suboptimal performance compared to a simple ad-hoc detector designed to identify abrupt writing style changes around the time of LLM proliferation. We contend that the development of specialized detectors exclusively dedicated to LLM-assisted writing detection is necessary. Such detectors could play a crucial role in fostering more authentic recognition of LLM involvement in scientific communication, addressing the current challenges in acknowledgment practices.


Mathematical Modeling of BCG-based Bladder Cancer Treatment Using Socio-Demographics

arXiv.org Artificial Intelligence

Cancer is one of the most widespread diseases around the world with millions of new patients each year. Bladder cancer is one of the most prevalent types of cancer affecting all individuals alike with no obvious prototypical patient. The current standard treatment for BC follows a routine weekly Bacillus Calmette-Guerin (BCG) immunotherapy-based therapy protocol which is applied to all patients alike. The clinical outcomes associated with BCG treatment vary significantly among patients due to the biological and clinical complexity of the interaction between the immune system, treatments, and cancer cells. In this study, we take advantage of the patient's socio-demographics to offer a personalized mathematical model that describes the clinical dynamics associated with BCG-based treatment. To this end, we adopt a well-established BCG treatment model and integrate a machine learning component to temporally adjust and reconfigure key parameters within the model thus promoting its personalization. Using real clinical data, we show that our personalized model favorably compares with the original one in predicting the number of cancer cells at the end of the treatment, with 14.8% improvement, on average.


Towards Outcome-Driven Patient Subgroups: A Machine Learning Analysis Across Six Depression Treatment Studies

arXiv.org Artificial Intelligence

Major depressive disorder (MDD) is a heterogeneous condition; multiple underlying neurobiological substrates could be associated with treatment response variability. Understanding the sources of this variability and predicting outcomes has been elusive. Machine learning has shown promise in predicting treatment response in MDD, but one limitation has been the lack of clinical interpretability of machine learning models. We analyzed data from six clinical trials of pharmacological treatment for depression (total n = 5438) using the Differential Prototypes Neural Network (DPNN), a neural network model that derives patient prototypes which can be used to derive treatment-relevant patient clusters while learning to generate probabilities for differential treatment response. A model classifying remission and outputting individual remission probabilities for five first-line monotherapies and three combination treatments was trained using clinical and demographic data. Model validity and clinical utility were measured based on area under the curve (AUC) and expected improvement in sample remission rate with model-guided treatment, respectively. Post-hoc analyses yielded clusters (subgroups) based on patient prototypes learned during training. Prototypes were evaluated for interpretability by assessing differences in feature distributions and treatment-specific outcomes. A 3-prototype model achieved an AUC of 0.66 and an expected absolute improvement in population remission rate compared to the sample remission rate. We identified three treatment-relevant patient clusters which were clinically interpretable. It is possible to produce novel treatment-relevant patient profiles using machine learning models; doing so may improve precision medicine for depression. Note: This model is not currently the subject of any active clinical trials and is not intended for clinical use.


What Should We Optimize in Participatory Budgeting? An Experimental Study

arXiv.org Artificial Intelligence

Participatory Budgeting (PB) is a process in which voters decide how to allocate a common budget; most commonly it is done by ordinary people -- in particular, residents of some municipality -- to decide on a fraction of the municipal budget. From a social choice perspective, existing research on PB focuses almost exclusively on designing computationally-efficient aggregation methods that satisfy certain axiomatic properties deemed "desirable" by the research community. Our work complements this line of research through a user study (N = 215) involving several experiments aimed at identifying what potential voters (i.e., non-experts) deem fair or desirable in simple PB settings. Our results show that some modern PB aggregation techniques greatly differ from users' expectations, while other, more standard approaches, provide more aligned results. We also identify a few possible discrepancies between what non-experts consider \say{desirable} and how they perceive the notion of "fairness" in the PB context. Taken jointly, our results can be used to help the research community identify appropriate PB aggregation methods to use in practice.


Providing Explanations for Recommendations in Reciprocal Environments

arXiv.org Artificial Intelligence

Automated platforms which support users in finding a mutually beneficial match, such as online dating and job recruitment sites, are becoming increasingly popular. These platforms often include recommender systems that assist users in finding a suitable match. While recommender systems which provide explanations for their recommendations have shown many benefits, explanation methods have yet to be adapted and tested in recommending suitable matches. In this paper, we introduce and extensively evaluate the use of "reciprocal explanations" -- explanations which provide reasoning as to why both parties are expected to benefit from the match. Through an extensive empirical evaluation, in both simulated and real-world dating platforms with 287 human participants, we find that when the acceptance of a recommendation involves a significant cost (e.g., monetary or emotional), reciprocal explanations outperform standard explanation methods which consider the recommendation receiver alone. However, contrary to what one may expect, when the cost of accepting a recommendation is negligible, reciprocal explanations are shown to be less effective than the traditional explanation methods.


Leveraging human knowledge in tabular reinforcement learning: A study of human subjects

arXiv.org Artificial Intelligence

Reinforcement Learning (RL) can be extremely effective in solving complex, real-world problems. However, injecting human knowledge into an RL agent may require extensive effort and expertise on the human designer's part. To date, human factors are generally not considered in the development and evaluation of possible RL approaches. In this article, we set out to investigate how different methods for injecting human knowledge are applied, in practice, by human designers of varying levels of knowledge and skill. We perform the first empirical evaluation of several methods, including a newly proposed method named SASS which is based on the notion of similarities in the agent's state-action space. Through this human study, consisting of 51 human participants, we shed new light on the human factors that play a key role in RL. We find that the classical reward shaping technique seems to be the most natural method for most designers, both expert and non-expert, to speed up RL. However, we further find that our proposed method SASS can be effectively and efficiently combined with reward shaping, and provides a beneficial alternative to using only a single speedup method with minimal human designer effort overhead.


Predicting Human Decision-Making: From Prediction to Action

Morgan & Claypool Publishers

In this book, we explore the task of automatically predicting human decision-making and its use in designing intelligent human-aware automated computer systems of varying naturesfrom purely conflicting interaction settings (e.g., security and games) to fully cooperative interaction settings (e.g., autonomous driving and personal robotic assistants). ISBN 9781681732749, 150 pages.


Advice Provision for Energy Saving in Automobile Climate-Control System

AI Magazine

Reducing energy consumption of climate control systems is important in order to reduce human environmental footprint. Our approach takes into account both the energy consumption of the climate control system and the expected comfort level of the driver. We therefore build two models, one for assessing the energy consumption of the climate control system as a function of the system's settings, and the other, models human comfort level as a function of the climate control system's settings. Using these models, the agent provides advice to the driver considering how to set the climate control system.