Goto

Collaborating Authors

 microsoft research


Signal Alums Reveal 'Encrypted Spaces,' a System for Making Private Collaboration Apps

WIRED

The new open-source project could serve as the basis for a future of apps with features as complex as Slack, Discord, or Google Docs--but with added protection against surveillance. End-to-end encryption, in which data is encoded so that only users on either "end" of a conversation can decrypt their communications--and not the server that relays that information or any other interloper--has become the standard for modern privacy on the internet. But its very name suggests a kind of simple pipe with two openings. The metaphor, and often the encryption technology that has enabled that model, doesn't fit neatly onto the world of Slack, Discord, Google Docs, and the other multiuser, complex, collaborative software where people now live and work. So one group of cryptographers has built what they describe as the foundation for a new generation of end-to-end encrypted apps, with a new metaphor: Instead of a mere pipe, they want to create "spaces" where users can hold group conversations, host information on a server, collectively make changes to it, invite in new collaborators or kick them out, all while maintaining the same strong encryption protections that prevent the server or network eavesdroppers from accessing their data.


Medical High-risk

Neural Information Processing Systems

T5 Foranypolic , states, andtreatmenta, if (s, a) 1+ Q D(s, a)and 2[0,1]exists thatPD(s, a)+ FD(s, a) , then (s, a) 1 . T6 Foranypolic , states, andtreata, if (s, a) Q R(s, a)and 2[0,1]exists PR(s, a)+ FR(s, a) , then (s, a) .


Exponential Shift: Humans Adapt to AI Economies

arXiv.org Artificial Intelligence

This paper explores how artificial intelligence (AI) and robotics are transforming the global labor market. Human workers, limited to a 33% duty cycle due to rest and holidays, cost $14 to $55 per hour. In contrast, digital labor operates nearly 24/7 at just $0.10 to $0.50 per hour. We examine sectors like healthcare, education, manufacturing, and retail, finding that 40-70% of tasks could be automated. Yet, human skills like emotional intelligence and adaptability remain essential. Humans process 5,000-20,000 tokens (units of information) per hour, while AI far exceeds this, though its energy use-3.5 to 7 times higher than humans-could offset 20-40% of cost savings. Using real-world examples, such as AI in journalism and law, we illustrate these dynamics and propose six strategies-like a 4-day workweek and retraining-to ensure a fair transition to an AI-driven economy.


AI Red-Teaming is a Sociotechnical System. Now What?

arXiv.org Artificial Intelligence

Whether tapped directly on the web, or embedded in software suites, search engines, and social media platforms, LLMs are everywhere. When a technology jumps this quickly from theoretical plaything to consumer service, many other elements are also settling in around it, without much forethought: interfaces, policies, business models, labor arrangements, infrastructural assurances, complementary technologies, public claims, advertising campaigns, regulations. Researchers studying the workings and implications of these technologies, across computer science, engineering, the social sciences, humanities, and law, must gear up just as fast to study not just the core technology, but the sociotechnical system taking shape around it[19]. Many of these decisions, arrangements, and infrastructures may turn out to be as consequential for users and the broader public as the core technology itself. But the boisterous promises and debates that surround a new technology can obscure these other essential elements that make technologies always more than the sum of their engineered parts. In this essay, we hope to call upon computer scientists and social scientists alike to pay closer, critical attention to thephenomenonof"red-teaming."


GEMS: Generative Expert Metric System through Iterative Prompt Priming

arXiv.org Artificial Intelligence

Across domains, metrics and measurements are fundamental to identifying challenges, informing decisions, and resolving conflicts. Despite the abundance of data available in this information age, not only can it be challenging for a single expert to work across multi-disciplinary data, but non-experts can also find it unintuitive to create effective measures or transform theories into context-specific metrics that are chosen appropriately. This technical report addresses this challenge by examining software communities within large software corporations, where different measures are used as proxies to locate counterparts within the organization to transfer tacit knowledge. We propose a prompt-engineering framework inspired by neural activities, demonstrating that generative models can extract and summarize theories and perform basic reasoning, thereby transforming concepts into context-aware metrics to support software communities given software repository data. While this research zoomed in on software communities, we believe the framework's applicability extends across various fields, showcasing expert-theory-inspired metrics that aid in triaging complex challenges.


AutoVerus: Automated Proof Generation for Rust Code

arXiv.org Artificial Intelligence

Generative AI has shown its values for many software engineering tasks. Still in its infancy, large language model (LLM)-based proof generation lags behind LLM-based code generation. In this paper, we present AutoVerus. AutoVerus uses LLM to automatically generate correctness proof for Rust code. AutoVerus is designed to match the unique features of Verus, a verification tool that can prove the correctness of Rust code using proofs and specifications also written in Rust. AutoVerus consists of a network of LLM agents that are crafted and orchestrated to mimic human experts' three phases of proof construction: preliminary proof generation, proof refinement guided by generic tips, and proof debugging guided by verification errors. To thoroughly evaluate AutoVerus and help foster future research in this direction, we have built a benchmark suite of 150 non-trivial proof tasks, based on existing code-generation benchmarks and verification benchmarks. Our evaluation shows that AutoVerus can automatically generate correct proof for more than 90% of them, with more than half of them tackled in less than 30 seconds or 3 LLM calls.


The Human Factor in AI Red Teaming: Perspectives from Social and Collaborative Computing

arXiv.org Artificial Intelligence

Rapid progress in general-purpose AI has sparked significant interest in "red teaming," a practice of adversarial testing originating in military and cybersecurity applications. AI red teaming raises many questions about the human factor, such as how red teamers are selected, biases and blindspots in how tests are conducted, and harmful content's psychological effects on red teamers. A growing body of HCI and CSCW literature examines related practices-including data labeling, content moderation, and algorithmic auditing. However, few, if any, have investigated red teaming itself. This workshop seeks to consider the conceptual and empirical challenges associated with this practice, often rendered opaque by non-disclosure agreements. Future studies may explore topics ranging from fairness to mental health and other areas of potential harm. We aim to facilitate a community of researchers and practitioners who can begin to meet these challenges with creativity, innovation, and thoughtful reflection.


Biomedical knowledge graph-enhanced prompt generation for large language models

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have been driving progress in AI at an unprecedented rate, yet still face challenges in knowledge-intensive domains like biomedicine. Solutions such as pre-training and domain-specific fine-tuning add substantial computational overhead, and the latter require domain-expertise. External knowledge infusion is task-specific and requires model training. Here, we introduce a task-agnostic Knowledge Graph-based Retrieval Augmented Generation (KG-RAG) framework by leveraging the massive biomedical KG SPOKE with LLMs such as Llama-2-13b, GPT-3.5-Turbo and GPT-4, to generate meaningful biomedical text rooted in established knowledge. KG-RAG consistently enhanced the performance of LLMs across various prompt types, including one-hop and two-hop prompts, drug repurposing queries, biomedical true/false questions, and multiple-choice questions (MCQ). Notably, KG-RAG provides a remarkable 71% boost in the performance of the Llama-2 model on the challenging MCQ dataset, demonstrating the framework's capacity to empower open-source models with fewer parameters for domain-specific questions. Furthermore, KG-RAG enhanced the performance of proprietary GPT models, such as GPT-3.5 which exhibited improvement over GPT-4 in context utilization on MCQ data. Our approach was also able to address drug repurposing questions, returning meaningful repurposing suggestions. In summary, the proposed framework combines explicit and implicit knowledge of KG and LLM, respectively, in an optimized fashion, thus enhancing the adaptability of general-purpose LLMs to tackle domain-specific questions in a unified framework.


AdaMedGraph: Adaboosting Graph Neural Networks for Personalized Medicine

arXiv.org Artificial Intelligence

Precision medicine tailored to individual patients has gained significant attention in recent times. Machine learning techniques are now employed to process personalized data from various sources, including images, genetics, and assessments. These techniques have demonstrated good outcomes in many clinical prediction tasks. Notably, the approach of constructing graphs by linking similar patients and then applying graph neural networks (GNNs) stands out, because related information from analogous patients are aggregated and considered for prediction. However, selecting the appropriate edge feature to define patient similarity and construct the graph is challenging, given that each patient is depicted by high-dimensional features from diverse sources. Previous studies rely on human expertise to select the edge feature, which is neither scalable nor efficient in pinpointing crucial edge features for complex diseases. In this paper, we propose a novel algorithm named \ours, which can automatically select important features to construct multiple patient similarity graphs, and train GNNs based on these graphs as weak learners in adaptive boosting. \ours{} is evaluated on two real-world medical scenarios and shows superiors performance.


Safurai-Csharp: Harnessing Synthetic Data to improve language-specific Code LLM

arXiv.org Artificial Intelligence

This paper introduces Safurai-Csharp, an open-source model designed to specialize in the generation, completion, and debugging of C# code. Safurai-Csharp is built upon the novel CodeLlama 34B model and leverages the EvolInstruct technique, creating a refined and expanded dataset for its fine-tuning process. The results of its performance, a notable score of 56.33% on the Manual MultiPL-E benchmark (Zero-Shot, Pass@1), signal its high capacity to streamline developers' workflows and aid code learning. It shows promise in setting new stakes in the landscape of open-source C# LLMs and hopes to inspire more inclusive and wide-ranging development in the field of language-specific LLMs.