Law
'Progressive except for Palestine': how a tech charity imploded over a statement on Gaza
Miliaku Nwabueze, a senior program manager at Code for Science & Society, had been concerned for some time about the role of technology in state violence. Then, on 7 October of last year, Hamas entered Israel, killing and kidnapping about 1,400 people. Less than a week later, as Israel ordered 1.1 million Palestinians out of northern Gaza in the onset of its deadly retaliation, Nwabueze decided to write a message to her colleagues on the US-based non-profit organization's Slack channel. "Hey y'all … I have been watching multiple genocides around the world," she began, naming Palestine as well as Sudan, the Congo and Artsakh. "All of these have heavy linkages to the tech industry." The 30-year-old went on to assert that CS&S – whose stated mission is to "advance the power of data to improve the social and economic lives of all people" – should say, at the minimum, "we support demands for a ceasefire" in Gaza.
'It feels like a startup energy': Google's UK boss on the advent of AI
Google's central London office cost as much as a tech unicorn and the company's UK boss, Debbie Weinstein, says it pulses with a similar spirit. "It feels like a startup energy," she says. However, we are meeting on a morning when Google has been threatened with a reckoning reserved for members of the corporate establishment, not tech ingenues: a breakup. Hours earlier, the US Department of Justice had asked a federal judge to order the sale of Google's Chrome browser, along with a host of other actions including making its search index – a database of all the webpages it has crawled – available to competitors. It follows a ruling by the same judge in August that the 2tn company has built an illegal monopoly in the search market.
Adaptive Two-Phase Finetuning LLMs for Japanese Legal Text Retrieval
Trung, Quang Hoang, Phuc, Nguyen Van Hoang, Hoang, Le Trung, Hieu, Quang Huu, Duy, Vo Nguyen Le
Text Retrieval (TR) involves finding and retrieving text-based content relevant to a user's query from a large repository, with applications in real-world scenarios such as legal document retrieval. While most existing studies focus on English, limited work addresses Japanese contexts. In this paper, we introduce a new dataset specifically designed for Japanese legal contexts and propose a novel two-phase pipeline tailored to this domain. In the first phase, the model learns a broad understanding of global contexts, enhancing its generalization and adaptability to diverse queries. In the second phase, the model is fine-tuned to address complex queries specific to legal scenarios. Extensive experiments are conducted to demonstrate the superior performance of our method, which outperforms existing baselines. Furthermore, our pipeline proves effective in English contexts, surpassing comparable baselines on the MS MARCO dataset. We have made our code publicly available on GitHub, and the model checkpoints are accessible via HuggingFace.
Words and Action: Modeling Linguistic Leadership in #BlackLivesMatter Communities
Roytburg, Dani, Olorunisola, Deborah, Soni, Sandeep, Klein, Lauren
In this project we describe a method of modeling semantic leadership across a set of communities associated with the #BlackLivesMatter movement, which has been informed by qualitative research on the structure of social media and Black Twitter in particular. We describe our bespoke approaches to time-binning, community clustering, and connecting communities over time, as well as our adaptation of state-of-the-art approaches to semantic change detection and semantic leadership induction. We find substantial evidence of the leadership role of BLM activists and progressives, as well as Black celebrities. We also find evidence of the sustained engagement of the conservative community with this discourse, suggesting an alternative explanation for how we arrived at the present moment, in which "anti-woke" and "anti-CRT" bills are being enacted nation-wide.
GerPS-Compare: Comparing NER methods for legal norm analysis
Bachinger, Sarah T., Unger, Christoph, Erd, Robin, Feddoul, Leila, Lachenmaier, Clara, Zarrieß, Sina, König-Ries, Birgitta
We apply NER to a particular sub-genre of legal texts in German: the genre of legal norms regulating administrative processes in public service administration. The analysis of such texts involves identifying stretches of text that instantiate one of ten classes identified by public service administration professionals. We investigate and compare three methods for performing Named Entity Recognition (NER) to detect these classes: a Rule-based system, deep discriminative models, and a deep generative model. Our results show that Deep Discriminative models outperform both the Rule-based system as well as the Deep Generative model, the latter two roughly performing equally well, outperforming each other in different classes. The main cause for this somewhat surprising result is arguably the fact that the classes used in the analysis are semantically and syntactically heterogeneous, in contrast to the classes used in more standard NER tasks. Deep Discriminative models appear to be better equipped for dealing with this heterogenerity than both generic LLMs and human linguists designing rule-based NER systems.
Removing Spurious Correlation from Neural Network Interpretations
Fotouhi, Milad, Bahadori, Mohammad Taha, Feyisetan, Oluwaseyi, Arabshahi, Payman, Heckerman, David
The existing algorithms for identification of neurons responsible for undesired and harmful behaviors do not consider the effects of confounders such as topic of the conversation. In this work, we show that confounders can create spurious correlations and propose a new causal mediation approach that controls the impact of the topic. In experiments with two large language models, we study the localization hypothesis and show that adjusting for the effect of conversation topic, toxicity becomes less localized.
Patent-CR: A Dataset for Patent Claim Revision
Jiang, Lekang, Scherz, Pascal A, Goetz, Stephan
This paper presents Patent-CR, the first dataset created for the patent claim revision task in English. It includes both initial patent applications rejected by patent examiners and the final granted versions. Unlike normal text revision tasks that predominantly focus on enhancing sentence quality, such as grammar correction and coherence improvement, patent claim revision aims at ensuring the claims meet stringent legal criteria. These criteria are beyond novelty and inventiveness, including clarity of scope, technical accuracy, language precision, and legal robustness. We assess various large language models (LLMs) through professional human evaluation, including general LLMs with different sizes and architectures, text revision models, and domain-specific models. Our results indicate that LLMs often bring ineffective edits that deviate from the target revisions. In addition, domain-specific models and the method of fine-tuning show promising results. Notably, GPT-4 outperforms other tested LLMs, but further revisions are still necessary to reach the examination standard. Furthermore, we demonstrate the inconsistency between automated and human evaluation results, suggesting that GPT-4-based automated evaluation has the highest correlation with human judgment. This dataset, along with our preliminary empirical research, offers invaluable insights for further exploration in patent claim revision.
Four Guiding Principles for Modeling Causal Domain Knowledge: A Case Study on Brainstorming Approaches for Urban Blight Analysis
Razouk, Houssam, Leitner, Michael, Kern, Roman
Urban blight is a problem of high interest for planning and policy making. Researchers frequently propose theories about the relationships between urban blight indicators, focusing on relationships reflecting causality. In this paper, we improve on the integration of domain knowledge in the analysis of urban blight by introducing four rules for effective modeling of causal domain knowledge. The findings of this study reveal significant deviation from causal modeling guidelines by investigating cognitive maps developed for urban blight analysis. These findings provide valuable insights that will inform future work on urban blight, ultimately enhancing our understanding of urban blight complex interactions.
Impromptu Cybercrime Euphemism Detection
Li, Xiang, Zhou, Yucheng, Zhao, Laiping, Li, Jing, Liu, Fangming
Detecting euphemisms is essential for content security on various social media platforms, but existing methods designed for detecting euphemisms are ineffective in impromptu euphemisms. In this work, we make a first attempt to an exploration of impromptu euphemism detection and introduce the Impromptu Cybercrime Euphemisms Detection (ICED) dataset. Moreover, we propose a detection framework tailored to this problem, which employs context augmentation modeling and multi-round iterative training. Our detection framework mainly consists of a coarse-grained and a fine-grained classification model. The coarse-grained classification model removes most of the harmless content in the corpus to be detected. The fine-grained model, impromptu euphemisms detector, integrates context augmentation and multi-round iterations training to better predicts the actual meaning of a masked token. In addition, we leverage ChatGPT to evaluate the mode's capability. Experimental results demonstrate that our approach achieves a remarkable 76-fold improvement compared to the previous state-of-the-art euphemism detector.
The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
Kirk, Hannah Rose, Whitefield, Alexander, Röttger, Paul, Bean, Andrew, Margatina, Katerina, Ciro, Juan, Mosquera, Rafael, Bartolo, Max, Williams, Adina, He, He, Vidgen, Bertie, Hale, Scott A.
Human feedback is central to the alignment of Large Language Models (LLMs). However, open questions remain about methods (how), domains (where), people (who) and objectives (to what end) of feedback processes. To navigate these questions, we introduce PRISM, a dataset that maps the sociodemographics and stated preferences of 1,500 diverse participants from 75 countries, to their contextual preferences and fine-grained feedback in 8,011 live conversations with 21 LLMs. With PRISM, we contribute (i) wider geographic and demographic participation in feedback; (ii) census-representative samples for two countries (UK, US); and (iii) individualised ratings that link to detailed participant profiles, permitting personalisation and attribution of sample artefacts. We target subjective and multicultural perspectives on value-laden and controversial issues, where we expect interpersonal and cross-cultural disagreement. We use PRISM in three case studies to demonstrate the need for careful consideration of which humans provide what alignment data.