commons
- Europe > United Kingdom > England > Greater London > London (0.05)
- South America > Brazil > São Paulo (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (7 more...)
- Leisure & Entertainment > Games (0.48)
- Food & Agriculture > Fishing (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
The Art of the Impersonal Essay, by Zadie Smith
In my experience, every kind of writing requires some kind of self-soothing Jedi mind trick, and, when it comes to essay composition, the rectangle is mine. What had seemed an impossible task transformed into a practical matter of six little arrows. The first essay anybody writes is for school. But the only examples I remember are the ones I wrote at the end, in my A-level exams. One compared Hitler to Stalin. I was proudest of the essay that considered whether the poet John Milton--pace William Blake--was "of the devil's party without knowing it." I did well on those standardized tests, but even passing was far from a foregone conclusion. I'd screwed up my mocks, the year before, smoking too much weed and studying rarely. Since then, I'd cleaned up my act--a bit--but was still overwhelmed by the task before me. My rested on a few essays written in the school hall under a three-hour time constraint?
- North America > United States > New York (0.04)
- North America > United States > California (0.04)
- Education (1.00)
- Health & Medicine > Therapeutic Area (0.34)
Peers vote to defy government over copyright threat from AI
Peers voted by 221 to 116 on Wednesday to insist on an amendment to force AI companies to be transparent about what material they use to train their models. He added: "We will not let the government forget their promise to support our creative industries. We will not back down and we will not quietly go away. This is just the beginning." Resistance to the changes in the Lords has been led by Beeban Kidron, a cross-bench peer and film director, whose amendments have been repeatedly backed by the upper chamber.
- Government (1.00)
- Law > Intellectual Property & Technology Law (0.52)
- Law > Statutes (0.33)
Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
Langlais, Pierre-Carl, Hinostroza, Carlos Rosas, Nee, Mattia, Arnett, Catherine, Chizhov, Pavel, Jones, Eliot Krzystof, Girard, Irène, Mach, David, Stasenko, Anastasia, Yamshchikov, Ivan P.
Large Language Models (LLMs) are pre-trained on large amounts of data from different sources and domains. These data most often contain trillions of tokens with large portions of copyrighted or proprietary content, which hinders the usage of such models under AI legislation. This raises the need for truly open pre-training data that is compliant with the data security regulations. In this paper, we introduce Common Corpus, the largest open dataset for language model pre-training. The data assembled in Common Corpus are either uncopyrighted or under permissible licenses and amount to about two trillion tokens. The dataset contains a wide variety of languages, ranging from the main European languages to low-resource ones rarely present in pre-training datasets; in addition, it includes a large portion of code data. The diversity of data sources in terms of covered domains and time periods opens up the paths for both research and entrepreneurial needs in diverse areas of knowledge. In this technical report, we present the detailed provenance of data assembling and the details of dataset filtering and curation. Being already used by such industry leaders as Anthropic and multiple LLM training projects, we believe that Common Corpus will become a critical infrastructure for open science research in LLMs.
- Oceania > New Zealand (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > Dominican Republic (0.04)
- (13 more...)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government > Regional Government > North America Government > United States Government (0.68)
Self-organisation of common good usage and an application to Internet services
Pires, Diogo L., Mancuso, Vincenzo, Castagno, Paolo, Marsan, Marco Ajmone
Natural and human-made common goods present key challenges due to their susceptibility to degradation, overuse, or congestion. We explore the self-organisation of their usage when individuals have access to several available commons but limited information on them. We propose an extension of the Win-Stay, Lose-Shift (WSLS) strategy for such systems, under which individuals use a resource iteratively until they are unsuccessful and then shift randomly. This simple strategy leads to a distribution of the use of commons with an improvement against random shifting. Selective individuals who retain information on their usage and accordingly adapt their tolerance to failure in each common good improve the average experienced quality for an entire population. Hybrid systems of selective and non-selective individuals can lead to an equilibrium with equalised experienced quality akin to the ideal free distribution. We show that these results can be applied to the server selection problem faced by mobile users accessing Internet services and we perform realistic simulations to test their validity. Furthermore, these findings can be used to understand other real systems such as animal dispersal on grazing and foraging land, and to propose solutions to operators of systems of public transport or other technological commons.
- Europe > Italy > Piedmont > Turin Province > Turin (0.14)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (3 more...)
- Transportation (0.67)
- Telecommunications (0.67)
- Information Technology > Networks (0.34)
U-Net in Medical Image Segmentation: A Review of Its Applications Across Modalities
Neha, Fnu, Bhati, Deepshikha, Shukla, Deepak Kumar, Dalvi, Sonavi Makarand, Mantzou, Nikolaos, Shubbar, Safa
Medical imaging is essential in healthcare to provide key insights into patient anatomy and pathology, aiding in diagnosis and treatment. Non-invasive techniques such as X-ray, Magnetic Resonance Imaging (MRI), Computed Tomography (CT), and Ultrasound (US), capture detailed images of organs, tissues, and abnormalities. Effective analysis of these images requires precise segmentation to delineate regions of interest (ROI), such as organs or lesions. Traditional segmentation methods, relying on manual feature-extraction, are labor-intensive and vary across experts. Recent advancements in Artificial Intelligence (AI) and Deep Learning (DL), particularly convolutional models such as U-Net and its variants (U-Net++ and U-Net 3+), have transformed medical image segmentation (MIS) by automating the process and enhancing accuracy. These models enable efficient, precise pixel-wise classification across various imaging modalities, overcoming the limitations of manual segmentation. This review explores various medical imaging techniques, examines the U-Net architectures and their adaptations, and discusses their application across different modalities. It also identifies common challenges in MIS and proposes potential solutions.
- Europe > Greece > Central Macedonia > Thessaloniki (0.04)
- North America > United States > Ohio > Portage County > Kent (0.04)
- North America > United States > New Jersey > Essex County > Newark (0.04)
- (3 more...)
- Overview (1.00)
- Research Report > Promising Solution (0.48)
Public Domain 12M: A Highly Aesthetic Image-Text Dataset with Novel Governance Mechanisms
Meyer, Jordan, Padgett, Nick, Miller, Cullen, Exline, Laura
We present Public Domain 12M (PD12M), a dataset of 12.4 million high-quality public domain and CC0-licensed images with synthetic captions, designed for training text-to-image models. PD12M is the largest public domain image-text dataset to date, with sufficient size to train foundation models while minimizing copyright concerns. Through the Source.Plus platform, we also introduce novel, community-driven dataset governance mechanisms that reduce harm and support reproducibility over time.
- Europe > Poland (0.04)
- Asia > Middle East > Jordan (0.04)
- Law (1.00)
- Information Technology (0.69)
2b0f658cbffd284984fb11d90254081f-Paper.pdf
Humanity faces numerous problems of common-pool resource appropriation. This class of multi-agent social dilemma includes the problems of ensuring sustainable use of fresh water, common fisheries, grazing pastures, and irrigation systems. Abstract models of common-pool resource appropriation based on non-cooperative game theory predict that self-interested agents will generally fail to find socially positive equilibria--a phenomenon called the tragedy of the commons. However, in reality, human societies are sometimes able to discover and implement stable cooperative solutions. Decades of behavioral game theory research have sought to uncover aspects of human behavior that make this possible.
- Europe > United Kingdom > England > Greater London > London (0.05)
- South America > Brazil > São Paulo (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (7 more...)
- Leisure & Entertainment > Games (0.48)
- Food & Agriculture > Fishing (0.47)
- Information Technology > Game Theory (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
New Online Communities: Graph Deep Learning on Anonymous Voting Networks to Identify Sybils in Polycentric Governance
This research examines the polycentric governance of digital assets in blockchain-based Decentralized Autonomous Organizations (DAOs). It offers a theoretical framework and addresses a critical challenge facing decentralized governance by developing a method to identify sybils, or spurious identities. Sybils pose significant organizational sustainability threats to DAOs and other, commons-based online communities, and threat models are identified. The experimental method uses graph deep learning techniques to identify sybil activity in a DAO governance dataset (snapshot.org). Specifically, a Graph Convolutional Neural Network (GCNN) learned voting behaviours and a fast k-means vector clustering algorithm (FAISS) used high-dimensional embeddings to identify similar nodes in a graph. The results reveal that deep learning can effectively identify sybils, reducing the voting graph by 2-5%. This research underscores the importance of sybil resistance in DAOs and offers a novel perspective on decentralized governance, informing future policy, regulation, and governance practices.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (6 more...)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance > Trading (1.00)
- Law Enforcement & Public Safety (0.93)
Creating a Discipline-specific Commons for Infectious Disease Epidemiology
Wagner, Michael M., Hogan, William, Levander, John, Darr, Adam, Diller, Matt, Sibilla, Max, Sperringer,, Alexander T. Loiacono. Terence Jr., Brown, Shawn T.
Objective: To create a commons for infectious disease (ID) epidemiology in which epidemiologists, public health officers, data producers, and software developers can not only share data and software, but receive assistance in improving their interoperability. Materials and Methods: We represented 586 datasets, 54 software, and 24 data formats in OWL 2 and then used logical queries to infer potentially interoperable combinations of software and datasets, as well as statistics about the FAIRness of the collection. We represented the objects in DATS 2.2 and a software metadata schema of our own design. We used these representations as the basis for the Content, Search, FAIR-o-meter, and Workflow pages that constitute the MIDAS Digital Commons. Results: Interoperability was limited by lack of standardization of input and output formats of software. When formats existed, they were human-readable specifications (22/24; 92%); only 3 formats (13%) had machine-readable specifications. Nevertheless, logical search of a triple store based on named data formats was able to identify scores of potentially interoperable combinations of software and datasets. Discussion: We improved the findability and availability of a sample of software and datasets and developed metrics for assessing interoperability. The barriers to interoperability included poor documentation of software input/output formats and little attention to standardization of most types of data in this field. Conclusion: Centralizing and formalizing the representation of digital objects within a commons promotes FAIRness, enables its measurement over time and the identification of potentially interoperable combinations of data and software.
- North America > United States > Florida > Hillsborough County > University (0.04)
- South America (0.04)
- North America > United States > Pennsylvania (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology (1.00)
- Health & Medicine > Public Health (1.00)
- Health & Medicine > Epidemiology (1.00)