murtagh
Ultrametric Model of Mind, II: Application to Text Content Analysis
In a companion paper, Murtagh (2012), we discussed how Matte Blanco's work linked the unrepressed unconscious (in the human) to symmetric logic and thought processes. We showed how ultrametric topology provides a most useful representational and computational framework for this. Now we look at the extent to which we can find ultrametricity in text. We use coherent and meaningful collections of nearly 1000 texts to show how we can measure inherent ultrametricity. On the basis of our findings we hypothesize that inherent ultrametricty is a basis for further exploring unconscious thought processes.
- North America > United States > New York (0.04)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- North America > United States > Tennessee (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Data Science > Data Mining (0.82)
Ultrametric Model of Mind, I: Review
We mathematically model Ignacio Matte Blanco's principles of symmetric and asymmetric being through use of an ultrametric topology. We use for this the highly regarded 1975 book of this Chilean psychiatrist and pyschoanalyst (born 1908, died 1995). Such an ultrametric model corresponds to hierarchical clustering in the empirical data, e.g. text. We show how an ultrametric topology can be used as a mathematical model for the structure of the logic that reflects or expresses Matte Blanco's symmetric being, and hence of the reasoning and thought processes involved in conscious reasoning or in reasoning that is lacking, perhaps entirely, in consciousness or awareness of itself. In a companion paper we study how symmetric (in the sense of Matte Blanco's) reasoning can be demarcated in a context of symmetric and asymmetric reasoning provided by narrative text.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)
The Future of Search and Discovery in Big Data Analytics: Ultrametric Information Spaces
Murtagh, Fionn, Contreras, Pedro
Under the heading of "Addressing the big data challenge", the European 7th Framework Programme sees the issue thus (see INFSO, 2012): "Recent industry reports detail how data volumes are growing at a faster rate than our ability to interpret and exploit them for innovative ICT applications, for decision support, planning, monitoring, control and interaction. This includes unstructured data types such as video, audio, images and free text as well as structured data types such as database records, sensor readings and 3D. While each of these types requires some specific form of processing and analytics, many of the general principles for managing and storing them at extreme scales are common across all of them." Analytics tool capability is called for, to address these burgeoning issues in the data intensive industries, to support "effective policy making and implementation" of public bodies resulting in "significant annual savings from 1 Big Data applications", and also to exploit open, linked data - "foster the reuse of public sector information and strengthen other open data activities linked to commercial exploitation." The "big data" marketplace is stated to be potentially worth approximately USD 600 billion. To address the challenges of search and discovery in massive and complex data sets and data flows, it is our contention in this work that we must move to an appropriate topology - to an appropriate framework such that computation is greatly facilitated. Our work is all about empowering those who are involved in data analytics, through clustering and related algorithms, to face these new challenges. Scalability and interactivity are two of the performance issues that follow directly from clustering algorithms, for search, retrieval and discovery, that are of linear computational complexity or better (logarithmic, or constant).
- North America > United States > District of Columbia > Washington (0.04)
- Europe > Austria > Vienna (0.04)
- Asia > Singapore (0.04)
Ward's Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm
Murtagh, Fionn, Legendre, Pierre
In the literature and in software packages there is confusion in regard to what is termed the Ward hierarchical clustering method. This relates to any and possibly all of the following: (i) input dissimilarities, whether squared or not; (ii) output dendrogram heights and whether or not their square root is used; and (iii) there is a subtle but important difference that we have found in the loop structure of the stepwise dissimilarity-based agglomerative algorithm. Our main objective in this work is to warn users of hierarchical clustering about this, to raise awareness about these distinctions or differences, and to urge users to check what their favorite software package is doing. In R, the function hclust of stats with the method "ward"option produces results that correspond to a Ward method (Ward
- North America > Canada (0.04)
- Europe > Ireland (0.04)
Modern hierarchical, agglomerative clustering algorithms
This paper presents algorithms for hierarchical, agglomerative clustering which perform most efficiently in the general-purpose setup that is given in modern standard software. Requirements are: (1) the input data is given by pairwise dissimilarities between data points, but extensions to vector data are also discussed (2) the output is a "stepwise dendrogram", a data structure which is shared by all implementations in current standard software. We present algorithms (old and new) which perform clustering in this setting efficiently, both in an asymptotic worst-case analysis and from a practical point of view. The main contributions of this paper are: (1) We present a new algorithm which is suitable for any distance update scheme and performs significantly better than the existing algorithms. (2) We prove the correctness of two algorithms by Rohlf and Murtagh, which is necessary in each case for different reasons. (3) We give well-founded recommendations for the best current algorithms for the various agglomerative clustering schemes.
- Europe > Austria > Vienna (0.14)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > New York (0.04)
- (5 more...)
Fast, Linear Time Hierarchical Clustering using the Baire Metric
Contreras, Pedro, Murtagh, Fionn
The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. In this work we evaluate empirically this new approach to hierarchical clustering. We compare hierarchical clustering based on the Baire metric with (i) agglomerative hierarchical clustering, in terms of algorithm properties; (ii) generalized ultrametrics, in terms of definition; and (iii) fast clustering through k-means partititioning, in terms of quality of results. For the latter, we carry out an in depth astronomical study. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more costly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we use clusterwise regression for this.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > Illinois > Champaign County > Champaign (0.04)
- (4 more...)
Methods of Hierarchical Clustering
Murtagh, Fionn, Contreras, Pedro
Agglomerative hierarchical clustering has been the dominant approach to constructing embedded classification schemes. It is our aim to direct the reader's attention to practical algorithms and methods - both efficient (from the computational and storage points of view) and effective (from the application point of view). It is often helpful to distinguish between method, involving a compactness criterion and the target structure of a 2-way tree representing the partial order on subsets of the power set; as opposed to an implementation, which relates to the detail of the algorithm used. As with many other multivariate techniques, the objects to be classified have numerical measurements on a set of variables or attributes. Hence, the analysis is carried out on the rows of an array or matrix.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York (0.05)
- North America > United States > District of Columbia > Washington (0.05)
- (7 more...)
- Overview (0.93)
- Research Report (0.64)
Fast redshift clustering with the Baire (ultra) metric
Murtagh, Fionn, Contreras, Pedro
If X is endowed with a metric, then this metric can be mapped onto an ultrametric. In practice, endowing X with a metric can be relaxed to a dissimilarity. An often used mapping from metric to ultrametric is by means of an agglomerative hierarchical clustering algorithm. A succession of n 1 pairwise merge steps take place by making use of the closest pair of singletons and/or clusters at each step. Here n is the number of observations, i.e. the cardinality of set X. Closeness between singletons is furnished by whatever distance or dissimilarity is in use. For closeness between singleton or non-singleton clusters, we need to define an inter-cluster distance or dissimilarity. This can be defined with reference to the cluster compactness or other property that we wish to optimize at each step of the algorithm. Since agglomerative hierarchical clustering requires consideration of pairwise dissimilarities at each stage it can be shown that even in the case of the most efficient algorithms, e.g.
- Europe > United Kingdom (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- (5 more...)
Ultrametric and Generalized Ultrametric in Computational Logic and in Data Analysis
Following a review of metric, ultrametric and generalized ultrametric, we review their application in data analysis. We show how they allow us to explore both geometry and topology of information, starting with measured data. Some themes are then developed based on the use of metric, ultrametric and generalized ultrametric in logic. In particular we study approximation chains in an ultrametric or generalized ultrametric context. Our aim in this work is to extend the scope of data analysis by facilitating reasoning based on the data analysis; and to show how quantitative and qualitative data analysis can be incorporated into logic programming.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Serbia > Vojvodina > South Bačka District > Novi Sad (0.04)
- (2 more...)
- Overview (0.66)
- Research Report (0.50)
Segmentation and Nodal Points in Narrative: Study of Multiple Variations of a Ballad
The Lady Maisry ballads afford us a framework within which to segment a storyline into its major components. Segments and as a consequence nodal points are discussed for nine different variants of the Lady Maisry story of a (young) woman being burnt to death by her family, on account of her becoming pregnant by a foreign personage. We motivate the importance of nodal points in textual and literary analysis. We show too how the openings of the nine variants can be analyzed comparatively, and also the conclusions of the ballads.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > New York (0.04)
- (2 more...)