AITopics | Sanchez, David

Collaborating Authors

Sanchez, David

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Entropy and type-token ratio in gigaword corpora

Rosillo-Rodes, Pablo, Miguel, Maxi San, Sanchez, David

arXiv.org Artificial IntelligenceNov-15-2024

Lexical diversity measures the vocabulary variation in texts. While its utility is evident for analyses in language change and applied linguistics, it is not yet clear how to operationalize this concept in a unique way. We here investigate entropy and text-token ratio, two widely employed metrics for lexical diversities, in six massive linguistic datasets in English, Spanish, and Turkish, consisting of books, news articles, and tweets. These gigaword corpora correspond to languages with distinct morphological features and differ in registers and genres, thus constituting a diverse testbed for a quantitative approach to lexical diversity. Strikingly, we find a functional relation between entropy and text-token ratio that holds across the corpora under consideration. Further, in the limit of large vocabularies we find an analytical expression that sheds light on the origin of this relation and its connection with both Zipf and Heaps laws. Our results then contribute to the theoretical understanding of text structure and offer practical implications for fields like natural language processing.

artificial intelligence, natural language, social media, (19 more...)

arXiv.org Artificial Intelligence

2411.10227

Country:

Europe (1.00)
North America > United States (0.67)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Computational lexical analysis of Flamenco genres

Rosillo-Rodes, Pablo, Miguel, Maxi San, Sanchez, David

arXiv.org Artificial IntelligenceMay-9-2024

Flamenco, recognized by UNESCO as part of the Intangible Cultural Heritage of Humanity, is a profound expression of cultural identity rooted in Andalusia, Spain. However, there is a lack of quantitative studies that help identify characteristic patterns in this long-lived music tradition. In this work, we present a computational analysis of Flamenco lyrics, employing natural language processing and machine learning to categorize over 2000 lyrics into their respective Flamenco genres, termed as $\textit{palos}$. Using a Multinomial Naive Bayes classifier, we find that lexical variation across styles enables to accurately identify distinct $\textit{palos}$. More importantly, from an automatic method of word usage, we obtain the semantic fields that characterize each style. Further, applying a metric that quantifies the inter-genre distance we perform a network analysis that sheds light on the relationship between Flamenco styles. Remarkably, our results suggest historical connections and $\textit{palo}$ evolutions. Overall, our work illuminates the intricate relationships and cultural significance embedded within Flamenco lyrics, complementing previous qualitative discussions with quantitative analyses and sparking new discussions on the origin and development of traditional music genres.

lyric, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2405.05723

Country:

North America (0.93)
Europe > Spain > Andalusia (0.24)
Europe > Spain > Balearic Islands > Mallorca (0.14)

Genre: Research Report > New Finding (0.86)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

American cultural regions mapped through the lexical analysis of social media

Louf, Thomas, Gonçalves, Bruno, Ramasco, Jose J., Sanchez, David, Grieve, Jack

arXiv.org Artificial IntelligenceApr-18-2023

Seven of the most prominent theories Cultural identity is an elusive notion because it depends [3-9] are mapped in Figure 1, showing considerable on a wide range of different cultural factors-- disagreement. For example, in [5] the geographer Wilbur including politics, religion, ethnicity, economics, and art, Zelinsky identified 5 major cultural regions--New England, among countless other examples--which will generally the Midland, the South, the Middle West, and the differ across individuals, with the cultural background West--based on a synthesis of regional patterns in a wide of every individual ultimately being unique. Nevertheless, range of cultural factors, including ethnicity, religion, individuals from the same region can generally be economics, and settlement history. Alternatively, in [6] expected to share some cultural traits, reflecting the drawing on a similar but more extensive range of cultural shared cultural values and practices associated with the factors, the social scientist Raymond Gastil identified 13 region [1]. Identifying the cultural regions of a nation-- major cultural regions, offering a more complex theory regions whose populations are characterized by relative than Zelinsky, including by dividing Zelinsky's Midland, cultural homogeneity compared to the populations of Middle West, and West regions. The two studies illustrate other regions within the nation--is very valuable information two basic limitations with these types of approaches across a wide range of domains. For example, it that subjectively synthesize a range of data to infer cultural is important for governments to understand geographical regions. First, it is unclear exactly how relevant variation in the values of their population so as to cultural factors should be identified. Zelinsky considers better meet their educational, social, and welfare needs.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1057/s41599-023-01611-3

2208.07649

Country:

North America > United States (1.00)
Europe > Spain > Balearic Islands (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Services (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.64)

Add feedback

Ordinal analysis of lexical patterns

Sanchez, David, Zunino, Luciano, De Gregorio, Juan, Toral, Raul, Mirasso, Claudio

arXiv.org Artificial IntelligenceMar-14-2023

Words are fundamental linguistic units that connect thoughts and things through meaning. However, words do not appear independently in a text sequence. The existence of syntactic rules induces correlations among neighboring words. Using an ordinal pattern approach, we present an analysis of lexical statistical connections for 11 major languages. We find that the diverse manners that languages utilize to express word relations give rise to unique pattern structural distributions. Furthermore, fluctuations of these pattern distributions for a given language can allow us to determine both the historical period when the text was written and its author. Taken together, our results emphasize the relevance of ordinal time series analysis in linguistic typology, historical linguistics and stylometry.

artificial intelligence, natural language, ordinal analysis, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1063/5.0139852

2208.11175

Country:

North America > United States (0.46)
Europe > Spain (0.28)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Enhanced Security and Privacy via Fragmented Federated Learning

Jebreel, Najeeb Moharram, Domingo-Ferrer, Josep, Blanco-Justicia, Alberto, Sanchez, David

arXiv.org Artificial IntelligenceNov-19-2022

In federated learning (FL), a set of participants share updates computed on their local data with an aggregator server that combines updates into a global model. However, reconciling accuracy with privacy and security is a challenge to FL. On the one hand, good updates sent by honest participants may reveal their private local information, whereas poisoned updates sent by malicious participants may compromise the model's availability and/or integrity. On the other hand, enhancing privacy via update distortion damages accuracy, whereas doing so via update aggregation damages security because it does not allow the server to filter out individual poisoned updates. To tackle the accuracy-privacy-security conflict, we propose {\em fragmented federated learning} (FFL), in which participants randomly exchange and mix fragments of their updates before sending them to the server. To achieve privacy, we design a lightweight protocol that allows participants to privately exchange and mix encrypted fragments of their updates so that the server can neither obtain individual updates nor link them to their originators. To achieve security, we design a reputation-based defense tailored for FFL that builds trust in participants and their mixed updates based on the quality of the fragments they exchange and the mixed updates they send. Since the exchanged fragments' parameters keep their original coordinates and attackers can be neutralized, the server can correctly reconstruct a global model from the received mixed updates without accuracy loss. Experiments on four real data sets show that FFL can prevent semi-honest servers from mounting privacy attacks, can effectively counter poisoning attacks and can keep the accuracy of the global model.

data mining, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TNNLS.2022.3212627

2207.05978

Country: Europe (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

A Critical Review on the Use (and Misuse) of Differential Privacy in Machine Learning

Blanco-Justicia, Alberto, Sanchez, David, Domingo-Ferrer, Josep, Muralidhar, Krishnamurty

arXiv.org Artificial IntelligenceJul-5-2022

As long ago as the 1970s, official statisticians [Dalenius(1977)] began to worry about potential disclosure of private information on people or companies linked to the publication of statistical outputs. This ushered in the statistical disclosure control (SDC) discipline [Hundepool et al.(2012)], whose goal is to provide methods for data anonymization. Also related to SDC is randomized response (RR, [Warner(1965)]), which was designed in the 1960s as a mechanism to eliminate evasive answer bias in surveys and turned out to be very useful for anonymization. The usual approach to anonymization in official statistics is utility-first: anonymization parameters are iteratively tried until a parameter choice is found that preserves sufficient analytical utility while reducing below a certain threshold the risk of disclosing confidential information on specific respondents. Both utility and privacy are evaluated ex post by respectively measuring the information loss and the probability of re-identification of the anonymized outputs.

artificial intelligence, machine learning, survey article, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3547139

2206.04621

Country:

Europe (0.67)
North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback