AITopics | menzerath

Collaborating Authors

menzerath

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Humpback whale songs have patterns that resemble human language

New ScientistFeb-6-2025, 19:00:56 GMT

Humpback whale songs have statistical patterns in their structure that are remarkably similar to those seen in human language. While this doesn't mean the songs convey complex meanings like our sentences do, it hints that whales may learn their songs in a similar way to how human infants start to understand language. Only male humpback whales (Megaptera novaeangliae) sing, and the behaviour is thought to be important for attracting mates. The songs are constantly evolving, with new elements appearing and spreading through the population until the old song is completely replaced with a new one. We're finally realising that many species are "We think it's a little bit like a standardised test, where everybody's got to do the same task but you can make changes and embellishments to show that you're better at the task than everybody else," says Jenny Allen at Griffith University in Gold Coast, Australia.

human language, humpback whale song, whale song, (11 more...)

New Scientist

Country:

Oceania > Australia (0.25)
Pacific Ocean (0.05)
Oceania > New Caledonia (0.05)
(2 more...)

Technology: Information Technology > Artificial Intelligence (0.98)

Add feedback

Linguistic Laws Meet Protein Sequences: A Comparative Analysis of Subword Tokenization Methods

Suyunu, Burak, Taylan, Enes, Özgür, Arzucan

arXiv.org Artificial IntelligenceNov-26-2024

Tokenization is a crucial step in processing protein sequences for machine learning models, as proteins are complex sequences of amino acids that require meaningful segmentation to capture their functional and structural properties. However, existing subword tokenization methods, developed primarily for human language, may be inadequate for protein sequences, which have unique patterns and constraints. This study evaluates three prominent tokenization approaches, Byte-Pair Encoding (BPE), WordPiece, and SentencePiece, across varying vocabulary sizes (400-6400), analyzing their effectiveness in protein sequence representation, domain boundary preservation, and adherence to established linguistic laws. Our comprehensive analysis reveals distinct behavioral patterns among these tokenizers, with vocabulary size significantly influencing their performance. BPE demonstrates better contextual specialization and marginally better domain boundary preservation at smaller vocabularies, while SentencePiece achieves better encoding efficiency, leading to lower fertility scores. WordPiece offers a balanced compromise between these characteristics. However, all tokenizers show limitations in maintaining protein domain integrity, particularly as vocabulary size increases. Analysis of linguistic law adherence shows partial compliance with Zipf's and Brevity laws but notable deviations from Menzerath's law, suggesting that protein sequences may follow distinct organizational principles from natural languages. These findings highlight the limitations of applying traditional NLP tokenization methods to protein sequences and emphasize the need for developing specialized tokenization strategies that better account for the unique characteristics of proteins.

protein sequence, sentencepiece, vocabulary size, (10 more...)

arXiv.org Artificial Intelligence

2411.17669

Country:

North America > United States (0.14)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Add feedback

Simple stochastic processes behind Menzerath's Law

Milička, Jiří

arXiv.org Artificial IntelligenceAug-30-2024

This paper revisits Menzerath's Law, also known as the Menzerath-Altmann Law, which models a relationship between the length of a linguistic construct and the average length of its constituents. Recent findings indicate that simple stochastic processes can display Menzerathian behaviour, though existing models fail to accurately reflect real-world data. If we adopt the basic principle that a word can change its length in both syllables and phonemes, where the correlation between these variables is not perfect and these changes are of a multiplicative nature, we get bivariate log-normal distribution. The present paper shows, that from this very simple principle, we obtain the classic Altmann model of the Menzerath-Altmann Law. If we model the joint distribution separately and independently from the marginal distributions, we can obtain an even more accurate model by using a Gaussian copula. The models are confronted with empirical data, and alternative approaches are discussed.

joint distribution, menzerath, stochastic process, (15 more...)

arXiv.org Artificial Intelligence

2409.00279

Country:

Europe > Netherlands > South Holland > Dordrecht (0.05)
Europe > Czechia > Prague (0.05)
Europe > United Kingdom (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.73)

Add feedback

Parallels of human language in the behavior of bottlenose dolphins

Ferrer-i-Cancho, R., Lusseau, D., McCowan, B.

arXiv.org Artificial IntelligenceMar-25-2022

Here we review them with the help of quantitative linguistics and information theory.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.2478/lf-2022-0002

1605.01661

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Yolo County > Davis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(9 more...)

Genre:

Research Report (0.40)
Overview (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.70)
Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Statistical Parameters of the Novel "Perekhresni stezhky" ("The Cross-Paths") by Ivan Franko

Buk, Solomija, Rovenchak, Andrij

arXiv.org Artificial IntelligenceDec-28-2005

Year 2006 is the 150th anniversary of Ivan Franko (1856-1916), the prominent Ukrainian writer, poet, publicist, philosopher, sociologist, economist, translator-polyglot and the public figure. His incomplete collected works were published in 50 volumes (Franko, 1976-86). With this name the notion of national identity in the Western Ukraine is connected. Franko's works have intensive plot and interesting topic.

franko, frequency, ivan franko, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1515/9783110894219.39

cs/0512102

Country:

Europe > Ukraine > Lviv Oblast > Lviv (0.06)
Europe > Ukraine > Kyiv Oblast > Kyiv (0.05)
Europe > Ukraine > Kharkiv Oblast > Kharkiv (0.05)
(3 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.69)

Add feedback