frequency vector
- North America > United States (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- Information Technology > Security & Privacy (0.93)
- Information Technology > Data Science > Data Mining (0.68)
- Information Technology > Communications (0.68)
- Information Technology > Artificial Intelligence > Machine Learning (0.46)
- North America > United States (0.14)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > Texas (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- Information Technology > Security & Privacy (0.93)
- Information Technology > Data Science > Data Mining (0.68)
- Information Technology > Communications (0.68)
- Information Technology > Artificial Intelligence > Machine Learning (0.46)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Information Technology > Security & Privacy (0.93)
- Information Technology > Data Science > Data Mining (0.68)
- Information Technology > Communications (0.68)
- Information Technology > Artificial Intelligence > Machine Learning (0.46)
Language Detection by Means of the Minkowski Norm: Identification Through Character Bigrams and Frequency Analysis
Pogăcean, Paul-Andrei, Avram, Sanda-Maria
The debate surrounding language identification has gained renewed attention in recent years, especially with the rapid evolution of AI-powered language models. However, the non-AI-based approaches to language identification have been overshadowed. This research explores a mathematical implementation of an algorithm for language determinism by leveraging monograms and bigrams frequency rankings derived from established linguistic research. The datasets used comprise texts varying in length, historical period, and genre, including short stories, fairy tales, and poems. Despite these variations, the method achieves over 80\% accuracy on texts shorter than 150 characters and reaches 100\% accuracy for longer texts. These results demonstrate that classical frequency-based approaches remain effective and scalable alternatives to AI-driven models for language detection.
Steady-State Strategy Synthesis for Swarms of Autonomous Agents
Jonáš, Martin, Kučera, Antonín, Kůr, Vojtěch, Mačák, Jan
Steady-state synthesis aims to construct a policy for a given MDP $D$ such that the long-run average frequencies of visits to the vertices of $D$ satisfy given numerical constraints. This problem is solvable in polynomial time, and memoryless policies are sufficient for approximating an arbitrary frequency vector achievable by a general (infinite-memory) policy. We study the steady-state synthesis problem for multiagent systems, where multiple autonomous agents jointly strive to achieve a suitable frequency vector. We show that the problem for multiple agents is computationally hard (PSPACE or NP hard, depending on the variant), and memoryless strategy profiles are insufficient for approximating achievable frequency vectors. Furthermore, we prove that even evaluating the frequency vector achieved by a given memoryless profile is computationally hard. This reveals a severe barrier to constructing an efficient synthesis algorithm, even for memoryless profiles. Nevertheless, we design an efficient and scalable synthesis algorithm for a subclass of full memoryless profiles, and we evaluate this algorithm on a large class of randomly generated instances. The experimental results demonstrate a significant improvement against a naive algorithm based on strategy sharing.
Private Federated Frequency Estimation: Adapting to the Hardness of the Instance
Wu, Jingfeng, Zhu, Wennan, Kairouz, Peter, Braverman, Vladimir
In federated frequency estimation (FFE), multiple clients work together to estimate the frequencies of their collective data by communicating with a server that respects the privacy constraints of Secure Summation (SecSum), a cryptographic multi-party computation protocol that ensures that the server can only access the sum of client-held vectors. For single-round FFE, it is known that count sketching is nearly information-theoretically optimal for achieving the fundamental accuracy-communication trade-offs [Chen et al., 2022]. However, we show that under the more practical multi-round FEE setting, simple adaptations of count sketching are strictly sub-optimal, and we propose a novel hybrid sketching algorithm that is provably more accurate. We also address the following fundamental question: how should a practitioner set the sketch size in a way that adapts to the hardness of the underlying problem? We propose a two-phase approach that allows for the use of a smaller sketch size for simpler problems (e.g., near-sparse or light-tailed distributions). We conclude our work by showing how differential privacy can be added to our algorithm and verifying its superior performance through extensive experiments conducted on large-scale datasets.
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Information Technology > Security & Privacy (0.93)
- Information Technology > Data Science > Data Mining (0.68)
- Information Technology > Communications (0.68)
- Information Technology > Artificial Intelligence > Machine Learning (0.46)
COLOGNE: Coordinated Local Graph Neighborhood Sampling
Representation learning for graphs enables the application of standard machine learning algorithms and data analysis tools to graph data. Replacing discrete unordered objects such as graph nodes by real-valued vectors is at the heart of many approaches to learning from graph data. Such vector representations, or embeddings, capture the discrete relationships in the original data by representing nodes as vectors in a high-dimensional space. In most applications graphs model the relationship between real-life objects and often nodes contain valuable meta-information about the original objects. While being a powerful machine learning tool, embeddings are not able to preserve such node attributes. We address this shortcoming and consider the problem of learning discrete node embeddings such that the coordinates of the node vector representations are graph nodes. This opens the door to designing interpretable machine learning algorithms for graphs as all attributes originally present in the nodes are preserved. We present a framework for coordinated local graph neighborhood sampling (COLOGNE) such that each node is represented by a fixed number of graph nodes, together with their attributes. Individual samples are coordinated and they preserve the similarity between node neighborhoods. We consider different notions of similarity for which we design scalable algorithms. We show theoretical results for all proposed algorithms. Experiments on benchmark graphs evaluate the quality of the designed embeddings and demonstrate how the proposed embeddings can be used in training interpretable machine learning algorithms for graph data.
- Leisure & Entertainment (0.46)
- Health & Medicine (0.46)