AITopics | Afghanistan

Collaborating Authors

Afghanistan

SentinelKilnDB: A Large-Scale Dataset and Benchmark for OBB Brick Kiln Detection in South Asia Using Satellite Imagery

Neural Information Processing SystemsJun-13-2026, 11:42:59 GMT

Air pollution was responsible for 2.6 million deaths across South Asia in 2021 alone, with brick manufacturing contributing significantly to this burden. In particular, the Indo-Gangetic Plain; a densely populated and highly polluted region spanning northern India, Pakistan, Bangladesh, and parts of Afghanistan sees brick kilns contributing 8-14% of ambient air pollution. Traditional monitoring approaches, such as field surveys and manual annotation using tools like Google Earth Pro, are time and labor-intensive. Prior ML-based efforts for automated detection have relied on costly high-resolution commercial imagery and non-public datasets, limiting reproducibility and scalability. In this work, we introduce SENTINELKILNDB, a publicly available, hand-validated benchmark of 62,671 brick kilns spanning threekiln types Fixed Chimney Bull's Trench Kiln (FCBK), Circular FCBK (CFCBK), and Zigzag kilns - annotated with oriented bounding boxes (OBBs) across 2.8 million km2 using free and globally accessible Sentinel-2 imagery. We benchmark state-of-the-art oriented object detection models and evaluate generalization across in-region, out-of-region, and super-resolution settings. SENTINELKILNDB enables rigorous evaluation of geospatial generalization and robustness for low-resolution object detection, and provides a new testbed for ML models addressing real-world environmental and remote sensing challenges at a continental scale. Datasets and code are available in SentinelKilnDB Dataset and SentinelKilnDB Bench-mark, under the Creative Commons Attribution-NonCommercial 4.0 International License.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Country:

Asia > Pakistan (0.27)
Asia > India (0.27)
Asia > Bangladesh (0.27)
Asia > Afghanistan (0.27)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.39)

Technology:

Information Technology > Artificial Intelligence > Vision (0.82)
Information Technology > Artificial Intelligence > Machine Learning (0.76)

Add feedback

Trump DOJ jumps into Musk xAI court battle as diversity fight heats up

FOX NewsApr-24-2026, 18:48:04 GMT

The DOJ joined Elon Musk's xAI in suing Colorado, alleging a state AI regulation law violates the First and Fourteenth amendments by forcing developers to adopt DEI ideology.

artificial intelligence, discrimination, social media, (9 more...)

FOX News

Country:

North America > United States > Colorado (0.32)
Asia > Middle East > Iran (0.15)
North America > United States > Texas (0.05)
(11 more...)

Industry:

Media > News (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications > Social Media (0.99)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.49)

Add feedback

Algorithmic Contiguity from Low-Degree Heuristic II: Predicting Detection-Recovery Gaps

Li, Zhangsong

arXiv.org Machine LearningApr-21-2026

The low-degree polynomial framework has emerged as a powerful tool for providing evidence of statistical-computational gaps in high-dimensional inference. For detection problems, the standard approach bounds the low-degree advantage through an explicit orthonormal basis. However, this method does not extend naturally to estimation tasks, and thus fails to capture the \emph{detection-recovery gap phenomenon} that arises in many high-dimensional problems. Although several important advances have been made to overcome this limitation \cite{SW22, SW25, CGGV25+}, the existing approaches often rely on delicate, model-specific combinatorial arguments. In this work, we develop a general approach for obtaining \emph{conditional computational lower bounds} for recovery problems from mild bounds on low-degree testing advantage. Our method combines the notion of algorithmic contiguity in \cite{Li25} with a cross-validation reduction in \cite{DHSS25} that converts successful recovery into a hypothesis test with lopsided success probabilities. In contrast to prior unconditional lower bounds, our argument is conceptually simple, flexible, and largely model-independent. We apply this framework to several canonical inference problems, including planted submatrix, planted dense subgraph, stochastic block model, multi-frequency angular synchronization, orthogonal group synchronization, and multi-layer stochastic block model. In the first three settings, our method recovers existing low-degree lower bounds for recovery in \cite{SW22, SW25} via a substantially simpler argument. In the latter three, it gives new evidence for conjectured computational thresholds including the persistence of detection-recovery gaps. Together, these results suggest that mild control of low-degree advantage is often sufficient to explain computational barriers for recovery in high-dimensional statistical models.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2604.1741

Country:

North America > United States (0.28)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
Asia > Middle East > Jordan (0.04)
(3 more...)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

An Optimal Sauer Lemma Over $k$-ary Alphabets

Hanneke, Steve, Meng, Qinglin, Moran, Shay, Shaeiri, Amirreza

arXiv.org Machine LearningApr-15-2026

The Sauer-Shelah-Perles Lemma is a cornerstone of combinatorics and learning theory, bounding the size of a binary hypothesis class in terms of its Vapnik-Chervonenkis (VC) dimension. For classes of functions over a $k$-ary alphabet, namely the multiclass setting, the Natarajan dimension has long served as an analogue of VC dimension, yet the corresponding Sauer-type bounds are suboptimal for alphabet sizes $k>2$. In this work, we establish a sharp Sauer inequality for multiclass and list prediction. Our bound is expressed in terms of the Daniely--Shalev-Shwartz (DS) dimension, and more generally with its extension, the list-DS dimension -- the combinatorial parameters that characterize multiclass and list PAC learnability. Our bound is tight for every alphabet size $k$, list size $\ell$, and dimension value, replacing the exponential dependence on $\ell$ in the Natarajan-based bound by the optimal polynomial dependence, and improving the dependence on $k$ as well. Our proof uses the polynomial method. In contrast to the classical VC case, where several direct combinatorial proofs are known, we are not aware of any purely combinatorial proof in the DS setting. This motivates several directions for future research, which are discussed in the paper. As consequences, we obtain improved sample complexity upper bounds for list PAC learning and for uniform convergence of list predictors, sharpening the recent results of Charikar et al.~(STOC~2023), Hanneke et al.~(COLT~2024), and Brukhim et al.~(NeurIPS~2024).

artificial intelligence, dimension, machine learning, (16 more...)

arXiv.org Machine Learning

2604.12952

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.24)
North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(6 more...)

Genre:

Instructional Material (0.46)
Research Report (0.40)

Industry: Education > Educational Setting (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Add feedback

Differentially Private Language Generation and Identification in the Limit

Mehrotra, Anay, Velegkas, Grigoris, Yu, Xifan, Zhou, Felix

arXiv.org Machine LearningApr-10-2026

We initiate the study of language generation in the limit, a model recently introduced by Kleinberg and Mullainathan [KM24], under the constraint of differential privacy. We consider the continual release model, where a generator must eventually output a stream of valid strings while protecting the privacy of the entire input sequence. Our first main result is that for countable collections of languages, privacy comes at no qualitative cost: we provide an $\varepsilon$-differentially-private algorithm that generates in the limit from any countable collection. This stands in contrast to many learning settings where privacy renders learnability impossible. However, privacy does impose a quantitative cost: there are finite collections of size $k$ for which uniform private generation requires $Ω(k/\varepsilon)$ samples, whereas just one sample suffices non-privately. We then turn to the harder problem of language identification in the limit. Here, we show that privacy creates fundamental barriers. We prove that no $\varepsilon$-DP algorithm can identify a collection containing two languages with an infinite intersection and a finite set difference, a condition far stronger than the classical non-private characterization of identification. Next, we turn to the stochastic setting where the sample strings are sampled i.i.d. from a distribution (instead of being generated by an adversary). Here, we show that private identification is possible if and only if the collection is identifiable in the adversarial model. Together, our results establish new dimensions along which generation and identification differ and, for identification, a separation between adversarial and stochastic settings induced by privacy constraints.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2604.08504

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(7 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.66)

Add feedback

Enhancing Online Support Group Formation Using Topic Modeling Techniques

Barman, Pronob Kumar, Reynolds, Tera L., Foulds, James

arXiv.org Machine LearningMar-31-2026

Online health communities (OHCs) are vital for fostering peer support and improving health outcomes. Support groups within these platforms can provide more personalized and cohesive peer support, yet traditional support group formation methods face challenges related to scalability, static categorization, and insufficient personalization. To overcome these limitations, we propose two novel machine learning models for automated support group formation: the Group specific Dirichlet Multinomial Regression (gDMR) and the Group specific Structured Topic Model (gSTM). These models integrate user generated textual content, demographic profiles, and interaction data represented through node embeddings derived from user networks to systematically automate personalized, semantically coherent support group formation. We evaluate the models on a large scale dataset from MedHelp, comprising over 2 million user posts. Both models substantially outperform baseline methods including LDA, DMR, and STM in predictive accuracy (held out log likelihood), semantic coherence (UMass metric), and internal group consistency. The gDMR model yields group covariates that facilitate practical implementation by leveraging relational patterns from network structures and demographic data. In contrast, gSTM emphasizes sparsity constraints to generate more distinct and thematically specific groups. Qualitative analysis further validates the alignment between model generated groups and manually coded themes, showing the practical relevance of the models in informing groups that address diverse health concerns such as chronic illness management, diagnostic uncertainty, and mental health. By reducing reliance on manual curation, these frameworks provide scalable solutions that enhance peer interactions within OHCs, with implications for patient engagement, community resilience, and health outcomes.

machine learning, manuscriptsubmittedtoacm, natural language, (17 more...)

arXiv.org Machine Learning

2603.24765

Country:

Europe > Lithuania (0.05)
Oceania > Kiribati (0.04)
Oceania > Australia (0.04)
(47 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.94)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Distributed Gradient Clustering: Convergence and the Effect of Initialization

Armacki, Aleksandar, Sharma, Himkant, Bajović, Dragana, Jakovetić, Dušan, Chakraborty, Mrityunjoy, Kar, Soummya

arXiv.org Machine LearningMar-31-2026

We study the effects of center initialization on the performance of a family of distributed gradient-based clustering algorithms introduced in [1], that work over connected networks of users. In the considered scenario, each user contains a local dataset and communicates only with its immediate neighbours, with the aim of finding a global clustering of the joint data. We perform extensive numerical experiments, evaluating the effects of center initialization on the performance of our family of methods, demonstrating that our methods are more resilient to the effects of initialization, compared to centralized gradient clustering [2]. Next, inspired by the $K$-means++ initialization [3], we propose a novel distributed center initialization scheme, which is shown to improve the performance of our methods, compared to the baseline random initialization.

artificial intelligence, initialization, machine learning, (19 more...)

arXiv.org Machine Learning

doi: 10.1109/IEEECONF60004.2024.10942834

2603.20507

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Europe > Serbia > Vojvodina > South Bačka District > Novi Sad (0.05)
Asia > India > West Bengal > Kharagpur (0.04)
(5 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Avoiding Imposters and Delinquents: Adversarial Crowdsourcing and Peer Prediction

Jacob Steinhardt, Gregory Valiant, Moses Charikar

Neural Information Processing SystemsMar-23-2026, 00:52:08 GMT

We consider a crowdsourcing model in which nworkers are asked to rate the quality of nitems previously generated by other workers. An unknown set of αnworkers generate reliable ratings, while the remaining workers may behave arbitrarily and possibly adversarially. The manager of the experiment can also manually evaluate the quality of a small number of items, and wishes to curate together almost all of the high-quality items with at most anfraction of low-quality items.

artificial intelligence, machine learning, rater, (17 more...)

Neural Information Processing Systems

Country: Asia > Afghanistan > Parwan Province > Charikar (0.40)

Genre: Instructional Material > Online (0.46)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.72)

Add feedback

Approximate maximum entropy principles via Goemans-Williamson with applications to provable variational methods

Andrej Risteski, Yuanzhi Li

Neural Information Processing SystemsMar-23-2026, 00:21:27 GMT

Neural Information Processing Systems http://nips.cc/

entropy, relaxation, variational method, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.44)

Add feedback

Scalable Learning of Multivariate Distributions via Coresets

Ding, Zeyu, Ickstadt, Katja, Klein, Nadja, Munteanu, Alexander, Omlor, Simon

arXiv.org Machine LearningMar-23-2026

Efficient and scalable non-parametric or semi-parametric regression analysis and density estimation are of crucial importance to the fields of statistics and machine learning. However, available methods are limited in their ability to handle large-scale data. We address this issue by developing a novel coreset construction for multivariate conditional transformation models (MCTMs) to enhance their scalability and training efficiency. To the best of our knowledge, these are the first coresets for semi-parametric distributional models. Our approach yields substantial data reduction via importance sampling. It ensures with high probability that the log-likelihood remains within multiplicative error bounds of $(1\pm\varepsilon)$ and thereby maintains statistical model accuracy. Compared to conventional full-parametric models, where coresets have been incorporated before, our semi-parametric approach exhibits enhanced adaptability, particularly in scenarios where complex distributions and non-linear relationships are present, but not fully understood. To address numerical problems associated with normalizing logarithmic terms, we follow a geometric approximation based on the convex hull of input data. This ensures feasible, stable, and accurate inference in scenarios involving large amounts of data. Numerical experiments demonstrate substantially improved computational efficiency when handling large and complex datasets, thus laying the foundation for a broad range of applications within the statistics and machine learning communities.

artificial intelligence, machine learning, munteanu, (18 more...)

arXiv.org Machine Learning

2603.19792

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > United States > North Carolina > Durham County > Durham (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback