Latent Variable Modeling in Multi-Agent Reinforcement Learning via Expectation-Maximization for UAV-Based Wildlife Protection
Taghavi, Mazyar, Farnoosh, Rahman
–arXiv.org Artificial Intelligence
I N T R O D U C T I O N T h e I r a n i a n l e o p a r d ( P a n t h e r a p a rd u s t u l l i a n a), a subspecies of the P ersian leopard, is critically endangered due to illegal poaching, habitat fragmentation, and h u m a n - w i l d l i f e c o n f l i c t. C o n s e r v a t i o n e f f o r t s a r e i n c r e a s i n g l y t u r n i n g t o t e c h n o l o g y f o r i n n o v a t i v e m o n i t o r i n g a n d i n t e r v e n t i o n m e t h o d s . Metric 10 Agents T raining Time (hrs) Memor y Usage (GB) CPU Utilization (%) GPU Utilization (%) T raining Time Increase (%) Memor y Usage Increase (%) 5.2 4.5 65 45 - - 20 Agents 50 Agents 6.3 5.1 75 55 20 15 8.0 6.8 85 70 53 51 T able 4. P ercentage of High-Risk zones Covered by Each Method (Mean std) F igure 3. P oacher Detection R ate Across Episodes. Higher Entropy Indicates More Diverse Exploration T able 5. KL Divergence between Inferred q(z) and Ground T ruth T ask Distribution T h e E M - b a s e d p o l i c y e x h i b i t s a n i n i t i a l l y h i g h e n t r o p y, e n c o u r a g i n g d i v e r s e a c t i o n s a m p l i n g, a n d g r a d u a l l y an n e a l s as th e po l i c y be c o m e s co n f i d e n t . Metric Cooperative Coverage Number of Agents Involved Coverage Efficiency (%) P oa ch er D et ec ti on R at e (%) Collision Incidents 6 85.3 - 0 P oacher Detection Coordination Conflict A voidance 8 - 92.1 0 10 - - 0 It enables conser vationists and security forces to allocate limited resources more effectiv e l y a n d a c t i n r e a l t i m e b a s e d o n a c t i o n a b l e i n t e l l i g e n c e d e r i v e d f r o m a u t o n o m o u s a g e n t s .
arXiv.org Artificial Intelligence
Oct-13-2025