Goto

Collaborating Authors

 mssp


How to Scale Mixture-of-Experts: From muP to the Maximally Scale-Stable Parameterization

arXiv.org Machine Learning

Recent frontier large language models predominantly rely on Mixture-of-Experts (MoE) architectures. Despite empirical progress, there is still no principled understanding of how hyperparameters should scale with network width $N$, expert width $N_e$, number of experts $M$, sparsity $K$, and depth $L$ to ensure both stability and optimal performance at scale. We take a principled step toward resolving this gap by analyzing three different scaling regimes: (I) co-scaling $N\asymp N_e$, (II) co-scaling $N\asymp M\asymp K$, and (III) full proportional scaling of $N, N_e, M$, and $K$. For each regime, we develop a novel Dynamical Mean Field Theory (DMFT) description of the limiting training dynamics of MoEs that provides a formal foundation for our analysis. Within this framework, we derive the unique parameterization for SGD and Adam satisfying all maximal-update ($ฮผ$) desiderata. We then show that the resulting $ฮผ$P prescription does not reliably induce monotonic improvement with scale or robust learning-rate transfer. We trace these pathologies to scale-dependent observables in the aggregation dynamics, which motivates a refined set of desiderata that we term maximal scale stability. Guided by this principle, we derive a Maximally Scale-Stable Parameterization (MSSP) for both SGD and Adam in all three scaling regimes, and characterize the corresponding limiting dynamics - qualitatively distinct from the $ฮผ$P limit - through a separate DMFT analysis. Experiments verify that MSSP robustly recovers learning rate transfer and monotonic improvement with scale across regimes. Combined with existing depth-scaling theory, these results provide a complete scaling prescription for MoE architectures as a function of width, depth, expert width, and number of experts.


How deep learning can deliver improved cybersecurity [Q&A]

#artificialintelligence

Traditional cybersecurity isn't necessarily bad at detecting attacks, the trouble is it often does so after they have occurred. A better approach is to spot potential attacks and block them before they can do any damage. One possible way of doing this is via'deep learning' allowing technology to identify the difference between good and bad. We spoke with Brooks Wallace, cybersecurity sales leader at Deep Instinct to find out more about this innovative solution. BW: If you look at cybersecurity, there's always been this holy grail of prevention.


Deep Instinct reaches out to MSSPs

#artificialintelligence

Deep Instinct, which uses deep learning to identify threats before they โ€ฆ there's artificial intelligence and machine learning, and some of that is deep โ€ฆ


Verifiable Planning in Expected Reward Multichain MDPs

arXiv.org Machine Learning

The planning domain has experienced increased interest in the formal synthesis of decision-making policies. This formal synthesis typically entails finding a policy which satisfies formal specifications in the form of some well-defined logic, such as Linear Temporal Logic (LTL) or Computation Tree Logic (CTL), among others. While such logics are very powerful and expressive in their capacity to capture desirable agent behavior, their value is limited when deriving decision-making policies which satisfy certain types of asymptotic behavior. In particular, we are interested in specifying constraints on the steady-state behavior of an agent, which captures the proportion of time an agent spends in each state as it interacts for an indefinite period of time with its environment. This is sometimes called the average or expected behavior of the agent. In this paper, we explore the steady-state planning problem of deriving a decision-making policy for an agent such that constraints on its steady-state behavior are satisfied. A linear programming solution for the general case of multichain Markov Decision Processes (MDPs) is proposed and we prove that optimal solutions to the proposed programs yield stationary policies with rigorous guarantees of behavior.


Automation And AI: The New Frontier In Cybersecurity

#artificialintelligence

Digital technology is changing the way we work. Employees are accessing their productivity applications from outside the physical workplace on an increasing number of mobile devices. Thus, the number of assets that internal IT organizations are expected to manage is rising, as are the amounts of data that need to be examined. Sensors for HVAC systems and intelligent CCTVs for physical building security are examples of IoT devices that are new sources of additional network traffic. The burden falls to IT organizations who are being asked to accommodate these advances in technology, but are confronted with a heightened risk to security in their businesses. The scale and complexity of a company's digital assets needing protection from malicious attacks and data breach has grown significantly.


5 Signs You Should Re-Evaluate Your Relationship with Your MSSP

#artificialintelligence

From Equifax to Yahoo, and Facebook to Marriott, large-scale data breaches impacting hundreds of millions of consumers have received their fair share of media attention in recent years. All this ink hasn't been spilled (or pixels displayed) in vain: there's growing awareness among business leaders of the security and privacy risks their organizations face, and increasing concern that their preparedness may be inadequate. In a recent PwC survey, for example, 72% of CEOs worldwide listed cybercriminal activity as a significant threat to their businesses, yet only 35% were comfortable with their organization's digital resilience and readiness to face such threats. Especially among small and mid-sized enterprises, the growth in awareness of the severity and urgency of cybersecurity risks is driving demand for managed security services. Organizations are increasingly turning to external vendors to help them build, maintain, and monitor their security operations programs and the technologies that comprise them.