Goto

Collaborating Authors

 mttf


A Reinforcement Learning-Based Task Mapping Method to Improve the Reliability of Clustered Manycores

Hossein-Khani, Fatemeh, Akbari, Omid

arXiv.org Artificial Intelligence

The increasing scale of manycore systems poses significant challenges in managing reliability while meeting performance demands. Simultaneously, these systems become more susceptible to different aging mechanisms such as negative-bias temperature instability (NBTI), hot carrier injection (HCI), and thermal cycling (TC), as well as the electromigration (EM) phenomenon. In this paper, we propose a reinforcement learning (RL)-based task mapping method to improve the reliability of manycore systems considering the aforementioned aging mechanisms, which consists of three steps including bin packing, task-to-bin mapping, and task-to-core mapping. In the initial step, a density-based spatial application with noise (DBSCAN) clustering method is employed to compose some clusters (bins) based on the cores temperature. Then, the Q-learning algorithm is used for the two latter steps, to map the arrived task on a core such that the minimum thermal variation is occurred among all the bins. Compared to the state-of-the-art works, the proposed method is performed during runtime without requiring any parameter to be calculated offline. The effectiveness of the proposed technique is evaluated on 16, 32, and 64 cores systems using SPLASH2 and PARSEC benchmark suite applications. The results demonstrate up to 27% increase in the mean time to failure (MTTF) compared to the state-of-the-art task mapping techniques.


Just Another Method to Compute MTTF from Continuous Time Markov Chain

Vasconcelos, Eduardo M.

arXiv.org Artificial Intelligence

The Meantime To Failure (MTTF) is a statistic used for system analysis in several knowledge areas. This value represents the average time to the system enters into one of the possible states of fault, without considering system repairs. Although MTTF be considered to analyze systems with fault states, it also can be used to perform analysis on processes, since it can be used to represent the meantime to one process finishes, given that, processes can be represented by state machine models. This work presents a method to compute MTTF from Continuous Time Markov Chain (CTMC) models. There are no arguments that demonstrate that this method performs better than other methods, but this method has a simpler implementation and is intuitive. This method also allows computing the absorption probabilities and the average holding time of each state without additional steps.