phi textbf
Pattern Recognition and Machine Learning (Bishop) - How is this log-evidence function maximized with respect to $\alpha$?
So it is not obvious that the additional $\alpha$ dependence of $E (\textbf{m}_N)$ that you point out has vanishing derivative, but there it is, it does. I too was puzzled when I saw no mention of it in the text, or in the solution posted for exercise 3.20 asking to deriver the result, which is therefore rather incomplete. A similar thing happens when maximizing the evidence wrt to $\beta$.