maximum likelihood
A two-step sequential approach for hyperparameter selection in finite context models
Contente, José, Martins, Ana, Pinho, Armando J., Gouveia, Sónia
Finite-context models (FCMs) are widely used for compressing symbolic sequences such as DNA, where predictive performance depends critically on the context length k and smoothing parameter α. In practice, these hyperparameters are typically selected through exhaustive search, which is computationally expensive and scales poorly with model complexity. This paper proposes a statistically grounded two-step sequential approach for efficient hyperparameter selection in FCMs. The key idea is to decompose the joint optimization problem into two independent stages. First, the context length k is estimated using categorical serial dependence measures, including Cramér's ν, Cohen's \k{appa} and partial mutual information (pami). Second, the smoothing parameter α is estimated via maximum likelihood conditional on the selected context length k. Simulation experiments were conducted on synthetic symbolic sequences generated by FCMs across multiple (k, α) configurations, considering a four-letter alphabet and different sample sizes. Results show that the dependence measures are substantially more sensitive to variations in k than in α, supporting the sequential estimation strategy. As expected, the accuracy of the hyperparameter estimation improves with increasing sample size. Furthermore, the proposed method achieves compression performance comparable to exhaustive grid search in terms of average bitrate (bits per symbol), while substantially reducing computational cost. Overall, the results on simulated data show that the proposed sequential approach is a practical and computationally efficient alternative to exhaustive hyperparameter tuning in FCMs.
- Europe > Portugal > Aveiro > Aveiro (0.05)
- North America > United States > New York (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
Reward Augmented Maximum Likelihood for Neural Structured Prediction
A key problem in structured output prediction is enabling direct optimization of the task reward function that matters for test evaluation. This paper presents a simple and computationally efficient method that incorporates task reward into maximum likelihood training. We establish a connection between maximum likelihood and regularized expected reward, showing that they are approximately equivalent in the vicinity of the optimal solution. Then we show how maximum likelihood can be generalized by optimizing the conditional probability of auxiliary outputs that are sampled proportional to their exponentiated scaled rewards. We apply this framework to optimize edit distance in the output space, by sampling from edited targets. Experiments on speech recognition and machine translation for neural sequence to sequence models show notable improvements over maximum likelihood baseline by simply sampling from target output augmentations.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (5 more...)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (3 more...)
- North America > United States > California > San Diego County > San Diego (0.05)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.42)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.42)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.78)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.78)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.97)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.32)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
- (2 more...)
- Europe > Switzerland (0.04)
- North America > Canada (0.04)