Goto

Collaborating Authors

 reask


Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

Ye, Tian, Xu, Zicheng, Li, Yuanzhi, Allen-Zhu, Zeyuan

arXiv.org Artificial Intelligence

Language models have demonstrated remarkable performance in solving reasoning tasks; however, even the strongest models still occasionally make reasoning mistakes. Recently, there has been active research aimed at improving reasoning accuracy, particularly by using pretrained language models to "self-correct" their mistakes via multi-round prompting. In this paper, we follow this line of work but focus on understanding the usefulness of incorporating "error-correction" data directly into the pretraining stage. This data consists of erroneous solution steps immediately followed by their corrections. Using a synthetic math dataset, we show promising results: this type of pretrain data can help language models achieve higher reasoning accuracy directly (i.e., through simple auto-regression, without multi-round prompting) compared to pretraining on the same amount of error-free data. We also delve into many details, such as (1) how this approach differs from beam search, (2) how such data can be prepared, (3) whether masking is needed on the erroneous tokens, (4) the amount of error required, (5) whether such data can be deferred to the fine-tuning stage, and many others.


Insurtech Descartes Partners With Modeling Firm Reask to Expand Parametric Cover

#artificialintelligence

Descartes Underwriting, the Paris-based parametric insurtech, has formed a partnership with Reask, the tropical-cyclone modeling firm. The partnership aims to expand the availability and advancement of parametric cyclone insurance products by combining Descartes' ability to incorporate new technology into parametric insurance product design with wind data provided by Sydney-headquartered insurtech Reask. This partnership also seeks to address the insurance protection gap by expanding global cyclone parametric coverage, Descartes said, explaining that the consistent global coverage of Reask's tropical cyclone product, Metryc, enables the expansion of parametric insurance policies into regions and geographies where data limitations impeded previous coverage. Furthermore, Reask's ability to augment scarcely available ground-level observations and deliver high-resolution wind hazard intensity metrics within days following an event greatly supports the deployment of Descartes' parametric products. As natural catastrophe and extreme weather risks evolve due to climate change, the inherent difficulties in obtaining accurate data due to the destructive nature of cyclone activity are also likely to be accentuated.


How did our global TC forecast stack up to observations over the past 12 months? - Reask

#artificialintelligence

The Southern hemisphere tropical cyclone season is now officially over, concluding a first full year of forecasting for us at reask. In the past 12 months we have issued three forecasts for each of the six active basins using our automated Machine Learning (ML) approach – how good were these? Given the probabilistic nature of our forecasts choosing a framework to judge performance is critical. Having spent considerable time and effort modeling complete risk distributions we believe that simply comparing mean predictions with observed occurrences is a huge waste of useful information. Instead, and following a recent article from Nate Silver's FiveThirtyEight blog, we here look at how well calibrated the model appeared in its first year.


Reask

#artificialintelligence

With advanced machine learning capabilities and ever-increasing computing power, the methodologies for better understanding catastrophe risk are improving. Today's risk transfer mechanisms require a dynamic view of hazard risk Are current climate conditions likely to impact your risk in the short to medium term? Our seasonal views of risk are specifically designed for dynamic risk market players. Balancing complex physics and computational efficiency, our approach uses high resolution hazard data and machine learning.