Solving Quantitative Reasoning Problems with Language Models: Minerva

Language Models

A language model is probability distribution over words and word sequences. A model tries to predict the next most appropriate word to fill a blank space in a sentence or phrase based on context on given text. Language Models are used in tasks such as speech recognition, sentiment analysis, machine translation, and question answering, parts of speech tagging.

Minerva: A Language Model

A language model capable of solving mathematical and scientific questions using step by step reasoning. By focusing on collecting training data that is relevant for quantitative reasoning problems, training models and using best inference techniques, we achieve significant performance gains on a variety of difficult quantitative reasoning tasks. Minerva solves problems by giving solutions that include numerical calculations and symbolic manipulation without relying on a calculator. The model answers mathematical questions using a mix of mathematical notation and natural language processing. Minerva includes several techniques, including

Few-shot prompting
Chain of thought
Majority voting

Minerva gives good performance on STEM reasoning tasks.

Solving a multi-step problem: A question from the MATH dataset and Minerva’s solution.

Model for Multi Step Quantitative Reasoning

Minerva builds on Pathways Language Models (PaLM) with further training on 118GB dataset of scientific papers from the arXiv preprint server and web pages that contain Mathematical expressions. Standard cleaning notation tends to remove notations and formatting that leads to meaning of mathematical expressions. So the model learns to convert using standard mathematical notation.

A dataset for quantitative reasoning: Careful data processing preserves mathematical information, allowing the model to learn mathematics at a higher level.

Minerva is given existing step-by-step solutions for given problems before being represented with new questions. Model doesn’t only consider one solution, but generates multiple solutions. The solutions are different, but often end at the same final answer. Minerva uses sampled solution, taking the most common result as the final conclusive answer.

What does Minerva gets wrong

Minerva easily gets mistaken in easily interpretable questions. Most of the half are calculation mistakes and other half are reasoning mistakes. It is possible that the model arrives at the correct answer with faulty reasoning.

Calculation mistake: The model incorrectly cancels the square root on both sides of the equation.

Conclusion

There is no underlying mathematical structure, although it generates solutions using a mix of natural language processing and LaTeX Mathematical expressions.

Models capable of quantitative reasoning have many potential applications, including serving as useful aids for researchers, and enabling new learning opportunities for students. We present Minerva as a small step in this direction.