- training set
- A set of documents used to train a machine learning model.
- validation set
- A set of documents used to decide which of a set of models or model specifications is likely to perform best on data drawn from the same distribution.
- test set
- A set of documents used to estimate the likely performance of our model on new data drawn from the same distribution. It is often referred to as a “held-out” test set, because we hold it back from the whole development cycle, and only use it once, to make our final estimate of performance.
- development set
- A set of documents used to decide which of a set of prompts or prompt-model combinations is likely to perform best on data drawn from the same distribution.
- hyperparameters
- User-defined configuration options for machine learning models that alter how they. Hyperparameters are often optimised to yield the set of hyperparameters that performs best on the validation set.
- prompt engineering
- The process of iteratively developing prompts in order to optimize performance for a given task and dataset.