Glossary - Evaluation for Automated Evidence Synthesis

training set: A set of documents used to train a machine learning model.
validation set: A set of documents used to decide which of a set of models or model specifications is likely to perform best on data drawn from the same distribution.
test set: A set of documents used to estimate the likely performance of our model on new data drawn from the same distribution. It is often referred to as a “held-out” test set, because we hold it back from the whole development cycle, and only use it once, to make our final estimate of performance.
development set: A set of documents used to decide which of a set of prompts or prompt-model combinations is likely to perform best on data drawn from the same distribution.
hyperparameters: User-defined configuration options for machine learning models that alter how they. Hyperparameters are often optimised to yield the set of hyperparameters that performs best on the validation set.
prompt engineering: The process of iteratively developing prompts in order to optimize performance for a given task and dataset.
ground truth: Data representing the true values of a variable, used to compare and validate predicted values output by a model