data_engine package¶
Submodules¶
prepare_data module¶
-
data_engine.prepare_data.
build_dataset
(params)¶ Builds (or loads) a Dataset instance. :param params: Parameters specifying Dataset options :return: Dataset object
Keeps only n captions per image and stores the rest in dictionaries for a later evaluation :param ds: Dataset object :param repeat: Number of input samples per output :param n: Number of outputs to keep. :param set_names: Set name. :return:
-
data_engine.prepare_data.
update_dataset_from_file
(ds, input_text_filename, params, splits=None, output_text_filename=None, remove_outputs=False, compute_state_below=False, recompute_references=False)¶ Updates the dataset instance from a text file according to the given params. Used for sampling
Parameters: - ds – Dataset instance
- input_text_filename – Source language sentences
- params – Parameters for building the dataset
- splits – Splits to sample
- output_text_filename – Target language sentences
- remove_outputs – Remove outputs from dataset (if True, will ignore the output_text_filename parameter)
- compute_state_below – Compute state below input (shifted target text for professor teaching)
- recompute_references – Whether we should rebuild the references of the dataset or not.
Returns: Dataset object with the processed data