Data

Contains the base functions for loading and saving data from/to .cxi files

These functions are used when constructing a new dataset class to pull specific desired information from a .cxi file. These functions should handle all the needed conversions between standard formats (for example, transposes of the basis arrays, shifting from object to probe motion, etc).

cdtools.tools.data.get_entry_info(cxi_file)

Returns a dictionary with the basic metadata from the cxi file’s entry_1 attribute

String type metadata is read out as a string, and datetime metadata is converted to python datetime objects if the string is properly formatted.

Parameters:

cxi_file (h5py.File) – A file object to be read

Returns:

entry_info – A dictionary with basic metadata defined in the cxi file

Return type:

dict

cdtools.tools.data.get_sample_info(cxi_file)

Returns a dictionary with the basic metadata from the cxi file’s entry_1/sample_1 attribute

Parameters:

cxi_file (h5py.File) – A file object to be read

Returns:

sample_info – A dictionary with basic metadata from the sample defined in the cxi file

Return type:

dict

cdtools.tools.data.get_wavelength(cxi_file)

Returns the wavelength of the source defined in the cxi file object, in m

Parameters:

cxi_file (h5py.File) – A file object to be read

Returns:

wavelength – The wavelength of the source defined in the cxi file

Return type:

np.float32

cdtools.tools.data.get_detector_geometry(cxi_file)

Returns a standardized description of the detector geometry defined in the cxi file object

It makes intelligent assumptions based on the definitions in the cxi file definition. The standardized description of the geometry that it outputs includes the sample to detector distance, the corner location of the detector, and the basis vectors defining the detector. It can only handle detectors defined as rectangular grids of pixels.

The distance and corner_location values are technically overdetermining the detector location, but for many experiments (particularly transmission experiments), the distance is needed and the exact corner location is not. If the corner location is not reported in the cxi file, no attempt will be made to calculate it.

Parameters:

cxi_file (h5py.File) – A file object to be read

Returns:

  • distance (np.float32) – The sample to detector distance, in m

  • basis_vectors (np.array) – The basis vectors for the detector

  • corner_location (np.array) – The real-space location of the (0,0) pixel in the detector

cdtools.tools.data.get_mask(cxi_file)

Returns the detector mask defined in the cxi file object

This function converts from the format specified in the cxi file definition to a simple on/off mask, where a value of 1 defines a good pixel (on) and a value of 0 defines a bad pixel (off).

If any bit is set in the mask at all, it will be defined as a bad pixel, with the exception of pixels marked exactly as 0x00001000, which is defined to mean that the pixel has signal above the background. These pixels are treated as on pixels.

Parameters:

cxi_file (h5py.File) – A file object to be read

Returns:

mask – An array storing the mask from the cxi file

Return type:

np.array

cdtools.tools.data.get_dark(cxi_file)

Returns an array with a dark image to use for initialization of a background model

This looks for a set of dark images at entry_1/instrument_1/detector_1/data_dark. If the darks exist, it will return the mean of the array along all axes but the last two. That is, if the dark image is a single image, it will return that image. If it is a stack of images, it will return the mean along the stack axis.

If the darks do not exist, it will return None.

Parameters:

cxi_file (h5py.File) – A file object to be read

Returns:

dark – An array storing the dark image

Return type:

np.array

cdtools.tools.data.get_data(cxi_file, cut_zeros=True)

Returns an array with the full stack of detector data defined in the cxi file object

This function will make sure to check all the various places that it’s okay to store the data in, to ensure that it can find the data regardless of whether the creator of the .cxi file has remembered to link the data to all the required locations.

It will return the data array in whatever shape it’s defined in.

It will also read out the axes attribute of the data into a list of strings.

Parameters:
  • cxi_file (h5py.File) – A file object to be read

  • cut_zeros (bool) – Default True, whether to set all negative data to zero

Returns:

  • data (np.array) – An array storing the data defined in the cxi file

  • axes (list(str)) – A list of the axes defined in the axes attribute, if any

cdtools.tools.data.get_shot_to_shot_info(cxi_file, field_name)

Gets a specified dataset of shot-to-shot information from the cxi file

The data is assumed to be in the form of an array, with one dimension being the number of patterns being stored in the dataset. This is helpful for storing additional readback data on the shot-to-shot level that may be important but doens’t have a clearly defined place to be stored in the .cxi file specification. Such data includes shot-to-shot probe intensity measurements, polarizer positions, etc.

It will look for this data in 3 places (in the following order):

  1. entry_1/data_1/<field_name>

  2. entry_1/sample_1/geometry_1/<field_name>

  3. entry_1/instrument_1/detector_1/<field_name>

This function is also used internally to read out the translations associated with a ptychography experiment

Parameters:
  • cxi_file (h5py.File) – A file object to be read

  • field_name (str) – The name of the field to be read from

Returns:

data – An array storing the translations defined in the cxi file

Return type:

np.array

cdtools.tools.data.get_ptycho_translations(cxi_file)

Gets an array of x,y,z translations, if such an array has been defined in the file

It negates the translations, because the CXI file format is designed to specify translations of the samples and the cdtools code specifies translations of the optics.

Parameters:

cxi_file (h5py.File) – A file object to be read

Returns:

translations – An array storing the translations defined in the cxi file

Return type:

np.array

cdtools.tools.data.create_cxi(filename)

Creates a new cxi file with a single entry group

Parameters:

filename (str) – The path at which to create the file

cdtools.tools.data.add_entry_info(cxi_file, metadata)

Adds a dictionary of entry metadata to the entry_1 group of a cxi file object

Parameters:
  • cxi_file (h5py.File) – The file to add the info to

  • metadata (dict) – A dictionary containing all the metadata to be stored

cdtools.tools.data.add_sample_info(cxi_file, metadata)

Adds a dictionary of entry metadata to the entry_1/sample_1 group of a cxi file object

This function will create the sample_1 attribute if it doesn’t already exist

Parameters:
  • cxi_file (h5py.File) – The file to add the info to

  • metadata (dict) – A dictionary containing all the metadata to be stored

cdtools.tools.data.add_source(cxi_file, wavelength)

Adds the entry_1/source_1 group to a cxi file object

It stores the energy and wavelength attributes in the source_1 group, given a wavelength to define them from.

Parameters:
  • cxi_file (h5py.File) – The file to add the source to

  • wavelength (float) – The wavelength of light

cdtools.tools.data.add_detector(cxi_file, distance, basis, corner=None)

Adds the entry_1/instrument_1/detector_1 group to a cxi file object

It will define all the relevant parameters - distance, pixel size, detector basis, and corner position (if relevant) based on the provided information.

Parameters:
  • cxi_file (h5py.File) – The file to add the detector to

  • distance (float) – The sample to detector distance

  • basis (array) – The detector basis

  • corner (array) – Optional, the corner position of the detector

cdtools.tools.data.add_mask(cxi_file, mask)

Adds the specified mask to the cxi file

It places the mask into the mask dataset under entry_1/instrument_1/detector_1. The internal mask is defined simply as a 1 for an “on” pixel and a 0 for an “off” pixel, and the saved mask is exactly the opposite. This is simpler than the most general mask allowed by the cxi file format but it captures the distinction between pixels to be used and pixels not to be used.

Parameters:
  • cxi_file (h5py.File) – The file to add the mask to

  • mask (array) – The mask to save out to the file

cdtools.tools.data.add_dark(cxi_file, dark)

Adds the specified dark image to a cxi file

It places the dark image data into the data_dark dataset under entry_1/instrument_1/detector_1.

Parameters:
  • cxi_file (h5py.File) – The file to add the mask to

  • dark (array) – The dark image(s) to save out to the file

cdtools.tools.data.add_data(cxi_file, data, axes=None, compression='gzip', chunks=True)

Adds the specified data to the cxi file

It will add the data unchanged to the file, placing it in two spots:

  1. The entry_1/instrument_1/detector_1/data path

  2. A softlink at entry_1/data_1/data

Parameters:
  • cxi_file (h5py.File) – The file to add the data to

  • data (array) – The data to be saved

  • axes (list(str)) – Optional, a list of axis names to be saved in the axes attribute

cdtools.tools.data.add_shot_to_shot_info(cxi_file, data, field_name)

Adds a specified dataset of shot-to-shot information to the cxi file

The data is assumed to be in the form of an array, with one dimension being the number of patterns being stored in the dataset. This is helpful for storing additional readback data on the shot-to-shot level that may be important but doens’t have a clearly defined place to be stored in the .cxi file specification. Such data includes shot-to-shot probe intensity measurements, polarizer positions, etc.

This function is also used internally to store the translations associated with a ptychography experiment

It will store this data in 3 places:

  1. The entry_1/sample_1/geometry_1/<field_name> path

  2. A softlink at entry_1/data_1/<field_name>

  3. A softlink at entry_1/instrument_1/detector_1/<field_name>

The geometry and detector paths may not always be relevant, but this ensures that the data is always available in any of the places that an eventual reader may go to look for, e.g., the translations.

Parameters:
  • cxi_file (h5py.File) – The file to add the translations to

  • data (array) – The data to be saved

  • field_name (str) – The field name to save the data under

cdtools.tools.data.add_ptycho_translations(cxi_file, translations)

Adds the specified translations to the cxi file

It will add the translations to the file, negating them to conform to the standard in cxi files that the translations refer to the object’s translation.

It will store them in 3 places:

  1. The entry_1/sample_1/geometry_1/translation path

  2. A softlink at entry_1/data_1/translation

  3. A softlink at entry_1/instrument_1/detector_1/translation

Parameters:
  • cxi_file (h5py.File) – The file to add the translations to

  • translations (array) – The translations to be saved

cdtools.tools.data.nested_dict_to_numpy(d)

Sends all array like objects in a nested dict to numpy arrays

Parameters:

d (dict) – A mapping whose keys are all strings and whose values are only numpy arrays, pytorch tensors, scalars, or other mappings meeting the same conditions

Returns:

new_dict – A new dictionary with all array like objects sent to numpy

Return type:

dict

cdtools.tools.data.nested_dict_to_torch(d, device=None)

Sends all array like objects in a nested dict to pytorch tensors

This will also send all the tensors to a specific device, if specified. There is no option to send all tensors to a specific dtype, as tensors are often a mixture of integer, floating point, and complex types. In the future, this may support a “precision” option to send all tensors to a specified precision.

Parameters:
  • d (dict) – A mapping whose keys are all strings and whose values are only numpy arrays, pytorch tensors, scalars, or other mappings meeting the same conditions

  • device (torch.device) – A valid device argument for torch.Tensor.to

Returns:

new_dict – A new dictionary with all array like objects sent to torch tensors

Return type:

dict

cdtools.tools.data.nested_dict_to_h5(h5_file, d)

Saves a nested dictionary to an h5 file object

Parameters:
  • h5_file (h5py.File) – A file object, or path to a file, to write the dictionary to

  • d (dict) – A mapping whose keys are all strings and whose values are only numpy arrays, pytorch tensors, scalars, or other mappings meeting the same conditions

cdtools.tools.data.h5_to_nested_dict(h5_file)

Loads a nested dictionary from an h5 file object

Parameters:

h5_file (h5py.File) – A file object, or path to a file, to load from

Returns:

d – A dictionary whose keys are all strings and whose values are numpy arrays, scalars, or python strings. Will raise an error if the data cannot be loaded into this format

Return type:

dict