Data
Contains the base functions for loading and saving data from/to .cxi files
These functions are used when constructing a new dataset class to pull specific desired information from a .cxi file. These functions should handle all the needed conversions between standard formats (for example, transposes of the basis arrays, shifting from object to probe motion, etc).
- cdtools.tools.data.get_entry_info(cxi_file)
Returns a dictionary with the basic metadata from the cxi file’s entry_1 attribute
String type metadata is read out as a string, and datetime metadata is converted to python datetime objects if the string is properly formatted.
- Parameters:
cxi_file (h5py.File) – A file object to be read
- Returns:
entry_info – A dictionary with basic metadata defined in the cxi file
- Return type:
dict
- cdtools.tools.data.get_sample_info(cxi_file)
Returns a dictionary with the basic metadata from the cxi file’s entry_1/sample_1 attribute
- Parameters:
cxi_file (h5py.File) – A file object to be read
- Returns:
sample_info – A dictionary with basic metadata from the sample defined in the cxi file
- Return type:
dict
- cdtools.tools.data.get_wavelength(cxi_file)
Returns the wavelength of the source defined in the cxi file object, in m
- Parameters:
cxi_file (h5py.File) – A file object to be read
- Returns:
wavelength – The wavelength of the source defined in the cxi file
- Return type:
np.float32
- cdtools.tools.data.get_detector_geometry(cxi_file)
Returns a standardized description of the detector geometry defined in the cxi file object
It makes intelligent assumptions based on the definitions in the cxi file definition. The standardized description of the geometry that it outputs includes the sample to detector distance, the corner location of the detector, and the basis vectors defining the detector. It can only handle detectors defined as rectangular grids of pixels.
The distance and corner_location values are technically overdetermining the detector location, but for many experiments (particularly transmission experiments), the distance is needed and the exact corner location is not. If the corner location is not reported in the cxi file, no attempt will be made to calculate it.
- Parameters:
cxi_file (h5py.File) – A file object to be read
- Returns:
distance (np.float32) – The sample to detector distance, in m
basis_vectors (np.array) – The basis vectors for the detector
corner_location (np.array) – The real-space location of the (0,0) pixel in the detector
- cdtools.tools.data.get_mask(cxi_file)
Returns the detector mask defined in the cxi file object
This function converts from the format specified in the cxi file definition to a simple on/off mask, where a value of 1 defines a good pixel (on) and a value of 0 defines a bad pixel (off).
If any bit is set in the mask at all, it will be defined as a bad pixel, with the exception of pixels marked exactly as 0x00001000, which is defined to mean that the pixel has signal above the background. These pixels are treated as on pixels.
- Parameters:
cxi_file (h5py.File) – A file object to be read
- Returns:
mask – An array storing the mask from the cxi file
- Return type:
np.array
- cdtools.tools.data.get_dark(cxi_file)
Returns an array with a dark image to use for initialization of a background model
This looks for a set of dark images at entry_1/instrument_1/detector_1/data_dark. If the darks exist, it will return the mean of the array along all axes but the last two. That is, if the dark image is a single image, it will return that image. If it is a stack of images, it will return the mean along the stack axis.
If the darks do not exist, it will return None.
- Parameters:
cxi_file (h5py.File) – A file object to be read
- Returns:
dark – An array storing the dark image
- Return type:
np.array
- cdtools.tools.data.get_data(cxi_file, cut_zeros=True)
Returns an array with the full stack of detector data defined in the cxi file object
This function will make sure to check all the various places that it’s okay to store the data in, to ensure that it can find the data regardless of whether the creator of the .cxi file has remembered to link the data to all the required locations.
It will return the data array in whatever shape it’s defined in.
It will also read out the axes attribute of the data into a list of strings.
- Parameters:
cxi_file (h5py.File) – A file object to be read
cut_zeros (bool) – Default True, whether to set all negative data to zero
- Returns:
data (np.array) – An array storing the data defined in the cxi file
axes (list(str)) – A list of the axes defined in the axes attribute, if any
- cdtools.tools.data.get_shot_to_shot_info(cxi_file, field_name)
Gets a specified dataset of shot-to-shot information from the cxi file
The data is assumed to be in the form of an array, with one dimension being the number of patterns being stored in the dataset. This is helpful for storing additional readback data on the shot-to-shot level that may be important but doens’t have a clearly defined place to be stored in the .cxi file specification. Such data includes shot-to-shot probe intensity measurements, polarizer positions, etc.
It will look for this data in 3 places (in the following order):
entry_1/data_1/<field_name>
entry_1/sample_1/geometry_1/<field_name>
entry_1/instrument_1/detector_1/<field_name>
This function is also used internally to read out the translations associated with a ptychography experiment
- Parameters:
cxi_file (h5py.File) – A file object to be read
field_name (str) – The name of the field to be read from
- Returns:
data – An array storing the translations defined in the cxi file
- Return type:
np.array
- cdtools.tools.data.get_ptycho_translations(cxi_file)
Gets an array of x,y,z translations, if such an array has been defined in the file
It negates the translations, because the CXI file format is designed to specify translations of the samples and the cdtools code specifies translations of the optics.
- Parameters:
cxi_file (h5py.File) – A file object to be read
- Returns:
translations – An array storing the translations defined in the cxi file
- Return type:
np.array
- cdtools.tools.data.create_cxi(filename)
Creates a new cxi file with a single entry group
- Parameters:
filename (str) – The path at which to create the file
- cdtools.tools.data.add_entry_info(cxi_file, metadata)
Adds a dictionary of entry metadata to the entry_1 group of a cxi file object
- Parameters:
cxi_file (h5py.File) – The file to add the info to
metadata (dict) – A dictionary containing all the metadata to be stored
- cdtools.tools.data.add_sample_info(cxi_file, metadata)
Adds a dictionary of entry metadata to the entry_1/sample_1 group of a cxi file object
This function will create the sample_1 attribute if it doesn’t already exist
- Parameters:
cxi_file (h5py.File) – The file to add the info to
metadata (dict) – A dictionary containing all the metadata to be stored
- cdtools.tools.data.add_source(cxi_file, wavelength)
Adds the entry_1/source_1 group to a cxi file object
It stores the energy and wavelength attributes in the source_1 group, given a wavelength to define them from.
- Parameters:
cxi_file (h5py.File) – The file to add the source to
wavelength (float) – The wavelength of light
- cdtools.tools.data.add_detector(cxi_file, distance, basis, corner=None)
Adds the entry_1/instrument_1/detector_1 group to a cxi file object
It will define all the relevant parameters - distance, pixel size, detector basis, and corner position (if relevant) based on the provided information.
- Parameters:
cxi_file (h5py.File) – The file to add the detector to
distance (float) – The sample to detector distance
basis (array) – The detector basis
corner (array) – Optional, the corner position of the detector
- cdtools.tools.data.add_mask(cxi_file, mask)
Adds the specified mask to the cxi file
It places the mask into the mask dataset under entry_1/instrument_1/detector_1. The internal mask is defined simply as a 1 for an “on” pixel and a 0 for an “off” pixel, and the saved mask is exactly the opposite. This is simpler than the most general mask allowed by the cxi file format but it captures the distinction between pixels to be used and pixels not to be used.
- Parameters:
cxi_file (h5py.File) – The file to add the mask to
mask (array) – The mask to save out to the file
- cdtools.tools.data.add_dark(cxi_file, dark)
Adds the specified dark image to a cxi file
It places the dark image data into the data_dark dataset under entry_1/instrument_1/detector_1.
- Parameters:
cxi_file (h5py.File) – The file to add the mask to
dark (array) – The dark image(s) to save out to the file
- cdtools.tools.data.add_data(cxi_file, data, axes=None, compression='gzip', chunks=True)
Adds the specified data to the cxi file
It will add the data unchanged to the file, placing it in two spots:
The entry_1/instrument_1/detector_1/data path
A softlink at entry_1/data_1/data
- Parameters:
cxi_file (h5py.File) – The file to add the data to
data (array) – The data to be saved
axes (list(str)) – Optional, a list of axis names to be saved in the axes attribute
- cdtools.tools.data.add_shot_to_shot_info(cxi_file, data, field_name)
Adds a specified dataset of shot-to-shot information to the cxi file
The data is assumed to be in the form of an array, with one dimension being the number of patterns being stored in the dataset. This is helpful for storing additional readback data on the shot-to-shot level that may be important but doens’t have a clearly defined place to be stored in the .cxi file specification. Such data includes shot-to-shot probe intensity measurements, polarizer positions, etc.
This function is also used internally to store the translations associated with a ptychography experiment
It will store this data in 3 places:
The entry_1/sample_1/geometry_1/<field_name> path
A softlink at entry_1/data_1/<field_name>
A softlink at entry_1/instrument_1/detector_1/<field_name>
The geometry and detector paths may not always be relevant, but this ensures that the data is always available in any of the places that an eventual reader may go to look for, e.g., the translations.
- Parameters:
cxi_file (h5py.File) – The file to add the translations to
data (array) – The data to be saved
field_name (str) – The field name to save the data under
- cdtools.tools.data.add_ptycho_translations(cxi_file, translations)
Adds the specified translations to the cxi file
It will add the translations to the file, negating them to conform to the standard in cxi files that the translations refer to the object’s translation.
It will store them in 3 places:
The entry_1/sample_1/geometry_1/translation path
A softlink at entry_1/data_1/translation
A softlink at entry_1/instrument_1/detector_1/translation
- Parameters:
cxi_file (h5py.File) – The file to add the translations to
translations (array) – The translations to be saved
- cdtools.tools.data.nested_dict_to_numpy(d)
Sends all array like objects in a nested dict to numpy arrays
- Parameters:
d (dict) – A mapping whose keys are all strings and whose values are only numpy arrays, pytorch tensors, scalars, or other mappings meeting the same conditions
- Returns:
new_dict – A new dictionary with all array like objects sent to numpy
- Return type:
dict
- cdtools.tools.data.nested_dict_to_torch(d, device=None)
Sends all array like objects in a nested dict to pytorch tensors
This will also send all the tensors to a specific device, if specified. There is no option to send all tensors to a specific dtype, as tensors are often a mixture of integer, floating point, and complex types. In the future, this may support a “precision” option to send all tensors to a specified precision.
- Parameters:
d (dict) – A mapping whose keys are all strings and whose values are only numpy arrays, pytorch tensors, scalars, or other mappings meeting the same conditions
device (torch.device) – A valid device argument for torch.Tensor.to
- Returns:
new_dict – A new dictionary with all array like objects sent to torch tensors
- Return type:
dict
- cdtools.tools.data.nested_dict_to_h5(h5_file, d)
Saves a nested dictionary to an h5 file object
- Parameters:
h5_file (h5py.File) – A file object, or path to a file, to write the dictionary to
d (dict) – A mapping whose keys are all strings and whose values are only numpy arrays, pytorch tensors, scalars, or other mappings meeting the same conditions
- cdtools.tools.data.h5_to_nested_dict(h5_file)
Loads a nested dictionary from an h5 file object
- Parameters:
h5_file (h5py.File) – A file object, or path to a file, to load from
- Returns:
d – A dictionary whose keys are all strings and whose values are numpy arrays, scalars, or python strings. Will raise an error if the data cannot be loaded into this format
- Return type:
dict