Protein¶
- class itergp.datasets.uci.Protein(dir='data/uci/protein', overwrite=False)¶
Bases:
UCIDatasetProtein dataset (45,730 × 9).
This is a data set of Physicochemical Properties of Protein Tertiary Structure. The data set is taken from CASP 5-9. There are 45730 decoys and size varying from 0 to 21 armstrong.
Source: https://archive.ics.uci.edu/ml/datasets/Physicochemical+Properties+of+Protein+Tertiary+Structure
Attributes Summary
Input shape of the data.
Output shape of the data.
Test data.
Training data.
Methods Summary
from_disk(dir)resample(rng_state)Resample the training and test set from the entire data set.
save([dir, overwrite])Save dataset to disk.
Attributes Documentation
- URL = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00265/'¶
- input_shape¶
Input shape of the data.
- output_shape¶
Output shape of the data.
- test¶
Test data.
- train¶
Training data.
Methods Documentation
- resample(rng_state)¶
Resample the training and test set from the entire data set.
Randomly selects new datapoints for the training and test set of the same sizes as the original dataset.
- Parameters
rng_state (SeedSequence) – Random number generator state.
- Return type