Welcome to DeepClean!¶
DeepClean is a multimodal data pre-processing and cleaning package written in python.
Features¶
- Single package for multi modal data pre-processing and cleaning.
- Fast Experimentation
- Modularity in the data (processing) pipeline
- Implements techniques like:
- Normalization
- Tokenization
- Augmentation
- Denoising
- and much more.
>>> from deepclean.image import Augmentation
>>> aug = Augmentation('path/to/image')
>>> from deepclean.text import Tokenization
>>> tk = Tokenization('path/to/text')