18k hand-selected images
FYI, https://huggingface.co/datasets/opendiffusionai/cc12m-a_woman currently has 18k hand-selected real-world images of "A woman" from the CC12m dataset.
These images thus have no watermarks, no stupid focus points, or other things that make AI training get confused.
(I also threw out all the black and white ones)
They also include 2 sets of LLaVa captions. A short version, and a long version.
I intend to keep adding to it for a while.
Here are some sample images from it: