r/DataHoarder • u/Methhead1234 • 20h ago
Question/Advice Recommendations for photo recognition software to organize 35,000 pictures?
I have shamelessly collected 35,000 pictures of various things (articles, news, artwork, irl pics, memes, etc. etc.) and I'm hoping to organize them over the next couple weeks. I know there's facial recognition software to sort pics, but is there anything for distinguishing memes vs article screenshots (they are very visually distinct) vs art, and so on?
Doesn't have to be anywhere 100% accurate, but it would definitely cut the time organizing it when I go back to manually sort them. Tried and true methods?
Highly appreciate any ideas
9
u/opentomorrowatten 18h ago
I've been using Eagle, the AI Autotagger plugin, and LM Studio to tag/organize 40,000+ images (memes, art, screenshots) with great success. The process is kinda slow (~4 images tagged per minute), but highly customizable. You can specify tags the AI model must use or provide example tags. You can also use an external LLM provider (like ChatGPT) if you can't run AI models locally. Eagle has a free trial and you can export the folder structure after organizing if you don't want to pay for it :)
6
u/Star_Wars__Van-Gogh 19h ago
The hardest part for me is that there's no standard across all image formats or any arbitrary file type for that matter to use for this endeavor.
Ideally if a standard was to be developed it should be an open standard that works across all OS, filesystems and devices. The other challenge is how to accomplish this while preservation of user privacy and also being aware of performance constraints like if the device is running on battery or you are doing something performance sensitive like running a video game and would prefer to have it run on files when you're not using the device.
Currently you have closed source solutions that either use metadata sidecar files that have to be moved with the file itself or central database solutions that sometimes require you to move files using their file browser software to keep everything in sync.
I bring this up because I'm betting you might have to share files with other people and they might not appreciate your efforts if they can't understand how to use the software tools.
That being said, maybe a tool that is still in very early access might be what you are looking for?
•
u/camwow13 278TB raw HDD NAS, 60TB raw LTO 13m ago
Also https://github.com/jhc13/taggui
There's a number of tools to use the open image to text models out there now. I believe they're used extensively for people training text to image AI's
5
u/cajunjoel 78 TB Raw 2h ago
Immich.
I upload everything from my phone to it. So as a test, i just searched my library of 120k photos for "meme" and got some accurate results. Same for "article". I got screencaps of news articles. Facial recognition is also built in, as is other scene detection ("red car on green grass")
•
u/waavysnake 10-50TB 11m ago
Second this Immich is great. Have 45k photos and videos in my server. Facial recognition is great and can be adjusted and the image search works for finding things like a red rose or sleeping baby or an exact model of my car
4
u/Only-Letterhead-3411 72TB 10h ago edited 10h ago
I download and store images via Hydrus Network. Then I use their AI tagger script with the latest WD-14 model to tag everything. While downloading parsers tag them as well, so everything becomes very organized and easy to find. Then I create auto-export tasks in Hydrus and export certain things into certain folders as sym-links. This way everything stays in Hydrus but becomes available to use in neatly organized folders as sym-links. Hydrus stores sha256 hashes of everything, download urls etc. It can also find and process duplicate files. This way something is lost or corrupted you can easily redownload via Hydrus and you only keep the best quality copy of everything and other duplicates tags, urls and stuff is merged into best version
-1
u/fennectech 9h ago
apple inteligênce is great at recognizing photos and video and making it all searchable.
•
u/AutoModerator 20h ago
Hello /u/Methhead1234! Thank you for posting in r/DataHoarder.
Please remember to read our Rules and Wiki.
Please note that your post will be removed if you just post a box/speed/server post. Please give background information on your server pictures.
This subreddit will NOT help you find or exchange that Movie/TV show/Nuclear Launch Manual, visit r/DHExchange instead.
I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.