Back To Schedule
Thursday, June 23 • 1:00pm - 1:50pm
Fast perceptual image hashing with Perl

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Talk slides, images & code used: https://github.com/dkechag/ImagePHash-Talk

This talk will take you through a novel Perl implementation of perceptual hashes for “similar image” searches. I'll start with the basics, if you don't know what a Discrete Cosine Transform is, you'll find out with a visual demo - no math required (although math is always encouraged!). After DCTs and p-hashing basics, I will go through the implementation’s new features that make it efficient and easy to integrate with our large MySQL-based image database.

At SpareRoom, the world’s largest roommate finding service, we handle tens of millions of images, mostly of customer rooms/properties. When tasked with adding an internal ability for a “similar image” search, I first looked at existing solutions and found out that some did not work as expected, while others were either slow or required changes to our infrastructure (mainly Perl, with the data - including the image index - on a large MySQL database).
In the end, I went with a solution initially based on Image::Hash, but with p-hashes reimplemented to fix collision issues and add speed (over 10x faster than pHash.org’s equivalent DCT-based hash). Then, I added features such as support for mirroring, as well as “index/reduced” hashes, in order to effectively use MySQL indices (and thus easily fit in our existing setup).
After an intro to the theory behind the DCT and perceptual hashes in general, I will discuss the abilites of this implementation and the experiences from developing and using it. The module will be released to CPAN before my talk, although the XS part responsible for the fast DCT calculation has been available for a while as Math::DCT.

avatar for Dimitrios Kechagias

Dimitrios Kechagias

Principal Developer, SpareRoom
I started using Perl almost 20 years ago, at the Stony Brook Algorithms lab (now known as the Data Science lab), for NLP and computational finance applications as a CS grad student. I worked on large scale Perl systems frequently after that, mostly in Natural Language / Linguistic... Read More →

Thursday June 23, 2022 1:00pm - 1:50pm CDT
Perl Track 12426 Greenspoint Dr, Houston, TX 77060