In the last few weeks I’ve been working on adding various forms of computer vision & machine learning processing to Damselfly, to provide automatic face detection, face recognition, object detection, and image classification. Let’s just say it hasn’t been a particularly smooth ride – and there is a huge scope for the .Net platform to support these popular features and technologies in a much better way.
Face detection and image recognition is the killer feature for a photo-management app – apps like Photoprism and Librephotos have it, and many of the people across various forums such as Reddit, DPReview etc., are all clamouring for something they can self-host that’ll rival (or at least attempt to emulate) the amazing object-recognition and search that comes for free with Google Photos.
It’s All Good on Windows, Cross-Platform Not So Much
I started off looking at the various options available – and I have to admit, I assumed it would be as simple as just searching for ‘Facial recognition’ in Nuget, adding a Library, and then just configuring the workflow and UX. Sadly, despite many people calling computer vision a ‘solved problem’, that’s very far from the reality. Sure, there are quite a few libraries which will make a decent fist of facial recognition if you’re on Windows and can use DLib or simiar. If you’re building something that’s cross-platform, and targetting any architectures (x64 and ARM) on Windows, Mac and Linux, it starts to get much harder. Given that all my development is on the Mac, and I don’t have a Windows host in my house, there has to be a better way!
The first port of call was the well-regarded FaceRecognition.Net (which uses DLibDotNet) – which seems like an excellent solution. Unfortunately, I couldn’t get it to work on my M1 MacBook Pro – I’m not sure if this is due to problems with Rosetta, or DLib.net, or something else, but even the samples wouldn’t work. There’s a tantalising comment suggesting that it’ll support ARM in future, so I’ll keep my eye on it as it progresses.
The next library I found was Accord.Net. This looked wonderful, and appeared to be the holy grail – an ML language with Haar classifiers (which are a widely-used method of face and object detection in images) all implemented in native .Net – no unmanaged dependencies to worry about. First attempts at using it were promising – I’d got a sample up and running, compiling under .Net Framework 4.8 with AnyCPU, in a few minutes. However, when I tried to compile using .Net Core, it all started to unravel. It seems that Accord.Net was abandoned a couple of years ago, and is now languishing, unloved, and left behind by .Net 3.1, 5 and 6 (although it had been compiled with .Net Standard 2.0).
Not one to be deterred, though, as this seemed such a good opportunity, I forked the salient parts of the codebase needed for Haar cacscades, upgraded the csproj files to SDK-style projects, and fixed the various compiler errors until I had a successful build on .Net 5.0. The first time I tentatively ran my sample on Linux, and it recognised faces, my wife wondered why I was cheering…. It wasn’t all a bed of roses though; the face recognition wasn’t great (and didn’t seem improvable, given the Haar cascade was hardcoded in the codebase). It also crashes a lot in the unsafe sections of code (used for high-performance maths) if you run it over images larger than about 320×320. Unfortunately, this makes it next-to-useless – because if you have a high-res 3500×4500 pixel image taken on an SLR with anything other than a portrait of somebody’s face, once you reduce it down to 320×320 you lose so much resolution that half of the faces will be missed!!
Spurred on with optimism at the thought of a native .Net managed framework for computer vision, my next port of call was Microsoft’s ML.Net framework. Unfortunately, there’s no out-of-the-box samples for facial recognition (and, having minimal knowledge of actual machine-learning techniques, I didn’t have the time, not expertise to figure out how to build one). ML.Net does, however, have a nice simple example of object-recognition that uses the YOLO5 trained model to detect around 70 different objects in pictures. It was pretty easy to implement – and when I get time I’ll look into adding the Yolo9000 pre-trained model which claims to recognise 9,000 different object classes. Neat!
It would be great if Microsoft could provide a sample and/or implementation of a facial detection model with ML.Net; I suspect it’s something that many people would use if it was available. Even Microsoft’s solution was a challenge from a cross-platform perspective; once I had the sample up and running, I had to spend a few days debugging and trouble-shooting before I figured out that their sample doesn’t actually work on Linux due to incompatibilities with the unmanaged Tensorflow libraries (they’re fixing it now…). It never fails to surprise me how many of the .Net Core samples out there are Windows-focused – even ones by Microsoft….
The last library I found was EmguCV. It’s a rather well-implemented, cross-platform library that has all sorts of computer vision goodness, and (after a bit of tweaking) will deploy across all platforms. I initially discounted it, as the Mac runtime libraries are only available under a paid commercial licence (not feasible with Damselfly as it’s free OSS). However, the good folks there were kind enough to furnish me with a copy of the libraries for so I could develop and debug the use of the library on my Mac. After a few deployment hiccups I got the library working, and it’s excellent – fast, reliable, deals with any size image, and it’s trivial to add multiple haar-classifiers – meaning I can scan a given photo several times using multiple trained classifiers, and then combine the results to give pretty accurate recognition. It’s now the primary library that I’m using in Damselfly’s face-detection processing.
Azure Face Services
Of course, there’s one notable exception remaining – and that’s Azure Cognitive Services from Microsoft. The Azure Face service a rather neat service that provides what appears to be Google-quality computer vision and image processing, via a web API. The results are really excellent, and more importantly it provides facial recognition (as opposed to just facial detection). This means that users of Damselfly can sign up for a free Azure Face account which will give them 30,000 free API calls a month (processing one photograph for face detection and recognition typically takes around 3 API calls), giving top-class facial recognition at zero cost, for around 8,000-10,000 images a month (transaction-limited to 20/min).
I’ve added some optimisations into Damselfly so that it’s possible to use EmguCV and ML.Net to automatically find faces and people using local processing, and then when an image is found to contain a face or a person, only then will it be submitted to Azure for full facial recognition. This seems a good compromise for most peoples’ photo libraries. As a test, I span up Damselfly with “use Azure for photos containing faces” enabled, and processed 11,000 smartphone pictures that I’ve taken from the last few years. It successfully scanned them all for faces and built a library of 68 people, all using less than 3,000 transactions.
More Please Microsoft
To summarise, it feels like there’s good and bad in Face Detection in .Net. If your app is running on Windows, you have lots of options; for true cross-platform apps, things aren’t too bad, but there’s definitely room for improvement. At one stage on this journey I looked at how simple it is to do full facial recognition from Python – and wondered about shelling out a child python process from Damselfly to process the facial recognition – but that felt like a cop-out. I really hope that Microsoft can push forward with ML.Net and provide face-detection and recognition in an easily accessible way to the masses who use .Net; it’s a killer feature that can transform applications and the way people use them.