Hello GSoC and Hello Planet!

Hello Everyone!

This is my first post on Planet KDE – so I’ll start with a brief intro of myself:

I’m Aditya Bhatt (adityab on IRC), and I’m an 18-year old, second-year (almost third-year) undergrad student, living in Ahmedabad, India. I’ve been selected to work on digiKam in GSoC 2010. I’ll be enabling digiKam to automatically tag people in photographs by facial recognition. This feature request has been around since a looong time ( 5 years ?). Bugzilla reveals that this is the #1 voted bug in digiKam, and #26 overall for KDE. It’s time this gets done πŸ™‚ For anyone who’s interested, my GSoC proposal is here.

I’ve been using Linux and FOSS in general since about 1999. I think I used a rudimentary form of KDE back then πŸ™‚ I’ve been a longtime KDE user since many years now, and was beginning to feel the urge to give back to the community. I love taking photos whenever I’m with my family, so digiKam was a natural choice to start with.

GSoC is a wonderful opportunity for students as it acts as a kick-start for many to start contributing to open-source. I couldn’t apply for GSoC last year because I wasn’t 18 back then, and my knowledge was also pretty limited. Over the year, I started playing with the awesome Qt API, and also ventured into the KDE API, so that I could start committing real stuff.

Being a second-year student, I’m self-taught in Image Processing and Pattern Recognition. I love those fields because of my liking for Math and AI.

About two months ago (I think), I started hacking on libface, a cross-platform library for face detection and recognition. libface was started by Alex Jironkin (my mentor). It is one of the very few( 2-3 ?) open-source libs for facial recognition. Alex, Marcel, Gilles, and the rest of the digiKam team helped me to improve my proposal.

As a result, I got selected in GSoC. Since my Indian state of Gujarat is a dry state, there was no beer and we had to do with a nice Pepsi party πŸ™‚ A really awesome thing worth noting is that 3 (yes, three) people from my (small) university got selected in GSoC this year, and that too for KDE! The other two: Nikhil Marathe (nsm), who’s in my batch, has already been working with the Kwin team to implement tiling, and he’ll be integrating UPnP support into KDE; Sai Dinesh (saidinesh5) is a year senior to me in the university, and will be working on mobile phone stuff for KDE – something that involves SyncML and Akonadi.

It was awesome to know that KDE got 50 slots this year, with 11 Indians amongst them! I got to know while celebrating on the channel #kde-in.

Integrating libface into digiKam would mean that digiKam would automatically know who’s in which photo and tag their faces, and that makes searching for stuff a lot easier, especially after nepomuk interfaces good with digiKam. Imagine searching in dolphin for the name of your best friend – the search results would show her photos. Or you can search for photos of your mom and dad together πŸ™‚ A nice possibility is that akonadi integration could enable linking your addressbook contacts with photographs – automatically-generated avatar pics for people in your addressbook. You might not need to use facebook’s tagging mechanism is you could upload auto-tagged photos via digiKam (I don’t know for sure if the KIPI plugin for exporting to facebook also exports tags, though).

After adding one more algorithm (fisherfaces) to libface, I’ll be making a tagging widget for pictures. There’s already something in nepomuk playground, and I think I’ll be using that as a starting point. After that, I’ll link libface , digiKam, and the widget together. Some more work must also be done with respect to the tagging style used by digiKam. As of now, digiKam can only tag entire photos with keywords, not regions. Storing tags as metadata needs to be worked upon, without breaking backwards-compatibility.

Lots of hacking to do, yay!

The “miss”-ed call

Here’s a transcript of the conversation I had with Pavas on Facebook last night.

All of it is true, and I’m mortally afraid.

But it’s pretty hilarious too ( if you look at it from a different angle than mine ).

The talk has been slightly edited ( unchattified ), for the sake of posterity.

Me: Oye. There?

Pavas: Ya. Bol.

Me: I’m in some sort of a weird situation here. Someone gave me a miscall half an hour ago. Thinking it’d be one of my innumerable Gandhinagar-residing relatives, I called back.

Pavas: ?

Me: And no one picked up the phone. Then, about 10 minutes ago, some old lady called me from that number. And asked who I was. Without answering, I asked her who she was.

This repeated 5 times before I finally said who I was. (Namely, my name.)

Then she said that I had called her before.

I told her that I had received a miscall, blah blah. Then she hung up.

Me: Then she calls again, just 5 minutes before now.

Pavas: <laughs>

Me: And asks again who I am.

Pavas: <Now this guy is interested> And? hahaha!

Me: And I repeat everything I said in the call before. And then she said that someone had called her from my number, saying that his name was Akash, and he wanted to talk to Neha.

Then I asked her where she’s from.

She said, “her house”. <Determined not to reveal info. Nor am I>

Pavas: And please tell me that Neha is her hot 20 yr old daughter… <Look at this guy. Testosterone runs in his every statement>

Me: Then I rephrase the Question, saying that I have many relatives in Gandhinagar. So I wanna know If she’s one of them. That would justify someone calling me.

But she says shes from Baroda.


Then she says she’ll tell the head of the family to call me in a while to talk. I say, “I have no idea who you people are, I know no akash or neha.”Β  I say that “Its night, I’m gonna sleep, so dont disturb me.”

Pavas: lol man… this 3rd yr chick across is staring at me coz i cant stop laughing…. hahah <see what I meant about the testosterone?>

Me: But she says NO, he’ll be calling anyway. So Im waiting now.

Pavas: Hey I wanna speak to this guy…

Me: lol

Pavas: You in the RC?

Me: Story to be continued, I’m sure. And no, I’m in my room.


Pavas: Oi please please please conference the call next time it comes…

Or at least record it…

Me: Ok. I’ll record.

:fear: Will they send goondas to kill me?!!!

Pavas: Aane de goondo ko… we’ll have fun

Me: :fear: <I think this fat guy is nuts>

And this conversation is going on my blog. Awesome story to compete with Denny’s,stuff yay!

Pavas: Ohh yeah..

Me: I’m waiting for them to call

Pavas: Tu kar na… <Methinks this fat guy has totally lost it>

Pavas: From parth’s cell, and now actually ask for Neha. <raving mad>

Pavas: Or actually get daddu to do it… It’ll be awesome! <I almost suffered an aneurysm here>

Pavas: Gimme the number… main miss call marta hoon

Me: :fear: NO!!!

Pavas: haha

Me: Pavas-gang vs Gujju-joint-family gang :fear:

Pavas: LOL now the whole RC is staring at me

Me: gr8, now you have sumthin to fear. I was studying till now, when the joint-family-beti-protection squad chose to attack me.

Pavas: Ab dekh, the head of family will be a 120 yr old grand-grand dad who’ll give you a 9 hr lecture on what this generation has become : “humaare zamaane mein log college padhne jaate the”…

Me: lol

Pavas: :daddu voice:

Me: :haha: editing and uploading to blog…

Pavas: Try Adobe Contributor… dunno if u have it for linux…but for windows its bloody awesome… gives an interface for blog uploads… also pushes in a plugin to word to push a doc directly onto your blog! <Note the quick change in topic, the sign of a deviant mind>

Me: Fuck adobe. I’m mortally afraid here!

Pavas: Oh, don’t you worry, you’re in Pavas gang – we always win at gang warsΒ :D+ we got our very own Mango Dolly with us who can glue the opposition with melting all over them in this heat… <NOTE: Mango Dolly is the nickname for a certain TA whom I refuse to name here>

Me: Suggest a title for this post.

Pavas: “Gang wars”. naa…

“The Neha Effect”

Me: Nope. Different.

Pavas: The ‘miss’-ed call. That’s perfect! <Yep>

I’m still waiting for that call…

libface Status

When I started working on libface, the face detection was not really good – we used one Haar cascade, and tuning certain parameters caused the detection to either ignore lots of genuine faces or gave tons of false positives.

Therefore, the main focus of my work till now was to get detection to work better. So I made a class Haarcascades, which is essentially a “set” of cascades, to which you can add a cascade, remove one, set a cascade’s weight, and so on.

This tremendously simplifies things when you want to perform the actual detection with multiple cascades.

Now, since I use multiple cascades for detection, most of the genuine faces are detected more than once. And since I’ve allowed detection to be very permissive ( tuning some parameters ), I actually INVITE false detections to occur, but also allow multiple detections of true faces. Thus, what I get is this :

An example of using multiple cascades

See? It sucks. Big time. When you want to pass all these detections to a program/library that will actually USE these detections, it will mean passing some faces over and over again, and also passing that random square near the bottom.

So what I tried was a simple clustering algorithm to reduce a cluster of many squares whose center points are very close ( distance lesser than a tunable threshold ) to one square. The number of squares that were “duplicates” of the reduced square per face is then called the “genuineness” of that face. Because, if there were a lot of duplicate detections for a face, then there’s a high chance that it is genuine. This fact is evident from the above picture. Note that the unwanted square near the bottom has no duplicate near it. Therefore, in the final detection output, I do not allow faces with less than 2 duplicates, and the result is awesome:

The detection results after using the clustering algorithm

This is the code for the clustering:

// Returns a vector of the final, true faces, with no duplicates
vector<Face> FaceDetect::finalFaces(vector< vector<Face> > combo, int maxdist, int mindups)
    vector<Face> finalResult;
    vector<int> genuineness;
    // Make one long vector of all faces
    unsigned int i, j;
    for (i = 0; i < combo.size(); ++i)
        for (j = 0; j < combo[i].size(); ++j)

    /* Now, starting from the left, take a face and compare with rest. If distance is less than a threshold, consider them to
       be "overlapping" face frames and delete the "duplicate" from the vector.
       Remember that only faces to the RIGHT of the reference face will be deleted.

    int ctr = 0;
    for (i = 0; i < finalResult.size(); ++i)
        int duplicates = 0;
        for (j = i + 1; j < finalResult.size(); ++j)	// Compare with the faces to the right
            if (distance(finalResult[i], finalResult[j]) < maxdist)
                finalResult.erase(finalResult.begin() + j);

        if (duplicates < mindups)	// Less duplicates, probably not genuine, kick it out
            genuineness.erase(genuineness.begin() + i);
            finalResult.erase(finalResult.begin() + i);
        /* Note that the index of the reference element will be the same as the index of it's number of duplicates
           in the genuineness vector, so win-win!.
    printf("Faces parsed : %d, number of final faces : %d\n", ctr, (int)genuineness.size());
    return finalResult;

Now, there is a single detection instance per image. Now, I guess, apart from possible future optimizations, the face detection problem is solved. Now I’ll have to concentrate on polishing libface’s Eigenfaces implementation for better and faster facial recognition. I’m also working on a Qt widget to that can be used to “tag” ( you know, draw squares ) regions in images. So if you know Qt and can help me with this, you’re welcome to help!

Thanks to my roommate Parth for his photos, which I use for testing libface.

After finishing the Eigenfaces polishing, I intend to get familiar with the digiKam codebase and plugins so that my GSoC work can be made easier. That is, IF I get selected in GSoC 2010!