follow ham looks at who's following you that you're not following back,
decides how spammy those users are, then recommends which you should follow back
and which you should block.
To decide whether an account is "spam" or "ham", I use classical machine learning methods —
I created a classifier using decision tree induction similar to
Quinlan's C4.5 algorithm
on a set of features including a user's follower count, following count, age, tweets per day and other factors. For
training data I collected spam reports sent to @spam for several weeks to identify a few thousand known spammers,
and hand picked another two thousand non-spam accounts from my and others' following lists.
The result is a system that can identify low quality accounts very quickly, helping you
to decide which of your followers to follow back without checking each account yourself.
The more people use it, the more profiles it'll be exposed to in order to improve the
algorithm in the future.
If you're interested in other work of mine, including more free services and code you
might enjoy,
visit my personal site.