In a dense engineering post, Twitter explains how it uses “crowdsourced” human evaluators to make sense of ephemeral hashtags and other search terms. And who benefits? Why, Twitter’s advertisers, of course.

Twitter has made an old idea new again, unveiling a new system that lets actual human beings tell its data center how to make sense of trending hashtags and other topical searches.
But don’t get too excited about this apparent triumph of man over machine. First, the actual work done by these people seems likely to be menial and poorly compensated, even if it does accomplish something that Twitter’s mighty information systems appear unable to manage on their own.
Second, and more important, you shouldn’t expect to see Twitter’s service improve in any ways you might actually notice — unless, that is, you happen to be a Twitter advertiser. Because the primary aim of the system appears to be improving Twitter’s ability to serve up relevant ads against briefly popular hashtags whose meaning would be completely opaque to computers, though readily grasped by real people.
On the other hand, this could fill in an important part of Twitter’s business model. While it’s difficult to tell from the outside, Twitter apparently believes that there’s big money to be made from serving up the right ads against sudden waves of public interest in various memes. Since you could argue that Twitter really isn’t much more than a steady progression of such waves gently lapping against the beach of human consciousness, it’s entirely possible the company is right.
Twitter revealed what it called its “real-time human computation” system in a dense and confusing blog post written by Twitter data scientist Edward Chen and Alpa Jain, a senior software engineer in the company’s “Revenue @ Twitter” group. Chen and Jain start out reasonably enough, laying out the difficulty of intepreting the meaning of searches that suddenly spike in popularity, only to fade away just as quickly. Citing some notable examples from the recent presidential debates, they write:
1. The queries people perform have probably never before been seen, so it’s impossible to know without very specific context what they mean. How would you know that #bindersfullofwomen refers to politics, and not office accessories, or that people searching for “horses and bayonets” are interested in the Presidential debates?
2. Since these spikes in search queries are so short-lived, there’s only a small window of opportunity to learn what they mean.
Of course, this presents no problem for the actual human users of Twitter, who can generally follow the Zeitgeist quickly enough to figure out what’s going on — even if they have to Google the hashtag or search term to grasp its meaning. (I’ve had to do that myself on any number of occasions.)
But it does create an issue for automated interpretation systems, which rely heavily on context and historical usage to ascertain exactly what Twitter users are talking about. And neither is very helpful in deciphering a meme that pops up on Twitter and then fades away almost instantly. Of course, the only reason automated interpretation systems are involved at all here is because they’re what Twitter relies on to serve up “relevant” ads — promoted tweets, promoted feeds and what have you — against these brief but often quite powerful search surges.
In other words, Twitter didn’t have a functionality problem here — it had a revenue problem. And that’s what Chen and Jain have stepped in to solve with their merry band of crowdsourced volunteers.
Of course, the data scientists can’t come right out and say that. Instead, they treat us to a discourse of how Twitter’s data systems work — one replete with topologies, bolts, spouts, tuple streams and Kafka queues. A representative sentence:
The Storm topology attaches a spout to this Kafka q
Twitter puts real humans into its search algorithm...and profits
No comments:
Post a Comment