The internet is full of opinions. If you’re like me you can’t help but wade into the comments thread on a controversial blog post. It’s a guilty pleasure. Wouldn’t it be nice if a computer could detect the most inane opinions and protect you from seeing them at all?
I developed a simple heuristic to determine if something is inane. If a block of text contains ‘I’, ‘me’, or ‘my’ we can say with acceptable confidence that that block of text is an opinion. This completely purges the internet of anecdotes and therefore anecdotal evidence — one of the most inane and useless arguments someone can make.
And if you’re wondering what excellent conversation Mr. Gunneh and EdgeX are having you can hover over the redacted bars to see through them:
Mission friggin’ accomplished.
The heuristic is a little overzealous. For example this entire blog post, written as a first person story, gets decimated. That’s why the extension comes with a button in the address bar that you can click to toggle it on and off. It’s also disabled by default on Facebook, where the whole point is to share I/me/my stories.
I’m always on the lookout for ways to improve the heuristic. I’ve made it a lot better recently by ignoring certain DOM elements that are unlikely to contain user generated content; elements like forms, scripts, and links are now ignored.
There’s going to be an upper limit to what I can do with DOM parsing, however. There’s no generic way to tell the difference between an informational statement like “My house is three blocks from the liquor store” and an inane opinion like “I can prove that Obama isn’t an American.” To fix that I’d have to enter the infinitely deep rabbit hole of grammatical analysis and set up a backend for the machine learning.
The overzealous nature of the heuristic hasn’t been without benefit. A few of the blog sites I frequent ended up getting large parts of their article content redacted. I wasn’t expecting that! The implications are profound. It turns out I’m reading lots of sites that I consider news sites, but are really opinion rags. What’s worse is the articles that get their content redacted are the same ones that have the most pageviews and most comments.
I’ve stumbled onto a seedy motivator in the world of online journalism. I have a solution to that too, which you can read in my blog post about creating accountability for opinions on the internet.