/blog: information, data, and scholarship

Portrait of the artist as a phrenology illustration

An assignment in my infoVis class: self-portrait as a phrenology illustrationThe first assignment in my infoVis class was to make a visual introduction to ourselves.  I drew a self-portrait in profile, then added my categorized interests in the style of a 19th-century phrenology illustration (compare with actual period illustrations here and here).

Phrenology is interesting stuff.  Though phrenologists had nearly everything wrong, modern neuroimaging has demonstrated that one  important part of their core idea was right: many psychological functions really are highly localized in the brain.  And they made a lot of really cool infographics.  Actually, maybe this is pseudoscience in general; palmistry and astrology also make silly data into some neat-looking infovis.  This same exercise would be fun with made-up star charts and palm diagrams.

$35 homemade whiteboard coffee table

DSC02327Whiteboards are great infovis tools, but expensive and need space.  Solution: the whiteboard coffee table.  It’s the very poor man’s Microsoft Surface (with no BSOD!).  Also, if your taste in home decor tends toward the spartan (as does mine), this makes a great dinner table; it’s durable and really easy to clean.  Most importantly, it’s cheap and you only need a drill and few hours to make it.  Here’s how:

.

.

.

.

Materials:

  • Some 1×2 boards (you can pre-sanded ones for about $2 a piece)
  • A panel of “tile board,” which you can get from Home Depot or whatever for about 10 bucks.
  • some 3″ drywall screws
  • some 1 1/2″ drywall screws
  • wood glue

Tools:

  • Drill with a screwdriver bit
  • handsaw (may need it, may not; see below)
  • tablesaw or circular saw to cut the tileboard (may need it, may not; see below)

coffee table copy

Construction:

  1. Decide on the dimensions you want, and figure how many 1×2’s you need (see the diagram above for the general plan).  You may need to be flexible here, depending on the sized of tile board panel you’re able to procure.
  2. Get the materials.  If you ask nice, a lot of times the store will cut the tile board for you, or they may have a 2′ x 4′  piece available.  You can probably get them to cut the 1 x 2’s for you, as well.
  3. Once you get the materials home, cut anything that still needs cuttin’.
  4. Fasten everything together with the appropriate-sized drywall screws (The diagram shows where they go).  I added glue, but you don’t really need it.  Once the frame is done, glue the top on. Done!

Use Zotero in a separate window

zotero-two-screens1

As I’ve written before, I love the free citation manager Zotero.   And the group and sharing features that just dropped as part of v2.0b7, while still a little buggy, are taking the awesomeness up another level.

But one thing about Zotero has always really annoyed me: the horizantally-split screen.  I never feel like I have enough vertical context for either my Zotero library or the web page I’m viewing.   Meanwhile, I’ve got a whole ‘nother monitor just sitting there empty. Some other folks have complained about this too, suggesting a sidebar view for Zotero.

Today, though, I realized that there’s a really obvious solution: just open up a new Firefox window (ctrl+n), put it on my other monitor, and display Zotero full-screen there.  Dual-monitor workflow bliss.

Obfuscate no more: why your email address should go au naturale

screenshot of the obfuscation decoder demoI was recently redesigning my homepage, and I wanted to include my email address.  I knew that only n00b looz3rz display their addy in plain site for spambots to harvest, so I applied a little light obfuscation,  like they do on php.net and million other sites: “myname at jasonpriem dot com.”

“Take that, spammer scum!” I thought as I finished, basking in my newfound invulnerability to the v1@gr@-hawking vermin.  After all, if lots of people use address munging, it must work, right?

Right?

Darn it, now I’ve got to start reading about it.  So I did.  And after a few hours of reading blogs and writing code, I am now an Expert With Advice (hey, this is the internet).  And the advice is this:

Stop trying to obfuscate your email address.  Stop now.

I’ve got two reasons (and for a few more, some other folks have blogged about this, too).  First, the more theoretical one:

Spam is a problem for you–obfuscation makes it a problem for your users.

After all, they’re the ones who are going to have to do all the de-munging.  Are they always going to notice that they have to remove “.invalid” from the end?  Do they all know that the English “at” means “@”?   Do they have time to edit text in their address lines?   Address munging is fundamentally inelegant, because it intentionally works against clarity.

People have been making this argument for a very long time. It’s particularly relevant nowadays, though, because of the growing promise of the semantic web.  We want data to be machine readable, because then we can do cool stuff with it.  FOAF and the hCard microformat are pretty pointless if they don’t have real email addresses to work with.  “Hide the data from the machines” is a good strategy for fighting Skynet, but not for the future of the web.  Ok, reason two:

Address munging just doesn’t work.

It can’t.  It’s putting glasses on Superman.  Although in theory a valid email can be pretty hard to identify, in practice, emails addresses use a very limited vocabulary–and computers are good at identifying limited vocabularies.  Don’t forget, everyone has been using the same old [at] and “dot” tricks for decades–this is security through obscurity at its very worst.

But don’t take my word for it.  I took a couple hours and worked up a demo email obfuscation decoder that breaks the vast majority of text-based obfuscations; it’s also got an input field for you to test out your own munges (some other people have built similar demos, too).  It’s not perfect, but it correctly decodes most obfuscations–and remember that this is a novice programmer, working for an afternoon.  It’s that easy. Supporters of obfuscation argue that spammers will go after the low-hanging fruit; folks, text-based obfuscation is the low-hanging fruit.

Now, the Alert Reader has by this time noticed that I’ve limited my critique to text-based munging.  “What about more sophisticated methods,” the Alert Reader now asks?  “What about using an image, or CSS, or Javascript to hide addresses?”  Good questions, Alert Reader; you are very alert.  Alright, let’s take a quick look at these, too:

Images

There’s not really much I can say about this one, save this: making content completely opaque to visually-impaired users simply shouldn’t be an option. And of course, spammers still can OCR your images.

CSS

Obviously, something like  foo@bar<span style=”display:none”>NULL</span>.com is silly; the spambot can filter out “display:none” spans pretty easily, or even just discard everything in a span.  <span class=’a’>foo</span><span class=’b’>bar</span>@“<span class=’c’>foo</span><span class=’d’>bar</span>.com at least requires the bot to open your stylesheet to see which spans are hidden.  But remember, your server will happily dish out your easily-parsed css to anyone who asks for it; this is not a good place to hide secrets.

Javascript

There are too many js methods to cover in any detail here.  Some are better than others; a few try to degrade gracefully for users without Javascript support.  All of them, though, share the same weakness as CSS: everyone can read your Javascript.  And you certainly don’t need a browser to run it; there are lots of JS interpreters that are more than happy to run on a spammer’s server.

Sure, you can get pretty clever with this technique (I particularly like the idea of decoding not on the onload event, but on a click event), but you can’t change the fact that ultimately the bad guys can do everything with your code that a browser does–and eventually, they will.

Now, I’ll admit that images, CSS, and Javascript approaches are more effective than text-based ones.  All of them (when done properly) require the spammer to pay for more bandwidth and/or processor cycles.  But they all also inconvenience some or all of your users, and none of them are compatible with the sementic web.  They all give you false sense of security, and they’re ugly, hackish solutions. True, some obfuscations have performed well empirically–but keep in mind that these (pretty informal) experiments are years old.  As more people have adopted these measures, be sure that more spammers are spending the time to counter them, as well.

Now, I can’t go so far as to condemn anyone who obfuscates an address; I get that spam is a pain, and filters aren’t perfect.  Sometimes an ugly, hackish solution is the only way.  But I’m suggesting that you think twice before you give in to the spammers and obfuscate, especially given the relative ineffectiveness of many commonly-used methods.  The Web reaches its full promise when information is made easier to find, not harder.

Prezi: presentation junk 2.0

prezi logoIt’s 2009.  I think everyone out there knows that Powerpoint is, at best, overused (at worst:Stalin).  Particularly gruesome is the animated slide-transition “feature,” which I think most agree has the same communication effectiveness and subtle charm as “<blink>” tags, mouse-cursor trails, and hilarious animated gifs of cats.

So how is it that presentation tool Prezi is suddenly the toast of the town?  The quick sell looks like this:

“Prezi allows anyone who can sketch an idea on a napkin to create and perform stunning non-linear presentations with relations, zooming into details, and adjusting to the time left without the need to skip slides.”

I love how the first phrase suggests that there’s this great mass of napkin-sketching geniuses out there who can’t get their ideas out (until now!).  I mean, I like mind maps, but turning one into an outline is pretty easy.   So the presentations are “non-linear.”  Does that mean the audience can interact with them, zooming in on sub-points of interest?  If it does, let me show you this thing called “hyperlinks.”   And is skipping slides really this tremendous problem?

When it comes down to it, the real selling point of Prezi is just the “stunning” presentation.  Now, perhaps I’m jaded, but “zoom-in/zoom-out” leaves me unstunned.  More importantly, though, this seems a textbook example of chartjunk: a “really great” visual effect that serves only to obscure or distract from real information.  I think (hope) it’ll have the lasting appeal of Powerpoint’s racecar-noise-with-flying-in-bullet-point.

Perhaps I’m missing something (feel free to correct me in the comments) or just being curmudgeonly, but I think Prezi is vastly overhyped.  Powerpoint is bad enough.  Also: I like how the Prezi logo, by mixing case, suggests that the product may in fact be called “Pretzl.”  Ok, now that’s definitely being curmudgeonly.

Quick book review: Dreaming in Code

I imagine Scott Rosenberg reckoned he’d picked a winner when he started Dreaming in Code, his 2007 book chronicling the development of the Chandler personal information manager. The project seemed to have everything going for it. It had all the fashionable features: GTD! Open Source! Peer-to-peer! Level the silos! It was headed by software legend Mitch Kapor. It had infinite funding. It had talented programmers with impeccable resumes—decades upon decades of successful experience creating good software.

Over the course of Dreaming, though,  we see this elite team gradually self-destruct. We see vague spec. We see unrealistic deadlines. We see huge mid-stream course changes.  As Rosenberg writes, “By now, I know, any software developer reading this volume has likely thrown it across the room in despair, thinking, ‘Stop the madness! They’re making every mistake in the book!’”  Dreaming finally ends four years into Chandler’s development—with version 1.0 still a distant vision (it was finally released, mostly to yawns, last August ).

Rosenberg, though, is savvy enough to turn the Chandler team’s failure into his own success.  Not only does he use the story to anchor an excellent (if basic) introduction into the practices and quirks of the industry as a whole, he weaves an engrossing and deeply human narrative.

Aristotle said tragedy should evoke fear and pity in the viewer, and Rosenberg deftly supplies us with both. On the one hand, Dreaming reads like watching a horror movie: “No! Why are you splitting up to explore the house!? Why do you keep changing the UI every 6 months!? Noooo!!!!” At the same, Rosenberg does a pretty good job of making us really like many of the characters. Kapor, in particular, comes off as both an intelligent visionary and genuinely good guy. Watching Chandler implode, I feel bad for him.

In interviews, Rosenberg shows again and again how the characters, all experienced programmers, understand the Classic Mistakes. Then he describes with agonizing clarity how they turn right around and proceed to make just those mistakes. I think it’s this quality that put me so in mind of classical tragedy, where the noble hero is undone by just these sorts of tragic flaws or mistakes.

Rosenberg resist the temptation to write another Lessons From Software Failure manual.  Instead he shows how smart, capable programmers working in an ideal environment can reenact the same fatal mistakes programmers were cataloging decades ago. Like Greek drama, Dreaming confronts the ineluctability of failure head-on.  Rosenberg’s ultimate thesis is nothing more or less than the classic words of  Donald Knuth, with which he opens the book: Software is hard. Sophocles would be proud.

Other reviews I liked:

  • Amazon
  • Joel Spolsky: discusses the technical aspects more; doesn’t think Chandler was a very good idea to begin with.  Has some good points, here.
  • Adam Barr: discusses the individual parts of the book more.

FeedVis 2.0: custom visualization for your feeds

this is what feedvis looks like

My FeedVis project–the interactive tagcloud for a group of feeds–has been out for a week now, I’ve been thrilled at the positive response I’ve gotten so far.  One rather glaring problem with the program, though, was that you could only look at the top 50 edublogs.

Not anymore.  After a few late nights, I’ve got a beta system for uploading and analyzing your own sets of feeds.  You just upload your opml, wait a few minutes, and you’re set: FeedVis gives you a custom page that you can bookmark and return to anytime you like; it’ll continue to update every time you visit.  You can also browse visualizations of other people’s feeds.

It’s pretty untested, and I’m sure use will uncover some bugs.  But it’s got potential; I’m excited to see what people think.

FeedVis: a deeper tagcloud for edublogs

a screenshoto of feedvis

Tagclouds have value, but, as I’ve written before, they’ve a number of shortfalls as well.  I’ve just finished my attempt to remedy some of these problems: FeedVis.  It’s an animated tagcloud that lets you compare word frequencies accross different time periods and authors, then check out the posts that used the words.  The demo is using the feeds for Scott McLeod’s Technorati-compiled list of top 50 edublogs, since that’s what got me started about feeds and tagclouds in the first place (although the program will work with any set of feeds).  More details about how it works are on the demo page.

I think what I’m really most excited about is the way this uses animation to let you actually see the words changing from one sample to the next.    Motion is such an important part of the way we see the world, and it’s been underemployed in information visualization, I think (although this changing; Hans Rosling’s TED talks have gotten a lot of buzz, for instance).

The project has been really fun, and a great learning experience; it’s gotten me really pumped about inofVis for learning about online interaction.  I think there is a lot of potential there for ed tech research.  I’m also pretty excited about programming; I started learning in February (with php), and then started javascript a couple months ago.  It’s been a really mind-expanding experience, and I’m looking foward to my next project, probably once I get done with grad school apps.

PrezDebatr 2.0! Beta!

Google is transforming the way we watch a political debate.  This Google Blog post demonstrates how viewers of the VP debate earlier this month made Google searches like “clean coal” and “define:maverick” spike as candidates spoke.  Without question, these viewers are experiencing something much richer than what would have been possible fifteen years ago.

But why stop there?  Why not a service that analyzes this kind of real-time, viewer-supplied data, selects the most interesting bits, and then displays it?  It would function both as a real-time fact-checker and a window into audience’s reactions.

Lots of people already live-blog these things; it would be easy to get several thousand people to submit their questions and search results to a server, using a standardized interface.  The software then just aggregates, organizes, and presents the results.  Volunteers who try to game the system would be shut out with Digg-style, community-driven user ratings.  If Google would make its real-time query data available, that’d be added, too, significantly broadening the sample’s relevance.

Read More »

Grad school: because your uncle at Lehman Bros. is not such a great connection now.

A nice bit of infoVis from the web comic Piled Higher and Deeper.  Kind of not the best news for someone who’s applying to doctoral programs this fall…um, can my app go in a special pile for people who’ve been planning this for years, regardless of what the economy would’ve done?