Blog by Sumana Harihareswara, Changeset founder

16 Nov 2013, 21:58 p.m.

A Little Design Thinking Can Go A Long Way

Hi, reader. I wrote this in 2013 and it's now more than five years old. So it may be very out of date; the world, and I, have changed a lot since I wrote it! I'm keeping this up for historical archive purposes, but the me of today may 100% disagree with what I said then. I rarely edit posts after publishing them, but if I do, I usually leave a note in italics to mark the edit and the reason. If this post is particularly offensive or breaches someone's privacy, please contact me.

I was playing with stdin/argv because Leonard suggested I improve Missing from Wikipedia to make it more Unixy and interoperable with other scripts and systems present and future. Right now it demands that you tell it the name of an existing plaintext file as a positional argument. Why shouldn't you be able to generate a giant string of names separated by newlines and just pipe it into the script, as you would into sort, grep, and similar tools?

I struggled with this whole stdin business, trying to make the tool work with both types of data input, and became disheartened. Then I stepped back to think about what I actually want to do. Aha: I am facing a design decision. I could make different choices that would suit different audiences.

For context: I took a rhetoric class in 1998 and learned the classic Rhetorical Triangle governing any communication. I then misremembered it for more than a decade till I looked it up just now. But I like my version better. So! Sumana's Rhetorical Triangle, as applicable to a piece of political software as it is to an essay, says that if you are trying to communicate with someone, it helps to consider:

  1. Audience
  2. Medium
  3. Message
My message: some topics have way less coverage on the Wikipedias than they deserve. I feel fine sticking with that. But who are my audiences, and thus which medium should I choose?

If I want terminal-savvy researchers and developers to use this tool, then it's fine as a standalone command-line script. I should stick a setup.py in there and put it up on PyPI, and switch to an all-stdin model of data input.

If I want activists and less programming-savvy researchers to use it -- people not like me -- then the path gets foggier. I haven't tested this script on a Mac or on Windows; I could work to make sure it's friendly on those OSes, and stay with the simple "gimme a textfile" data workflow. (Why make my user learn to use pipes and cat?)

But the much user-friendlier step would be to turn it into a little web app on Tool Labs. My tool would read input from a bunch of formfields and/or allow the user to upload a CSV-type file, and could output to a nice-looking HTML page with redlinks (to help you create the pages) with options for plaintext or wiki markup download. This would also make the tool a lot more discoverable by casual websurfers. And if I put it on Tool Labs, I can run queries directly against live replicas of the Wikimedia databases, which would be faster than hitting a web API.

I imagine some folks, who like great UI and more seamless data transfer, would prefer installable desktop/mobile applications with actual GUIs. But I have approximately no skills in that area and feel very little urgency about growing said skills, so I won't be going in that direction.

Once I framed my data flow problem more as a product management question and less as an implementation struggle, I found it much easier to decide. I can serve the audience that needs this tool -- activists and researchers -- while still retaining value for those with more comfort on the command line. It would be feasible to refactor the tool into:

  • a core module that takes a bunch of names, checks them against a Wikipedia, and spits out a "missing" list (you could run this as a standalone command-line script, getting data from stdin)
  • a set of web-specific functions that make it easier to get input and excrete output
And I've not yet implemented a web app that takes input from a user and spits out a relevant response, so I could do that and become a cleverer programmer, or borrow code that does most of what I want.

The simplification that makes me sigh in relief: I won't write and maintain two kinda-clashing methods of data input. (Although the tradeoff is a bunch of (arguably) feature creep.)