File Under “Really Not Getting It”

May 16, 2008 – 1:45 pm

The New York Times ran a piece today on the under-representation of women in the sciences—in the Fashion and Style section, of course, because hey, women don’t read the business pages, do they? *sigh*

I Will Buy You Lunch…

May 16, 2008 – 11:29 am

…or maybe even dinner, if you will build a usable table differencing tool.  I have some SQLite database files under version control that I need to merge (updated on my laptop yesterday while flying, not remembering that I’d made some updates on my desktop machine as well).  Ditto spreadsheets: I store all my grades in Excel now, but every once in a while, I have to merge changes made at home with changes made in the office. Both are multi-tabular data (”multi” because there can be many tables in a database, and many worksheets in a spreadsheet); after images, I’d guess tabular is the most common format that people want to put into version control that diff(1) can’t deal with. I can’t promise wealth or fame, but you’d sure be popular.

But I Was Gone Less than 48 Hours!

May 16, 2008 – 10:01 am

I left Toronto for Austin mid-day Wednesday, and got back at midnight last night. Lots happened in the interim, so here’s a linkandthoughtdump (which I bet actually is one word in German):

  • Gave a talk about Beautiful Code to the Austin Python Users’ Group Wednesday at Enthought’s swanky offices. (They’re the kind folks who provide web hosting for the Software Carpentry course.) About 27 people in attendance, and good discussion afterward; was grateful to Travis Vaught and Sergey Fomel for rides from the airport and to the hotel respectively.
  • Gave another talk titled “HPC Considered Harmful” at the Texas Advanced Computing Center’s Second Annual Scientific Software Days. I was a bit nervous about telling people at a supercomputing center that focusing on massive parallelism and peak performance is wrongheaded, but there were a lot of nodding heads.
  • I made lots of notes from two other talks that I want to follow up on at some point:
    • Robert van de Geijn’s FLAME system lets you draw matrix operations, then automatically generates the corresponding high-performance code. It’s a great example of a real high-level programming tool for scientists (and yet another special case of what a real extensible programming system would support).
    • Eric Jones (also from Enthought) talked about a tool they’re building that watches changes to variables in Python programs, and automatically generates interactive plots of their values. It sounds simpler and less impressive than it actually is; I’ve asked him to put together a screencast, and I think you’ll be wowed—I was. (Later: Steve Eddings from The Mathworks sent me a link about data linking in MATLAB, complete with a video tutorial.)
  • At roughly the same time, half a world away, Diomidis Spinellis presented a study comparing the code quality of Linux, Windows, OpenSolaris, and FreeBSD. Very cool work; wish I’d been at ICSE’08 to ask questions.
  • Meanwhile, Dmitri Vassiliev, who is continuing his work on SlashID this summer, has discovered that generated code is next-to-impossible to debug. Not to be a one-note symphony or anything, but I said in that same article about extensible programming systems that the real challenge is not extending notation, but creating extensible debugging tools so that those notations and high-level representations can be fixed when they break. Robert van de Geijn doesn’t think FLAME needs a debugger; respectfully, I disagree.
  • Science in the Open has a plea to scientists to make their raw data available, motivated by yet another irreproducible result.
  • Kosta Zabashta has posted early thoughts about integrating IRC into DrProject. (Gray on black? Kosta…your design skills rival mine…) I need to tell him that DrProject’s RPC module doesn’t handle tickets because Jeff Balogh is going to replace the entire ticketing system with an extensible one this summer, using his Dojo Form Editor as a front end…
  • Elisabeth Hendrickson has thoughts on automating tests for legacy web applications. Students, take note.
  • Thanks to Nick Jamil and others, we have instructions for installing DrProject on Windows. Yay!
  • Everything old is new again, including Ada and the Bletchley Park Colossus.

And then there’s this:

Thanks again to Sergey Fomel for inviting me down, and for introducing me to the reproducible research community—I’m looking forward to many more discussions.

Student Blogs

May 15, 2008 – 10:20 am

Some of our summer interns have started blogging about their projects:

I’ll add more as I get them.

SE-CSE Workshop

May 15, 2008 – 9:40 am

One of the downsides of being in Texas is that I couldn’t attend the First International Workshop on Software Engineering for Computational Science and Engineering, which was held at ICSE’08 in Leipzig this week.  Papers are here (I’ll be reading them on the flight home); they look interesting, but the biggest thing for me is the change in the workshop’s name—it used to be “Software Engineering for High-Performance Computing Applications” (2004, 2005, 2006, 2007), and I’m hoping the change of name reflects a genuine broadening of focus.

Another DrProject Design Question

May 15, 2008 – 8:42 am

We’re pleased to welcome Liz Blankenship, a Season of Usability intern, to the DrProject team. Liz, a grad student in HCI, is going to help us redesign DrProject’s admin interface. Along the way, I’m hoping she can give us some advice on a few other things as well, chief among them the notion of the “All” project.

The background is this: Trac (DrProject’s ancestor) only allows one project per installation. We weren’t going to install it forty times to manage a class of eighty students working in pairs, so one of the first things we did back in 2005 was extend it to support multiple projects per portal. We then faced two questions, which we decided were related:

  1. What project does a newcomer to the portal see by default (i.e., what’s “home”)?
  2. How do reach everybody who has an account with a particular portal (e.g., to nofify them of impending downtime)?

Our solution was to say that every portal has an undeletable project called “All”. Every user of that portal is automatically a member of that project, so mailing “all@wherever” will reach everyone, and that project’s wiki acts as the portal’s home page. As a bonus, this also provides a logical place for people to file tech support tickets: if you need your password reset, for example, you could file a ticket against “All”.

It was nice in theory, but it hasn’t worked out that well in practice:

  1. Lots of people find it confusing. This might be the project name (some people think that registering for “All” means asking to be put in all of the projects managed by a portal), but I don’t think that’s the whole story.
  2. There turned out to be lots of reasons not to automatically include everyone in the “All” project. For example, external clients for undergraduate projects really don’t care about administrative matters; they only want email forwarded from their specific project, so we had to allow people to opt out and/or turn off email forwarding.
  3. Almost nobody figures guesses “ticket All for tech support”. Instead, many users file tickets against “All” that should go against other, more specific, projects. Since we don’t have a way to move tickets from one project to another, this causes recurring irritation.

So what should we do? Options include:

  1. Change the name of “All” to something like “home”. This would be (nearly) trivial, but it seems unlikely that a simple renaming will actually solve our problems.
  2. Modify the URLs so that the project is a parameter, rather than part of the path; if a project isn’t specified, take the user to a default project. This would make the URLs harder to read aloud (a test I think any good web application should pass), but it would be more reliable than using URLs that might or might not contain a project name.
  3. Get rid of “All” and:
    1. require admins to mark one project as the landing pad, or
    2. give people a 404 if they don’t specify a project when trying to connect.
  4. Create a special kind of project for the landing pad with:
    1. a wiki that can only be edited by the admin,
    2. that automatically says something informative about all the public projects,
    3. and whose mailing list includes every user, but can only be sent to by the admin.

As you can guess, #4 is my current favorite. It could even be implemented without significant code changes, by defining a new role and giving every user who isn’t an admin that role in the default project. There’d still be the problem of its name, though.

So, what do you think we should do? And why?

On My Way to Texas

May 14, 2008 – 8:53 am

I’m flying down to Austin this afternoon (I know, I know, bad carbon karma), where I’ll be talking to the Austin Python Users’ Group about Beautiful Code, and at the Texas Advanced Computing Center’s Scientific Software Days about “HPC Considered Harmful”. I’m looking forward to meeting everyone!

Later: talk went well; slides available.

What I’m Reading These Days

May 14, 2008 – 8:38 am

A couple of students have asked, so here’s my reading list:

  • “ACM Queue”, “Communications of the ACM”, “IEEE Software”, “IEEE Computer”: all are magazines, rather than peer-reviewed research journals; I flip through each one when I find it just to see if there’s anything of interest. Good for broad, high-level overview of what everyone else is thinking about; I guess I read about 0.5 articles per magazine, and spend no more than 2-3 minutes flipping through them on average.
  • “Empirical Software Engineering”, “IEEE Transactions on Software Engineering”, “Automated Software Engineering”, “ACM Transactions on Software Engineering and Methodology”, and a few others: the specialized peer-reviewed journals of record in my area. Very low hit rate these days (maybe one article in ten), partly because they cover the whole of software engineering, and partly because most of the things being discussed seem to have little to do with real-world software development as I’ve experienced it.
  • “Discover”, “American Scientist”, and “New Scientist”: these are for fun (yeah, I’m a science geek); I have a couple to take with me on the plane to Texas this afternoon. (I’m particularly fond of Brian Hayes’ column in “American Scientist”…)
  • “Computing in Science & Engineering”: figuring out how to make scientific programmers more productive is the main reason I’m in academia (see http://swc.scipy.org for my current best guess). I’m on the editorial board of this magazine, and I’d guess I read about 1/4 of the articles end to end.
  • Doctor Dobb’s Journal“: has been talking to professional software developers since the late 1970s. Most of my book reviews appear here, and I find two or three articles in each issue worth reading from end to end. A lot of what I know about real-world technologies I pick up here.
  • “Software: Practice & Experience” and “The Journal of Systems & Software”: in-depth descriptions (and critiques) of real software systems (which is what I thought software engineering would mostly be about, back in my naive and idealistic youth). The first description of “Make” appeared in “SP&E” way back in 1975, and a recent issue of “JSS” described a dozen different systems for tracking the provenance of scientific data. High hit rate…
  • SIGCSE: is the Special Interest Group on Computer Science Education. They have an annual conference, and I go through the proceedings article by article every year (high hit rate). I’ve also started reading the proceedings from ITiCSE and CSEE&T, which are (respectively) a European equivalent to SIGCSE and a conference on software engineering education and training.
  • Adam Goucher’s blog: best tester I ever worked with, now thinking about what QA really ought to be about. I also enjoy the Google Testing Blog.
  • The Beautiful Code blog (mostly written by Michael Feathers, author of one of my favorite books): lots of good thoughts on software system design.
  • The Computer Science Canada blog: student-run, student-written, interesting viewpoint on the world (always looking for contributions, by the way).
  • The DemoCamp blog: DemoCamp is the equivalent of open mike night at the pub; tech people from small companies and startups in the Toronto area get up and give lightning demos and talks about what they’re doing. Since David Crow founded it two and a half years ago, it has spread to more than a dozen other cities.
  • Joel Spolsky, Jon Udell, and Bruce Schneier: the first is more often amusing than deep, the second reminds me of of John McPhee’s essays, and the third is frankly scary.
  • XKCD: the ironic person’s Dilbert.

You can also check out my recommended reading list (slightly out of date — see my LibraryThing page for a more complete list, mostly sans review).

David Ascher Has Nice Things To Say…

May 14, 2008 – 8:03 am

…about the work that Mike Wu and Ronald Fung did for Thunderbird last term.  Feedback from other clients was equally positive; I’m hoping/looking forward to running the course again next year.

A Different Perspective

May 14, 2008 – 8:01 am

Interesting post from Jorge Aranda, reporting a talk from 1969 that still resonates.