Clever turn of phrase on software testing
I was reviewing some notes from last year and a colleague made an off-hand remark about software testing,
> There is no such thing as no time, just no priority.”
Data munging and a spec that improves on CSV
I’ve been working with a large collection of files (several hundred GB worth) containing serialized data structures. The decision to use a marshaled structure was made years ago and for small collections, used only by the author themself, was probably easiest. What I’ve realized is:
- others need access to the data
- the marshaling is tied to the language and, to some degree, library version
- marshaling/unmarshaling is relatively slow
- the marshaled data is bloated and repeats metadata
- the hierarchal data structure is unnecessary, resulting in…
- most of the unmarshaled structure is thrown away
- damaged files are not easily salvaged
I considered XML, JSON and S-Expressions but the most portable, efficient representation I could come up with is one of the oldest and worst defined: CSV, comma separated values. I say worst because while everyone knows what it is there is no “standard” only reference implementations- which diverge, for example Microsoft Excel- and no definition for including metadata except a convention of using the first line for field names. Still, it meets my requirements. Going to a CSV representation saves me 51%-54% on disk and I can use fast, C-based libraries.
The type of data in the file is important and the field names do not uniquely identify it, so I thought of including a comment line at the file start. This breaks normal CSV implementations that expect either data or field names at the first line. I chose to use an “eye-catcher” as the first field which encodes the unique type. This wastes space, adding slightly less than 10% to the file size for a field that never varies within a given file. That is still a significant savings over serialized structures but unsatisfying. What I’d like to do is store a comment or additional metadata once. Searching for a better solution, I happened across Creativyst Table Format which has the goals:
- More functional than CSV
- Less overhead than XML
- Simplicity
and true it does all that. It is a well-written specification. Best, it neatly supports what I want to do. I could bodge together a library to read and write a basic form of it (and I still may) but as far as I know no reference implementations exist for the languages I’m concerned with. I lose portability and it is unreasonable to impose on every random colleague the requirement that they use my code or write their own parser just to access this data. So it’s a far better idea but not suitable for my situation at this time.
Which is disappointing. It should be popularized but I’m not in a position to do it. My hope is that someone reads this and cobbles together an Open Source reference implementation. Having ready implementations in Perl and Java would ease adoption and make decisions like mine simple: use the best data exchange format available.
Problem or Opportunity?
Came across this interesting observation:
> For many years I have been asking new clients to tell me who their best-performing people are. And then I ask: “What are they assigned to?” Almost without exception, the performers are assigned to problems… Almost invariably, the opportunities are left to fend for themselves.
>
> Peter F. Drucker, writing for [The Wall Street Journal](http://online.wsj.com/public/article/SB113208353287697881.html)
So you have to ask yourself, “What are you working on?”
Compensation day
We got our “numbers” today and several of my colleagues received promotions. A co-worker told that a former boss of his used to remark that, “Any day the firm hands you a check on a discretionary basis is a good one.”
He’s right, it is a good day.
On-call for turkey day
Luck of the calendar and I have on-call for Thanksgiving. It’s not a holiday in Europe or the Far East. I’ve been logged in and working since 9am. Whee.
I just hope it stays quiet.
The old in and out?
I just got back from a few days off, arrived at JFK, and now I turn around and
fly out for work, leaving from LGA. This is not how I like to plan these things but I have meetings tomorrow and the idea of a 6am flight didn’t sound so good. I’ll catch up on the 1000+ backlog of work email over the next couple of days.
New digs at work
I’m on my second day in the new cubicle in Brooklyn. So far, so good. My cube is about a third
smaller and the floor is larger but it shaves twenty minutes off my commute. I go into the Manhattan office for meetings one day per week and the engineering teams generally work together via email so it doesn’t feel much different.
Monday morning mess
Quite the mess to follow-up on today. Two teams involved in a powerdown did not communicate,
or miscommunicated, procedures. That mistake inflicted an NY-wide outage on us. Not good.
Feedback preceded me home
I’m amazed that before I got to the office (I was sick yesterday) the feedback
began coming in.
Some of the attendees told their colleagues on the other side of the
globe and those folks asked for copies of the material and want to know
when I would present it to them.
On the other hand, we saw a couple of the people who attended, who are
experienced and should know better, go off half-cocked in email today.
I might have been informative, and even entertaining, but if it is not
translated into practice it was a waste of eight weeks development
time and a week of teaching.
Back to New York
Finished up yesterday and flew back to New York. I still have a head cold.
That made for a painful descent and I probably annoyed the other
passengers as I coughed and snorted. Despite that, the last few days of
the training seems to have gone well. I’ll see how the formal feedback
turns out. I am a little irritated that the IT Training people were unaware I
was giving the course until I asked about feedback forms. Someone
dropped the ball. I have some material to add and lab questions to revise, but
I don’t foresee big changes in it when I run the course again later this summer
for another team.
Ross Lonstein