Navigation


RSS / Atom



gif (LZW) patent expires tomorrow

2003-06-19 , ,

The href="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=/netahtml/srchnum.htm&r=1&f=G&l=50&s1=4,464,650.WKU.&OS=PN/4,464,650&RS=PN/4,464,650">patent
held by Unisys on the Lempel-Ziv-Welch (LZW) compression scheme
expires tomorrow June 20th, 2003. This is a good thing since the
compression is used in the gif image format and the Unix
compress among other things.

Unisys probably deserves its place as the first whipping boy for
software patents. Set aside the fact that an algorithm is not in and
of itself an implementation so it should not be patentable as a
machine or the matter that by granting such patents the Patent Office
paves the way for broad and speculative land rush tactics (think
Amazon, among others). Unisys irked a lot of people with what looked
like a bald faced money grab.

A little background: Lempel and Ziv worked at Sperry and while
there were issued a patent for the LZ78 scheme. Welch had worked for
Sperry and published his enhancements after leaving them but Sperry
was issued a patent for the work. Sperry merged with Burroughs to form
Unisys in 1986. You can pull all this from some well crafted googling
and the Unisys website.

What happened next is open to dispute. In 1987, Compuserve
introduced the gif format which made use of the LZW compression
scheme. Either Compuserve didn’t do its homework or Unisys turned a
blind eye but the result was that gif came into common use and Unisys
did not enforce its patent regarding gif until 1993. That’s when
people got angry- the world wide web was taking off and most images
were in gif format, a format they assumed was free to use, but here
was Unisys claiming that they were due for every encoder and decoder
written. An interesting footnote, IBM also holds patents on
compression and, it’s reputed, on earlier and later work that
invalidates or overlaps those owned by Unisys. IBM seemed to choose to
ignore the small fry. In any event, the timing couldn’t have been
worse and Unisys got a well-deserved black eye among technical
people.

The above linked patent, if you tried to read through it, describes
an implementation in hardware. The algorithm, first published in 1977,
can be pretty easily explained:

    Starting at the beginning of the data, look for unique strings. If
    the string is new, write it to a table. Replace the string with a
    single character code representing the index to the table for that
    sequence in the output. Continue through the data matching
    strings, adding news ones to the table and outputing codes.

You can express the above algorithm in fifteen or twenty lines of
pseudocode.

Getting even farther afield… it naturally isn’t quite that simple
in practice. For one thing, you want to only replace strings of more
than two bytes in size- otherwise there wouldn’t be any savings-
making the algorithm trivially more complicated. Another limitation is
that the table is going to be of limited size- there are only so many
unique strings you can recognize so a largely random file is going to
blow your table and ruin the resulting compression. One more is that
you need to reserve a particular bit pattern as a flag that the
byte(s) following it represent an entry in the table, so for some
sequences you will end up expanding single bytes into multiples. And
you’ll need to reserve a bit pattern to represent the end of the
file. It’s still reasonable to implement and if you do like a lot of
implementors, you use it with href="http://www.nist.gov/dads/HTML/huffmanEncoding.html">Huffman
encoding or Run
Length Encoding
you can squeeze the data even further.

Comment


Commenting is closed for this article.