So here I am in a cloudy and rainy
San Diego, visiting Storage Networking World. This is a show for big
data center types.
Typical opening question: “So how many data centers
do you have?” But there is frequently some interesting stuff presented
amidst the vendor driven chaff that might have meaning for the SMB
market.
With a 25x data compression factor, the winner, yesterday, is Diligent Technologies
(are all the good names are taken?). They claim their technology
enables data volume compression that is over 10x what ordinary data
compression achieves — a real breakthrough. Common compression
algorithm are lucky to get 2x compression.
So if you have 100 GB to back up, their product, Protectier (see
name comment above) can turn it into 4GB, something you could burn onto
a DVD in a few minutes. All in all, a wonderful product for SMB’s — but
they aren’t selling it to SMB’s (good marketers must be scarce too).
Having spent some time looking at compression algorithms in my
mis-spent youth, I was very sceptical of the 25x reduction claim. I was
gradually cornering the charming but less-technical than me Melissa,
when up walked Neville Yates, Diligent’s CTO, whose movie-star good
looks and English accent give no clue to his manly technical chops,
which are impressive.
The way Diligent achieves it exceptional compression ratio is by
comparing all incoming data to the data already arrived. When it finds
an incoming stream of bytes similar to an existing series of bytes it
compares the two and stores the differences. The magic comes in a
couple of areas, as near as I can make out given Neville’s natural
reticence on the “how” of the technology.
First, one has to be smart about how big the series of bytes before
worrying about trying to compess it, since if it’s too short there
won’t be much or any compression. Secondly, the system needs a very
fast and efficient method of knowing what is has already received so it
can know when it is receiving something similar. And it all has to be
optimized to run in-line at data rate speeds on a standard server box —
which runs the cool and reliable Linux OS.
The big plus to this technology besides the compression ratio, is
its reliability. Since there is no assumption that two files are the
same just because their metadata is, the problem of not backing up
something you mistakenly thought was already backed up (a problem with
file-based de-duplication software) is eliminated. Further, since the
software operates on byte-streams, it can compress anything: email,
databases, archives, mp3’s, encrypted data or whatever weird data
format your favorite program uses.
So naturally I am a bit disappointed that this wonderful technology
is targetted to large data centers, even though I understand Diligent’s
thinking. A viral marketing, disruptive technology approach would be to
release a consumer version, that maybe offers just 10x compression, but
proves to hundreds of thousands of people in a few months that the
technology really works. Then the data center guys — the smart ones
anyway — will be calling Diligent.
By Robin
storagemojo.com
