What are you looking for ?

History 2002: New Credo for EMC with Centera

Content Addressed Storage or CAS

There are several ways of looking at Centera, EMC’s new disk subsystem: either simply as a competitor to Network Appliance’s NearStore for disk-to-disk (D2D) backup, or as replacement system for paper, microfilm, optical disks and magnetic tape for archiving data or documents that won’t change.

Emc Centera

Most of the Centera project, both software and hardware, originated with a 55-person strong Belgian firm called FilePool, acquired by EMC in April 2001 in a cash transaction of around $50 million.

The small firm subsequently said goodbye to its 19th-century castle HQ, with expansive gardens and assorted barnyard animals, in favor of a more modern facility in Mechelen.

In terms of hardware, Centera is a NAS with IP connection, consisting of a series of racks (EMC calls them “nodes”), a JBOB (and not RAID) interchangeable subsystem, comprising 4 low-cost Maxtor 160GB IDE HDD drives and its own 850MHz Pentium III processor.

These hot-swappable racks are interlinked on a dedicated network with an Ethernet switch.

One cabinet contains a minimum of 16 racks, or a raw capacity of 9.6TB, but this translates in practice to half that amount in user capacity, since each record is mirrored on another disk on another rack.

The cabinet can contain a second array of 8 or 16 racks, and up to 16 cabinets may be connected to form a “cluster”, with a maximum of 7 clusters for a single “domain,” which brings us to a potential grand total of 1.05PB.

Jan Van Riel, one of the founders of FilePool and now GM of the EMC Belgian development group, has coined the term RAIN, or Redundant Array of Independent Nodes, to describe this new type of architecture: “Every object is mirrored somewhere else and the management data are spread out over different nodes.”

In performance terms, we’re nowhere near NAS or SAN technology, but clearly ahead of tape, where access times are usually in minutes, or optical disk, in tens of second. Announced access time here under 1s.

More specifically, according to Jim Rothnie, “The time to get the first byte of the data object to the interface is under 300ms.”

He also reports bandwidth at 5MB/s for each connection, evoking a typical configuration of 4 such connections, for 20MB/s.

Centera’s hardware runs $101,500 (it’s rare for EMC to give out prices!) for the minimum 4.8GB user capacity, or exactly the same price per gigabyte as NetApp’s NearStore (which also uses the same Maxtor drives).

Of course now, with EMC, hardware is never as important as software, and behind the software here is a concept that may not be new, but has a new name, CAS or Content Addressed Storage.

To honor the new product and its new concept, EMC has even set up a new business unit, headed by VP and GM Tom Heiser, who has been with the company since 1984.

NAS and SAN systems manage blocks and files, primarily transactional or database information that are updated regularly. CAS, however, addresses the storage of objects that, once they are electronically created, will no longer be modified – fixed content. This sort of data the Enterprise Storage Group, meanwhile, calls “reference information,” which includes scanned documents or books, engineering drawings, bank check images, X-rays, email and their attachments as well as multimedia files.

The market analyst firm projects a market of $28 billion in 2006. 50% of new data currently in electronic form is of this type, and 75% of the 57 billion gigabytes that will be created in 2004 will also be in this form.

Each document stored on Centera is given a unique “signature,” a logical (and not physical) address or key of 128b that is attributed as a function of its structure or specs, rather than its content, before the object is then recorded to two different nodes for safety reasons.

As a result of this signature, the CentraStar software is able to verify whether a new incoming document is identical to one already recorded, in order to avoid data duplication. The two documents, however, must be completely identical. For example, the same image compressed with 2 different methods would be considered different. Indexing of stored objects is effected in XML language, a fairly common tendency these days.

For the moment, there is no compression system for Centera, nor is there any means to search content. If a disk or a rack should give up the ghost, however, it can be recovered from the files mirrored elsewhere in the cluster.

This, incidentally, allows EMC to offer a service contract limited to 2 visits a year, solely for the purposes of replacing defective racks.

No archiving system on tape or other media is currently available, but there is the possibility of asynchronous copy to LAN, MAN or WAN. CentraStar runs $103,200.

There seems to be some disagreement at EMC on one point, interestingly. According to Heiser, the hardware cannot be sold without the software. For Rothnie, however, that issue can be studied on a case by case basis. On the other hand, there is no prohibition in licensing the software to use it on another hardware platform.

Without the software, however, the user is once again confronted with the challenge of getting past EMC’s tric proprietary APIs, in order to access documents on Centera.

EMC is mainly counting on the channel to sell the product, and has signed with 40 or so partners involved in various aspects of fixed content data, as well as with backup companies such as Legato (although not Veritas or Computer Associates).

Legato, incidentally, would have no reason logically to develop an interface with their own backup software if the end user is forced to buy the expensive CentraStar in its full version. Instead, it will turn to NearStore, Quantum DX30 or even start-up Isilon systems, which now offers magnetic disks.

The question also remains whether Centera, which is positioning itself in the hundred-million dollar price range, will not compete directly with EMC’s Symmetrix, in the million dollar range, certain of which are already used to store or mirror, at least partly, fixed content documents.

Among EMC’s arguments, we find the same old line that has been advanced for the past 15 years or so by fans of document management on microfilm, optical disk and magnetic tape, vaunting the market for the “paperless office” that has never really taken off.

EMC’s real advantage with Centera, to our eyes, is less the price than the fact that it constitutes a truly online storage system, thanks to its use of HDDs.

At the same time, we have some difficulty imagining lawmakers, from whatever country, already reluctant to accept film or WORM disks as probative evidence, suddenly approve of recording to a modifiable magnetic medium for legal purposes, especially since on Centera it is possible to delete a record. One has only to recall the ongoing Enron affair, in which countless documents are alleged already to have been fraudulently destroyed.

This article is an abstract of news published on issue 172 on May 2002 from the former paper version of Computer Data Storage Newsletter.

Read also :