Interview With Eli Efrat, CEO of Axxana

The technology of Axxana is unique but not easy to understand. That’s why we did this interview below. The problem is to know if this Israeli start-up, headquartered in Auburndale, MA, with R&D in Tel Aviv, Israel, and an office in Cambridge, UK, with currently 30 employees and getting $29 million in financial funding, is really successful after being founded 7 years ago and operational since 5 years. CEO Efrat refuses to give us any figure (revenue, number of customers, etc.) to be convinced. We only know the name of three customers that acquired its highly ruggedized and expansive boxes: Animal Health International (animal health products distributor), Avago Technologies (supplier of analog interface components for communications, industrial and consumer applications), Casas GEO (housing developer in Mexico). Add four unknown ones: a Portuguese financial institution, a brokerage firm on the US East Coast, a global bank situated in Manhattan, and a data center hosting firm in Germany. That’s all. Member of EMC Select program but without OEMs, the company said to have reseller partnership with 13 firms, more than known customers: GDT, AOS, The Pinnacle Group, GTRI, Nexus and FusionStorm in USA, Synapse 360, Union Solutions and S3 in the UK, Synergy and Dedanext in Italy, Vanume in Mexico, and Malam-Team in Israel. (Editor)

[Note that we did some corrections to the transcript one and two days after the publication.]

Eli Efrat, since 2005 co-founder and CEO of Axxana, was formerly CEO of MessageVine, in providing presence and instant messaging solutions to wireless carriers and ISPs; before that, he co-founded Veon, in digital video and broadband Internet, acquired by Philips to become MP4Net Group; prior to that, he co-founded Algotec Systems, in picture archiving and communication systems (PACS) and now a Kodak company; he was member of the Israeli Defense Forces Haman Talpiot project.

StorageNewsletter: What’s the difference between synchronous and asynchronous replication?
Efrat: It is one of the dumbest technology you could think of actually. Some of the replication technologies out there are such sophisticated, RecoverPoint or other vendor’s technologies for replication. Our technology is just plain "dumb" in some sense. Basically when you want to do data protection and replication there are two methods for doing so. The first one is synchronous: The application server is writing on the storage and then the storage sends the data to another data center. The application is waiting for the write to committed on the storage. However that write is not committed yet until written on the secondary storage,
in the secondary data center. The secondary storage sends the commit back to
the primary storage, and the primary storage then commits to the application,
which means that you have to wait for the transaction to be completed.

And can you have data loss ?
No because, if the database write does not get the commit from the primary storage after replicating the data then this write never happened. If the write didn’t complete, the transaction never happened on the application side.
If you have thousands of transactions coming within microseconds, with synchronous replication all writes have to occur in the order it was initiated by the applications. The first transaction will immediately go to the secondary data center and be committed there The second transaction will wait in line for one transaction to complete. Transaction number 9 will wait 8 transactions, number 500 will wait 499 transactions. This means that you have to utilize very costly FC communications lines and you have to be within the range of the speed of light, otherwise the application will start delaying and will stop.
Synchronous replication does not lose data because logically you don’t lose any transaction. It means millions of dollars on the communication and it means millions of dollars on building a close-by data center. Many organizations in the world are utilizing close-by data centers or service providers, including cloud service providers. They have to be close to their customers to offer synchronous replication. Think about the amount of money spent and the amount of excess data centers out there due DR distance considerations.
Because it’s too costly for most of the end users to deploy nearby data centers and because the second replication method – asynchronous replication – can go the distance, most end users opt for asynchronous. In the case of asynchronous the primary storage immediately commits to the application. There is no delay, but data is accumulated at the primary data center the data befire it is ent to the other side. With asynchronous you can lower the bandwidth of the communication and you don’t need fiber.
Even in asynchronous replication, customers don’t want to spend so much on the IP lines. They want to replicate more but it’s just because the bandwidth of the communication lines is never enough. We are saving 33% on the IP communication lines on asynchronous and sometimes much more than 33%.

I heard that there are other ways to do that with three nodes replication.
If most of the orhanizations can’t do two data centers synchronous then they definitely can’t do three data centers (nearby synchronous + remote asynchronous). Even in the case of three data centers topologies, before Axanna, if it’s not a regional disaster I have something in my data centers that I can failover to the synchronous close-by data center, I’m not going to lose data. I still pay for the lines and the data centers, but it’s going to be synchronous.
The problem with synchronous is not just the money. If incur something like hurricane Sandy and both New Jersey and New York are hit and you have data centers in both places, both data centers can lose power and will be out of commission. You spent millions of dollars on infrastructure and you can’t use the data centers because it’s a regional disaster. In that case I can failover to the remote one, in London or San Francisco, and it’s going to be asynchronous, but I am going to lose data. In some of the cases I will be able not to lose data and in others I will lose data. With synchronous only I will lose everything in a regional disaster. With only asynchronous I know I will lose some of the data but I’ll still have the data centers. It’s costly but I’m a little bit more protected. But if you think about, three data centers topologies do not solve he problem. They just move the problem a little bit. The problem with DR is that there hasn’t been innovations in many years…

No, today we at least have the cloud, many companies are doing replication using the cloud
Cloud is amazing for us. But if it’s private cloud, at the end of the day the systems are sitting on storage and whether it’s virtual or not virtual, somewhere on the nodes there is still physical hardware they need to protect as well. We are starting to talk to cloud providers about how they are protecting their own data centers. Some of them are starting to talk about DR as a Service and we come to play very well with the cloud providers offering DR as a Service, with the black box in production sites as well as the multi-tenant solutions on the cloud. Cloud is a fantastic solution and we’re definitely into that.

I suppose that your solution needs a lot of cache.
The secret sauce is that we don’t need a lot because the way we work is we sit on top of asynchronous replication. We take the very acute problems that async and sync pose, and combine both technologies together getting rid of the inhibitors. The application thinks that it’s being protected synchronously but the storage knows that it’s protecting it asynchronously and the network topology is asynchronous and the infrastructure is asynchronous. So all we have to do is protect that information that is going to be destroyed during a disaster in an asynchronous environment. Now with the invention of SSDs, Axxana came to play. Previously mechanical disks, holding the asynchronous data gap, would have been destroyed. We keep only that gap. When we talk to customers and show them the number of transactions and the value of their data, it is sometimes translated to millions of dollars for them. The difference between async and sync could be hundreds of thousands of dollars even though it’s not all the data. Now, in asynchronous replication, because you’re losing data, the data has to be completely consistent on the application side. Exchange with Exchange, CRM, databases, they all have to stay consistent. If you make a tiny mistake and you provision the wrong volume in the wrong consistency group you will carry that mistake for weeks and months and until you come to a point where all the data across all applications is consistent, you’re going to lose much more, then you planned for. Not losing any data influences your ability to failover (RTO) not just the value of the data that was lost.

If the line is broken you need to store all the data.
Yes and this is why today we have the one box that is 73GB and you’re talking about lags in the hundreds of megabytes. We have one box with 73GB and another one with 400GB and we’re soon coming out with another box that is actually larger than that. We have enough cache in the box to protect end users against replication link failure. We have larger disks. All SSDs.

What do you need in terms of hardware on both sides?
For hardware we have our box and we bring with us clustered servers, 1U, two of them because everything is redundant. These servers collect all the data from the replication system and send everything to that very ‘dumb box’ to offer zero data loss at any distance. In the secondary site side we only need a piece of software that we call a ‘Recoverer’ that comes into play once a disaster occurs. Not a lot of hardware. We make sure that the replication engine knows us so that you have to do very little in terms of installation. You put it there, you zone it, connect it to the SAN and everything happens automatically.

So you have one box collecting the data and another one doing the transmission.
No, we have only one Phoenix Black Box and that’s the one saving the lag information and then we have the collector to collect the data from the storage or the replication. The reason why the collector is there is because it needs to get everything very quickly and introduce no delay to the replication process and write that data on the black box which is fiber, local, no delays, no latency. So we have only one Axxana box and the other one is just software sitting on a 1U server.

What kind of connection are your users generally using?
When you’re doing asynchronous you’re using IP network. The WAN is on IP from one data center to the other.

So none of them is using fiber.
There is a play whereby you can connect also to protect synchronously. There are several other synchronous scenarios where Axxana comes to play, so customers can use fiber or any other connection.

Why don’t you use de-dupe to lower the transmission?
We basically don’t transmit and we don’t do the replication. This is done by the replication engine, by the replication software. If you think about it we have one scenario where it’s active-active. Our basic scenario is replication from production site to disaster recovery site. We only need a box in production, we don’t need a box in the DR site because our box is resilient to disasters, so we have to be where the disaster is. If you want to do A to B and B to A, and this is a scenario that too, you put two boxes but it’s a double case of the basic A to B. It’s just that the DR site is also an active site. But in a simple scenario they are only at the primary site with the black box and for that we don’t need to do the de-dupe because it’s already been done. For example with EMC RecoverPoint, they have a de-dupe engine inside.

Your Phoenix Products works only on EMC RecoverPoint?
Today yes.

I suppose that you are in a relationship with companies that could be interested in your products, like IBM or Hitachi?
I cannot talk about anything that is not public, but in essence you are right. In theory yes, in practice I cannot talk about anything that is not public.

Will it be a problem to adapt your products to IBM or HDS enterprise arrays?
Three words for you, not at all!

Did one of your customers ever encounter data loss?
No! Never.

And will it be possible in the future to have lower-cost version?
Yes. Again can’t tell you but basically yes.

What’s your roadmap?
We want to reach a point where we are UPS for data. Not the shipping company but the uninterrupted power supply for data. Our roadmap goes through additional storage platforms, additional storage vendors, additional implementation for the same technology in data protection not only replication. Replication is very much inherited from storage. Going out from storage to the application level, implementing other methods for data protection.

With the possibility to replicate the applications and eventually the OS?
Exactly. After developing implementations for additional storage vendors, layers, moving from the infrastructure and storage layer, to the OS and application layers, then you’re talking about coming up with more products for a much ‘lighter’ data protection arena. All in the EDR (Enterprise Data Recording) space. There’s a lot to do.

What is the average price for a Phoenix protection for the end user?
Unfortunately I cannot talk about average price I can only tell you that we are very happy with the list price of $214,000. Average is not something that I can divulge unfortunately.

Does the price only depend on size of the SSDs or does the bandwidth influence the price too?
No, you’re right, the size of the SSDs. We’re trying to be very simple with our model, not to complicate too much.

Where do you buy your SSDs?
From Stec, currently.

Last question: A lot of Israeli storage start-ups are acquired at the end by a big companies. Does your plan is to grow for an eventual IPO or to be acquired?
Absolutely to grow the company. Here’s the thing, I can’t tell you what will happened, what I can tell you what will happen. What I can tell you are our plans, and how I’m looking at it. Is there a potential to become a large vendor offering EDR and offering the kind of solution that we offer to everyone. To me, absolutely yes. Maybe we’ll be the first not to get acquired.