Goodbye DirectMemory, Welcome Apache DirectMemory

Apache DirectMemory logo

A quick post to announce that DirectMemory has been accepted for incubation in the Apache Software Foundation – and this is, of course, a good occasion to show off a new rebranded logo ;). One of the best things is that the team has grown to eight talented committers (ok, seven plus me) :

Ioannis Canellos
Maurizio Cucchiara
Grobmeier Christian
Olivier Lamy
Raffaele P. Guidi
Simone Gianni
Simone Tripodi
Tommaso Teofili

And the project will be assisted in incubation by five experienced mentors:

Anthony Elder
Christian Grobmeier
Olivier Lamy
Sylvain Wallez
Tim Williams

Good times are gonna come: stay tuned at the Apache DirectMemory incubator site for more exciting news!

Ciao,
Raffaele

Posted in DirectMemory Cache | Tagged , | 4 Comments

DirectMemory Cache exposed

DirectMemory Cache logo

DirectMemory, in just some months, has gone under three complete rewrites. Why? At first I wanted it powerful, I had a vision in which it become a new end-to-end solutions for all cache related solutions (from heap to off-heap to disk to database) but, in the end, it was a bit too much for a single person and – most of all – it was simply not needed as there were wonderful and proven solutions for all of those problem.

Except, of course, for the off-heap part that (uh-oh!) was the original ispiration. Of course, BigMemory from Terracotta is already here but it’s a paid solution and not everyone can afford it. Of course again there’s memcached that can be used as well and it’s pretty pervasive (just think about it being the cache layer in GAE), but of course it is written in C and has to be installed and managed. Also, it is well known for its performance but, being an external daemon, it imposes some network overhead on your applications. Now, while the first two rewrites tried to address the (self induced) complexity problem the last one tried to concentrate on off-heap, and on being a simpler, embedded alternative to memcached for JVM programmers. Good thing is that, having achieved a low memory foot print even for large quantities of large objects and -honestly- quite good performance even compared to standard heap caches, I think it is now a viable alternative, in some cases, to ehcache, JCS and (yeah someone is still using it!) OSCache.

Now, what’s in the box?

DirectMemory, for the sake of simplicity, exposes one static Cache facade for the lazy programmer in need of a way to temporarily store large quantities of (possibly large as well) objects. The cache facade exposes the (expected) store, retrieve, free and update methods, using strings as keys, objects as payloads and an optional expiresIn value. The lazy programmer doesn’t even need to worry about implementing the Serializable interface in his objects because they are serialized by the wonderful Protostuff library that, simply, doesn’t require that and is several times faster than standard java seralization.  DirectMemory starts a separate thread that tries to collect expired entries and, should it be needed, tries to least frequently used ones as well. You can find also a nice dump and Monitor.dump methods that stores usage statistics about buffers, hits and performance and a clean method that allows to simply reset the buffers and start from scratch with all the off-heap allocated memory free to use again. Keep in mind that memory is never released, but it can be just de-allocated and used again, there’s no way to release it to the operating system. Store, retrieve and free methods are complemented by storeByteArray, retrieveByteArray, etc… in case you want to skip serialization (because you want to do it on your own or you need to store binary data loaded from disk or the network or whatever).

While the lazy programmer above could find this enough I also would like to expose some of the internals of DirectMemory. The Cache facade uses Guava Collections to keep key references and relies on memory allocation functionalities exposed by OffHeapMemoryBuffer, which basically is a simple malloc() implementation for the JVM using direct ByteBuffers only for memory allocation and writing to the memory itself, but keeping index data in a collection of Pointer objects. The strategy is simple and effective: at startup DirectMemory allocates one or more large (up to 2gb) direct buffers and then puts a new Pointer with a reference to it (start=0, size=capacity, free=true) in a Pointer’s list. Every time a new value is stored the first free pointer is “sliced” and the new Pointer is stored into the list. When the value gets removed the pointer is simply set as “free” again, ready to be used again or collected by DirectMemory garbage collection methods. There’s few room for concurrency because Pointers get almost never removed from the list, just added. The MemoryManager facade is just a small layer that enables transparent interaction with more than one OffHeapMemoryBuffer object, working around the 2gb limitation for direct memory buffers.

These approach makes using off-heap memory easy and fast and works around ByteBuffers limitations (making frequent calls to ByteBuffer.allocateDirect as shown in the great K.D. Gregory’s ByteBuffers and Non-Heap Memory article can have an impact in heap usage and should adjacent memory be not available allocation can take several seconds as well).

DirectMemory is a young project but not far from the first stable release and it has the the goal to become a useful tool in your programmers’toolbox. Trying DirectMemory is easy, if you use maven (and I think everyone should!). Your feedback is encouraged and appreciated.

Raffaele

References:

Posted in DirectMemory Cache, Uncategorized | Tagged , , | 8 Comments

DirectMemory Benchmark: Heap vs Off-heap vs OrientDB

The good news is that I finally managed to get a full refactoring done. The VERY good news is that it allowed to get a first NoSQL storage implementation done quickly and easily using the OrientDB document database as the underlaying implementation. Now, I finally have some numbers to show and here they are in the micro benchmark page of the project wiki.

I quote from the wiki (I’m lazy!):

The benchmark shows that atomic operations (put, get) on heap storage are, of course, an order of magnitude faster than on the others, but only for a small number of entries – total execution time gets affected by garbage collection – doubling the number of entries multiplies total test execution time by a factor of 5.3. Starting from 100k entries off-heap storage outperforms heap.

Now a nice picture to make it clearer:

The wiki also states that:

Doubling the number of items roughly doubles both average execution time and total elapsed for OrientDB while off-heap storage average execution time is not affected by the increase in size and number of entries

And here is the graphs showing it clearly:

I’m happy to say that the OrientDB storage implementation performs pretty well (it probably needs some tuning, anyhow) considering it keeps entries on disk (that could be retrieved later and even after a restart, which is a plus). Considering it is a one-night effort and that I still have to try out the binary database implementation it’s an excellent result.

Posted in DirectMemory Cache | Tagged , , , , , | 7 Comments

Introducing DirectMemory cache

DirectMemory Cache is an open source alternative to Terracotta BigMemory ™ and has the final goal to let java applications use large (10, 20 GB and more) amounts of memory without slowing down the garbage collection process affecting overall system performance.

Although I started writing it just to understand how the amazing BigMemory worked and as a possible open source replacement for it, the project is now growing like a kind of cache abstraction layer over pluggable storage engines (including disk, network and NoSQL databases) implementing caching specific semantic (LRU eviction and item expiry with pluggable eviction strategies,  etc..) and the off-heap store has become just 0ne of the supported storage engines.

DirectMemory itself manages the first (in heap) layer, which is of course the fastest and acts as a queued buffer for other layers, eviction (also pluggable, of both expired and over-quota items, where the quotas can be specified for every layer), performance monitoring (leveraging javasimon), etc. DirectMemory uses pluggable serializers (I have two as of today, one based on standard serialization and one based on protostuff-runtime, more efficient and that doesn’t require objects to implement Serializable) and I also have written a (simple and experimental) disk storage engine.

The project is in its early stages but is fully usable and well covered by unit tests; you can checkout the code and a simple web demo from github or just download the latest jar. DirectMemory is written in java, built and managed by Maven and leverages AOP (AspectJ) for performance monitoring and (in the next future) tracing, and eviction.

Next steps will be:

  • Integration of a NoSQL  backend (OrientDB would be just perfect, but I was also thinking about Voldemort)
  • network distribution aspect using JGroups or Hazelcast (of course OrientDB and Voldemort could also be a solution to this)
  • Extensive testing with huge quantities (+4 GB) of ram – any volounteer for this? 🙂

Keep in touch here, on twitter or checkout the project wiki for further information and releases – feedback is much appreciated.


Posted in DirectMemory Cache | Tagged , , , , , | 6 Comments