DirectMemory Benchmark: Heap vs Off-heap vs OrientDB

The good news is that I finally managed to get a full refactoring done. The VERY good news is that it allowed to get a first NoSQL storage implementation done quickly and easily using the OrientDB document database as the underlaying implementation. Now, I finally have some numbers to show and here they are in the micro benchmark page of the project wiki.

I quote from the wiki (I’m lazy!):

The benchmark shows that atomic operations (put, get) on heap storage are, of course, an order of magnitude faster than on the others, but only for a small number of entries – total execution time gets affected by garbage collection – doubling the number of entries multiplies total test execution time by a factor of 5.3. Starting from 100k entries off-heap storage outperforms heap.

Now a nice picture to make it clearer:

The wiki also states that:

Doubling the number of items roughly doubles both average execution time and total elapsed for OrientDB while off-heap storage average execution time is not affected by the increase in size and number of entries

And here is the graphs showing it clearly:

I’m happy to say that the OrientDB storage implementation performs pretty well (it probably needs some tuning, anyhow) considering it keeps entries on disk (that could be retrieved later and even after a restart, which is a plus). Considering it is a one-night effort and that I still have to try out the binary database implementation it’s an excellent result.

Advertisements
This entry was posted in DirectMemory Cache and tagged , , , , , . Bookmark the permalink.

7 Responses to DirectMemory Benchmark: Heap vs Off-heap vs OrientDB

  1. Tomas Espeleta says:

    That’s so cool! I’m following your work @OrientDB site… great job!
    You know, I’m struggling to make fast read operations for bigger datasets: are you using db.load(ORID) method or OSQLSynchQuery?
    I’m trying the load method, that it’s supposed to be the fastest way to retrieve nodes… but I’ve seen that after 540’000 graph elements (I’m using ODatabaseGraphTx, but at the end there is an underlying doc db) the performance of READ operations is drastically reduced (from 120ms to over 2000ms, each 1500 elements reads)
    Could you try your solution for, let’s say, 1million elements?
    I’m going to post my stats and my testcase in OrientDB group.

    • I’m using db.load(ORID) – OSQLSynchQuery was an unneeded performance drag (it slowed retrieve operations a lot, even of a 30x factor when the database exceeds the 10k entries) – and I only use key-value storage (remember, DirectMemory it’s a cache) of byte arrays. I’m really willing to test it extensively with large quantities of objects – only thing I need a more powerful (64bit OS and 10+GB ram) test machine.
      I maybe found a volounteer (thanks, @bayoda, I hope the offer is still valid) or I will have to look at EC2 (they provide AMIs with 68GB ram) but I would like to consolidate things a bit, and proceed gradually before investing my time in learning EC2 and doing linux system administration (although, of course, it would be fun 😉 )

      • Tomas Espeleta says:

        Yeah… more or less the same results for queries here: even using indexes and some performance tuning…
        For the tests: do you really need so much RAM? Isn’t it configurable? (like… how many mb in heap before switching to off-heap, how many off-heap before switching to file store)
        I thought that was the logic, because once you said you were using LRU eviction…
        My proposal was, 1million elements on orient… not in-memory 🙂

      • Oh, sure, I didn’t got it at first :P, and you are perfectly right. Well, this little benchmark was focused on Storage implementations and to prove the case for off-heap, but I already have some tests like the one you suggest but with using the FileStorage implementation (that was really too slow and, at the end, wasn’t really fit for the job). I’ll upgrade it to OrientDB (with both this and the upcoming BinaryDatabase implementation) and give it another spin. Thanks for the suggestion, in the meanwhile

  2. Tomas Espeleta says:

    I posted on Orient group some stats for my use case (fastest possible “get_if_exist_else_add” with orientdb). load(ORID) is indeed the better solution. To manage it with millions of vertexes, I used jdbm2 to store a Map…
    I thought that you could directly try to use jdbm2, as alternative to OrientDB: your case should be simple enough to use directly jdbm2 🙂 If find some time I’ll try to help you out.

    • Well, I already explored the jdbm2 option but choosed to go with orientdb first – mostly because of its scalability but also because my objects are already serialized (with protostuff-runtime) and jdbm would serialize them again; I’m not sure how much would it cost in terms of performance.
      But if you feel in a helpful mood extend the Storage class (mimicking http://bit.ly/fSBbNA) and give it a spin! 🙂

  3. Pingback: DirectMemory Cache exposed | Raffaele P. Guidi's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s