Thursday, November 01, 2012

OrientDB: huge improvement in performance (+9,000%) in many use cases. Thanks RaspberryPi !



Hi all,
today I've a good story to tell you. A couple of days ago Fabrizio Fortino sent to me an email with some metrics and screenshots about the profiling of an in-production instance of OrientDB. Well, a lot of time was spent on open/close of database. 

That was the issue 1145 (http://code.google.com/p/orient/issues/detail?id=1145) but I assigned to it a low priority because it was an improvement, not a real bug...

Well today I'm hacking with a Raspberry PI cheap HW and OrientDB to see if it could be used in production for some limited use cases. Well on this kind of HW everything is much-much slower! "Yeah, it's normal: I have a $35 HW, Java is not so optimized yet on this ARM platform, etc.". This were my firsts thoughts about the initial results.

But after some profiling I was arrived at the same conclusion of Fabrizio, so I decided to spend 2 hours of my life to investigate in deep.

Well, I've just committed a small patch (r7134) that avoids to open a database every time a database is re-used from the pool. In facts this is a quite costly operation, specially if you do many small atomic operation where most of the cost is in open/close that in the operation itself!

This fix improved a lot these scenarios:
  • Usage via HTTP/Rest, because a new connection is acquired every time from the pool at every operation
  • Java Web Applications where at the server side you used the database pool
  • you wrote a Java App that every time creates a new instance of a database. if this is your case I strongly encourage using the database pool that at this time is much faster
  1. in case metadata changes (schema, security, functions) you would need to invoke a reload() to get the changes
The improvement will be minor in the cases:
All these are PROS, what about CONS?
This is a simple load of a tiny document against a database on my pc:

$ ab -n1000 -A admin:admin -k -c10 http://localhost:2480/document/demo/71:1
...
Requests per second:    52.56 [#/sec] (mean)

$ ab -n1000 -A admin:admin -k -c10 http://localhost:2480/document/demo/71:1
...
Requests per second:    4694.57 [#/sec] (mean)

This is 90x faster, namely 9,000%, namely a huge improvement!

Now it's funny that OrientDB on the Raspberry PI, with the new patch, runs at a speed quite close to my PC I used everyday to work before this patch!

Saturday, May 05, 2012

GraphDB market share

Last week a market analysis agency contacted me to ask some questions about OrientDB saying that OrientDB, following its research, has the second position in the worldwide GraphDB market right after Neo4J. Awesome!

But who are the main players of the GraphDB market?
Since each vendor claims, more or less, to be the market leader, what is the real user base? Seems quite hard to gather real data about users and customers directly from vendors.

So I though that one of the best way is to look into the public groups and forums because users, before or after, will subscribe on it because it's the first hand source of information, help and tricks. They can't lie! This document contains some metrics extracted from public sources. Click on the source to see with your eyes about the source I used.

By reading this data Neo4J is, without any doubts, the GraphDB market leader, followed by OrientDB in rapid grow and after a long distance InfiniteGraph and  DEX. By reading the web site  InfiniteGraph seems to have some real customers, but seems all related to the previous product ObjectivityDB (an ODBMS born more than a decade ago).

Below the metrics:


Products
Updated on
05/05/2012
05/05/2012
05/05/2012
05/05/2012 
Source
created on
April 2011
April 2010
September 2011
May 2011
members
926
620
75
?
threads since the beginning
1,240
1,449
36
33
posts since the beginning
6,752
7,918
233
87
posts in the last month (April 2012)
1,107
439
19
0
posts 2 months ago (March 2012)
1,310
519
13
7
but 100% announcements

Monday, February 20, 2012

Why I hate Maven

Yes, I admin that Maven has improved the development of Java programmers because the tons of dependencies each project brings.

So why I hate it so much? Well, because the thousands (really thousands!) of network calls to the remote server to:

  • check versions
  • check md5
  • download pom.xml files
  • download jars

But why Maven has been realized in the way we know? All the logic is at client side. This means that each Maven user pays the absurd latency cost for each network calls! The solution? Git teaches.

Why don't build a tree of requested JARs, send it to the Maven server and download the resulting zipped archive containing all the stuff to install in one shot?

In this way updates daily updates would take ms or just some seconds depending by the updates and the network bandwidth, not any more by the network latency.