Monday 4 July 2016

MongoDB Jolie Connector Part 1

One of the buzz words of the IT community it is definitely "Big Data", yet like it often happens the actual definition is somewhat vague, is "Big Data" a question of volume of data or also a question of performances.The answer can be found ,in my opinion, in the definition provided by Gartner: "Big Data:is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation"
Does it means that to do "Big Data" we need to have a new DB technology: well  the short answer is yes we do, traditional DB technologies poorly fit  with some of the "Big Data" requirements. This new challenge has created a proliferation of new DB technologies and vendors with different approaches and solution. A good review on the several option has been provided by the following site Database technology comparison-matrix
MongoDB it is one of the possible technology choices, it is my opinion that with its document approach makes data modeling a much more linear approach easing also the not always comunication between IT and not IT members of a company.
The next question  is: "Can microservices play a role in a Big Data scenario?". It is my opinion  that microservices  can play an important role in developing data handling in a "Big Data" landscape. Expanding on the micro services orchestration concept, it can imagined a scenario where a poll of microservices  can be used as data orchestration where new "aggregated data" are created by using individual function exposed by the microservices. This will certainly cover two of characteristic of Gartner definition:
  1. high-variety information:  The orchestration of original data into new aggregated data will avoid the creation of new  database bounded table or collection 
  2.  cost-effective: microservices are effective as much we are able to decompose correctly the problem on hand
On the purpose of  bringing Jolie within the Big Data discourse a MongoDB connector has been developed: the choice of MongoDB as first Big Data technology approached  by the Jolie's team has not been accidental but driven by the tree like data representation used by MongoDB that extremity similar if not identical to the way Jolie represents complex data in its  complex type definition.
The connector has been developed externally the normal Jolie Language development but it is developed by the Jolie team members and it is also responds to a specific demands of a growing Jolie programming community.
The developing team has adopted a top down approach when developing the connector starting from an analysis of a set of requirement passing through the interface definition to arrive to some pilot project ( which  result will be presented further on ).
The identified requirement were  the following
  1.  Preserve the not SQL nature of MongoDB
  2. Give to the connector user a simple and instinctive interface to the MongoDB world
  3. Preserve some of the native aggregation and correlation characteristic of MongoDB
From this three requirements the following considerations have been thought 
  1. The connector will have to have all the CRUD operation 
  2. The connector will use the native query representational (JSON) 
  3. The connector will provide access to MongoDB aggregation capability
  4. The naming convention of  service operation types replicate the terms used by MongoDB,
and resulted in the following interface:


interface MongoDBInterface {
  RequestResponse:
  connect (ConnectRequest)(ConnectResponse) throws MongoException ,
  query   (QueryRequest)(QueryResponse)   throws MongoException JsonParseException ,
  insert  (InsertRequest)(InsertResponse)   throws MongoException JsonParseException ,
  update  (UpdateRequest)(UpdateResponse)   throws MongoException JsonParseException ,
  delete  (DeleteRequest)(DeleteResponse)   throws MongoException JsonParseException ,
  aggregate (AggregateRequest)(AggregateResponse)   throws MongoException JsonParseException
}



Like mentioned before the connector aims to propose a simple to use interface:
Starting from insert operation

q.collection = "CustomerSales";
    with (q.document){
          .name    = "Lars";
          .surname = "Larsesen";
          .code = "LALA01";
          .age = 28;
           with (.purchase){
             .ammount = 30.12;
             .date.("@type")="Date";
             .date= currentTime;
             .location.street= "Mongo road";
             .location.number= 2
          }
     };
As can be seen the document can be inserted int the "CustomerSales" collection by simply defining the document to be insert as Jolie structured variable.

The query operation or any other operation that require a subset filtering ( update , delete ,aggregate )

q.filter = "{'purchase.date':{$lt:'$date'}}";
q.filter.date =long("1463572271651");
q.filter.date.("@type")="Date";

notice the use of  '$nameOfVariable' to dynamically inject the value in search filter
Using the sane representation it can be inject the value for an update operation

q.collection = "CustomerSales";
q.filter = "{surname: '$surname'}";
q.filter.surname = "Larsesen";
q.documentUpdate = "{$set:{age:'$age'}}";
q.documentUpdate.age= 22; 

and so it goes for the aggregation:

q.collection = "CustomerSales";
q.filter = "{$group:{ _id : '$surname', total:{$sum : 1}}}";

Use and limitation of the connector

The connector can be downloaded here ( Jolie Custom services Installation ), the current release has to be consider a Beta release, therefore prone to present  imperfections and bugs
Some of them are already known and are on the way to be corrected:
  1. Logoff operation missing
  2. Handling  MongoDB security
  3. DateTime storing error[UTC DateTime
the user must consider the following native data type mapping

Mongo Type Jolie Type Detail
Double double
int32 int
int64 int
String sting
DateTime long with child node ("@type")="Date"
ObjectId long