JsonBatch - Journey from zero to full-fledged Batch Engine - part 1

Recently, I have published JsonBatch - an Engine to run batch requests with JSON based REST APIs. Although it is a small library, I found it quite satisfying to go through all processes, from designing to developing it. So today, I want to share my journey with you.

I. The idea

It all started with a requirement for our company product. To support one of our users migrate their own system to our system, we need to provide them an Adapter that will bridge the gap between their APIs to our APIs and maintain it for a while until they finish the migrating process.

It's a common requirement, but we want to reuse it somehow, so we don't have to repeat the same works with other users. At first, we thinking about providing user a way to config what endpoint that their API will route to, and how to map our response back to their response. Quite straight forward, but the reality is not so simple. Due to the different design of 2 systems, their API will mostly have to call multi-requests and then aggregate all responses.

That is when an idea came to me. What we need is a Batch Engine that can be configured easily & dynamically.

II. The first prototype

So for a Batch Engine, it should take a list of requests, execute sequentially, and collect all responses.

The execution logic is simple, but here comes the first challenge: How to build the request body?

It's easy if the client can supply all the request bodies, but what if a request body requires data from the response of another request? The Engine needs a way to know how & where to extract data. Because all of the requests & responses will be JSON, so if you want to locate a field, the first answer that came to me is using JsonPath. Fortunately, Java already has a good library Jayway's JsonPath that supports all my needs.

So now we done with the How question, next is the Where question. The solution I came up with is building a grand JSON contains all the requests & responses. Something like this:

{
 "original": {
   "http_method": "...",
   "url": "...",
   "headers": {
     "header_1": [ "..." ],
     "header_2": [ "..." ],
     ...
   },
   "body": { ... }
 },
 "requests": [
     {
       "http_method": "...",
       "url": "...",
       "headers": {
         "header_1": [ "..." ],
         "header_2": [ "..." ],
         ...
       },
       "body": { ... }
     },
     ...
 ],
 "responses": [
     {
       "status": ...,
       "headers": {
         "header_1": [ "..." ],
         "header_2": [ "..." ],
         ...
       },
       "body": { ... }
     },
     ...
 ]
}

With that, only 1 problem left: we need a schema format to instruct the Engine build the actual JSON body. A quick search and I found out JSON Schema. But after some research, I think JSON Schema is more suited to validating than generating, and the schema format has a structure that quite different from actual JSON. So I'm left with designing my own custom schema.

With some experiments, I came up with a schema that's simple but good enough. The schema is same as actual JSON but only string field has a specific format. For example:

{
  "field_1": 1,
  "field_2": "int $.responses[0].body.field_a",
  "field_3": "int[] $.responses[*].body.field_a"
}

With this schema, the Engine will build a JSON with:

  • field_1 = 1
  • field_2 is integer, and extract value from JsonPath: $.responses[0].body.field_a
  • field_3 is integer array, and extract value from JsonPath: $.responses[*].body.field_a

An actual JSON will look like that:

{
  "field_1": 1,
  "field_2": 2,
  "field_3": [2, 3, 4]
}

You can see the structure of schema & actual is quite similar.

Now, after all the pieces came together, it took me about 1 week to code this first prototype. In the next part, I will share how this prototype continues to evolve.