How LogJoint parses JSON files

Log as string

LogJoint considers an JSON log file as one big string. This is a logical representation, of-course physically LogJoint doesn't load the whole file into a string in memory. A string here means a sequence of Unicode characters. To convert a raw log file to Unicode characters LogJoint uses the encoding specified in your format's settings. JSON file does not have to be pretty-printed to look nice in LogJoint.

Suppose we have this log file:

{"timestamp":"2018-05-22 20:25:35.968","severity":"INFO",thread:"123","msg":"Hi there"}
{"timestamp":"2018-05-22 20:25:42.005","severity":"ERROR",thread:"123","msg":"Error occurred!","exception":"WebException"}

The log contains two messages, each represented by a JSON object. The second message is of severity error and it includes exception information in additional property exception.

Header regular expression

LogJoint uses user-provided regular expression to split input JSON string into individual log messages. This regex is called header regular expression. It's supposed to match the beginnings of messages. It might look unnatural to use regexps against JSON texts. The reason for this approach is efficiency - with the regex in hands LogJoint can read a random part of potentially huge input file and start splitting this part. In our example the header regular expression may look like this:
^              # new messages should start from new line
{              # JSON object start
\s*            # skip spaces if any before first attribute
"timestamp":"  # expected first mandatory property

Note that LogJoint ignores unescaped white space in patterns and treats everything after # as a comment. Programmers can read about IgnorePatternWhitespace, ExplicitCapture, and Multiline flags that are actually used here in msdn: RegexOptions Enumeration.

LogJoint applies the header regular expression many times to find all the messages in the input string. In our example the header regex will match two times:

Thick black lines show message boundaries. After applying header regex LogJoint knows where the messages begin and where they end. A messsage ends where the next message begins.

Normalization with JUST transformation

On the next step LogJoint applies user-provided normalization JSON transformation to each message separated out on previous step. Transformation syntax is described at JUST.net home page. The output of this JUST tranformation must be one JSON element with the following schema

{"d":"datetime: yyyy-MM-ddTHH:mm:ss.fffffff","t":"thread id string","s":"severity: i, w, e","m":"Log message"}
			

Only d property is mandatory. Severity property s can contain any string, but only first character of it is compared (case-insensitively) with expected values i, w, e.

LogJoint knows how to interpret and display transformation output. Basically your JUST tranformation tells LogJoint:

For the sample log above the transformation might look like that:
{
  "d": "#customfunction(logjoint.model,LogJoint.Json.Functions.TO_DATETIME,#valueof($.timestamp),yyyy-MM-dd HH:mm:ss.fff)",
  "m": "#concat(#valueof($.msg),#ifcondition(#existsandnotempty($.exception),true,#concat(\nException: , #valueof($.exception)),))",
  "s": "#valueof($.severity)"
}

Within JUST code you can use predefined JUST functions as well as #customfunctions provided by LogJoint. Example of calling a custom function TO_DATETIME see in above code. See functions reference for the list of available functions.