.. _yajl_fort:
====================
The yajl_fort module
====================
The ``yajl_fort`` module defines an object-oriented Fortran interface to
the YAJL C library, which is an event-driven parser for JSON data streams.
`JSON `_ is an open standard data interchange format.
It is lightweight, flexible, easy for humans to read and write, and language
independent.
.. note::
Unlike most other JSON libraries, YAJL does not provide or impose an
in-memory data representation, but instead uses callbacks to accommodate
any in-memory representation. The same is true of ``yajl_fort``, being
only an interface to YAJL. If you want an in-memory representation (and
you most likely do), you may do so using ``yajl_fort``, but you provide
the code that defines and populates the in-memory representation using
the callbacks according to your specific requirements.
Synopsis
========
.. code-block:: fortran
use yajl_fort
Derived types
``fyajl_callbacks`` (abstract), ``fyajl_parser``, ``fyajl_status``
Functions
``fyajl_get_error``, ``fyajl_status_to_string``
Parameters
* callback return:
``FYAJL_CONTINUE_PARSING``,
``FYAJL_TERMINATE_PARSING``
* kind:
``FYAJL_INTEGER_KIND``,
``FYAJL_REAL_KIND``
* parser return:
``FYAJL_STATUS_OK``,
``FYAJL_STATUS_ERROR``,
``FYAJL_STATUS_CLIENT_CANCELED``
* option:
``FYAJL_ALLOW_COMMENTS``,
``FYAJL_ALLOW_MULTIPLE_DOCUMENTS``,
``FYAJL_ALLOW_PARTIAL_DOCUMENT``,
``FYAJL_ALLOW_TRAILING_GARBAGE``,
``FYAJL_DONT_VALIDATE_STRINGS``
Prerequisites
-------------
The ``yajl_fort`` module uses YAJL version 2.0 or later. The source code for
this library can be downloaded from https://github.com/lloyd/yajl/releases.
The library is also available as a standard binary package in all major Linux
distributions. See http://lloyd.github.io/yajl/ for additional information.
Parser callback functions
=========================
JSON overview
-------------
The JSON data language is quite simple. It is built on two basic data
structures. An *array* is an ordered list of comma-separated *values*
enclosed in brackets (``[`` and ``]``). An *object* is an unordered list of
comma-separated *name* ``:`` *value* pairs enclosed in braces (``{`` and ``}``).
A *name* is a string enclosed in double quotes, and a *value* is one of
the following: a string in double quotes, a number (integer or real), a
boolean literal (``true`` or ``false``), the literal ``null``, or an *object*
or *array*. Note how the data structures can be nested. Whitespace is
insignificant except in strings. At the outermost level, what is considered
valid JSON text varies between the several standard documents, and it comes
down to a matter of agreement between the producer and consumer of the data.
Originally it was required to be an *object* or *array*, but more recently
any JSON *value* is considered valid. The YAJL library follows the latter.
See this `blog post `_
for a discussion of the issue, and http://www.json.org for a detailed
description of the JSON syntax.
The callbacks derived type
--------------------------
The C language YAJL parser operates by calling application-defined callback
functions in response to the various events encountered while parsing the
input stream. The callback functions communicate with each other through
a common, application-defined, context data struct, and a void pointer to
that data struct is passed to each of the callbacks. In this Fortran
interface, this application-defined code/data is implemented by the abstract
derived type ``fyajl_callbacks``:
.. code-block:: fortran
type, abstract :: fyajl_callbacks
contains
procedure(cb_no_args), deferred :: start_map
procedure(cb_no_args), deferred :: end_map
procedure(cb_string), deferred :: map_key
procedure(cb_no_args), deferred :: null_value
procedure(cb_logical), deferred :: logical_value
procedure(cb_integer), deferred :: integer_value
procedure(cb_double), deferred :: double_value
procedure(cb_string), deferred :: string_value
procedure(cb_no_args), deferred :: start_array
procedure(cb_no_args), deferred :: end_array
end type fyajl_callbacks
Application code extends this type, adding the desired context data
components and providing concrete implementations of the callback functions.
The required interfaces for the deferred type bound callback functions are:
.. code-block:: fortran
integer function cb_no_args(this)
class(fyajl_callbacks) :: this
integer function cb_integer(this, value)
class(fyajl_callbacks) :: this
integer(FYAJL_INTEGER_KIND), intent(in) :: value
integer function cb_double(this, value)
class(fyajl_callbacks) :: this
real(FYAJL_REAL_KIND), intent(in) :: value
integer function cb_logical(this, value)
class(fyajl_callbacks) :: this
logical, intent(in) :: value
integer function cb_string(this, value)
class(fyajl_callbacks) :: this
character(*,kind=c_char), intent(in) :: value
The return value of each function must be either of the module parameters
``FYAJL_CONTINUE_PARSING`` or ``FYAJL_TERMINATE_PARSING``. The latter return
value will cause the parser to terminate with an error. The module kind
parameters for integer and real values, ``FYAJL_INTEGER_KIND`` and
``FYAJL_REAL_KIND``, correspond to C's ``long long`` and ``double``, and are
dictated by the YAJL library. The callbacks are invoked as follows:
:``start_map``:
called when a ``{`` is parsed, marking the start of an *object*
:``end_map``:
called when a ``}`` is parsed, marking the end of an *object*.
:``start_array``:
called when a ``[`` is parsed, marking the start of an *array*.
:``end_array``:
called when a ``]`` is parsed, marking the end of an *array*.
:``map_key``:
called when the *name* of a *name* ``:`` *value* pair is parsed,
and the parsed name string is passed to the function.
:``integer_value``:
called when an integer *value* is parsed, and the value is passed
to the function.
:``double_value``:
called when a real *value* is parsed, and the value is passed to
the function.
:``string_value``:
called when a string *value* is parsed, and the value is passed
to the function.
:``logical_value``:
called when the *value* token ``true`` or ``false`` is parsed, and
the corresponding Fortran logical value is passed to the function.
:``null_value``:
called when the *value* token ``null`` is parsed.
Parsing
=======
The derived type ``fyajl_parser`` and its type bound procedures implement the
JSON parser. First, as described in the previous section, an
application-specific extension of the abstract type ``fyajl_callbacks`` must
be defined and an instance (here ``callbacks``) of that extension initialized:
.. code-block:: fortran
type, extends(fyajl_callbacks) :: my_callbacks
! context data defined here
contains
! define the deferred type bound procedures
end type
type(my_callbacks), target :: callbacks
! initialize the context data of callbacks as needed
The parser is then initialized by passing the ``callbacks`` object to its
``init`` subroutine:
.. code-block:: fortran
type(fyajl_parser) :: parser
call parser%init(callbacks)
Note that proper finalization of the parser object occurs automatically
when the object is deallocated or otherwise ceases to exist. Finalization
of the callback object is the responsibility of the application.
Parsing is carried out incrementally via repeated calls to the ``parse``
method:
.. code-block:: fortran
call parser%parse(buffer, stat)
character(kind=c_char), intent(in) :: buffer(:)
type(fyajl_status), intent(out) :: stat
Successive chunks of the JSON text are passed in the ``buffer`` array, and
the parsing status is returned in ``stat``; see `Error handling`_.
After all the JSON text has been fed to the parser, the ``parse_complete``
method must be called to parse any internally buffered JSON text that
might remain:
.. code-block:: fortran
call parser%parse_complete(stat)
type(fyajl_status), intent(out) :: stat
This is required because the parser is stream based and it needs an explicit
end-of-input signal to force it to parse content at the end of the stream that
sometimes exists. The parsing status is returned in ``stat``; see
`Error handling`_.
The function call ``parser%bytes_consumed()`` returns the number of
characters consumed from ``buffer`` in the last call to ``parse``.
Error handling
--------------
The ``parse`` and ``parse_complete`` methods return a ``type(fyajl_status)``
status value, which equals one of the following module parameters:
``FYAJL_STATUS_OK``
No error.
``FYAJL_STATUS_ERROR``
A parsing error was encountered; use ``fyajl_get_error`` to get
information about it.
``FYAJL_STATUS_CLIENT_CANCELLED``
One of the callback procedures returned ``FYAJL_TERMINATE_PARSING``.
The comparison operators ``==`` and ``/=`` are defined for
``type(fyajl_status)`` values.
Several additional functions (not type bound) are provided for error handling.
.. code-block:: fortran
fyajl_get_error(parser, verbose, buffer)
logical, intent(in) :: verbose
character(kind=c_char), intent(in) :: buffer(:)
Returns a character string describing the the error encountered by the parser.
If ``verbose`` is true, the message will include the portion of the input
stream where the error occurred together with an arrow pointing to the specific
character. The ``buffer`` array should contain the chunk of JSON text
passed in the last call to ``parse``.
.. code-block:: fortran
fyajl_status_to_string(code)
type(fyajl_status), intent(in) :: code
Returns a character string describing the specified status value ``code``.
Parsing options
---------------
The parser supports several options provided by the YAJL library. They are
set and unset using the ``set_option`` and ``unset_option`` methods after
the parser has been initialized:
.. code-block:: fortran
call parser%set_option(option)
call parser%unset_option(option)
where ``option`` is one of the following module parameters.
The default for all is unset.
``FYAJL_ALLOW_COMMENTS``
JSON does not allow for comments. Setting this option causes the
parser to ignore javascript style comments in the input stream. This
includes single-line comments that begin with ``//`` and continue
to the end of the line. This is a very useful extention to the JSON
standard, but one that is not supported by many JSON parsers.
``FYAJL_DONT_VALIDATE_STRINGS``
By default, the parser verifies that all strings are valid UTF-8. This
option disables this check, resulting in slightly faster parsing.
``FYAJL_ALLOW_TRAILING_GARBAGE``
By default, ``parse_complete`` verifies that the entire input text has
been consumed and will return an error if it finds otherwise. Setting this
option will disable this check. This can be useful when parsing an input
stream that contains more than one JSON document. In such scenarios, the
``bytes_consumed`` method is useful for identifying the trailing
portion of the input text for subsequent handling.
``FYAJL_ALLOW_MULTIPLE_DOCUMENTS``
An instance of a parser normally expects that the input stream consists
of a single JSON document. Setting this option changes that behavior and
allows an instance to parse an input stream containing multiple documents
that are separated by whitespace.
``FYAJL_ALLOW_PARTIAL_DOCUMENT``
By default, ``parse_complete`` verifies that the top level *object*
is complete; that is, the closing ``}`` has been parsed. If it finds
otherwise it returns an error. Setting this option disables this check.
Examples
========
In addition to the simple example presented below, here are some links to
genuine uses of ``yajl_fort``:
* The :ref:`json module ` included in YAJL-Fort defines
structures for in-memory representation of arbitrary JSON data, and
procedures for populating the structures with JSON data read from a file
or string using ``yajl_fort``.
* The ``parameter_list_type`` module from the `Petaca library
`_ defines a hierarchical data structure
that is very similar to JSON, but that is much better suited to Fortran use.
A subset of JSON maps naturally to this data structure, and the
`parameter_list_json
`_
module provides procedures built on ``yajl_fort`` for populating this
structure with JSON data read from a file or string. This illustrates a
major advantage of the customized callback approach, in that the callbacks
implement the grammar of this JSON subset so that syntax errors are
detected promptly during parsing.
A JSON white space stripper
---------------------------
This simple program reads JSON text from a file, strips all insignificant
white space from it, including newlines, and writes the result to standard
output. Somewhat contrived, but it serves to illustrate how to use
``yajl_fort`` in a complete program. No in-memory representation of the JSON
data is needed in this case; it is streamed to the output as it is being
parsed. The only slightly complicated aspect, requiring some context data,
is keeping track of when the ``,`` separator needs to be written.
The source for this example is in `test/strip.f90
`_
The module ``strip_cb_type`` defines the callback structure. The callback
functions merely echo their respective token to the output. However the
``*_value`` and ``map_key`` functions must first write a ``,`` if the value
follows a value in an array list, or if the key follows a key:value pair in
an object list. The hierarchical structure of JSON means that at any moment
of the parsing there may be multiple array or object lists in the process of
being parsed. To keep track for each list of whether a comma is needed or
not, we use a stack. Here we just use a fixed length logical array ``comma``
and an integer index ``top`` that points to the top of the stack. These are
the common context data shared by the callbacks. The subroutines ``push``,
``pop``, and ``write_comma`` take care of managing the stack.
.. code-block:: fortran
module strip_cb_type
use,intrinsic :: iso_fortran_env, only: output_unit
use yajl_fort
implicit none
private
type, extends(fyajl_callbacks), public :: strip_cb
integer :: top = 1
logical :: comma(99) = .false.
contains
procedure :: start_map
procedure :: end_map
procedure :: map_key
procedure :: null_value
procedure :: logical_value
procedure :: integer_value
procedure :: double_value
procedure :: string_value
procedure :: start_array
procedure :: end_array
end type
contains
subroutine push(this)
class(strip_cb), intent(inout) :: this
this%top = this%top + 1
this%comma(this%top) = .false. ! start of new list
end subroutine
subroutine pop(this)
class(strip_cb), intent(inout) :: this
this%top = this%top - 1
end subroutine
subroutine write_comma(this, next)
class(strip_cb), intent(inout) :: this
logical, intent(in) :: next
if (this%comma(this%top)) write(output_unit,'(",")',advance='no')
this%comma(this%top) = next
end subroutine
integer function null_value(this) result(stat)
class(strip_cb) :: this
call write_comma(this, next=.true.)
write(output_unit,'("null")',advance='no')
stat = FYAJL_CONTINUE_PARSING
end function
integer function logical_value(this, value) result(stat)
class(strip_cb) :: this
logical, intent(in) :: value
call write_comma(this, next=.true.)
if (value) then
write(output_unit,'("true")',advance='no')
else
write(output_unit,'("false")',advance='no')
end if
stat = FYAJL_CONTINUE_PARSING
end function
integer function integer_value(this, value) result(stat)
class(strip_cb) :: this
integer(fyajl_integer_kind), intent(in) :: value
call write_comma(this, next=.true.)
write(output_unit,'(i0)',advance='no') value
stat = FYAJL_CONTINUE_PARSING
end function
integer function double_value(this, value) result(stat)
class(strip_cb) :: this
real(fyajl_real_kind), intent(in) :: value
call write_comma(this, next=.true.)
write(output_unit,'(g0)',advance='no') value
stat = FYAJL_CONTINUE_PARSING
end function
integer function string_value(this, value) result(stat)
class(strip_cb) :: this
character(*), intent(in) :: value
call write_comma(this, next=.true.)
write(output_unit,'(3a)',advance='no') '"', value, '"'
stat = FYAJL_CONTINUE_PARSING
end function
integer function map_key(this, value) result(stat)
class(strip_cb) :: this
character(*), intent(in) :: value
call write_comma(this, next=.false.) ! no comma for next value
write(output_unit,'(3a)',advance='no') '"', value, '":'
stat = FYAJL_CONTINUE_PARSING
end function
integer function start_map(this) result(stat)
class(strip_cb) :: this
call write_comma(this, next=.true.)
write(output_unit,'("{")',advance='no')
call push(this) ! starting new list
stat = FYAJL_CONTINUE_PARSING
end function
integer function end_map(this) result(stat)
class(strip_cb) :: this
write(output_unit,'("}")',advance='no')
call pop(this) ! finished this list
stat = FYAJL_CONTINUE_PARSING
end function
integer function start_array(this) result(stat)
class(strip_cb) :: this
call write_comma(this, next=.true.)
write(output_unit,'("[")',advance='no')
call push(this) ! starting new list
stat = FYAJL_CONTINUE_PARSING
end function
integer function end_array(this) result(stat)
class(strip_cb) :: this
write(output_unit,'("]")',advance='no')
call pop(this) ! finished this list
stat = FYAJL_CONTINUE_PARSING
end function
end module
The main program opens the file specified on the command line for unformatted
stream input, and then reads and parses buffer-sized chunks until the whole
file has been read. This is a pattern most any use of ``yajl_fort`` will
follow.
.. code-block :: fortran
program strip_json
use,intrinsic :: iso_fortran_env
use yajl_fort
use strip_cb_type
implicit none
integer :: ios, lun, last_pos, curr_pos, buflen
character(64) :: arg
character(:), allocatable :: file
character :: buffer(64) ! intentionally small buffer for testing
type(strip_cb), target :: callbacks
type(fyajl_parser), target :: parser
type(fyajl_status) :: stat
!! Get the file name from the command line
if (command_argument_count() == 1) then
call get_command_argument(1, arg)
file = trim(arg)
else
call get_command(arg)
write(error_unit,'(a)') 'usage: ' // trim(arg) // ' file'
stop
end if
!! Open the file for stream input
open(newunit=lun,file=file,action='read',access='stream')
inquire(lun,pos=last_pos)
!! Initialize the parser with our callback functions
call parser%init(callbacks)
call parser%set_option(FYAJL_ALLOW_COMMENTS)
do
!! Read the next chunk of the input file
read(lun,iostat=ios) buffer
if (ios /= 0 .and. ios /= iostat_end) then
write(error_unit,'(a,i0)') 'read error: iostat=', ios
exit
end if
!! Feed the chunk to the parser and check for errors.
inquire(lun,pos=curr_pos)
buflen = curr_pos - last_pos
last_pos = curr_pos
if (buflen > 0) then
call parser%parse(buffer(:buflen), stat)
if (stat /= FYAJL_STATUS_OK) then
write(error_unit,'(a)') &
fyajl_get_error(parser, .true., buffer(:buflen))
exit
end if
end if
!! If there are no more chunks to read, tell the parser.
if (ios == iostat_end) then
call parser%complete_parse(stat)
if (stat /= FYAJL_STATUS_OK) then
write(error_unit,'(a)') &
fyajl_get_error(parser, .false., buffer(:buflen))
end if
exit
end if
end do
close(lun)
end program