JSON Data (SQL Server)

[Origin]: https://docs.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server

JSON is a popular textual data format used for exchanging data in modern web and mobile applications. JSON is also used for storing unstructured data in log files or NoSQL databases like Microsoft Azure DocumentDB. Many REST web services return results formatted as JSON text or accept data formatted as JSON. For example, most Azure services such as Azure Search, Azure Storage, and Azure DocumentDb have REST endpoints that return or consume JSON. JSON is also the main format for exchanging data between web pages and web servers using AJAX calls.

Here’s an example of JSON text:

[{
    "name": "John",
    "skills": ["SQL", "C#", "Azure"]
}, {
    "name": "Jane",
    "surname": "Doe"
}]

SQL Server provides built-in functions and operators that let you do the following things with JSON text.

  • Parse JSON text and read or modify values.
  • Transform arrays of JSON objects into table format.
  • Run any Transact-SQL query on the converted JSON objects.
  • Format the results of Transact-SQL queries in JSON format.

    Overview of built-in JSON support

Key JSON capabilities of SQL Server

Here’s more info about the key capabilities that SQL Server provides with its built-in JSON support.

Extract values from JSON text and use them in queries

If you have JSON text that’s stored in database tables, you can use built-in functions to read or modify values in the JSON text.

  • Use the JSON_VALUE function to extract a scalar value from a JSON string.
  • Use JSON_QUERY to extract an object or an array from a JSON string.
  • Use the ISJSON function to test whether a string contains valid JSON.
  • Use the JSON_MODIFY function to change a value in a JSON string.

Example

In the following example, the query uses both relational and JSON data (stored in the jsonCol column) from a table:

SELECT Name,Surname,
 JSON_VALUE(jsonCol,'$.info.address.PostCode') AS PostCode,
 JSON_VALUE(jsonCol,'$.info.address."Address Line 1"')+' '
  +JSON_VALUE(jsonCol,'$.info.address."Address Line 2"') AS Address,
 JSON_QUERY(jsonCol,'$.info.skills') AS Skills
FROM PeopleCollection
WHERE ISJSON(jsonCol)>0
 AND JSON_VALUE(jsonCol,'$.info.address.Town')='Belgrade'
 AND Status='Active'
ORDER BY JSON_VALUE(jsonCol,'$.info.address.PostCode')

Applications and tools see no difference between the values taken from scalar table columns and the values taken from JSON columns. You can use values from JSON text in any part of a Transact-SQL query (including WHERE, ORDER BY, or GROUP BY clauses, window aggregates, and so on). JSON functions use JavaScript-like syntax for referencing values inside JSON text.

For more info, see Validate, Query, and Change JSON Data with Built-in Functions (SQL Server), JSON_VALUE (Transact-SQL), and JSON_QUERY (Transact-SQL).

Change JSON values

If you have to modify parts of JSON text, you can use the JSON_MODIFY function to update the value of a property in a JSON string and return the updated JSON string. The following example updates the value of a property in a variable that contains JSON.

DECLARE @jsonInfo NVARCHAR(MAX)

SET @jsonInfo=JSON_MODIFY(@jsonInfo,'$.info.address[0].town','London') 

Convert JSON collections to a rowset

You don’t need a custom query language to query JSON in SQL Server. To query JSON data, you can use standard T-SQL. If you have to create a query or report on JSON data, you can easily convert JSON data to rows and columns by calling the OPENJSON rowset function. For more info, see Convert JSON Data to Rows and Columns with OPENJSON (SQL Server).

The following example calls OPENJSON and transforms the array of objects stored in the @json variable to a rowset that can be queried with a standard SQL SELECT statement:

DECLARE @json NVARCHAR(MAX)
SET @json =  
N'[  
       { "id" : 2,"info": { "name": "John", "surname": "Smith" }, "age": 25 },  
       { "id" : 5,"info": { "name": "Jane", "surname": "Smith" }, "dob": "2005-11-04T12:00:00" }  
 ]'  

SELECT *  
FROM OPENJSON(@json)  
  WITH (id int 'strict $.id',  
        firstName nvarchar(50) '$.info.name', lastName nvarchar(50) '$.info.surname',  
        age int, dateOfBirth datetime2 '$.dob')  

Results

id firstName lastName age dateOfBirth
2 John Smith 25
5 Jane Smith 2005-11-04T12:00:00

OPENJSON transforms the array of JSON objects into a table in which each object is represented as one row, and key/value pairs are returned as cells. The output observes the following rules.

  • OPENJSON converts JSON values to the types specified in the WITH clause.
  • OPENJSON can handle both flat key/value pairs and nested, hierarchically organized objects.
  • You don’t have to return all the fields contained in the JSON text.
  • OPENJSON returns NULL values if JSON values don’t exist.
  • You can optionally specify a path after the type specification to reference a nested property or to reference a property by a different name.
  • The optional strict prefix in the path specifies that values for the specified properties must exist in the JSON text.

For more info, see Convert JSON Data to Rows and Columns with OPENJSON (SQL Server) and OPENJSON (Transact-SQL).

Convert SQL Server data to JSON or export JSON

Format SQL Server data or the results of SQL queries as JSON by adding the FOR JSON clause to a SELECT statement. Use FOR JSON to delegate the formatting of JSON output from your client applications to SQL Server. For more info, see Format Query Results as JSON with FOR JSON (SQL Server).

The following example uses PATH mode with the FOR JSON clause.

SELECT id, firstName AS "info.name", lastName AS "info.surname", age, dateOfBirth as dob  
FROM People  
FOR JSON PATH  

The FOR JSON clause formats SQL results as JSON text that can be provided to any app that understands JSON. The PATH option uses dot-separated aliases in the SELECT clause to nest objects in the query results.

Results

[{
    "id": 2,
    "info": {
        "name": "John",
        "surname": "Smith"
    },
    "age": 25
}, {
    "id": 5,
    "info": {
        "name": "Jane",
        "surname": "Smith"
    },
    "dob": "2005-11-04T12:00:00"
}] 

For more info, see Format Query Results as JSON with FOR JSON (SQL Server) and FOR Clause (Transact-SQL).

Combine relational and JSON data

SQL Server provides a hybrid model for storing and processing both relational and JSON data using standard Transact-SQL language. You can organize collections of your JSON documents in tables, establish relationships between them, combine strongly-typed scalar columns stored in tables with flexible key/value pairs stored in JSON columns, and query both scalar and JSON values in one or more tables using full Transact-SQL.

JSON text is typically stored in varchar or nvarchar columns and is indexed as plain text. Any SQL Server feature or component that supports text supports JSON, so there are almost no constraints on interaction between JSON and other SQL Server features. You can store JSON in In-memory or Temporal tables, you can apply Row-Level Security predicates on JSON text, and so on.

If you have pure JSON workloads where you want to use a query language that’s customized for the processing of JSON documents, consider Microsoft Azure DocumentDB.

Here are some use cases that show how you can use the built-in JSON support in SQL Server.

Return data from a SQL Server table formatted as JSON

If you have a web service that takes data from the database layer and returns it in JSON format, or JavaScript frameworks or libraries that accept data formatted as JSON, you can format JSON output directly in a SQL query. Instead of writing code or including a library to convert tabular query results and then serialize objects to JSON format, you can use FOR JSON to delegate the JSON formatting to SQL Server.

For example, you might want to generate JSON output that’s compliant with the OData specification. The web service expects a request and response in the following format.

  • Request: /Northwind/Northwind.svc/Products(1)?$select=ProductID,ProductName
  • Response: {"@odata.context":"http://services.odata.org/V4/Northwind/Northwind.svc/$metadata#Products(ProductID,ProductName)/$entity","ProductID":1,"ProductName":"Chai"}

    This OData URL represents a request for the ProductID and ProductName columns for the product with id 1. You can use FOR JSON to format the output as expected in SQL Server.

SELECT 'http://services.odata.org/V4/Northwind/Northwind.svc/$metadata#Products(ProductID,ProductName)/$entity'
 AS '@odata.context',   
 ProductID, Name as ProductName   
FROM Production.Product  
WHERE ProductID = 1  
FOR JSON AUTO  

The output of this query is JSON text that’s fully compliant with OData spec. Formatting and escaping are handled by SQL Server. SQL Server can also format query results in any format such as OData JSON or GeoJSON – for more info, see Returning spatial data in GeoJSON format.

Analyze JSON data with SQL queries

If you have to filter or aggregate JSON data for reporting purposes, you can use OPENJSON to transform JSON to relational format. Then use standard Transact-SQL and built-in functions to prepare the reports.

SELECT Tab.Id, SalesOrderJsonData.Customer, SalesOrderJsonData.Date  
FROM   SalesOrderRecord AS Tab  
          CROSS APPLY  
     OPENJSON (Tab.json, N'$.Orders.OrdersArray')  
           WITH (  
              Number   varchar(200) N'$.Order.Number',   
              Date     datetime     N'$.Order.Date',  
              Customer varchar(200) N'$.AccountNumber',   
              Quantity int          N'$.Item.Quantity'  
           )  
  AS SalesOrderJsonData  
WHERE JSON_VALUE(Tab.json, '$.Status') = N'Closed'  
ORDER BY JSON_VALUE(Tab.json, '$.Group'), Tab.DateModified  

Both standard table columns and values from JSON text can be used in the same query. You can add indexes on the JSON_VALUE(Tab.json, '$.Status') expression to improve performance of query. For more info, see Index JSON data.

Import JSON data into SQL Server tables

If you have to load JSON data from an external service into SQL Server, you can use OPENJSON to import the data into SQL Server instead of parsing the data in the application layer.

DECLARE @jsonVariable NVARCHAR(MAX)

SET @jsonVariable = N'[  
        {  
          "Order": {  
            "Number":"SO43659",  
            "Date":"2011-05-31T00:00:00"  
          },  
          "AccountNumber":"AW29825",  
          "Item": {  
            "Price":2024.9940,  
            "Quantity":1  
          }  
        },  
        {  
          "Order": {  
            "Number":"SO43661",  
            "Date":"2011-06-01T00:00:00"  
          },  
          "AccountNumber":"AW73565",  
          "Item": {  
            "Price":2024.9940,  
            "Quantity":3  
          }  
       }  
  ]'

INSERT INTO SalesReport  
SELECT SalesOrderJsonData.*  
FROM OPENJSON (@jsonVariable, N'$.Orders.OrdersArray')  
           WITH (  
              Number   varchar(200) N'$.Order.Number',   
              Date     datetime     N'$.Order.Date',  
              Customer varchar(200) N'$.AccountNumber',   
              Quantity int          N'$.Item.Quantity'  
           )  
  AS SalesOrderJsonData;

The content of the JSON variable can be provided by an external REST service, sent as a parameter from a client-side JavaScript framework, or loaded from external files. You can easily insert, update or merge results from JSON text into a SQL Server table. For more info about this scenario, see the following blog posts.

Load JSON files into SQL Server

Information stored in files can be formatted as standard JSON or Line-Delimited JSON. SQL Server can import the contents of JSON files, parse it using the OPENJSON or JSON_VALUE functions, and load it into tables.

  • If your JSON documents are stored in local files, on shared network drives, or in Azure File Storage locations that can be accessed by SQL Server, you can use bulk import to load your JSON data into SQL Server. For more info about this scenario, see Importing JSON files into SQL Server using OPENROWSET (BULK).
  • If your line-delimited JSON files are stored in Azure Blob Storage or the Hadoop file system, you can use Polybase to load JSON text, parse it in Transact-SQL code, and load it into tables.

Test drive built-in JSON support

Test drive built-in JSON support with the AdventureWorks sample database. To get the AdventureWorks sample database, download at least the database file and the samples and scripts file from here. After you restore the sample database to an instance of SQL Server 2016, unzip the samples file and open the “JSON Sample Queries procedures views and indexes.sql” file from the JSON folder. Run the scripts in this file to reformat some existing data as JSON data, run sample queries and reports over the JSON data, index the JSON data, and import and export JSON.

Here’s what you can do with the scripts included in the file.

  1. Denormalize the existing schema to create columns of JSON data.
    1. Store information from SalesReasons, SalesOrderDetails, SalesPerson, Customer, and other tables that contain information related to sales order into JSON columns in the SalesOrder_json table.
    2. Store information from EmailAddresses/PersonPhone tables into the Person_json table as arrays of JSON objects.
  2. Create procedures and views that query JSON data.
  3. Index JSON data – create indexes on JSON properties and full-text indexes.
  4. Import and export JSON – create and run procedures that export the content of the Person and the SalesOrder tables as JSON results, and import and update the Person and the SalesOrder tables using JSON input.
  5. Run query examples – run some queries that call the stored procedures and views created in steps 2 and 4.
  6. Clean up scripts – don’t run this part if you want to keep the stored procedures and views created in steps 2 and 4.

Learn more about built-in JSON support

Topics in this section

Format Query Results as JSON with FOR JSON (SQL Server)
Use the FOR JSON clause to delegate the formatting of JSON output from your client applications to SQL Server.

Convert JSON Data to Rows and Columns with OPENJSON (SQL Server)
Use OPENJSON to import JSON data into SQL Server, or to convert JSON data into relational format for an app or service that can’t currently consume JSON directly, such as SQL Server Integration Services.

Validate, Query, and Change JSON Data with Built-in Functions (SQL Server)
Use these built-in functions to validate JSON text and to extract a scalar value, an object, or an array.

JSON Path Expressions (SQL Server)
Use a path expression to specify the JSON text that you want to use.

Index JSON data
Use computed columns to create collation-aware indexes over properties in JSON documents.

Solve common issues with JSON in SQL Server
Find answers to some common questions about the built-in JSON support in SQL Server.

Microsoft blog posts

Reference topics

Consuming JSON Strings in SQL Server

[Origin]: https://www.simple-talk.com/sql/t-sql-programming/consuming-json-strings-in-sql-server/

Consuming JSON Strings in SQL Server

It has always seemed strange to Phil that SQL Server has such complete support for XML, yet is completely devoid of any support for JSON. In the end, he was forced, by a website project, into doing something about it. The result is this article, an iconoclastic romp around the representation of hierarchical structures, and some code to get you started.

Updated 2nd May 2013
Updated 8th  May 2014

“The best thing about XML is what it shares with JSON, being human readable. That turns out to be important, not because people should be reading it, because we shouldn’t, but because it avoids interoperability problems caused by fussy binary encoding issues.

 

Beyond that, there is not much to like. It is not very good as a data format. And it is not very good as a document format. If it were a good document format, then wikis would use it.”

Doug Crockford March 2010

This article describes a TSQL JSON parser and its evil twin, a JSON outputter, and provides the source. It is also designed to illustrate a number of string manipulation techniques in TSQL. With it you can do things like this to extract the data from a JSON document:

And get:

1176-JSON1.jpg

…or you can do the round trip:

To get:

Background

TSQL isn’t really designed for doing complex string parsing, particularly where strings represent nested data structures such as XML, JSON, YAML, or XHTML.

You can do it but it is not a pretty sight; but why would you ever want to do it anyway? Surely, if anything was meant for the ‘application layer’ in C# or VB.net, then this is it. ‘Oh yes’, will chime in the application thought police, ‘this is far better done in the application or with a CLR.’ Not necessarily.

Sometimes, you just need to do something inappropriate in TSQL.

There are a whole lot of reasons why this might happen to you. It could be that your DBA doesn’t allow a CLR, for example, or you lack the necessary skills with procedural code. Sometimes, there isn’t any application, or you want to run code unobtrusively across databases or servers.

I needed to interpret or ‘shred’ JSON data. JSON is one of the most popular lightweight markup languages, and is probably the best choice for transfer of object data from a web page. It is, in fact, executable JavaScript that is very quick to code in the browser in order to dump the contents of a JavaScript object, and is lightning-fast to populate the browser object from the database since you are passing it executable code (you need to parse it first for security reasons – passing executable code around is potentially very risky). AJAX can use JSON rather than XML so you have an opportunity to have a much simpler route for data between database and browser, with less opportunity for error.

The conventional way of dealing with data like this is to let a separate business layer parse a JSON ‘document’ into some tree structure and then update the database by making a series of calls to it. This is fine, but can get more complicated if you need to ensure that the updates to the database are wrapped into one transaction so that if anything goes wrong, then the whole operation can be rolled back. This is why a CLR or TSQL approach has advantages.

“Sometimes, you just
need to do something
inappropriate in TSQL…”

I wrote the parser as a prototype because it was the quickest way to determine what was involved in the process, so I could then re-write something as a CLR in a .NET language.  It takes a JSON string and produces a result in the form of an adjacency list representation of that hierarchy. In the end, the code did what I wanted with adequate performance (It reads a json file of  540 name\value pairs and creates the SQL  hierarchy table  in 4 seconds) so I didn’t bother with the added complexity of maintaining a CLR routine. In order to test more thoroughly what I’d done, I wrote a JSON generator that used the same Adjacency list, so you can now import and export data via JSON!

These markup languages such as JSON and XML all represent object data as hierarchies. Although it looks very different to the entity-relational model, it isn’t. It is rather more a different perspective on the same model. The first trick is to represent it as a Adjacency list hierarchy in a table, and then use the contents of this table to update the database. This Adjacency list is really the Database equivalent of any of the nested data structures that are used for the interchange of serialized information with the application, and can be used to create XML, OSX Property lists, Python nested structures or YAML as easily as JSON.

Adjacency list tables have the same structure whatever the data in them. This means that you can define a single Table-Valued  Type and pass data structures around between stored procedures. However, they are best held at arms-length from the data, since they are not relational tables, but something more like the dreaded EAV (Entity-Attribute-Value) tables. Converting the data from its Hierarchical table form will be different for each application, but is easy with a CTE. You can, alternatively, convert the hierarchical table into XML and interrogate that with XQuery.

JSON format.

JSON is designed to be as lightweight as possible and so it has only two structures. The first, delimited by curly brackets, is a collection of name/value pairs, separated by commas. The name is followed by a colon. This structure is generally implemented in the application-level as an object, record, struct, dictionary, hash table, keyed list, or associative array. The other structure is an ordered list of values, separated by commas. This is usually manifested as an array, vector, list, or sequence.

“Using recursion in TSQL is
like Sumo Wrestlers doing Ballet.
It is possible but not pretty.”

The first snag for TSQL is that the curly or square brackets are not ‘escaped’ within a string, so that there is no way of shredding a JSON ‘document’ simply. It is difficult to  differentiate a bracket used as the delimiter of an array or structure, and one that is within a string. Also, interpreting a string into a SQL String isn’t entirely straightforward since hex codes can be embedded anywhere to represent complex Unicode characters, and all the old C-style escaped characters are used. The second complication is that, unlike YAML, the datatypes of values can’t be explicitly declared. You have to sniff them out from applying the rules from the JSON Specification.

Obviously, structures can be embedded in structures, so recursion is a natural way of making life easy. Using recursion in TSQL is like Sumo Wrestlers doing Ballet. It is possible but not pretty.

The implementation

Although the code for the JSON Parser/Shredder will run in SQL Server 2005, and even in SQL Server 2000 (with some modifications required), I couldn’t resist using a TVP (Table Valued Parameter) to pass a hierarchical table to the function, ToJSON, that produces a JSON ‘document’. Writing a SQL Server 2005 version should not be too hard.

First the function replaces all strings with tokens of the form @Stringxx, where xx is the foreign key of the table variable where the strings are held. This takes them, and their potentially difficult embedded brackets, out of the way. Names are  always strings in JSON as well as  string values.

Then, the routine iteratively finds the next structure that has no structure contained within it, (and is, by definition the leaf structure), and parses it, replacing it with an object token of the form ‘@Objectxxx‘, or ‘@arrayxxx‘, where xxx is the object id assigned to it. The values, or name/value pairs are retrieved from the string table and stored in the hierarchy table. Gradually, the JSON document is eaten until there is just a single root object left.

The JSON outputter is a great deal simpler, since one can be surer of the input, but essentially it does the reverse process, working from the root to the leaves. The only complication is working out the indent of the formatted output string.

In the implementation, you’ll see a fairly heavy use of PATINDEX. This uses a poor man’s RegEx, a starving man’s RegEx. However, it is all we have, and can be pressed into service by chopping the string it is searching (if only it had an optional third parameter like CHARINDEX that specified the index of the start position of the search!). The STUFF function is also a godsend for this sort of string-manipulation work.

So once we have a hierarchy, we can pass it to a stored procedure. As the output is an adjacency list, it should be easy to access the data. You might find it handy to create a table type if you are using SQL Server 2008. Here is what I use. (Note that if you drop a Table Valued Parameter type, you will have to drop any dependent functions or procedures first, and re-create them afterwards).

ToJSON. A function that creates JSON Documents

Firstly, we need a simple utility function:

And now, the function that takes a JSON Hierarchy table and converts it to a JSON string.

ToXML. A function that creates XML

The function that converts a hierarchy  table to XML gives us a JSON to XML converter. It is surprisingly similar to the previous function

This provides you the means of converting a JSON string into XML

This gives the result…

Wrap-up

The so-called ‘impedence-mismatch’ between applications and databases is, I reckon, an illusion. The object-oriented nested data-structures that we receive from applications are, if the developer has understood the data correctly,  merely a perspective from a particular entity of the relationships it is involved with. Whereas it is easy to shred XML documents to get the data from it to update the database, it has been trickier with other formats such as JSON. By using techniques like this, it should be possible to liberate the application, or website, programmer from having to do the mapping from the object model to the relational, and spraying the database with ad-hoc TSQL  that uses the base tables or updateable views.  If the database can be provided with the JSON, or the Table-Valued parameter, then there is a better chance of  maintaining full transactional integrity for the more complex updates.

The database developer already has the tools to do the work with XML, but why not the simpler, and more practical JSON? I hope these two routines get you started with experimenting with this.

Interesting JSON-related articles and sites

Since writing this article, Phil has also developed a CSV parser and output and an XML parser (Producing JSON Documents from SQL Server queries via TSQL)

How can I send the “&” (ampersand) character via AJAX?

[Origin]: http://stackoverflow.com/questions/11294107/how-can-i-send-the-ampersand-character-via-ajax

I want to send a few variables and a string with POST method from JavaScript. I get the string from the database, and then send it to a PHP page. I am using XMLHttpRequest object. The problem is that the string contains the character “&” few times, and $_POST array in PHP sees it like multiple keys. I tried replacing the “&” with “\&” with replace() function, but doesn’t seem to do anything. Can anyone help?

The javascript code and the string looks like this:

var wysiwyg = dijit.byId("wysiwyg").get("value");
var wysiwyg_clean = wysiwyg.replace('&','\&');

var poststr = "act=save";
poststr+="&titlu="+frm.value.titlu;
poststr+="&sectiune="+frm.value.sectiune;
poststr+="&wysiwyg="+wysiwyg_clean;
poststr+="&id_text="+frm.value.id_text;     
xmlhttp.open("POST","lista_ajax.php",true);
xmlhttp.setRequestHeader("Content-type","application/x-www-form-urlencoded");
xmlhttp.send(poststr);

String is:

<span class="style2">&amp;quot;Busola&amp;quot;</span>
shareedit

You can use encodeURIComponent().

It will escape all the characters that cannot occur verbatim in URLs:

var wysiwyg_clean = encodeURIComponent(wysiwyg);

In this example, the ampersand character & will be replaced by the escape sequence %26, which is valid in URLs.

shareedit

Understanding JSON Schema

http://spacetelescope.github.io/understanding-json-schema/index.html

JSON Schema is a powerful tool for validating the structure of JSON data. However, learning to use it by reading its specification is like learning to drive a car by looking at its blueprints. You don’t need to know how an internal combustion engine fits together if all you want to do is pick up the groceries. This book, therefore, aims to be the friendly driving instructor for JSON Schema. It’s for those that want to write it and understand it, but maybe aren’t interested in building their own car—er, writing their own JSON Schema validator—just yet.

octopus

Note

This book describes JSON Schema draft 4. Earlier versions of JSON Schema are not completely compatible with the format described here.

Where to begin?

  • This book uses some novel conventions for showing schema examples and relating JSON Schema to your programming language of choice.
  • If you’re not sure what a schema is, check out What is a schema?.
  • The basics chapter should be enough to get you started with understanding the core JSON Schema Reference.
  • When you start developing large schemas with many nested and repeated sections, check out Structuring a complex schema.
  • json-schema.org has a number of resources, including the official specification and tools for working with JSON Schema from various programming languages.
  • jsonschema.net is an online application run your own JSON schemas against example documents. If you want to try things out without installing any software, it’s a very handy resource.

Contents:

There is also a print version of this document.

Back to top

© Copyright 2013-2015, Michael Droettboom, Space Telescope Science Institute.
Last updated on Sep 11, 2015.
Created using Sphinx 1.3.1.

JSON Validation with JSON Schema

http://davidwalsh.name/json-validation

It didn’t take long for JSON to become the hottest thing since Pam Anderson slowly bounced her way down the BayWatch beaches. And why shouldn’t it be? JSON is easy to understand visually, easy to parse on both the client and server sides, and is supported in just about every language except aborigine. There is however one problem I see with the way JSON is used by developers today: lack of validation. Most developers assume the JSON provide is not only error-free also in the proper format. Bad assumption. Let me show you how Kris Zyp’s JSON Schema can help you validate JSON on both the client and server sides.

Advertisement

What is JSON Schema?

JSON Schema is a standard (currently in draft) which provides a coherent schema by which to validate a JSON “item” against. Properties within the schema are defined and with another object containing their expected type. For example:

"myObj" : {
	"type" : "array",
	"properties" : {
		"id": { "type": "number" },
		"username": { "type" : "string" }
	}
}

Besides providing the required type, other properties can be defined, including:

  • items: This should be a schema or an array of schemas. When this is an object/schema and the instance value is an array, all the items in the array must conform to this schema.
  • optional: Notes if the property should be considered optional
  • requires: This indicates that if this property is present in the containing instance object, the property given by requires attribute must also be present in the containing instance object.
  • maxItems: Defines the maximum number of items in the collection

Numerous other properties are available, all of which may be found at:http://tools.ietf.org/html/draft-zyp-json-schema-03

Defining a Simple JSON Schema

Let’s say that our application requires data in the following format:

{
	users: [
		{ id: 1, username: "davidwalsh", numPosts: 404, realName: "David Walsh" },
		{ id: 2, username: "russianprince", numPosts: 12, realName: "Andrei Arshavin" }
	]
}

Right away we can see:

  • The object has a users property
  • The users property is an array
  • The users array contains objects
  • Each object has an id (number), username (string), numPosts (number), and realName (string)

With this structure in mind, we can create a simple schema to validate our expected format:

{
	"type" : "object",
	"properties" : {
		"users" : {
			"type" : "array", // remember that arrays are objects
			"items" : { // "items" represents the items within the "users" array
				"type" : "object",
				"properties" : {
					"id": { "type": "number" },
					"username": { "type" : "string" },
					"numPosts": { "type" : "number" },
					"realName": { "type" : "string", optional: true }
				}
			}
		}
	}
}

dojox.json.schema and JSON Schema – Client Side

A JSON Schema validation routine is available with dojox.json.schema. The validatemethod accepts two arguments: your JSON to validate and the schema. Let’s load the schema we created above, along with the sample JSON we created, and validate it:

// Require the json scheme module
dojo.require("dojox.json.schema");

// When resources are ready
dojo.ready(function() {

	// Load the schema
	dojo.xhrGet({
		url: 'schema.json',
		handleAs: 'json',
		load: function(schema) {

			// Now load the JSON
			dojo.xhrGet({
				url: 'users.json',
				handleAs: 'json',
				load: function(users) {

					// Now validate it!
					var result = dojox.json.schema.validate(users,schema);

					// Show the result
					console.log(result);
				}
			});
		}
	});	
});

A true valid property signals that the JSON is valid. If the result fails validation, validwill be false and the errors property will contain an array of error messages detailing why the given property did not pass validation. Here’s a sample return result with errors:

{
	errors: [
		{
			message: "is missing and not optional",
			property: "users"
		}
	]
	valid: false
}

How you handle invalid data is up to you; moving forward with invalid data could present a security risk for both your organization and the user.

CommonJS-Utils and JSON Schema – Server Side

Kris also provides a server side JSON Schema validation routine within his CommonJS Utilsproject on GitHub. I’ve installed this project using NPM for NodeJS:

npm install commonjs-utils

Within this package is a json-schema resource. The following snippet requires that resources, reads in the schema and data JSON files, and validates the data JSON against the schema:

// Require Sys and FileSystem
var sys = require('sys'), fs = require('fs');

// Require package
var validate = require('commonjs-utils/json-schema').validate;

// Load a schema by which to validate
fs.readFile('schema.json',function(err,data) {
	if(err) throw err;
	var schema = data;
	// Load data file
	fs.readFile('./users.json',function(err,data) {
		if(err) throw err;
		// Parse as JSON
		var posts = JSON.parse(data);
		// Validate
		var validation = validate(posts, schema);
		// Echo to command line
		sys.puts('The result of the validation:  ',validation.valid);
	});
});

To run this via the command line:

node server-validate.js

The server side uses the exact same schema and data as the client side, so your web application can be covered on both fronts.

Closing Thoughts on JSON Schema

JSON Schema is still a draft but I think Kris has done an outstanding job in creating the draft and coding server and client side validators. JSON validation is often overlooked and the data is wrongly assumed as correct. The resources for data validation are available — it’s up to you to use them!

An Introduction to JSON Schema

http://crypt.codemancers.com/posts/2014-02-11-An-introduction-to-json-schema/

JSON, or JavaScript Object Notation has become the most widely used serialization and transport mechanism for information across various web-services. From it’s initial conception, the format garnered swift and wide appreciation for being really simple and non-verbose.

Lets say you want to consume the following JSON object via an API:

{
  id: 3232,
  name: "Kashyap",
  email: "kashyap@example.com"
  contact: {
    id: 123,
    address1: "Shire",
    zipcode: LSQ424
  }
}

Now, let’s assume that you want to ensure that before consuming this data, email andcontact.zipcode must be present in the JSON. If that data is not present, you shouldn’t be using it. The typical way is to check for presence of those fields but this whack-a-mole quickly gets tiresome.

Similarly, lets say you are an API provider and you want to let your API users know the basic structure to which data is going to conform to, so that your API users can automatically test validity of data.

If you ever had to deal with above two problems, you should be using JSON schemas.

What’s a Schema?

A schema is defined in Wikipedia as a way to define the structure, content, and to some extent, the semantics of XML documents; which probably is the simplest way one could explain it. For every element — or node — in a document, a rule is given to which it needs to conform. Having constraints defined at this level will make it unnecessary to handle the edge cases in the application logic. This is a pretty powerful tool. This was missing from the original JSON specification but efforts were made to design one later on.

Why do we need a Schema?

If you’re familiar with HTML, the doctype declaration on the first line is a schema declaration. (Specific to HTML 4 and below.)

HTML 4 Transitional DOCTYPE declaration:

&lt;!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"&gt;

This line declares that the rest of the document conforms to the directives specified at the url http://www.w3.org/TR/html4/loose.dtd. That means, if you declare the document as strict, then the usage of any new elements like <sp></sp> will cause the page to display nothing. In other words, if you make a typo or forget to close a tag somewhere, then the page will not get rendered and your users will end up with a blank page.

At first glance, this looks like a pain — and it is, actually. That’s part of the reason why this was abandoned altogether in the newer version of HTML. However, HTML is not really a good use case for a schema. Having a well-defined schema upfront helps in validating user input at the language/protocol level than at the application’s implementation level. Let’s see how defining a schema makes it easy to handle user input errors.

JSON Schema

The JSON Schema specification is divided into three parts:

  1. JSON Schema Core: The JSON Schema Core specification is where the terminology for a schema is defined. Technically, this is simply the JSON spec with the only addition being definition of a new media type of application/schema+json. Oh! a more important contribution of this document is the$schema keyword which is used to identify the version of the schema and the location of a resource that defines a schema. This is analogous to the DOCTYPE declaration in the HTML 4.01 and other older HTML versions.The versions of the schema separate changes in the keywords and the general structure of a schema document. The resource of a schema is usually a webpage which provides a JSON object that defines a specification. Confused? Go open up the url http://www.w3.org/TR/html4/loose.dtdwhich I’m linking to here in a browser and go through the contents. This is the specification of HTML 4.01 Loose API. Tags like ENTITY, ELEMENT, ATTLIST are used to define the accepted elements, entities and attributes for a valid HTML document.Similarly, the JSON Schema Core resource URL (downloads the schema document) defines a superset of constraints.
  2. JSON Schema Validation: The JSON Schema Validation specification is the document that defines the valid ways to define validation constraints. This document also defines a set of keywords that can be used to specify validations for a JSON API. For example, keywords like multipleOf, maxLength,minLength etc. are defined in this specification. In the examples that follow, we will be using some of these keywords.
  3. JSON Hyper-Schema: This is another extension of the JSON Schema spec, where-in, the hyperlink and hypermedia-related keywords are defined. For example, consider the case of a globally available avatar (or, Gravatar). Every Gravatar is composed of three different components:
    1. A Picture ID,
    2. A Link to the picture,
    3. Details of the User (name and email ID).

    When we query the API provided by Gravatar, we get a reponse typically having this data encoded as JSON. This JSON response will not download the entire image but will have a link to the image. Let’s look at a JSON representation of a fake profile I’ve setup on Gravatar:

    {
      "entry":[{
        "id":"61443191",
        "hash":"756b5a91c931f6177e2ca3f3687298db",
        "requestHash":"756b5a91c931f6177e2ca3f3687298db",
        "profileUrl":"http:\/\/gravatar.com\/jsonguerilla",
        "preferredUsername":"jsonguerilla",
        "thumbnailUrl":"http:\/\/1.gravatar.com\/avatar\/756b5a91c931f6177e2ca3f3687298db",
        "photos":[{
          "value":"http:\/\/1.gravatar.com\/avatar\/756b5a91c931f6177e2ca3f3687298db",
          "type":"thumbnail"
        }],
        "name":{
          "givenName":"JSON",
          "familyName":"Schema",
          "formatted":"JSON Schema Blogpost"
        },
        "displayName":"jsonguerilla",
        "urls":[]
      }]
    }
    

    In this JSON response, the images are represented by hyperlinks but they are encoded as strings. Although this example is for a JSON object returned from a server, this is how traditional APIs handle input as well. This is due to the fact that JSON natively does not provide a way to handle hyperlinks; they are only Strings.

    JSON hyperschema attempts to specify a way to have a more semantic way of representing hyperlinks and images. It does this by defining keywords (as JSON properties) such as links, rel,href. Note that this specification does not try to re-define these words in general (as they are defined in HTTP protocol already) but it tries to normalize the way those keywords are used in JSON.

Drafts

The schema is still under development and the progress can be tracked by comparing the versions known as “drafts”. Currently, the schema is in the 4th version. The validation keywords can be dropped or added between versions. This article — and many more over the interwebs — refer to the 4th version of the draft.

Usage

Let’s build a basic JSON API that accepts the following data with some constraints:

  1. A post ID. This is a number and is a required parameter.
  2. Some free-form text with an attribute of body. This is a required parameter.
  3. A list of tags with an attribute of ‘tags’. Our paranoid API cannot accept more than 6 tags though. This is a required parameter.
  4. An optional list of hyperlinks with an attribute of ‘references’

Let’s face it, almost every app you might’ve ever written must’ve had some or the other constraints. We end up repeating the same verification logic everytime. Let’s see how we can simplify that.

We will be using Sinatra for building the API. This is the basic structure of our app.rb:

require 'sinatra'
require 'sinatra/json'
require 'json-schema'

post '/' do
end

The Gemfile:

gem 'sinatra'
gem 'sinatra-contrib'
gem 'json-schema'

We will be using the JSON-Schema gem for the app. Let’s look at the schema that we will define in aschema.json file:

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "required": [ "id", "body", "tags" ],
  "properties": {
    "id": {
      "type": "integer"
    },

    "body": {
      "type": "string"
    },

    "tags": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "maxItems": 6
    },

    "references": {
      "type": "array",
      "items": {
        "type": "string",
        "format": "uri"
      }
    }
  }
}
  1. The properties attribute holds the main chunk of the schema definition. This is the attribute under which each of the individual API attribute is explained in the form of a schema of it’s own.
  2. The required attribute takes in a list of strings that mention which of the API parameters are required. If any of these parameters is missing from the JSON input to our app, an error will be logged and the input won’t get validated.
  3. The type keyword specifies the schema type for that particular block. So, at the first level, we say it’s an object (analogous to a Ruby Hash). For the body, tags and references, the types arestring, array and array respectively.
  4. In case an API parameter can accept an array, the items inside that array can be explained by a schema definition of their own. This is done by using an items attribute and defining how each of the item in the array should be validated.
  5. The format attribute is a built-in format for validation in the JSON Schema specification. This alleviates the pain of adding regex for validating common items like uri, ip4, ip6, email, date-time and hostname. That’s right, no more copy-pasting URI validation regexes from StackOverflow.
  6. The $schema attribute is a non-mandatory attribute that specifies the type of the schema being used. For our example, we will be using the draft#4 of the JSON Schema spec.

To use this schema in our app, we will create a helper method that uses validates the input with the schema we just defined. The json-schema gem provides three methods for validation — avalidate method that returns either true or false, a validate! that raises an exception when validation of an attribute fails and a fully_validate method that builds up an array of errors similar to what Rails’ ActiveRecord#save method provides.

We will be using the JSON::Validator.fully_validate method in our app and return a nicely formatted JSON response to the user if the validation fails.

helpers do
  def validate(json_string_or_hash)
    JSON::Validator.fully_validate('schema.json', json_string_or_hash)
  end
end

Now, we can use this helper inside routes to check the validity of the input JSON like so:

post '/' do
  input = JSON.load(request.body.read)
  errors = validate(input)

  if errors.empty?
    json({ message: "The blog post has been saved!" })
  else
    status 400
    json({ errors: a })
  end
end

If the input is valid, the errors object will be empty. Otherwise, it will hold a list of errors. This object will be returned as a JSON response with the appropriate HTTP status code. For example, if we run this app and send in a request with a missing id parameter, the response will be something similar to the following:

[
  "The property '#/' did not contain a required property of 'id' in
  schema schema.json#"
]

Let’s say if we send in a request with id having a string parameter. The errors object will hold the following:

[
  "The property '#/' did not contain a required property of 'id' in
  schema schema.json#"
]

Last example. Let’s try sending a references parameter with a malformed URI. We will send the following request:

{
  "id": 1,
  "body": "Hello, Universe",
  "tags": ["start", "first"],
  "references": [ "data:image/svg+xml;base64 C==" ]
}

(This input is in the file not_working_wrong_uri.txt)

curl \
  -d @not_working_wrong_uri.txt
  -H 'Content-Type: application/json' \
  http://localhost:4567

The output of this would be:

[
  "The property '#/references/0' must be a valid URI in schema
  schema.json#"
]

Thus, with a really simple validation library and a standard that library implementers in different languages use, we can achieve input validation with a really simple setup. One really great advantage of following a schema standard is that we can be sure about the basic implementation no matter what the language which might implment the schema. For example, we can use the sameschema.json description with a JavaScript library for validating the user input — for example, in the front-end of the API we’ve just built.

Summary

The full app, some sample input files are present in this repo. The json-schema gem is not yet official and might have some unfinished components — For example, the format validations ofhostname and email for a string type have not been implemented yet — and the JSON Schema specification itself is under constant revisions. But that doesn’t mean it’s not ready for usage. Few of our developers use the gem in one of our projects and are pretty happy with it. Try out the gem and go through the specfication to gain an idea of why this would be beneficial yourself.

More Reading

  1. Understanding JSON Schema
  2. JSON Schema Documentation
  3. This excellent article by David Walsh
  4. JSON Schema Example: This example uses more keywords that weren’t discussed in this post. For example, title and description.

Can I use comments inside a JSON file?

http://stackoverflow.com/questions/244777/can-i-use-comments-inside-a-json-file

I don’t believe you can have an actual comment. The JSON should all be data, and if you include a comment, then it will be data too.

You could have a designated data element called “_comment” (or something) that would be ignored by apps that use the json data.

You would probably be better having the comment in the processes that generate/receive the json, as they are supposed to know what the json data will be in advance, or at least the structure of it.

But if you decided to…

{
   "_comment": "comment text goes here...",
   "glossary": {
      "title": "example glossary",
      "GlossDiv": {
         "title": "S",
         "GlossList": {
            "GlossEntry": {
               "ID": "SGML",
               "SortAs": "SGML",
               "GlossTerm": "Standard Generalized Markup Language",
               "Acronym": "SGML",
               "Abbrev": "ISO 8879:1986",
               "GlossDef": {
                  "para": "A meta-markup language, used to create markup languages such as DocBook.",
                  "GlossSeeAlso": ["GML", "XML"]
               },
               "GlossSee": "markup"
            }
         }
      }
   }
}
shareedit