Introducing ASON, and announcing libason 0.0.1

| categories: tech, libason | View Comments

I never much liked SQL.

I don't think it's a terribly controversial opinion. Apart from being pretty much the only option in its space, SQL doesn't have much to recommend it. It has confusing pretensions to natural language, it expects us to nest its expressions despite being incredibly verbose, it can't seem to shake a gross affection for capital letters, every single implementation is piled high with custom features, missing features, and strange voodoo, and it's got almost an entire imperative language built in to it, despite seeing most of its use from inside of other imperative languages. I like relational calculus well enough (it's the sort of crunchy, smarty-pants, pretensions-to-high-mathematics kind of comp sci I've always been in to), but its usual "business implementation" has never gotten along with me.

At the same time, I've always liked table-less databases. Designing and maintaining a schema is always more work than you expect, so a "just dump your dictionaries here" approach has instant appeal for the lazy and impatient, not to mention being simpler to reconcile with most modern imperative programming languages. Migrate-on-read also seems more engineer-friendly than trying to maintain migrations. I even prefer the atomicity model of some of the newer NoSQL databases like CouchDB to transactions, though I've yet to find an ORM that exposes it well.

It was my dislike of SQL and my preference for table-less that led me to look for an alternative to SQLite. I wanted a light, file-backed data store that could store table-less data (preferably JSON) and index it. I found tokyocabinet early, but while the author is being vague about it, it seems to be obsolete. kyotocabinet is the author's replacement, written in C++, and packaged for Fedora by yours truly, but it drops the table-less storage for simpler key-value stores, and the code quality leaves a lot to be desired. I will probably drop the package soon if nobody else is interested.

Then I started thinking about rolling my own. It was a problem that dealt in a lot of areas I happen to find fun (data structures, parsing, getting information through the VM to the disk without using O_DIRECT like an idiot), and while almost certainly a yak shave, I kept running in to this same yak, and I'd never once liked its haircut.

So, since formalism is how I personally drum up that froth of false confidence that precedes the birth of all great open source software, I set about formalizing a data model. I wanted to lend a semantic to JSON, which has some popularity already as a data format for table-less databases, and also extend it so I could reflect queries. I wanted to be able to describe not just a JSON value, but a set of such values, and from that set a subset, or a revision of that set with an item updated or removed.

And in this way was born Algebraic Serialized Object Notation, and a lengthy, crunchy whitepaper explaining how it works. Intimidating, perhaps, but at some point in the process I must have hit on some luck, because while specifying ASON is complicated, using it is simple.

Here is an ASON object:

{ "foo": 6, "bar": 7, "baz": "Cookies 'n' Cream" }

Just like JSON. Here is a collection of three ASON objects:

{ "type": "Student", "name": "Joe Thompson", "GPA": 3.5 } |
{ "type": "Student", "name": "Jeff Foxworthy", "GPA": 3.7 } |
{ "type": "Student", "name": "Smythe Littlesmythe", "GPA": 3.5 }

Here is a query to find all students with a GPA of 3.5:

{ "type": "Student", "GPA": 3.5, * }

Here is how to apply that query to the records above:

(   { "type": "Student", "name": "Joe Thompson", "GPA": 3.5 } |
    { "type": "Student", "name": "Jeff Foxworthy", "GPA": 3.7 } |
    { "type": "Student", "name": "Smythe Littlesmythe", "GPA": 3.5 }
) & { "type": "Student", "GPA": 3.5, * }

And here's the results:

{ "type": "Student", "name": "Joe Thompson", "GPA": 3.5 } |
{ "type": "Student", "name": "Smythe Littlesmythe", "GPA": 3.5 }

ASON's extensions should be apparent: we've basically allowed JSON patterns to be expressed as well as single values. While my intent was to specify a query language for a table-less database, using ASON much like regex to validate JSON values as you would strings is an obvious application.

For the database case, I opt to use the conjunction operation to join entries in a table. For example, that set of student values above could literally be read as "Joe Thompson or Jeff Foxworthy or Smythe Littlesmythe" if we interpreted it as a pattern rather than as a data set. The "&" operator calculates "pattern intersections," which effectively returns our matches (intersecting patterns is possible for regular expressions as well, though few practical syntaxes implement this).

This is all well and good, but a cute formalization is little more than cute. Without an implementation to play with, there's not much use in it.

And thus the point of this post. I would like to announce the first early alpha release of libason, version 0.0.1. Download here. This is a more-or-less complete implementation of ASON as it is today, in the form of a very, very slow in-memory database (what do you want? it's an early alpha). It comes with a handy REPL program called asonq, which will let you play around with the syntax, and a full C API for parsing and manipulating ASON values (python bindings to come soon... I hope). I intend for this to grow an on-disk database module, thus fulfilling my needs, and possibly have a proper DB server built over it, much like tokyotyrant sat atop tokyocabinet. There's also some more syntax to specify, mostly around updating values.

For now, I have something roughly complete, and interesting. The API documentation is shipped as the manpage ason_values(3) and some others which should be linked by association. Hopefully I can persuade a few curious souls to play around with it. Drop your bugs at github.

blog comments powered by Disqus