BSON [bee · sahn], short for Binary JSON, is a binary-encoded serialization of JSON-like documents. Like JSON, BSON supports the embedding of documents and arrays within other documents and arrays. BSON also contains extensions that allow representation of data types that are not part of the JSON spec. For example, BSON has a Date type and a BinData type.
BSON can be compared to binary interchange formats, like Protocol Buffers. BSON is more "schema-less" than Protocol Buffers, which can give it an advantage in flexibility but also a slight disadvantage in space efficiency (BSON has overhead for field names within the serialized data).
BSON was designed to have the following three characteristics:
Keeping spatial overhead to a minimum is important for any data representation format, especially when used over the network.
BSON is designed to be traversed easily. This is a vital property in its role as the primary data representation for MongoDB.
Encoding data to BSON and decoding from BSON can be performed very quickly in most languages due to the use of C data types.
BSON is a binary format in which zero or more key/value pairs are stored as a single entity. We call this entity a document.
The following grammar specifies version 1.0 of the
BSON standard. We've written the grammar using a
syntax. Valid BSON data is represented by
The following basic types are used as terminals in the rest of the grammar. Each type must be serialized in little-endian format.
|byte||1 byte (8-bits)|
|int32||4 bytes (32-bit signed integer)|
|int64||8 bytes (64-bit signed integer)|
|double||8 bytes (64-bit IEEE 754 floating point)|
The following specifies the rest of the BSON
grammar. Note that quoted strings represent terminals,
and should be interpreted with C semantics
"\x01" represents the byte
0001). Also note that we use the
operator as shorthand for repetition
"\x01\x01"). When used as a unary
* means that the repetition can
occur 0 or more times.
|document||::=||int32 e_list "\x00"||BSON Document|
|e_list||::=||element e_list||Sequence of elements|
|element||::=||"\x01" e_name double||Floating point|
||||"\x02" e_name string||UTF-8 string|
||||"\x03" e_name document||Embedded document|
||||"\x04" e_name document||Array|
||||"\x05" e_name binary||Binary data|
||||"\x06" e_name||Undefined — Deprecated|
||||"\x07" e_name (byte*12)||ObjectId|
||||"\x08" e_name "\x00"||Boolean "false"|
||||"\x08" e_name "\x01"||Boolean "true"|
||||"\x09" e_name int64||UTC datetime|
||||"\x0A" e_name||Null value|
||||"\x0B" e_name cstring cstring||Regular expression|
||||"\x0C" e_name string (byte*12)||DBPointer — Deprecated|
||||"\x0E" e_name string||Symbol — Deprecated|
||||"\x10" e_name int32||32-bit Integer|
||||"\x11" e_name int64||Timestamp|
||||"\x12" e_name int64||64-bit integer|
||||"\xFF" e_name||Min key|
||||"\x7F" e_name||Max key|
|string||::=||int32 (byte*) "\x00"||String|
|binary||::=||int32 subtype (byte*)||Binary|
|subtype||::=||"\x00"||Binary / Generic|
|code_w_s||::=||int32 string document||Code w/ scope|
Implementations of the BSON specification exist for many different languages / environments. Some implementations are currently embedded within MongoDB drivers, since MongoDB was the first large project to make use of BSON. Over time those libraries will be made more stand-alone, but they should be usable independently of MongoDB in their current state.
J2ME (work in progress)
Lua (pure; work in progress)
Python — with optional C extension
MongoDB, the document-oriented database, uses BSON as both the network and on-disk representation of documents.
If you know of other BSON implementations or projects using BSON, please add them.
BSON is designed to be efficient in space, but in many cases is not much more efficient than JSON. In some cases BSON uses even more space than JSON. The reason for this is another of the BSON design goals: traversability. BSON adds some "extra" information to documents, like length prefixes, that make it easy and fast to traverse.
BSON is also designed to be fast to encode and decode. For example, integers are stored as 32 (or 64) bit integers, so they don't need to be parsed to and from text. This uses more space than JSON for small integers, but is much faster to parse.
The best place to ask questions about BSON is on the BSON mailing list.
The best way to contribute to this site is to fork the project and send us a pull request.