BSON

}
01010100
11101011
10101110
01010101
{

BSON [bee · sahn], short for Bin­ary JSON, is a bin­ary-en­coded seri­al­iz­a­tion of JSON-like doc­u­ments. Like JSON, BSON sup­ports the em­bed­ding of doc­u­ments and ar­rays with­in oth­er doc­u­ments and ar­rays. BSON also con­tains ex­ten­sions that al­low rep­res­ent­a­tion of data types that are not part of the JSON spec. For ex­ample, BSON has a Date type and a BinData type.

BSON can be com­pared to bin­ary inter­change for­mats, like Proto­col Buf­fers. BSON is more "schema-less" than Proto­col Buf­fers, which can give it an ad­vant­age in flex­ib­il­ity but also a slight dis­ad­vant­age in space ef­fi­ciency (BSON has over­head for field names with­in the seri­al­ized data).

BSON was de­signed to have the fol­low­ing three char­ac­ter­ist­ics:

  1. Lightweight

    Keep­ing spa­tial over­head to a min­im­um is im­port­ant for any data rep­res­ent­a­tion format, es­pe­cially when used over the net­work.

  2. Traversable

    BSON is de­signed to be tra­versed eas­ily. This is a vi­tal prop­erty in its role as the primary data rep­res­ent­a­tion for Mon­goDB.

  3. Efficient

    En­cod­ing data to BSON and de­cod­ing from BSON can be per­formed very quickly in most lan­guages due to the use of C data types.

Specification

Version 1.0

BSON is a binary format in which zero or more key/value pairs are stored as a single entity. We call this entity a document.

The following grammar specifies version 1.0 of the BSON standard. We've written the grammar using a pseudo-BNF syntax. Valid BSON data is represented by the document non-terminal.

Basic Types

The following basic types are used as terminals in the rest of the grammar. Each type must be serialized in little-endian format.

byte 1 byte (8-bits)
int32 4 bytes (32-bit signed integer)
int64 8 bytes (64-bit signed integer)
double 8 bytes (64-bit IEEE 754 floating point)

Non-terminals

The following specifies the rest of the BSON grammar. Note that quoted strings represent terminals, and should be interpreted with C semantics (e.g. "\x01" represents the byte 0000 0001). Also note that we use the * operator as shorthand for repetition (e.g. ("\x01"*2) is "\x01\x01"). When used as a unary operator, * means that the repetition can occur 0 or more times.

document ::= int32 e_list "\x00" BSON Document
e_list ::= element e_list Sequence of elements
| ""
element ::= "\x01" e_name double Floating point
| "\x02" e_name string UTF-8 string
| "\x03" e_name document Embedded document
| "\x04" e_name document Array
| "\x05" e_name binary Binary data
| "\x06" e_name Undefined — Deprecated
| "\x07" e_name (byte*12) ObjectId
| "\x08" e_name "\x00" Boolean "false"
| "\x08" e_name "\x01" Boolean "true"
| "\x09" e_name int64 UTC datetime
| "\x0A" e_name Null value
| "\x0B" e_name cstring cstring Regular expression
| "\x0C" e_name string (byte*12) DBPointer — Deprecated
| "\x0D" e_name string JavaScript code
| "\x0E" e_name string Symbol
| "\x0F" e_name code_w_s JavaScript code w/ scope
| "\x10" e_name int32 32-bit Integer
| "\x11" e_name int64 Timestamp
| "\x12" e_name int64 64-bit integer
| "\xFF" e_name Min key
| "\x7F" e_name Max key
e_name ::= cstring Key name
string ::= int32 (byte*) "\x00" String
cstring ::= (byte*) "\x00" CString
binary ::= int32 subtype (byte*) Binary
subtype ::= "\x00" Binary / Generic
| "\x01" Function
| "\x02" Binary (Old)
| "\x03" UUID
| "\x05" MD5
| "\x80" User defined
code_w_s ::= int32 string document Code w/ scope

Examples

The following are some example documents (in JavaScript / Python style syntax) and their corresponding BSON representations. Try mousing over them for some useful correlation.

{"hello": "world"} "\x16\x00\x00\x00\x02hello\x00
 \x06\x00\x00\x00world\x00\x00"
{"BSON": ["awesome", 5.05, 1986]} "1\x00\x00\x00\x04BSON\x00&\x00
 \x00\x00
\x020\x00\x08\x00\x00
 \x00awesome\x00
\x011\x00333333
 \x14@
\x102\x00\xc2\x07\x00\x00
 \x00\x00"

Implementations

Implementations of the BSON specification exist for many different languages / environments. Some implementations are currently embedded within MongoDB drivers, since MongoDB was the first large project to make use of BSON. Over time those libraries will be made more stand-alone, but they should be usable independently of MongoDB in their current state.

Projects Using BSON

  • MongoDB, the document-oriented database, uses BSON as both the network and on-disk representation of documents.

If you know of other BSON implementations or projects using BSON, please add them.

FAQ

What is the point of BSON when it is no smaller than JSON in many cases?

BSON is designed to be efficient in space, but in many cases is not much more efficient than JSON. In some cases BSON uses even more space than JSON. The reason for this is another of the BSON design goals: traversability. BSON adds some "extra" information to documents, like length prefixes, that make it easy and fast to traverse.

BSON is also designed to be fast to encode and decode. For example, integers are stored as 32 (or 64) bit integers, so they don't need to be parsed to and from text. This uses more space than JSON for small integers, but is much faster to parse.

Where can I get more help/infomation?

The best place to ask questions about BSON is on the BSON mailing list.

How can I contribute or make fixes to this site?

The best way to contribute to this site is to fork the project and send us a pull request.

Creative Commons — CC0