diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..600d2d3 --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +.vscode \ No newline at end of file diff --git a/README.md b/README.md index 09e7b32..807ada8 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,29 @@ # SBHPF-Standard -The standard publications for the SBHPF binary format \ No newline at end of file +This repository holds the standard publications for the SBHPF binary format. + +As of right now, the latest version of SBHPF is **Version 1**. + +## What is SBHPF? + +**SBHPF** is an abbreviation of **Simple Binary Hierarchical Property Format**. + +It is essentially just a standard definition of a binary format that stores nodes in a tree-like format. + +A node can optionally contain properties or child nodes. + +The specific implementation of this depends on the version of the format. Please read the document for more information. + +Each standard definition outlines the format in detail, and also has some recommendations for implementing parsers. + +## Standards + +**SBHPF version 1:** First version of the format. Quite simple but at the same time pretty solid. See the [v1 standard](/SBHPF-v1.md). + +## Implementations + +I will try to create quality implementations of SBHPF in various languages when I have time. + +For now, there is only a basic Go implementation. + +- **GoSBHPF:** Simple go library implementing SBHPF. ([source](https://git.zervo.org/FLUX/GoSBHPF)) \ No newline at end of file diff --git a/SBHPF-v1.md b/SBHPF-v1.md new file mode 100644 index 0000000..d2df8d6 --- /dev/null +++ b/SBHPF-v1.md @@ -0,0 +1,242 @@ +# Simple Binary Hierarchical Property Format (Version 1) + +## **1. Overview** + +This document specifies the structure of a compact and efficient **binary tree-like property format** designed for **hierarchical key-value storage**. The format prioritizes fast serialization/deserialization and minimal storage overhead. + +**SBHPF** stands for **Simple Binary Hierarchical Property Format**. + +The term "file" or "property file" in this document doesn't strictly imply a file in the typical sense of an entry in a filesystem. "file", in this context, simply refers to a byte array. This byte array can in turn be stored in a traditional file, or in any other format. We chose the term "file" to make the document easier to read. + +## **2. File Structure** + +A binary property file consists of: +A **fixed-length header** (2 bytes) that defines the format version and feature set. +A **hierarchical node tree** that contains nodes, and key-value entries known as "properties". + +## **3. File Header** +The file begins with a **2-byte fixed-length header**: + +| Offset | Size | Field Name | Description | +| --- | --- | --- | --- | +| 0 | 1 |**Version**| Format version. Always `0x01` for this version. Future versions increment this. | +| 1 | 1 | **Feature Flags** | Reserved for future use. Always `0x00` in version 1. | + +After the header, the **first node** starts immediately at byte `2`. + +## **4. Node Structure** + +Each node represents a hierarchical entity and contains **a fixed-length header, properties, and child nodes**. Note that only the node header is required for a node to be valid. Properties and child nodes are optional. + +A node has a structure with the following components: +- **Node Header** - Data describing the node. Has fixed size of 9 bytes. +- **Node Name** - (optional) Only present on named nodes. A node is considered named if the *Name Length* node header is bigger than 0. That header also defines the size of this component. +- **Node Properties** - The beginning of node properties. How this works is described in detail later, but this is only present if the *Propert Count* node header is bigger than 0. That header, in combination with information on each property, defines the size of this component. +- **Child Nodes** - The beginning of child nodes. This is also described more later, but this is only present if the *Child Count* node header is bigger than 0. The size of this component is defined by that header, in combination with the *Node Size* headers of the child nodes. + + +### **4.1 Node Header** + + Each node begins with a **fixed 9-byte header**: +| Offset | Size | Field Name | Description | +| --- | --- | --- | --- | +| 0 | 4 | **Node Size** | Total size of this node (including header, entries, and children). | +| 4 | 2 | **Property Count** | Number of key-value entries (properties) within this node. | +| 6 | 2 | **Child Count** | Number of direct child nodes. | +| 8 | 1 | **Name Length** | Length of the node name field (in bytes). A value of `0` means unnamed. | + +These fields are unsigned integers. + +#### **4.1.1 Node Size** + +`Node Size` specifies the total size of the node in bytes. + +The size is defined as: the amount of bytes from the first byte of the node header, to the last byte of the last child of the node. + +If the node does not have any children, this definition changes to: the last byte of the last property. + +And if there are no properties, the definition will simply be: the last byte of the node header (if the node is named, it will be last byte of node header + `Name Length`). + +Since `Node Size` has a fixed size of `4` bytes, the theoretical max size of a node is `4,294,967,296` bytes, or `4.295` gigabytes. +This of course, includes root nodes. + +This constraint might be addressed in future revisions, but storing that amount of data in a format primarily intended for configuration files does not make sense anyway. + +#### **4.1.2 Property Count** + +`Property Count` specifies the amount of properties defined on the node. +This, together with other parts of the node header, allows defining "fixed" sizes of variable fields in the binary structure. +By using this approach, we eliminate the need of recursively terminating values which would contribute to inefficient parsing. + +Since `Property Count` has a fixed size of `2` bytes, the max value of this field is `65535`. +This effectively means that each node can have a total of `65535` properties defined on it. + +#### **4.1.3 Child Count** + +`Child Count` specifies the amount of child nodes belonging to the node. +There are no explicitly defined pointers or IDs in this binary format, because these node headers are all we need to dynamically determine the location of data. + +Child nodes are defined immediately after the last property on the node, +or the end of the node name if no properties are defined, +or the end of the node header if no name is defined. + +Since `Child Count` has a fixed size of `2` bytes, the max value of this field is `65535`. +This effectively means that each node can have up to `65535` children defined on it. + +#### **4.1.4 Name Length** + +If `Name Length > 0`, the node name is stored immediately **after the node header**, as a UTF-8 encoded string. + +Since `Name Length` has a fixed size of `1` byte, the max value of this field is `255`. +This effectively means that the name of each node, if defined, can have a maximum length of `255`. + +**If unnamed**, this section is omitted, and the next section (properties) starts immediately after the header. + +### **4.3 Properties** + +*A.k.a key-value entries.* + +After the node header (and optional node name), the node contains **properties**. Each property follows this format: + +| Offset | Size | Field Name | Description | +| --- | --- | --- | --- | +| 0 | 1 | **Key Length** | Length of the key string (in bytes). | +| 2 | 1 | **Value Type** | Specifies the type of the value (see Section 5). | +| 3 | X | **Key** | UTF-8 string of `Key Length` bytes. | +| X+3 | Y | **Value** | The value itself, determined by `Value Type`. | + +The number of properties on a node is given by `Property Count` in the **Node Header**. + +The **total size of each property is variable**, depending on `Key Length` and `Value Type`. + +This can also be described as "the first two bytes are fixed, and are used to determine the length of the rest of the property". + +#### **4.3.1 Key Length** + +Defines the length of the `Key` that follows immediately after `Value Type` at offset `2`. + +Since `Key Length` has a fixed size of `1` byte, the max value of this field is `255`. This effectively means that each + +### **4.4 Child Nodes** + +After all properties, the node contains **child nodes**. + +The number of children is given by `Child Count` in the **Node Header**. + +Each child node follows **immediately** after the previous node’s last property. + +Each child node is stored in the same format as a regular node (header → name → properties → children). + +This **immediate tree structure** allows rapid traversal. + +## **5. Value Types** + +Each property has a **1-byte type specifier** before the value. Supported types: + +| Type ID | Type Name | Size | Description | +| `0x01` | `int8` | 1 byte | 8-bit signed integer | +| `0x02` | `uint8` | 1 byte | 8-bit unsigned integer | +| `0x03` | `int16` | 2 bytes | 16-bit signed integer | +| `0x04` | `uint16` | 2 bytes | 16-bit unsigned integer | +| `0x05` | `int32` | 4 bytes | 32-bit signed integer | +| `0x06` | `uint32` | 4 bytes | 32-bit unsigned integer | +| `0x07` | `int64` | 8 bytes | 64-bit signed integer | +| `0x08` | `uint64` | 8 bytes | 64-bit unsigned integer | +| `0x09` | `float32` | 4 bytes | 32-bit IEEE 754 single precision float | +| `0x0A` | `float64` | 8 bytes | 64-bit IEEE 754 single precision float | +| `0x0B` | `bool` | 1 byte | `0x00` = false, `0x01` = true | +| `0x0C` | `string` | Variable | UTF-8 encoded string (preceded by a 2-byte length prefix) | + +The structure and size of a property is determined by the type. + +Some types (strings) are more complex because of their variable size, and dedicate `2` bytes to specify the size of the string. +This is called the length prefix. Because this prefix is fixed, the maximum size of the value of a string property is `65535` bytes. + +## **6. Example Structure (Hex Representation)** + +Here’s an example of a configuration file storing a **named root node** with a **nameless child node** and some properties: + +``` +01 00 ; Header: Version 1, Feature Flags 0 +10 00 00 00 ; Node Size (57 bytes) +02 00 ; Property Count = 2 +01 00 ; Child Count = 1 +06 ; Node Name Length = 6 ("config") +63 6F 6E 66 69 67 ; Node Name "config" (UTF-8 encoded) +05 0B 73 65 74 75 70 01 ; Property 1: Key "setup" (bool = true) +04 0C 70 61 74 68 04 00 2F 75 73 72 ; Property 2: Key "path" (string = "/usr") +10 00 00 00 ; Node Size (20 bytes) +01 00 ; Property Count = 1 +00 00 ; Child Count = 0 +00 ; Node Name Length = 0 (nameless node) +05 02 6C 65 76 65 6C 03 00 00 00 ; Property 1: Key "level" (uint32 = 3) +``` + +This example demonstrates: + +- **Named root node with name "config"** +- **Two properties** (`setup = true`, `path = "/usr"`) +- **One nameless child node with one property** (`level = 3`) + +### 6.1 Understanding the structure + +The rest of this document contains enough information to understand the data structure shown above. +However, it might be a bit confusing if you are unfamiliar with concepts such as hexadecimal and binary data storage. +While you do not need to understand any of this to use libraries implementing this standard, it is still good practice to "know your tools". +And it might even be interesting, who knows. + +To help explain the structure above, let's look at the most complex entry: the "path" string. + +Here is the "path" string property from the example: +`04 0C 70 61 74 68 04 00 2F 75 73 72` + +Let's break it apart to make it easier to understand: +`04` `0C` `70 61 74 68` `04 00` `2F 75 73 72` + +Now let's explain each part of the property: + +- `04` - **Key Length:** In this case, a key length of 4. Because the key "path" is 4 bytes. *All* properties have a key length defined. +- `0C` - **Value Type:** Controls the type of the property. Some types also have unique formatting. In this case, we specify a type of 12 (hex 0C) which corresponds to a string value. +- `70 61 74 68` - **Key:** This is the UTF-8 key. This key has a size of 4 bytes, as specified by the *Key Length*. It has a UTF-8 value of "path". Everything in the property after the key is unique to the value type. +- `04 00` - **String Length Prefix:** This is unique to the string value type. It is fixed to a size of 2 bytes and specifies the size of the actual string value. In this case 4 bytes. +- `2F 75 73 72` - **String value:** The actual string. In this case "/usr". This value has a size of 4 bytes, as specified by the *String Length Prefix*. + +This might seem very complex just to define a string, but from a technical perspective it is actually really simple and pretty efficient. +Implementing a similar structure as the example above in JSON would take around 100 bytes of storage, while this format only takes up 57 bytes. +Parsing this format is also a lot more efficient compared to JSON. + +There are many improvements that can be made to this format in terms of parsing efficiency, by for example padding values to multiples of 2. +But the version defined here (version 1) serves as a good foundation that can be built upon in the future. + +## **8. Future Extensions (Versions 2+)** + +This format is designed for future expansion. Examples of improvements that could be made in future versions are: +- **Extended Feature Flags:** byte 2 of the file header can enable new features. +- **Variable Node Size Fields:** to allow nodes >4GB, if needed in later versions. +- **Compression Flags:** optional zlib or LZ4 support for even smaller storage. +- **Optimized Field Sizes:** byte-aligning parts of the structure with padding to allow for logic-optimized parser implementations. + +And probably much more... + +But! This serves as a pretty solid foundation to build upon. And is a fun experiment. +The presence of the version flag at the start of a file also enables backwards compatibility. + +## **9. Conclusion** +This **SBHPF Version 1** specification ensures: + +- **Fast, direct access** to nodes with fixed headers and easily calculated offsets. +- **Minimal parsing complexity** thanks to the absence of unnecessary variable-size fields. +- **Hierarchical, structured key-value storage** with efficient traversal. +- **Future-proofing** via a version + feature flags system. + +This probably sounds like "cheesy tech talk". +In reality this is just a functional and pretty efficient binary format that was created because "why not". + +## **10. Document revisions** + +Additional revisions might be made to this version standard definition at any time. +These revisions should never be breaking changes, as that should be defined in a new version of the format. +These revisions are specifically intended for correcting grammar mistakes and improving readability and clearity. + +- **2025-02-14** - First publication. +