First revision
This commit is contained in:
parent
748b07519a
commit
9aee64322f
3 changed files with 270 additions and 1 deletions
1
.gitignore
vendored
Normal file
1
.gitignore
vendored
Normal file
|
@ -0,0 +1 @@
|
|||
.vscode
|
28
README.md
28
README.md
|
@ -1,3 +1,29 @@
|
|||
# SBHPF-Standard
|
||||
|
||||
The standard publications for the SBHPF binary format
|
||||
This repository holds the standard publications for the SBHPF binary format.
|
||||
|
||||
As of right now, the latest version of SBHPF is **Version 1**.
|
||||
|
||||
## What is SBHPF?
|
||||
|
||||
**SBHPF** is an abbreviation of **Simple Binary Hierarchical Property Format**.
|
||||
|
||||
It is essentially just a standard definition of a binary format that stores nodes in a tree-like format.
|
||||
|
||||
A node can optionally contain properties or child nodes.
|
||||
|
||||
The specific implementation of this depends on the version of the format. Please read the document for more information.
|
||||
|
||||
Each standard definition outlines the format in detail, and also has some recommendations for implementing parsers.
|
||||
|
||||
## Standards
|
||||
|
||||
**SBHPF version 1:** First version of the format. Quite simple but at the same time pretty solid. See the [v1 standard](/SBHPF-v1.md).
|
||||
|
||||
## Implementations
|
||||
|
||||
I will try to create quality implementations of SBHPF in various languages when I have time.
|
||||
|
||||
For now, there is only a basic Go implementation.
|
||||
|
||||
- **GoSBHPF:** Simple go library implementing SBHPF. ([source](https://git.zervo.org/FLUX/GoSBHPF))
|
242
SBHPF-v1.md
Normal file
242
SBHPF-v1.md
Normal file
|
@ -0,0 +1,242 @@
|
|||
# Simple Binary Hierarchical Property Format (Version 1)
|
||||
|
||||
## **1. Overview**
|
||||
|
||||
This document specifies the structure of a compact and efficient **binary tree-like property format** designed for **hierarchical key-value storage**. The format prioritizes fast serialization/deserialization and minimal storage overhead.
|
||||
|
||||
**SBHPF** stands for **Simple Binary Hierarchical Property Format**.
|
||||
|
||||
The term "file" or "property file" in this document doesn't strictly imply a file in the typical sense of an entry in a filesystem. "file", in this context, simply refers to a byte array. This byte array can in turn be stored in a traditional file, or in any other format. We chose the term "file" to make the document easier to read.
|
||||
|
||||
## **2. File Structure**
|
||||
|
||||
A binary property file consists of:
|
||||
A **fixed-length header** (2 bytes) that defines the format version and feature set.
|
||||
A **hierarchical node tree** that contains nodes, and key-value entries known as "properties".
|
||||
|
||||
## **3. File Header**
|
||||
The file begins with a **2-byte fixed-length header**:
|
||||
|
||||
| Offset | Size | Field Name | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| 0 | 1 |**Version**| Format version. Always `0x01` for this version. Future versions increment this. |
|
||||
| 1 | 1 | **Feature Flags** | Reserved for future use. Always `0x00` in version 1. |
|
||||
|
||||
After the header, the **first node** starts immediately at byte `2`.
|
||||
|
||||
## **4. Node Structure**
|
||||
|
||||
Each node represents a hierarchical entity and contains **a fixed-length header, properties, and child nodes**. Note that only the node header is required for a node to be valid. Properties and child nodes are optional.
|
||||
|
||||
A node has a structure with the following components:
|
||||
- **Node Header** - Data describing the node. Has fixed size of 9 bytes.
|
||||
- **Node Name** - (optional) Only present on named nodes. A node is considered named if the *Name Length* node header is bigger than 0. That header also defines the size of this component.
|
||||
- **Node Properties** - The beginning of node properties. How this works is described in detail later, but this is only present if the *Propert Count* node header is bigger than 0. That header, in combination with information on each property, defines the size of this component.
|
||||
- **Child Nodes** - The beginning of child nodes. This is also described more later, but this is only present if the *Child Count* node header is bigger than 0. The size of this component is defined by that header, in combination with the *Node Size* headers of the child nodes.
|
||||
|
||||
|
||||
### **4.1 Node Header**
|
||||
|
||||
Each node begins with a **fixed 9-byte header**:
|
||||
| Offset | Size | Field Name | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| 0 | 4 | **Node Size** | Total size of this node (including header, entries, and children). |
|
||||
| 4 | 2 | **Property Count** | Number of key-value entries (properties) within this node. |
|
||||
| 6 | 2 | **Child Count** | Number of direct child nodes. |
|
||||
| 8 | 1 | **Name Length** | Length of the node name field (in bytes). A value of `0` means unnamed. |
|
||||
|
||||
These fields are unsigned integers.
|
||||
|
||||
#### **4.1.1 Node Size**
|
||||
|
||||
`Node Size` specifies the total size of the node in bytes.
|
||||
|
||||
The size is defined as: the amount of bytes from the first byte of the node header, to the last byte of the last child of the node.
|
||||
|
||||
If the node does not have any children, this definition changes to: the last byte of the last property.
|
||||
|
||||
And if there are no properties, the definition will simply be: the last byte of the node header (if the node is named, it will be last byte of node header + `Name Length`).
|
||||
|
||||
Since `Node Size` has a fixed size of `4` bytes, the theoretical max size of a node is `4,294,967,296` bytes, or `4.295` gigabytes.
|
||||
This of course, includes root nodes.
|
||||
|
||||
This constraint might be addressed in future revisions, but storing that amount of data in a format primarily intended for configuration files does not make sense anyway.
|
||||
|
||||
#### **4.1.2 Property Count**
|
||||
|
||||
`Property Count` specifies the amount of properties defined on the node.
|
||||
This, together with other parts of the node header, allows defining "fixed" sizes of variable fields in the binary structure.
|
||||
By using this approach, we eliminate the need of recursively terminating values which would contribute to inefficient parsing.
|
||||
|
||||
Since `Property Count` has a fixed size of `2` bytes, the max value of this field is `65535`.
|
||||
This effectively means that each node can have a total of `65535` properties defined on it.
|
||||
|
||||
#### **4.1.3 Child Count**
|
||||
|
||||
`Child Count` specifies the amount of child nodes belonging to the node.
|
||||
There are no explicitly defined pointers or IDs in this binary format, because these node headers are all we need to dynamically determine the location of data.
|
||||
|
||||
Child nodes are defined immediately after the last property on the node,
|
||||
or the end of the node name if no properties are defined,
|
||||
or the end of the node header if no name is defined.
|
||||
|
||||
Since `Child Count` has a fixed size of `2` bytes, the max value of this field is `65535`.
|
||||
This effectively means that each node can have up to `65535` children defined on it.
|
||||
|
||||
#### **4.1.4 Name Length**
|
||||
|
||||
If `Name Length > 0`, the node name is stored immediately **after the node header**, as a UTF-8 encoded string.
|
||||
|
||||
Since `Name Length` has a fixed size of `1` byte, the max value of this field is `255`.
|
||||
This effectively means that the name of each node, if defined, can have a maximum length of `255`.
|
||||
|
||||
**If unnamed**, this section is omitted, and the next section (properties) starts immediately after the header.
|
||||
|
||||
### **4.3 Properties**
|
||||
|
||||
*A.k.a key-value entries.*
|
||||
|
||||
After the node header (and optional node name), the node contains **properties**. Each property follows this format:
|
||||
|
||||
| Offset | Size | Field Name | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| 0 | 1 | **Key Length** | Length of the key string (in bytes). |
|
||||
| 2 | 1 | **Value Type** | Specifies the type of the value (see Section 5). |
|
||||
| 3 | X | **Key** | UTF-8 string of `Key Length` bytes. |
|
||||
| X+3 | Y | **Value** | The value itself, determined by `Value Type`. |
|
||||
|
||||
The number of properties on a node is given by `Property Count` in the **Node Header**.
|
||||
|
||||
The **total size of each property is variable**, depending on `Key Length` and `Value Type`.
|
||||
|
||||
This can also be described as "the first two bytes are fixed, and are used to determine the length of the rest of the property".
|
||||
|
||||
#### **4.3.1 Key Length**
|
||||
|
||||
Defines the length of the `Key` that follows immediately after `Value Type` at offset `2`.
|
||||
|
||||
Since `Key Length` has a fixed size of `1` byte, the max value of this field is `255`. This effectively means that each
|
||||
|
||||
### **4.4 Child Nodes**
|
||||
|
||||
After all properties, the node contains **child nodes**.
|
||||
|
||||
The number of children is given by `Child Count` in the **Node Header**.
|
||||
|
||||
Each child node follows **immediately** after the previous node’s last property.
|
||||
|
||||
Each child node is stored in the same format as a regular node (header → name → properties → children).
|
||||
|
||||
This **immediate tree structure** allows rapid traversal.
|
||||
|
||||
## **5. Value Types**
|
||||
|
||||
Each property has a **1-byte type specifier** before the value. Supported types:
|
||||
|
||||
| Type ID | Type Name | Size | Description |
|
||||
| `0x01` | `int8` | 1 byte | 8-bit signed integer |
|
||||
| `0x02` | `uint8` | 1 byte | 8-bit unsigned integer |
|
||||
| `0x03` | `int16` | 2 bytes | 16-bit signed integer |
|
||||
| `0x04` | `uint16` | 2 bytes | 16-bit unsigned integer |
|
||||
| `0x05` | `int32` | 4 bytes | 32-bit signed integer |
|
||||
| `0x06` | `uint32` | 4 bytes | 32-bit unsigned integer |
|
||||
| `0x07` | `int64` | 8 bytes | 64-bit signed integer |
|
||||
| `0x08` | `uint64` | 8 bytes | 64-bit unsigned integer |
|
||||
| `0x09` | `float32` | 4 bytes | 32-bit IEEE 754 single precision float |
|
||||
| `0x0A` | `float64` | 8 bytes | 64-bit IEEE 754 single precision float |
|
||||
| `0x0B` | `bool` | 1 byte | `0x00` = false, `0x01` = true |
|
||||
| `0x0C` | `string` | Variable | UTF-8 encoded string (preceded by a 2-byte length prefix) |
|
||||
|
||||
The structure and size of a property is determined by the type.
|
||||
|
||||
Some types (strings) are more complex because of their variable size, and dedicate `2` bytes to specify the size of the string.
|
||||
This is called the length prefix. Because this prefix is fixed, the maximum size of the value of a string property is `65535` bytes.
|
||||
|
||||
## **6. Example Structure (Hex Representation)**
|
||||
|
||||
Here’s an example of a configuration file storing a **named root node** with a **nameless child node** and some properties:
|
||||
|
||||
```
|
||||
01 00 ; Header: Version 1, Feature Flags 0
|
||||
10 00 00 00 ; Node Size (57 bytes)
|
||||
02 00 ; Property Count = 2
|
||||
01 00 ; Child Count = 1
|
||||
06 ; Node Name Length = 6 ("config")
|
||||
63 6F 6E 66 69 67 ; Node Name "config" (UTF-8 encoded)
|
||||
05 0B 73 65 74 75 70 01 ; Property 1: Key "setup" (bool = true)
|
||||
04 0C 70 61 74 68 04 00 2F 75 73 72 ; Property 2: Key "path" (string = "/usr")
|
||||
10 00 00 00 ; Node Size (20 bytes)
|
||||
01 00 ; Property Count = 1
|
||||
00 00 ; Child Count = 0
|
||||
00 ; Node Name Length = 0 (nameless node)
|
||||
05 02 6C 65 76 65 6C 03 00 00 00 ; Property 1: Key "level" (uint32 = 3)
|
||||
```
|
||||
|
||||
This example demonstrates:
|
||||
|
||||
- **Named root node with name "config"**
|
||||
- **Two properties** (`setup = true`, `path = "/usr"`)
|
||||
- **One nameless child node with one property** (`level = 3`)
|
||||
|
||||
### 6.1 Understanding the structure
|
||||
|
||||
The rest of this document contains enough information to understand the data structure shown above.
|
||||
However, it might be a bit confusing if you are unfamiliar with concepts such as hexadecimal and binary data storage.
|
||||
While you do not need to understand any of this to use libraries implementing this standard, it is still good practice to "know your tools".
|
||||
And it might even be interesting, who knows.
|
||||
|
||||
To help explain the structure above, let's look at the most complex entry: the "path" string.
|
||||
|
||||
Here is the "path" string property from the example:
|
||||
`04 0C 70 61 74 68 04 00 2F 75 73 72`
|
||||
|
||||
Let's break it apart to make it easier to understand:
|
||||
`04` `0C` `70 61 74 68` `04 00` `2F 75 73 72`
|
||||
|
||||
Now let's explain each part of the property:
|
||||
|
||||
- `04` - **Key Length:** In this case, a key length of 4. Because the key "path" is 4 bytes. *All* properties have a key length defined.
|
||||
- `0C` - **Value Type:** Controls the type of the property. Some types also have unique formatting. In this case, we specify a type of 12 (hex 0C) which corresponds to a string value.
|
||||
- `70 61 74 68` - **Key:** This is the UTF-8 key. This key has a size of 4 bytes, as specified by the *Key Length*. It has a UTF-8 value of "path". Everything in the property after the key is unique to the value type.
|
||||
- `04 00` - **String Length Prefix:** This is unique to the string value type. It is fixed to a size of 2 bytes and specifies the size of the actual string value. In this case 4 bytes.
|
||||
- `2F 75 73 72` - **String value:** The actual string. In this case "/usr". This value has a size of 4 bytes, as specified by the *String Length Prefix*.
|
||||
|
||||
This might seem very complex just to define a string, but from a technical perspective it is actually really simple and pretty efficient.
|
||||
Implementing a similar structure as the example above in JSON would take around 100 bytes of storage, while this format only takes up 57 bytes.
|
||||
Parsing this format is also a lot more efficient compared to JSON.
|
||||
|
||||
There are many improvements that can be made to this format in terms of parsing efficiency, by for example padding values to multiples of 2.
|
||||
But the version defined here (version 1) serves as a good foundation that can be built upon in the future.
|
||||
|
||||
## **8. Future Extensions (Versions 2+)**
|
||||
|
||||
This format is designed for future expansion. Examples of improvements that could be made in future versions are:
|
||||
- **Extended Feature Flags:** byte 2 of the file header can enable new features.
|
||||
- **Variable Node Size Fields:** to allow nodes >4GB, if needed in later versions.
|
||||
- **Compression Flags:** optional zlib or LZ4 support for even smaller storage.
|
||||
- **Optimized Field Sizes:** byte-aligning parts of the structure with padding to allow for logic-optimized parser implementations.
|
||||
|
||||
And probably much more...
|
||||
|
||||
But! This serves as a pretty solid foundation to build upon. And is a fun experiment.
|
||||
The presence of the version flag at the start of a file also enables backwards compatibility.
|
||||
|
||||
## **9. Conclusion**
|
||||
This **SBHPF Version 1** specification ensures:
|
||||
|
||||
- **Fast, direct access** to nodes with fixed headers and easily calculated offsets.
|
||||
- **Minimal parsing complexity** thanks to the absence of unnecessary variable-size fields.
|
||||
- **Hierarchical, structured key-value storage** with efficient traversal.
|
||||
- **Future-proofing** via a version + feature flags system.
|
||||
|
||||
This probably sounds like "cheesy tech talk".
|
||||
In reality this is just a functional and pretty efficient binary format that was created because "why not".
|
||||
|
||||
## **10. Document revisions**
|
||||
|
||||
Additional revisions might be made to this version standard definition at any time.
|
||||
These revisions should never be breaking changes, as that should be defined in a new version of the format.
|
||||
These revisions are specifically intended for correcting grammar mistakes and improving readability and clearity.
|
||||
|
||||
- **2025-02-14** - First publication.
|
||||
|
Loading…
Add table
Reference in a new issue