First revision

This commit is contained in:
zervo 2025-02-14 20:12:29 +01:00
parent 748b07519a
commit 9aee64322f
3 changed files with 270 additions and 1 deletions

1
.gitignore vendored Normal file
View file

@ -0,0 +1 @@
.vscode

View file

@ -1,3 +1,29 @@
# SBHPF-Standard
The standard publications for the SBHPF binary format
This repository holds the standard publications for the SBHPF binary format.
As of right now, the latest version of SBHPF is **Version 1**.
## What is SBHPF?
**SBHPF** is an abbreviation of **Simple Binary Hierarchical Property Format**.
It is essentially just a standard definition of a binary format that stores nodes in a tree-like format.
A node can optionally contain properties or child nodes.
The specific implementation of this depends on the version of the format. Please read the document for more information.
Each standard definition outlines the format in detail, and also has some recommendations for implementing parsers.
## Standards
**SBHPF version 1:** First version of the format. Quite simple but at the same time pretty solid. See the [v1 standard](/SBHPF-v1.md).
## Implementations
I will try to create quality implementations of SBHPF in various languages when I have time.
For now, there is only a basic Go implementation.
- **GoSBHPF:** Simple go library implementing SBHPF. ([source](https://git.zervo.org/FLUX/GoSBHPF))

242
SBHPF-v1.md Normal file
View file

@ -0,0 +1,242 @@
# Simple Binary Hierarchical Property Format (Version 1)
## **1. Overview**
This document specifies the structure of a compact and efficient **binary tree-like property format** designed for **hierarchical key-value storage**. The format prioritizes fast serialization/deserialization and minimal storage overhead.
**SBHPF** stands for **Simple Binary Hierarchical Property Format**.
The term "file" or "property file" in this document doesn't strictly imply a file in the typical sense of an entry in a filesystem. "file", in this context, simply refers to a byte array. This byte array can in turn be stored in a traditional file, or in any other format. We chose the term "file" to make the document easier to read.
## **2. File Structure**
A binary property file consists of:
A **fixed-length header** (2 bytes) that defines the format version and feature set.
A **hierarchical node tree** that contains nodes, and key-value entries known as "properties".
## **3. File Header**
The file begins with a **2-byte fixed-length header**:
| Offset | Size | Field Name | Description |
| --- | --- | --- | --- |
| 0 | 1 |**Version**| Format version. Always `0x01` for this version. Future versions increment this. |
| 1 | 1 | **Feature Flags** | Reserved for future use. Always `0x00` in version 1. |
After the header, the **first node** starts immediately at byte `2`.
## **4. Node Structure**
Each node represents a hierarchical entity and contains **a fixed-length header, properties, and child nodes**. Note that only the node header is required for a node to be valid. Properties and child nodes are optional.
A node has a structure with the following components:
- **Node Header** - Data describing the node. Has fixed size of 9 bytes.
- **Node Name** - (optional) Only present on named nodes. A node is considered named if the *Name Length* node header is bigger than 0. That header also defines the size of this component.
- **Node Properties** - The beginning of node properties. How this works is described in detail later, but this is only present if the *Propert Count* node header is bigger than 0. That header, in combination with information on each property, defines the size of this component.
- **Child Nodes** - The beginning of child nodes. This is also described more later, but this is only present if the *Child Count* node header is bigger than 0. The size of this component is defined by that header, in combination with the *Node Size* headers of the child nodes.
### **4.1 Node Header**
Each node begins with a **fixed 9-byte header**:
| Offset | Size | Field Name | Description |
| --- | --- | --- | --- |
| 0 | 4 | **Node Size** | Total size of this node (including header, entries, and children). |
| 4 | 2 | **Property Count** | Number of key-value entries (properties) within this node. |
| 6 | 2 | **Child Count** | Number of direct child nodes. |
| 8 | 1 | **Name Length** | Length of the node name field (in bytes). A value of `0` means unnamed. |
These fields are unsigned integers.
#### **4.1.1 Node Size**
`Node Size` specifies the total size of the node in bytes.
The size is defined as: the amount of bytes from the first byte of the node header, to the last byte of the last child of the node.
If the node does not have any children, this definition changes to: the last byte of the last property.
And if there are no properties, the definition will simply be: the last byte of the node header (if the node is named, it will be last byte of node header + `Name Length`).
Since `Node Size` has a fixed size of `4` bytes, the theoretical max size of a node is `4,294,967,296` bytes, or `4.295` gigabytes.
This of course, includes root nodes.
This constraint might be addressed in future revisions, but storing that amount of data in a format primarily intended for configuration files does not make sense anyway.
#### **4.1.2 Property Count**
`Property Count` specifies the amount of properties defined on the node.
This, together with other parts of the node header, allows defining "fixed" sizes of variable fields in the binary structure.
By using this approach, we eliminate the need of recursively terminating values which would contribute to inefficient parsing.
Since `Property Count` has a fixed size of `2` bytes, the max value of this field is `65535`.
This effectively means that each node can have a total of `65535` properties defined on it.
#### **4.1.3 Child Count**
`Child Count` specifies the amount of child nodes belonging to the node.
There are no explicitly defined pointers or IDs in this binary format, because these node headers are all we need to dynamically determine the location of data.
Child nodes are defined immediately after the last property on the node,
or the end of the node name if no properties are defined,
or the end of the node header if no name is defined.
Since `Child Count` has a fixed size of `2` bytes, the max value of this field is `65535`.
This effectively means that each node can have up to `65535` children defined on it.
#### **4.1.4 Name Length**
If `Name Length > 0`, the node name is stored immediately **after the node header**, as a UTF-8 encoded string.
Since `Name Length` has a fixed size of `1` byte, the max value of this field is `255`.
This effectively means that the name of each node, if defined, can have a maximum length of `255`.
**If unnamed**, this section is omitted, and the next section (properties) starts immediately after the header.
### **4.3 Properties**
*A.k.a key-value entries.*
After the node header (and optional node name), the node contains **properties**. Each property follows this format:
| Offset | Size | Field Name | Description |
| --- | --- | --- | --- |
| 0 | 1 | **Key Length** | Length of the key string (in bytes). |
| 2 | 1 | **Value Type** | Specifies the type of the value (see Section 5). |
| 3 | X | **Key** | UTF-8 string of `Key Length` bytes. |
| X+3 | Y | **Value** | The value itself, determined by `Value Type`. |
The number of properties on a node is given by `Property Count` in the **Node Header**.
The **total size of each property is variable**, depending on `Key Length` and `Value Type`.
This can also be described as "the first two bytes are fixed, and are used to determine the length of the rest of the property".
#### **4.3.1 Key Length**
Defines the length of the `Key` that follows immediately after `Value Type` at offset `2`.
Since `Key Length` has a fixed size of `1` byte, the max value of this field is `255`. This effectively means that each
### **4.4 Child Nodes**
After all properties, the node contains **child nodes**.
The number of children is given by `Child Count` in the **Node Header**.
Each child node follows **immediately** after the previous nodes last property.
Each child node is stored in the same format as a regular node (header → name → properties → children).
This **immediate tree structure** allows rapid traversal.
## **5. Value Types**
Each property has a **1-byte type specifier** before the value. Supported types:
| Type ID | Type Name | Size | Description |
| `0x01` | `int8` | 1 byte | 8-bit signed integer |
| `0x02` | `uint8` | 1 byte | 8-bit unsigned integer |
| `0x03` | `int16` | 2 bytes | 16-bit signed integer |
| `0x04` | `uint16` | 2 bytes | 16-bit unsigned integer |
| `0x05` | `int32` | 4 bytes | 32-bit signed integer |
| `0x06` | `uint32` | 4 bytes | 32-bit unsigned integer |
| `0x07` | `int64` | 8 bytes | 64-bit signed integer |
| `0x08` | `uint64` | 8 bytes | 64-bit unsigned integer |
| `0x09` | `float32` | 4 bytes | 32-bit IEEE 754 single precision float |
| `0x0A` | `float64` | 8 bytes | 64-bit IEEE 754 single precision float |
| `0x0B` | `bool` | 1 byte | `0x00` = false, `0x01` = true |
| `0x0C` | `string` | Variable | UTF-8 encoded string (preceded by a 2-byte length prefix) |
The structure and size of a property is determined by the type.
Some types (strings) are more complex because of their variable size, and dedicate `2` bytes to specify the size of the string.
This is called the length prefix. Because this prefix is fixed, the maximum size of the value of a string property is `65535` bytes.
## **6. Example Structure (Hex Representation)**
Heres an example of a configuration file storing a **named root node** with a **nameless child node** and some properties:
```
01 00 ; Header: Version 1, Feature Flags 0
10 00 00 00 ; Node Size (57 bytes)
02 00 ; Property Count = 2
01 00 ; Child Count = 1
06 ; Node Name Length = 6 ("config")
63 6F 6E 66 69 67 ; Node Name "config" (UTF-8 encoded)
05 0B 73 65 74 75 70 01 ; Property 1: Key "setup" (bool = true)
04 0C 70 61 74 68 04 00 2F 75 73 72 ; Property 2: Key "path" (string = "/usr")
10 00 00 00 ; Node Size (20 bytes)
01 00 ; Property Count = 1
00 00 ; Child Count = 0
00 ; Node Name Length = 0 (nameless node)
05 02 6C 65 76 65 6C 03 00 00 00 ; Property 1: Key "level" (uint32 = 3)
```
This example demonstrates:
- **Named root node with name "config"**
- **Two properties** (`setup = true`, `path = "/usr"`)
- **One nameless child node with one property** (`level = 3`)
### 6.1 Understanding the structure
The rest of this document contains enough information to understand the data structure shown above.
However, it might be a bit confusing if you are unfamiliar with concepts such as hexadecimal and binary data storage.
While you do not need to understand any of this to use libraries implementing this standard, it is still good practice to "know your tools".
And it might even be interesting, who knows.
To help explain the structure above, let's look at the most complex entry: the "path" string.
Here is the "path" string property from the example:
`04 0C 70 61 74 68 04 00 2F 75 73 72`
Let's break it apart to make it easier to understand:
`04` `0C` `70 61 74 68` `04 00` `2F 75 73 72`
Now let's explain each part of the property:
- `04` - **Key Length:** In this case, a key length of 4. Because the key "path" is 4 bytes. *All* properties have a key length defined.
- `0C` - **Value Type:** Controls the type of the property. Some types also have unique formatting. In this case, we specify a type of 12 (hex 0C) which corresponds to a string value.
- `70 61 74 68` - **Key:** This is the UTF-8 key. This key has a size of 4 bytes, as specified by the *Key Length*. It has a UTF-8 value of "path". Everything in the property after the key is unique to the value type.
- `04 00` - **String Length Prefix:** This is unique to the string value type. It is fixed to a size of 2 bytes and specifies the size of the actual string value. In this case 4 bytes.
- `2F 75 73 72` - **String value:** The actual string. In this case "/usr". This value has a size of 4 bytes, as specified by the *String Length Prefix*.
This might seem very complex just to define a string, but from a technical perspective it is actually really simple and pretty efficient.
Implementing a similar structure as the example above in JSON would take around 100 bytes of storage, while this format only takes up 57 bytes.
Parsing this format is also a lot more efficient compared to JSON.
There are many improvements that can be made to this format in terms of parsing efficiency, by for example padding values to multiples of 2.
But the version defined here (version 1) serves as a good foundation that can be built upon in the future.
## **8. Future Extensions (Versions 2+)**
This format is designed for future expansion. Examples of improvements that could be made in future versions are:
- **Extended Feature Flags:** byte 2 of the file header can enable new features.
- **Variable Node Size Fields:** to allow nodes >4GB, if needed in later versions.
- **Compression Flags:** optional zlib or LZ4 support for even smaller storage.
- **Optimized Field Sizes:** byte-aligning parts of the structure with padding to allow for logic-optimized parser implementations.
And probably much more...
But! This serves as a pretty solid foundation to build upon. And is a fun experiment.
The presence of the version flag at the start of a file also enables backwards compatibility.
## **9. Conclusion**
This **SBHPF Version 1** specification ensures:
- **Fast, direct access** to nodes with fixed headers and easily calculated offsets.
- **Minimal parsing complexity** thanks to the absence of unnecessary variable-size fields.
- **Hierarchical, structured key-value storage** with efficient traversal.
- **Future-proofing** via a version + feature flags system.
This probably sounds like "cheesy tech talk".
In reality this is just a functional and pretty efficient binary format that was created because "why not".
## **10. Document revisions**
Additional revisions might be made to this version standard definition at any time.
These revisions should never be breaking changes, as that should be defined in a new version of the format.
These revisions are specifically intended for correcting grammar mistakes and improving readability and clearity.
- **2025-02-14** - First publication.