First revision

2025-02-14 20:12:29 +01:00 · 2025-02-14 20:12:29 +01:00 · 9aee64322f
commit 9aee64322f
parent 748b07519a
3 changed files with 270 additions and 1 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1 @@
+.vscode
--- a/README.md
+++ b/README.md
@ -1,3 +1,29 @@
 # SBHPF-Standard

-The standard publications for the SBHPF binary format
+This repository holds the standard publications for the SBHPF binary format.
+
+As of right now, the latest version of SBHPF is **Version 1**.
+
+## What is SBHPF?
+
+**SBHPF** is an abbreviation of **Simple Binary Hierarchical Property Format**.
+
+It is essentially just a standard definition of a binary format that stores nodes in a tree-like format.
+
+A node can optionally contain properties or child nodes.
+
+The specific implementation of this depends on the version of the format. Please read the document for more information.
+
+Each standard definition outlines the format in detail, and also has some recommendations for implementing parsers.
+
+## Standards
+
+**SBHPF version 1:** First version of the format. Quite simple but at the same time pretty solid. See the [v1 standard](/SBHPF-v1.md).
+
+## Implementations
+
+I will try to create quality implementations of SBHPF in various languages when I have time.
+
+For now, there is only a basic Go implementation.
+
+- **GoSBHPF:** Simple go library implementing SBHPF. ([source](https://git.zervo.org/FLUX/GoSBHPF))
--- a/SBHPF-v1.md
+++ b/SBHPF-v1.md
@ -0,0 +1,242 @@
+# Simple Binary Hierarchical Property Format (Version 1)
+
+## **1. Overview**
+  
+This document specifies the structure of a compact and efficient **binary tree-like property format** designed for **hierarchical key-value storage**. The format prioritizes fast serialization/deserialization and minimal storage overhead.
+
+**SBHPF** stands for **Simple Binary Hierarchical Property Format**.
+
+The term "file" or "property file" in this document doesn't strictly imply a file in the typical sense of an entry in a filesystem. "file", in this context, simply refers to a byte array. This byte array can in turn be stored in a traditional file, or in any other format. We chose the term "file" to make the document easier to read.
+
+## **2. File Structure**
+  
+A binary property file consists of:
+A **fixed-length header** (2 bytes) that defines the format version and feature set.
+A **hierarchical node tree** that contains nodes, and key-value entries known as "properties".
+
+## **3. File Header**
+The file begins with a **2-byte fixed-length header**:
+
+| Offset | Size | Field Name | Description | 
+| --- | --- | --- | --- | 
+| 0 | 1 |**Version**| Format version. Always `0x01` for this version. Future versions increment this. |
+| 1 | 1 | **Feature Flags** | Reserved for future use. Always `0x00` in version 1. |
+
+After the header, the **first node** starts immediately at byte `2`.
+
+## **4. Node Structure**
+  
+Each node represents a hierarchical entity and contains **a fixed-length header, properties, and child nodes**. Note that only the node header is required for a node to be valid. Properties and child nodes are optional.
+
+A node has a structure with the following components:
+- **Node Header** - Data describing the node. Has fixed size of 9 bytes.
+- **Node Name** - (optional) Only present on named nodes. A node is considered named if the *Name Length* node header is bigger than 0. That header also defines the size of this component.
+- **Node Properties** - The beginning of node properties. How this works is described in detail later, but this is only present if the *Propert Count* node header is bigger than 0. That header, in combination with information on each property, defines the size of this component.
+- **Child Nodes** - The beginning of child nodes. This is also described more later, but this is only present if the *Child Count* node header is bigger than 0. The size of this component is defined by that header, in combination with the *Node Size* headers of the child nodes.
+
+
+### **4.1 Node Header**
+  
+  Each node begins with a **fixed 9-byte header**:
+| Offset | Size | Field Name | Description | 
+| --- | --- | --- | --- |
+| 0 | 4 | **Node Size** | Total size of this node (including header, entries, and children). | 
+| 4 | 2 | **Property Count** | Number of key-value entries (properties) within this node. | 
+| 6 | 2 | **Child Count** | Number of direct child nodes. | 
+| 8 | 1 | **Name Length** | Length of the node name field (in bytes). A value of `0` means unnamed. |
+
+These fields are unsigned integers.
+
+#### **4.1.1 Node Size**
+  
+`Node Size` specifies the total size of the node in bytes.
+
+The size is defined as: the amount of bytes from the first byte of the node header, to the last byte of the last child of the node.
+
+If the node does not have any children, this definition changes to: the last byte of the last property.
+
+And if there are no properties, the definition will simply be: the last byte of the node header (if the node is named, it will be last byte of node header + `Name Length`).
+  
+Since `Node Size` has a fixed size of `4` bytes, the theoretical max size of a node is `4,294,967,296` bytes, or `4.295` gigabytes.
+This of course, includes root nodes.
+
+This constraint might be addressed in future revisions, but storing that amount of data in a format primarily intended for configuration files does not make sense anyway.
+
+#### **4.1.2 Property Count**
+  
+`Property Count` specifies the amount of properties defined on the node.
+This, together with other parts of the node header, allows defining "fixed" sizes of variable fields in the binary structure.
+By using this approach, we eliminate the need of recursively terminating values which would contribute to inefficient parsing.
+
+Since `Property Count` has a fixed size of `2` bytes, the max value of this field is `65535`.
+This effectively means that each node can have a total of `65535` properties defined on it.
+
+#### **4.1.3 Child Count**
+  
+`Child Count` specifies the amount of child nodes belonging to the node.
+There are no explicitly defined pointers or IDs in this binary format, because these node headers are all we need to dynamically determine the location of data.
+  
+Child nodes are defined immediately after the last property on the node,
+or the end of the node name if no properties are defined,
+or the end of the node header if no name is defined.
+  
+Since `Child Count` has a fixed size of `2` bytes, the max value of this field is `65535`.
+This effectively means that each node can have up to `65535` children defined on it.
+
+#### **4.1.4 Name Length**
+  
+If `Name Length > 0`, the node name is stored immediately **after the node header**, as a UTF-8 encoded string.
+
+Since `Name Length` has a fixed size of `1` byte, the max value of this field is `255`.
+This effectively means that the name of each node, if defined, can have a maximum length of `255`.
+  
+**If unnamed**, this section is omitted, and the next section (properties) starts immediately after the header.
+
+### **4.3 Properties**
+  
+*A.k.a key-value entries.*
+  
+After the node header (and optional node name), the node contains **properties**. Each property follows this format:
+  
+| Offset | Size | Field Name | Description | 
+| --- | --- | --- | --- |
+| 0 | 1 | **Key Length** | Length of the key string (in bytes). |
+| 2 | 1 | **Value Type** | Specifies the type of the value (see Section 5). | 
+| 3 | X | **Key** | UTF-8 string of `Key Length` bytes. | 
+| X+3 | Y | **Value** | The value itself, determined by `Value Type`. |
+
+The number of properties on a node is given by `Property Count` in the **Node Header**.
+
+The **total size of each property is variable**, depending on `Key Length` and `Value Type`.
+
+This can also be described as "the first two bytes are fixed, and are used to determine the length of the rest of the property".
+
+#### **4.3.1 Key Length**
+  
+Defines the length of the `Key` that follows immediately after `Value Type` at offset `2`.
+
+Since `Key Length` has a fixed size of `1` byte, the max value of this field is `255`. This effectively means that each
+
+### **4.4 Child Nodes**
+
+After all properties, the node contains **child nodes**.
+
+The number of children is given by `Child Count` in the **Node Header**.
+
+Each child node follows **immediately** after the previous node’s last property.
+
+Each child node is stored in the same format as a regular node (header → name → properties → children).
+
+This **immediate tree structure** allows rapid traversal.
+
+## **5. Value Types**
+  
+Each property has a **1-byte type specifier** before the value. Supported types:
+
+| Type ID | Type Name | Size | Description | 
+| `0x01` | `int8` | 1 byte | 8-bit signed integer |
+| `0x02` | `uint8` | 1 byte | 8-bit unsigned integer |
+| `0x03` | `int16` | 2 bytes | 16-bit signed integer |
+| `0x04` | `uint16` | 2 bytes | 16-bit unsigned integer |
+| `0x05` | `int32` | 4 bytes | 32-bit signed integer | 
+| `0x06` | `uint32` | 4 bytes | 32-bit unsigned integer |
+| `0x07` | `int64` | 8 bytes | 64-bit signed integer | 
+| `0x08` | `uint64` | 8 bytes | 64-bit unsigned integer | 
+| `0x09` | `float32` | 4 bytes | 32-bit IEEE 754 single precision float | 
+| `0x0A` | `float64` | 8 bytes | 64-bit IEEE 754 single precision float |
+| `0x0B` | `bool` | 1 byte | `0x00` = false, `0x01` = true | 
+| `0x0C` | `string` | Variable | UTF-8 encoded string (preceded by a 2-byte length prefix) |
+
+The structure and size of a property is determined by the type.
+
+Some types (strings) are more complex because of their variable size, and dedicate `2` bytes to specify the size of the string.
+This is called the length prefix. Because this prefix is fixed, the maximum size of the value of a string property is `65535` bytes.
+
+## **6. Example Structure (Hex Representation)**
+  
+Here’s an example of a configuration file storing a **named root node** with a **nameless child node** and some properties:
+
+```
+01 00                  	; Header: Version 1, Feature Flags 0
+10 00 00 00            	; Node Size (57 bytes)
+02 00                  	; Property Count = 2
+01 00                  	; Child Count = 1
+06                     	; Node Name Length = 6 ("config")
+63 6F 6E 66 69 67                   ; Node Name "config" (UTF-8 encoded)
+05 0B 73 65 74 75 70 01             ; Property 1: Key "setup" (bool = true)
+04 0C 70 61 74 68 04 00 2F 75 73 72 ; Property 2: Key "path" (string = "/usr")
+10 00 00 00				; Node Size (20 bytes)
+01 00					; Property Count = 1
+00 00					; Child Count = 0
+00						; Node Name Length = 0 (nameless node)
+05 02 6C 65 76 65 6C 03 00 00 00	; Property 1: Key "level" (uint32 = 3)
+```
+
+This example demonstrates:
+
+- **Named root node with name "config"**
+- **Two properties** (`setup = true`, `path = "/usr"`)
+- **One nameless child node with one property** (`level = 3`)
+
+### 6.1 Understanding the structure
+
+The rest of this document contains enough information to understand the data structure shown above.
+However, it might be a bit confusing if you are unfamiliar with concepts such as hexadecimal and binary data storage.
+While you do not need to understand any of this to use libraries implementing this standard, it is still good practice to "know your tools".
+And it might even be interesting, who knows.
+
+To help explain the structure above, let's look at the most complex entry: the "path" string.
+
+Here is the "path" string property from the example:
+`04 0C 70 61 74 68 04 00 2F 75 73 72`
+
+Let's break it apart to make it easier to understand:
+`04` `0C` `70 61 74 68` `04 00` `2F 75 73 72`
+
+Now let's explain each part of the property:
+
+- `04` - **Key Length:** In this case, a key length of 4. Because the key "path" is 4 bytes. *All* properties have a key length defined.
+- `0C` - **Value Type:** Controls the type of the property. Some types also have unique formatting. In this case, we specify a type of 12 (hex 0C) which corresponds to a string value.
+- `70 61 74 68` - **Key:** This is the UTF-8 key. This key has a size of 4 bytes, as specified by the *Key Length*. It has a UTF-8 value of "path". Everything in the property after the key is unique to the value type.
+- `04 00` - **String Length Prefix:** This is unique to the string value type. It is fixed to a size of 2 bytes and specifies the size of the actual string value. In this case 4 bytes.
+- `2F 75 73 72` - **String value:** The actual string. In this case "/usr". This value has a size of 4 bytes, as specified by the *String Length Prefix*.
+
+This might seem very complex just to define a string, but from a technical perspective it is actually really simple and pretty efficient.
+Implementing a similar structure as the example above in JSON would take around 100 bytes of storage, while this format only takes up 57 bytes.
+Parsing this format is also a lot more efficient compared to JSON.
+
+There are many improvements that can be made to this format in terms of parsing efficiency, by for example padding values to multiples of 2.
+But the version defined here (version 1) serves as a good foundation that can be built upon in the future.
+
+## **8. Future Extensions (Versions 2+)**
+
+This format is designed for future expansion. Examples of improvements that could be made in future versions are:
+- **Extended Feature Flags:** byte 2 of the file header can enable new features.
+- **Variable Node Size Fields:** to allow nodes >4GB, if needed in later versions.
+- **Compression Flags:** optional zlib or LZ4 support for even smaller storage.
+- **Optimized Field Sizes:** byte-aligning parts of the structure with padding to allow for logic-optimized parser implementations.
+
+And probably much more...
+
+But! This serves as a pretty solid foundation to build upon. And is a fun experiment.
+The presence of the version flag at the start of a file also enables backwards compatibility.
+
+## **9. Conclusion**
+This **SBHPF Version 1** specification ensures:
+
+- **Fast, direct access** to nodes with fixed headers and easily calculated offsets.
+- **Minimal parsing complexity** thanks to the absence of unnecessary variable-size fields.
+- **Hierarchical, structured key-value storage** with efficient traversal.
+- **Future-proofing** via a version + feature flags system.
+
+This probably sounds like "cheesy tech talk".
+In reality this is just a functional and pretty efficient binary format that was created because "why not".
+
+## **10. Document revisions**
+
+Additional revisions might be made to this version standard definition at any time.
+These revisions should never be breaking changes, as that should be defined in a new version of the format.
+These revisions are specifically intended for correcting grammar mistakes and improving readability and clearity.
+
+- **2025-02-14** - First publication.
+