UniRec
2.9.3
|
UniRec is a data format for storage and transfer of simple unstructured records, i.e. sets of key-value pairs with a fixed set of keys. A record in UniRec format is similar to a C structure but it can be defined at run-time. It thus brings possibility to dynamically create structures in a statically typed language.
The main advantage of UniRec is extremely fast access to fields of a record. No parsing is needed, the fields are accessed directly from the record, almost as in a plain C struct.
In comparison with access to a struct member, just one additional memory access is needed in order to find position of the field in the record. This access is to a small table which easily fits into a CPU cache.
To create an UniRec record, a user first needs to specify a set of fields and their types - a template. Then a memory for the record is allocated and field values can be set using simple macros.
NOTE: The following text describes UniRec as used in the C or C++ language.
The example states that the types of the fields must be defined before a template can be created. If names and types of the fields are known at compile-time, they can be defined at the beginning of a *.c file as in the following example:
If the set of fields and their types is not known in advance, they may also be defined at run-time. However, access to such fields is then a little more complicated due to limitations of statically types languages (if a compiler doesn't know the type of a field, it can't create a set of instructions to read from or write into it).
An UniRec field may have one of the following types:
name | size | description |
---|---|---|
int8 | 1 | 8bit singed integer |
int16 | 2 | 16bit singed integer |
int32 | 4 | 32bit singed integer |
int64 | 8 | 64bit singed integer |
uint8 | 1 | 8bit unsigned integer |
uint16 | 2 | 16bit unsigned integer |
uint32 | 4 | 32bit unsigned integer |
uint64 | 8 | 64bit unsigned integer |
char | 1 | A single ASCII character |
float | 4 | Single precision floating point number (IEEE 754) |
double | 8 | Double precision floating point number (IEEE 754) |
ipaddr | 16 | Special type for IPv4/IPv6 addresses, see below for details |
macaddr | 6 | Special type for MAC address, see below for details |
time | 8 | Special type for precise timestamps, see below for details |
string | - | Variable-length array of (mostly) printable characters |
bytes | - | Variable-length array of bytes (not expected to be printable characters) |
Types "string" and "bytes" are the same from a machine point of view (both are of type char[] in C), the only difference is their semantics. When printing as text, "string" is usually printed directly as ASCII or UTF-8 string, "bytes" is rather interpreted as binary data and printed in hex.
A terminating null character ('\0') SHOULD NOT be included at the end of "string" values since this is specific for the C language and data in UniRec should be independent of a programming language.
Structure to store both IPv4 and IPv6 addresses and associated functions.
Structure to store MAC address and associated functions.
Structure to store timestamps and associated types, macros and function.
Name of field may be any string matching the regular expression [A-Za-z][A-Za-z0-9_]* with the following limitations:
It is RECOMMENDED that all field names are uppercase.
An UniRec record consists of field values put one after another in a specific order. There is no header. Information about the template and the size of the record must be provided by other means.
The layout of a record is given only by its template (specifying a set of fields and their types) and the following rules.
A record is divided into three sections:
Fixed-length fields in the first section are sorted by their size from largest to smallest. Fields with the same size are sorted alphabetically by their name.
The second section contains two 16bit numbers for each variable-length field - offset of the beginning of the field's data and length of the data (in bytes). The offset is counted from the beginning of the record.
The meta-information fields are sorted alphabetically by the field names.
The third section contains data of variable-length fields in an arbitrary order. The data of variable-length fields SHOULD be placed immediately one after another. There SHOULD be NO "empty spaces" between them and data of the fields SHOULD NOT overlap.
The first two sections are called the "fixed-length part" of a record, since their total size is always the same and all data are present on fixed offsets (for a given template). The last section is called "variable-length part" because its total length as well as position of individual fields may be different in each record.
The following picture shows layout of a record containing information about a HTTP connection. The template of this record contains the following fields: ipaddr SRC_IP, ipaddr DST_IP, uint16 SRC_PORT, uint16 DST_PORT, uint8 PROTOCOL, uint8 TCP_FLAGS, uint32 PACKETS, uint32 BYTES, uint16 HTTP_RSP_CODE, string HTTP_URL, string HTTP_USER_AGENT
All values, except IP and MAC addresses, are in little endian. IP and MAC addresses are treated rather as sequences of bytes than numbers, so they are left in network order, i.e. big-endian (however, they are encapsulated in a special data type and shouldn't be accessed directly so the internal format should be needed to know).
Maximal length of the record is limited to 65534 (2^16 - 2) bytes.
Templates are usually defined by a string enumerating all the fields in the template, using comma (',') as a separator of field names. Order of field names in such string is not important (since physical order of fields is given by the rules above).
Types, enums and structures defined in unirec.h.
An enum value for each of the UniRec types.
Unsigned integer type for holding field IDs. IMPLEMENTATION NOTE: ur_field_id_t = uint16_t
A structure defining an UniRec template. It contains information about which fields are present in a record with that template and how to access them. For user this is a black box, it is not needed to access the structure's members directly.
Iterator type used by ur_iter_fields function
Constants used for iteration over fields in a template, see ur_iter_fields function for details.
Maximal size of an UniRec record.
### Definition of statically-known UniRec fields.
This macro allows to define fields used in the program and their types at compile-time. This allows to access such fields in UniRec records more easily and efficiently.
This macro should be used in the beginning of each translation unit (i.e. a *.c file) if the fields used (or at least some of them) are known at compile-time.
Parameters:
There MAY be a comma after the last field name. Also, there MAY be a semicolon after the closing parenthesis at the end of the macro.
#### Example:
This macro generates code allowing to use the defined fields in ur_get, ur_set and other macros which need symbolic constants to access the fields.
For each field specified by this macro, a CPP macro is defined with F_
prefix in the name and a value of a unique numeric ID. Also, a constant F_name_T is defined with a value of the field's type (as defined in ur_field_type enum). Other internal variables and structures are defined.
If there are more than one translation unit accessing UniRec fields, the same set of fields MUST be defined using UR_FIELDS in each of them.
### Cleanup of all internal structures.
This function has to be called after all UniRec functions and macros invocations if there were some fields defined at run-time. Otherwise this function does not have any effect, because nothing has been allocated. The function is called typically during a cleanup phase before the program's end.
No UniRec function or macro can be called after a call to ur_finalize.
### Run-time definition of a field
This function allows to define a field at run-time.
Parameters: "name" - name of the new field, see description of UR_FIELDS for rules on field names. "type" - type of the new field.
If a field with the same name already exists in the internal table of defined fields and "type" is the same as the one in the table, the function just returns the ID of the field. If types does not match, a UR_E_TYPE_MISMATCH error code is returned.
If no field with "name" is present in the table of fields, a new entry is created with a new unique ID and the given name and type of the field. The new ID is returned.
Returns:
All error codes returned by this function are negative integers, ID is always non-negative.
If this function is used in a program, the function ur_finalize() has to be called after all UniRec functions and macros invocations.
NOTE: It is not necessary to define fields which were defined by UR_FIELDS. It is recommended to define all fields statically by UR_FIELDS if possible. This function is present only for cases when field names and/or types are not known until run-time.
NOTE: Fields defined by this function can be accessed using their numeric IDs only. Symbolic CPP macros are not defined, of course.
This function allows to define sef of fields at run-time.
Define new UniRec fields at run-time. It adds new fields into existing structures. If the field already exists and type is equal nothing will happen. If the type is not equal an error will be returned.
Parameters: "fc_data_fmt" - String containing types and names of fields delimited by comma. Example ifc_data_fmt: "uint32 FOO,uint8 BAR,float FOO2"
Returns:
### Undefine field
Allows to revert a previous definition of a field by ur_define_field.
Frees the ID of the given field for future re-use. The ID becomes invalid after a call to this function so the field with the given name can not be accessed any more. Note that the same ID may be assigned to another field later.
This function is not necessary in most cases. Its only purpose is to allow a re-use of field IDs since their total count is limited to 2^16-1.
After this function is used, all the templates using the undefined field have to freed and created again.
### Create UniRec template
Creates a structure describing an UniRec template with the given set of fields and returns a pointer to it.
The template should be freed by ur_free_template after is not needed any more.
Parameters:
Order of field names is not important, i.e. any two strings with the same set of field names but with different order are equivalent.
All fields MUST be previously defined, either statically by UR_FIELDS or by calls to ur_define_field.
If an error occurs and "errstr" is not NULL, it is set to a string with corresponding error message.
Returns:
### Create UniRec template for usage with Libtrap
Creates UniRec template and set this template to specified input interface (ifc).
This template will be set as a minimum set of fields to be able to receive messages. If the input interface receives superset of fields, the template will be expanded.
Creates UniRec template and set this template to specified output interface (ifc).
Set of fields of this template will be set to an output interface.
Creates UniRec template and set this template to specified input (ifc_in) and output (ifc_out) interface.
This template will be set as a minimum set of fields to be able to receive messages. If the input interface receives superset of fields, the template will be expanded and new set of fields will be set to output interface.
### Free UniRec template
Free memory allocated for a template.
Following functions are used to retrieve certain field value from UniRec record.
Parameters:
This function returns value of an appropriate type of a specific field (int, uint, ...). Because of this, the field must be a symbolic constant (i.e. "F_name") not a numerical ID. It can be used just for fixed size fields (not for string and bytes).
This function returns pointer to a value of an appropriate type. Because of this, the field must be a symbolic constant (i.e. "F_name") not a numerical ID. It can be used for both fixed-length and variable-length fields.
This function returns void pointer to a value. Field can be symbolic constant or numerical ID. It can be used for both fixed-length and variable-length fields. (This function is used for fields defined at run-time)
Function copies data of a variable-length field from UniRec record and append '\0' character. The function allocates new memory space for the string, it must be freed using free()! Field can be symbolic constant or numerical ID.
Following functions are used to set a value to specified field in a record.
Parameters:
This function assumes value of an appropriate type of a specific field (int, uint, ...). Because of this, the field must be a symbolic constant (i.e. "F_name") not a numerical ID. It can be used just for fixed size fields (not for string and bytes).
To set dynamically defined field, use ur_get_ptr_by_id() and write to that pointer.
This function is used to set variable-length fields. Field can be symbolic constant or numerical ID. For better performance use function ur_clear_varlen, before setting all variable fields in record.
Parameters:
This function will clear all variable-length fields. It can be used for better performance of setting content to variable-length fields. Use this function before setting of all the variable-length fields.
Set string to the UniRec record. Value is a C-style string, length is determined automatically by strlen() ('\0' is not included in the record)
### Size of a fixed-length, static field
Returns size of a field. Field has to be statically defined. For variable-length fields it returns -1. To get size of variable-length field use function ur_get_var_len().
### Size of variable-length field
Returns length of a variable-length field. Field can be symbolic constant or numerical ID.
### Size of a fixed-length part of a record
Returns size of a fixed-length part of a record.
### Size of a variable-length part of a record
Returns size of a variable-length part of a record.
### Size of a record
Returns total size of whole UniRec record.
### Check template's fields
Returns non-zero if field is present, zero otherwise.
### Check type of a field (variable-length or fixed-length)
Returns non-zero if field is dynamic, zero otherwise.
### ID of a field (dynamic or static)
Function returns id of a field by name of the field, or UR_E_INVALID_NAME if the name is not known.
### Create UniRec record
Allocates memory for a record with given template. It allocates N+M bytes, where N is the size of fixed-length part of the record (inferred from template), and M is the size of variable-length, which must be provided by caller.
Parameters:
### Free UniRec record
Free memory allocated for UniRec record. You can call system free() on the record as well.
### Clone record
Function creates new UniRec record and fills it with the data given by parameter. It returns Pointer to a new UniRec record.
Parameters:
### Copy fields
Copies all fields present in both templates from src to dst.
The function compares src_tmplt and dst_tmplt and for each field present in both templates it sets the value of field in dst to a corresponding value in src.
Parameters:
### Iterate over fields of a template
This function can be used to iterate over all fields of a given template. It returns ID of the next field present in the template after a given ID. If ID is set to UR_ITER_BEGIN, it returns the first fields. If no more fields are present, UR_ITER_END is returned.The order of fields is given by the order in which they are defined.
The order of fields is given by the order in which they are defined.
Parameters:
Returns ID of the next field or UR_ITER_END if no more fields are present.
Example usage:
This function can be used to iterate over all fields of a given template. It returns n-th ID of a record specified by index. If the return value is UR_ITER_END. The index is higher than count of fields in the template.
The order of fields is given by the order in the record Parameters:
Example usage: