To filter UniRec data, there is an universal NEMEA module called
(it can be found at https://github.com/CESNET/Nemea-Modules/tree/master/unirecfilter).
unirecfilter module can work with filter expression given by
parameter from command line, or with multiple expressions (and multiple output
IFC) given in a text file passed by
Filter can contain any UniRec field and data type (except UniRec arrays), and
supports all traditional comparison operators with optimized
General filter format is:
FIELD OPERATOR VALUE
UniRec messages that match the given filter are forwarded via output IFC, others are dropped.
unirecfilter allows to specify output UniRec template
-O, so the output can contain only a subset of fields.
It is also possible to specify fields that do not need to be present in the input UniRec template; such fields can be set to default value in the output template (e.g.,
-O "ipaddr SRC_IP, ipaddr DST_IP, uint16 SRC_PORT, uint16 DST_PORT, string MESSAGE=\"this_is_message\"").
unirecfilter can be stopped after receiving N messages (
Filter is a logical expression composed of terms joined together by
logical operators, possibly with the use of brackets (
)) and negation (
NOT). Term is a
field cmp_operator value, e.g.
FOO == 1 or
SRC_IP == 10.0.0.1.
It is possible to use the following keywords instead of actual UniRec field name from input template. The keywords are a shortcut (and simplification) for compound expressions:
host- stands for
port- stands for
The semantics can be explain using the following example of filter:
host == 192.168.0.0/24
which returns true if and only if:
SRC_IP == 192.168.0.0/24 or DST_IP == 192.168.0.0/24
Note: it means any IP address in the range 192.168.0.0-192.168.0.255 matches.
Available comparison operators are:
=>greater than or equal
=<lesser than or equal
~=matches regular expression
INIn-Array function, see the “In Array” section below
Available logical operators are:
Almost all data types from unirec are supported:
int88bit signed integer
int1616bit signed integer
int3232bit signed integer
int3232bit signed integer
int6464bit signed integer
uint88bit unsigned integer
uint1616bit unsigned integer
uint3232bit unsigned integer
uint6464bit unsigned integer
chara single ASCII character
floatsingle precision floating point number (IEEE 754)
doubledouble precision floating point number (IEEE 754)
ipaddrspecial type for IPv4/IPv6 addresses, see unirec README (note - IPv6 address in a filter loaded from a file has to be surrounded by double quotes because of syntax issues)
stringvariable-length array of (mostly) printable characters, surrounded by double quotes
bytesvariable-length array of bytes (not expected to be printable characters), surrounded by double quotes
timeUniRec timestamp, such as
TIME_LAST. The time column can be compared with date&time specified in format:
.sssrepresents miliseconds and is an optional part (Ex.
2018-01-10T21:17:00). Note that the timestamp should be in UTC timezone.
UniRec fields of
ipaddr type (such as
DST_IP) can be
compared with host IP addresses or with a subnet in the notation
BITS represents a number of leading ‘1’ bits in the
-F "SRC_IP == 10.0.0.0/8"
IP need not to be necessarily a network address (it can be any
IP address) because it is masked during the initiation of the filter
== returns true if and only if an
SRC_IP belong to the
It is possible to abbreviate (and optimize) filter when a field is to be matched with a set of values. For example, it is possible to rewrite this:
DST_PORT == 1 or DST_PORT == 234 or DST_PORT == 123 or DST_PORT == 80 or DST_PORT == 443
into more readable this:
DST_PORT in [1, 234, 123, 80, 443]
Internally, the array (which is specified in brackets
]) is parsed, sorted,
and the filter matching is done using binary search, i.e., it is faster
according to a measurement (mainly for longer arrays).
The following UniRec types are currently supported by this In Array feature:
double, and “subnets” (i.e.,
ipaddr with prefix
length such as
Filter specified on command line with
-F flag is a single expression
which is evaluated for the output interface. For example:
-F "SRC_PORT == 23"
Filter specified in a file provides more flexibility and allows to use more than one output interface.
Format of the file is
TEMPLATE_1:FILTER_1; ...; TEMPLATE_N:FILTER_N;
where each semicolon separated item corresponds to one output interface.
Line breaks along the separators as well as within a filter are allowed.
One-line comments starting with
# can be used.
The semicolon at the end is necessary.
Template can be empty, meaning to use the same template as on input.
ipaddr SRC_IP,ipaddr DST_IP: SRC_IP == 184.108.40.206/16;
# ifc0: HTTP(S) traffic :PROTOCOL == 6 && (SRC_PORT in [80, 443] || DST_PORT in [80, 443]); # ifc1: DNS traffic :SRC_PORT == 53 || DST_PORT == 53;
To reload the filter while unirecfilter is running, send signal SIGUSR1 (10) to the process.
You can use syntax FIELD=value in the template to specify default value used if field is not present on the input (f.e. uint32 BAR=1). Example below.
./unirecfilter -i u:test_ifc_in,u:test_ifc_out -O "ipaddr SRC_IP, ipaddr DST_IP, uint16 SRC_PORT, uint16 DST_PORT, string MESSAGE=\"this_is_message\""
./unirecfilter -i IFC_SPEC [-O TMPLT] [-F FLTR]
./unirecfilter -i IFC_SPEC [-f FILE]
Here are some examples of running unirecfilter.
DST_PORT fields no matter what is the input template.”
./unirecfilter -i u:test_ifc_in,u:test_ifc_out -O "ipaddr SRC_IP, ipaddr DST_IP, uint16 SRC_PORT, uint16 DST_PORT"
“Forward flow the first 100 records with `SRC_PORT higher than 20.”
./unirecfilter -i u:test_ifc_in,u:test_ifc_out -c 100 -F "SRC_PORT > 20"
“Forward flow records that have
DST_IP from the given IP prefixes, and
DST_PORT are lower than 200 except 80, 53, 123.”
./unirecfilter -i u:test_ifc_in,u:test_ifc_out -F "host IN [220.127.116.11/24, 18.104.22.168/16, 22.214.171.124/16] and port < 200 and not port in [80, 53, 123]"
“Specify output template and forward flow the first 100 records with `SRC_PORT equal to 443 or 53. (Note: filter can be rewritten with IN operator)”
./unirecfilter -i u:test_ifc_in,u:test_ifc_out -c 100 -O "ipaddr SRC_IP, ipaddr DST_IP, uint16 DST_PORT, uint16 SRC_PORT" -F "SRC_PORT == 443 || SRC_PORT == 53"
Below is an example of config file “filter.txt” to be used with the
-f parameter, everything after
# is considered as comment:
ipaddr DST_IP,ipaddr SRC_IP:SRC_PORT == 443; #port matching number 443 uint16 SRC_PORT:SRC_PORT >= 23 || DST_PORT >= 23; #usage of or string BAR=not_present:BAR ~= "not_present"; #regex usage
You can then use filter file as shown below:
unirecfilter -i u:test_ifc_in,u:test_ifc_out:timeout=HALF_WAIT,u:test_ifc_in,u:test_ifc_out2:timeout=HALF_WAIT -f "filter.txt"
Especially notice interface option
:timeout=HALF_WAIT. It will
enable the module to run even for interfaces that are not listening.
Meaning the module will not hang while waiting for someone to connect.
It will drop the message and move on to sending next one. These
dropped messages will count toward -c option (when this option is