Filtering of UniRec messages
Introduction
To filter UniRec data, there is an universal NEMEA module called unirecfilter
(it can be found at https://github.com/CESNET/Nemea-Modules/tree/master/unirecfilter).
The unirecfilter
module can work with filter expression given by -F
parameter from command line, or with multiple expressions (and multiple output
IFC) given in a text file passed by -f
.
Filter can contain any UniRec field and data type (except UniRec arrays), and
supports all traditional comparison operators with optimized IN
operator.
General filter format is: FIELD OPERATOR VALUE
UniRec messages that match the given filter are forwarded via output IFC, others are dropped.
unirecfilter
allows to specify output UniRec template -O
, so the output can contain only a subset of fields.
It is also possible to specify fields that do not need to be present in the input UniRec template; such fields can be set to default value in the output template (e.g., -O "ipaddr SRC_IP, ipaddr DST_IP, uint16 SRC_PORT, uint16 DST_PORT, string MESSAGE=\"this_is_message\""
).
unirecfilter
can be stopped after receiving N messages (-c N
).
Filter
Filter is a logical expression composed of terms joined together by
logical operators, possibly with the use of brackets ((
, )
) and negation (NOT
). Term is a
triplet field cmp_operator value
, e.g. FOO == 1
or SRC_IP == 10.0.0.1
.
Keywords
It is possible to use the following keywords instead of actual UniRec field name from input template. The keywords are a shortcut (and simplification) for compound expressions:
host
- stands forSRC_IP
orDST_IP
port
- stands forSRC_PORT
orDST_PORT
The semantics can be explain using the following example of filter:
host == 192.168.0.0/24
which returns true if and only if:
SRC_IP == 192.168.0.0/24 or DST_IP == 192.168.0.0/24
Note: it means any IP address in the range 192.168.0.0-192.168.0.255 matches.
Operators
Available comparison operators are:
>
greater than<
lesser than>=
,=>
greater than or equal<=
,=<
lesser than or equal=
,==
equal/matches subnet!=
,<>
not equal=~
,~=
matches regular expressionin
,IN
In-Array function, see the “In Array” section below
Available logical operators are:
||
,OR
- or&&
,AND
- and!
,NOT
- not
Data types
Almost all data types from unirec are supported:
int8
8bit signed integerint16
16bit signed integerint32
32bit signed integerint32
32bit signed integerint64
64bit signed integeruint8
8bit unsigned integeruint16
16bit unsigned integeruint32
32bit unsigned integeruint64
64bit unsigned integerchar
a single ASCII characterfloat
single precision floating point number (IEEE 754)double
double precision floating point number (IEEE 754)ipaddr
special type for IPv4/IPv6 addresses, see unirec README (note - IPv6 address in a filter loaded from a file has to be surrounded by double quotes because of syntax issues)string
variable-length array of (mostly) printable characters, surrounded by double quotesbytes
variable-length array of bytes (not expected to be printable characters), surrounded by double quotestime
UniRec timestamp, such asTIME_FIRST
,TIME_LAST
. The time column can be compared with date&time specified in format:YYYY-mm-ddTHH:MM:SS.sss
, where.sss
represents miliseconds and is an optional part (Ex.2018-01-10T21:17:00
). Note that the timestamp should be in UTC timezone.
Subnets
UniRec fields of ipaddr
type (such as SRC_IP
, DST_IP
) can be
compared with host IP addresses or with a subnet in the notation
IP/BITS
, where BITS
represents a number of leading ‘1’ bits in the
netmask.
Example: -F "SRC_IP == 10.0.0.0/8"
Note: IP
need not to be necessarily a network address (it can be any
IP address) because it is masked during the initiation of the filter
(e.g., 10.1.1.1/8
becomes 10.0.0.0
).
Operator ==
returns true if and only if an SRC_IP
belong to the
given subnet.
In Array
It is possible to abbreviate (and optimize) filter when a field is to be matched with a set of values. For example, it is possible to rewrite this:
DST_PORT == 1 or DST_PORT == 234 or DST_PORT == 123 or DST_PORT == 80 or DST_PORT == 443
into more readable this:
DST_PORT in [1, 234, 123, 80, 443]
Internally, the array (which is specified in brackets [
and ]
) is parsed, sorted,
and the filter matching is done using binary search, i.e., it is faster
according to a measurement (mainly for longer arrays).
The following UniRec types are currently supported by this In Array feature:
int8
, int16
, int32
, int64
, uint8
, uint16
, uint32
, uint64
,
ipaddr
, time
, float
, double
, and “subnets” (i.e., ipaddr
with prefix
length such as 192.168.0.0/24
)
Format
Command line
Filter specified on command line with -F
flag is a single expression
which is evaluated for the output interface. For example: -F "SRC_PORT == 23"
File
Filter specified in a file provides more flexibility and allows to use more than one output interface.
Format of the file is TEMPLATE_1:FILTER_1; ...; TEMPLATE_N:FILTER_N;
where each semicolon separated item corresponds to one output interface.
Line breaks along the separators as well as within a filter are allowed.
One-line comments starting with #
can be used.
The semicolon at the end is necessary.
Template can be empty, meaning to use the same template as on input.
Examples:
ipaddr SRC_IP,ipaddr DST_IP: SRC_IP == 1.2.0.0/16;
# ifc0: HTTP(S) traffic
:PROTOCOL == 6 && (SRC_PORT in [80, 443] || DST_PORT in [80, 443]);
# ifc1: DNS traffic
:SRC_PORT == 53 || DST_PORT == 53;
To reload the filter while unirecfilter is running, send signal SIGUSR1 (10) to the process.
Default values
You can use syntax FIELD=value in the template to specify default value used if field is not present on the input (f.e. uint32 BAR=1). Example below.
./unirecfilter -i u:test_ifc_in,u:test_ifc_out -O "ipaddr SRC_IP, ipaddr DST_IP, uint16 SRC_PORT, uint16 DST_PORT, string MESSAGE=\"this_is_message\""
Examples
./unirecfilter -i IFC_SPEC [-O TMPLT] [-F FLTR]
./unirecfilter -i IFC_SPEC [-f FILE]
Here are some examples of running unirecfilter.
“Forward only SRC_IP
, DST_IP
, SRC_PORT
, DST_PORT
fields no matter what is the input template.”
./unirecfilter -i u:test_ifc_in,u:test_ifc_out -O "ipaddr SRC_IP, ipaddr DST_IP, uint16 SRC_PORT, uint16 DST_PORT"
“Forward flow the first 100 records with `SRC_PORT higher than 20.”
./unirecfilter -i u:test_ifc_in,u:test_ifc_out -c 100 -F "SRC_PORT > 20"
“Forward flow records that have SRC_IP
or DST_IP
from the given IP prefixes, and SRC_PORT
or DST_PORT
are lower than 200 except 80, 53, 123.”
./unirecfilter -i u:test_ifc_in,u:test_ifc_out -F "host IN [195.113.232.0/24, 147.32.0.0/16, 82.208.0.0/16] and port < 200 and not port in [80, 53, 123]"
“Specify output template and forward flow the first 100 records with `SRC_PORT equal to 443 or 53. (Note: filter can be rewritten with IN operator)”
./unirecfilter -i u:test_ifc_in,u:test_ifc_out -c 100 -O "ipaddr SRC_IP, ipaddr DST_IP, uint16 DST_PORT, uint16 SRC_PORT" -F "SRC_PORT == 443 || SRC_PORT == 53"
Below is an example of config file “filter.txt” to be used with the -f
parameter, everything after #
is considered as comment:
ipaddr DST_IP,ipaddr SRC_IP:SRC_PORT == 443; #port matching number 443
uint16 SRC_PORT:SRC_PORT >= 23 || DST_PORT >= 23; #usage of or
string BAR=not_present:BAR ~= "not_present"; #regex usage
You can then use filter file as shown below:
unirecfilter -i u:test_ifc_in,u:test_ifc_out:timeout=HALF_WAIT,u:test_ifc_in,u:test_ifc_out2:timeout=HALF_WAIT -f "filter.txt"
Especially notice interface option :timeout=HALF_WAIT
. It will
enable the module to run even for interfaces that are not listening.
Meaning the module will not hang while waiting for someone to connect.
It will drop the message and move on to sending next one. These
dropped messages will count toward -c option (when this option is
enabled).