Wednesday, July 31, 2019

An update on Community ID

By Christian Kreibich, Senior Engineer at Corelight

Nearly a year has passed since the introduction of the Community ID flow hashing standard, so I’d like recap the goals of the project, share an update on what has happened since, and lay out the next steps.

The Community ID aims to simplify the correlation of flow-level logs produced by multiple network monitoring applications. Without the ID, one needs to locate the required parts of the flow tuple (typically the IP address and port of each endpoint, plus the transport protocol) in each log’s rendering, combine them, and match them up. This “join” is tedious in the best case, and in corner cases (specific ICMP message types, for example) can become fairly tricky. The ID standardizes the rendering of flow tuples into hash-like strings, reducing the correlation to a simple string comparison.

The project originated out of efforts to simplify the correlation of logs produced by two of the major modern open-source network monitors: Suricata and Zeek. The former added support in version 4.1, while Zeek users can install a package that adds support from Zeek version 2.5 onward. At last year’s SuriCon in Vancouver we presented the project in more detail. Feedback was very positive and lead to a series of early adopters, including Moloch, Elastic Beats and Common Schema, HELK, and most recently MISP and VAST. Other projects have declared intention to support (such as D4 and Sysmon). A major thanks to all developers involved! They not only took on the burden of implementing the standard, they did so from non-reusable implementations and a largely informational “specification” document.

We’ve recently updated the ID’s main document to become more normative, including a pseudo code implementation. At the moment, the ID is perhaps easiest to explore via our recently released communityid Python module: it installs via pip and significantly reduces the barrier to entry, particularly in data-processing / SIEM environments. It ships with a command-line tool that reports the ID for a given flow tuple, as follows:

     $ community-id tcp 1234 80


Going forward, our goals are threefold:

Gather feedback and experience reports. The ID provides version support, and the community has raised several interesting ideas for future revisions. The first version is, quite literally, the simplest approach we could think of. We’re particularly curious to hear about operational use of the ID, its proneness to hash collisions, practical concerns, or creative applications. If you have any feedback, please open tickets!

Provide as many off-the-shelf implementations of the ID as possible. We recently released the communityid Python module that installs via pip and significantly reduces the barrier to entry, particularly in data-processing / SIEM environments. Several of the existing implementations look like they will be relatively straightforward to make reusable. A C library would obviously be a great way to unify and simplify existing implementations, and enable others. If you are interested in working on these, please get in touch!

Add support to more network monitoring applications. Most immediately, we’re looking to support Wireshark, with others to follow. Whether you’re considering an implementation, are actively working on one, or have a tool that you would like to see support the ID, shout!

Please feel free to explore the ID at We look forward to your feedback.

No comments:

Post a Comment