Thursday, August 8, 2019

Zeek 3.0.0 RC1 released

(Note: We will update this blog posting for the final release.  Please provide feedback on anything that would be helpful to add.)



We just published a release candidate for Zeek 3.0.0—our first major release since Bro 2.0 came out in 2012. This version is quite special as it undertakes The Big Zeekification™: It is executing on the technical side of the name change that we announced last year by now renaming the tool itself, including binaries, scripts, and even some events. “Bro” is now “Zeek.” 

This name change brings some disruption for existing users, which is unavoidable for a long-term codebase where the original name had more than 20 years to proliferate into pretty much every corner. Nevertheless, we have been trying hard to maintain backwards compatibility from Zeek 3.0.0 to Bro 2.6 as much as possible to facilitate smooth upgrades. Wherever we reasonably could, we put aliases and redirects in place so that old names remain working in parallel to the new ones. When using the old names, you will in many cases see explicit deprecation warnings that point you to the places that need updating. These transition mechanisms will remain in place for the Zeek 3.0.x series. We’ll remove them with the next feature release 3.1.0 and likewise with the next long-term stable release 4.0.0, in accordance with our new release schedule.

Below is a more detailed summary of the main changes coming with the renaming. In addition, Zeek 3.0.0 comes with a number of new features as well, including:

  • New analyzers for NTP and MQTT, and extended analyzers for DNS (SPF/DNSSEC), RDP, SMB, and TLS. 
  • Support for decapsulating VXLAN tunnels.
  • Support for logging in UTF8.
  • Several extensions of the scripting language:
  • Closures for anonymous functions
    • Iteration over key/value pairs of a table through for ( key, value in t ) ...)
    • Python-style vector slicing (v[2:4])
    • A new data structure, paraglob, for efficiently matching strings against large list of globs.
  • See the NEWS file for more detailed release notes, and CHANGES for the complete list of changes

Upgrading to Zeek 3


The following summarizes the main naming-related changes that you will encounter after installing Zeek 3.0.0. Unless otherwise noted, the Bro 2.6 names and paths will continue to work with this release, but often trigger deprecation warnings.

  • The names of all executables that had “bro” in their name have changed: bro -> zeek, bro-config -> zeek-config, broctl -> zeekctlbro-cut -> zeek-cut. Zeek 3.0.0 installs wrappers under the old names that will let them continue to work.
  • The default install prefix is now /usr/local/zeek instead of /usr/local/bro. If your existing installation used the previous default and you are using the new default when upgrading, we'll symlink /usr/local/zeek to /usr/local/bro. Certain subdirectories get similar treatment: share/bro, include/bro, and lib/bro.
  • Along with BroControl becoming ZeekControl, installation directories and files with broctl in their name have changed to use zeekctl instead. However, these changes remain backwards compatible with previous Bro installations by continuing to pull from existing locations where customizations might have been made. For example, if you have a broctl.cfg file from a previous installation, installing Zeek over it will retain that file and even symlink the new zeekctl.cfg to it.
  • The new extension for Zeek scripts is .zeek. This leads to two major changes:
    • All scripts ending in .bro have been renamed to .zeek. In particular, $prefix/share/bro/site/local.bro has been renamed to local.zeek. However, if you have an existing local.bro file from a previous Bro installation—possibly with customizations made to it—Zeek will install a symlink local.zeek file that points to that pre-existing local.bro. In that case, you may want to just copy local.bro into the new local.zeek location to avoid confusion, but things should generally also work properly without intervention.
    • The search logic for the @load script directive now prefers files ending in .zeek, but will still fallback to loading a .bro file if it exists. E.g. @load foo will first check for a foo.zeek file to load and then otherwise foo.bro. Note that @load foo.bro (with the explicit .bro file suffix) prefers the opposite order: it first checks for foo.bro and then falls back to a foo.zeek, if that exists.
  • Changes affecting scripts:
    • The events bro_init, bro_done, and bro_script_loaded are now deprecated; use zeek_init, zeek_done, and zeek_script_loaded instead. Any existing event handlers for the deprecated versions will automatically alias to the new events such that existing code will not break, but their usage will emit deprecation warnings.
    • The functions bro_is_terminating and bro_version function are deprecated and replaced by functions named zeek_is_terminating and zeek_version. The old names likewise continue to work with deprecation warnings.
  • The namespace used by all the builtin plugins that ship with Zeek have changed to use Zeek::.
  • Any Broker topic names used in scripts shipped with Zeek that previously were prefixed with bro/ are now prefixed with zeek/ instead. In the case where external applications were using a bro/ topic to send data into a Bro process, a Zeek process still subscribes to those topics in addition to the equivalently named zeek/ topic. In the case where external applications were using a bro/ topic to subscribe to remote messages or query data stores, there's no backwards compatibility and external applications must be changed to use the new zeek/ topic. The NEWS have a list of the most common topic names that one may need to change.
  • The Broxygen component, which is used to generate our Doxygen-like scripting API documentation, has been renamed to Zeekygen. This likely has no breaking or visible changes for most users, except in the case one used it to generate their own documentation via the --broxygen flag, which is now named --zeekygen. Besides that, various documentation in scripts has also been updated to replace Sphinx cross-referencing roles and directives like :bro:see: with :zeek:see:.


Upgrading to the Zeek Package Manager


The external package manager switched its name as well, from bro-pkg to zkg. On PyPI, both the old bro-pkg and new zkg packages share the same code-base, so you may continue using bro-pkg if you want, but it’s easy enough to switch for sake of consistency: run pip uninstall bro-pkg && pip install zkg.  Either way, a wrapper script is provided that forwards from bro-pkg to zkg


Renaming External Packages  


It's up to a package’s maintainer whether they want to rename a package that’s been using “bro” in its name—there’s nothing about such a package name that will be incompatible with Zeek 3.0.0. If you do want to rename your package, we recommend the following process, assuming it’s hosted on GitHub:
  1. Rename your GitHub repository from bro-foo to zeek-foo. GitHub will automatically provide a redirect from the old URL to the new URL, so people who had installed a package using the old URL will still be fine going forward.
  2. Add an alias to the package’s metadata: aliases = zeek-foo bro-foo. This tells zkg that old and new names are referring to the same package, and it will create corresponding symlinks so that explicit @load bro-foo directives will continue to work. See the documentation for more on aliases.
  3. Optionally, update the depends metadata field. The special dependencies zeek and zkg are replacing bro and bro-pkg, respectively, and zkg treats them as aliases. Note, however, that existing bro-pkg installations won’t recognize the new names yet, so you might want to leave them in there to support users who have not yet upgraded. See the documentation for more.
  4. Re-register the renamed package, zeek-foo with central package source. Follow the normal directions to update your index file: remove the old URL for bro-foo and add the URL for zeek-foo.


Common Issues When Upgrading 


  • If you were running Bro as the bro user and intend to use a zeek user now, don't forget to remove/update any potential cron jobs you may have.
  • If you're installing Zeek on an old Bro host, remember to first shut down the old cluster using broctl.
  • Symptoms of overlapping Bro/Zeek installations:
    • Plugins may have failing symbol problems depending on if you run Zeek or Bro.
    • zkg packages may fail to install with an error that btest can't find init-bare.bro.  This may be caused by certain packages using an old version of the get-bro-env script or bro_dist metadata substitution in combination with having the bro-pkg/zkg configuration set to use a mismatched Bro/Zeek sourcetree. 
  • Not remembering to update zkg configuration (i.e. updating the paths in ~/.zkg/config or ~/.bro-pkg/config in case you’re now using a different source/installation path for Zeek 3.0.0)
  • Not updating PATH environment variable (to either remove an old /usr/local/bro path or to add the new /usr/local/zeek path)
  • Plugins will generally need to be recompiled for Zeek 3.0.0 (as is usually the case with new versions). Plugins that require --bro-dist have been seen to have build issues. The best solution is to switch the plugin to the new skeleton code. However, we will try to address any specific issues if you file a ticket with instructions on how to reproduce.
  • If you run the BHR scripts, you may need to change those to run as the zeek user as well as the permissions on the queue directory.
  • Not remembering to update both where an external processes (e.g. cron job) writes Intel files into the old installation tree and where the Intel configuration (e.g. Intel::read_files) expects to read such files in the case you choose to use the new default installation path. e.g. if Intel was previously written to /usr/local/bro and you now want to use /usr/local/zeek, remember to update both the Zeek configuration and whatever external process may be writing the Intel files.

Feedback 


We realize that we may have missed some places where the name change can impact existing setups. We need your help to close those gaps: if you’re running into any issues upgrading from Bro 2.6 to Zeek 3.0.0, please let us know. If it’s something that we can/should fix, please file a ticket on GitHub. If you have advice for others on how to adapt their setups, scripts, or packages, please leave a comment on this blog posting or email the Zeek mailing list. We’ll be updating this blog posting once the final 3.0.0 release comes out.


Contributors


Thanks to Mike Dopheide, Jon Siwek, and Justin Azoff for contributing to this blog posting.

Wednesday, July 31, 2019

An update on Community ID

By Christian Kreibich, Senior Engineer at Corelight


Nearly a year has passed since the introduction of the Community ID flow hashing standard, so I’d like recap the goals of the project, share an update on what has happened since, and lay out the next steps.

The Community ID aims to simplify the correlation of flow-level logs produced by multiple network monitoring applications. Without the ID, one needs to locate the required parts of the flow tuple (typically the IP address and port of each endpoint, plus the transport protocol) in each log’s rendering, combine them, and match them up. This “join” is tedious in the best case, and in corner cases (specific ICMP message types, for example) can become fairly tricky. The ID standardizes the rendering of flow tuples into hash-like strings, reducing the correlation to a simple string comparison.

The project originated out of efforts to simplify the correlation of logs produced by two of the major modern open-source network monitors: Suricata and Zeek. The former added support in version 4.1, while Zeek users can install a package that adds support from Zeek version 2.5 onward. At last year’s SuriCon in Vancouver we presented the project in more detail. Feedback was very positive and lead to a series of early adopters, including Moloch, Elastic Beats and Common Schema, HELK, and most recently MISP and VAST. Other projects have declared intention to support (such as D4 and Sysmon). A major thanks to all developers involved! They not only took on the burden of implementing the standard, they did so from non-reusable implementations and a largely informational “specification” document.

We’ve recently updated the ID’s main document to become more normative, including a pseudo code implementation. At the moment, the ID is perhaps easiest to explore via our recently released communityid Python module: it installs via pip and significantly reduces the barrier to entry, particularly in data-processing / SIEM environments. It ships with a command-line tool that reports the ID for a given flow tuple, as follows:

     $ community-id tcp 10.0.0.1 192.168.0.1 1234 80

     1:K4ienR4L7rjxkkNvuZGIZwbbphY=

Going forward, our goals are threefold:

Gather feedback and experience reports. The ID provides version support, and the community has raised several interesting ideas for future revisions. The first version is, quite literally, the simplest approach we could think of. We’re particularly curious to hear about operational use of the ID, its proneness to hash collisions, practical concerns, or creative applications. If you have any feedback, please open tickets!

Provide as many off-the-shelf implementations of the ID as possible. We recently released the communityid Python module that installs via pip and significantly reduces the barrier to entry, particularly in data-processing / SIEM environments. Several of the existing implementations look like they will be relatively straightforward to make reusable. A C library would obviously be a great way to unify and simplify existing implementations, and enable others. If you are interested in working on these, please get in touch!

Add support to more network monitoring applications. Most immediately, we’re looking to support Wireshark, with others to follow. Whether you’re considering an implementation, are actively working on one, or have a tool that you would like to see support the ID, shout!

Please feel free to explore the ID at https://github.com/corelight/community-id-spec. We look forward to your feedback.

Tuesday, July 30, 2019

Open Source Zeek Leadership Team Meeting Minutes - 26 July 2019



The open source Zeek project Leadership Team (LT) is made up of contributors from multiple organizations throughout the community. The LT acts as both a technical steering committee and governance body. You can find out more about the LT on the team page of the website.
Below are the notes from the LT meeting held on 26 July 2019.

Zeek.org Leadership Team Members (Bold indicates attendance)
  • Keith Lehigh (Chair), Indiana University
  • Johanna Amann, International Computer Science Institute/Corelight/Lawrence Berkeley National Laboratory
  • Seth Hall, Corelight
  • Vern Paxson, Corelight & University of California at Berkeley
  • Michal Purzynski, Mozilla Foundation
  • Aashish Sharma, Lawrence Berkeley Lab
  • Adam Slagell, ESnet
  • Robin Sommer, Corelight


  • Amber Graner*, Corelight, Director of Community for the Open Source Zeek Community
  • Nicole Fischer*, Creative, Graphics Design
*not a member

Agenda

  • Zeek Logo Discussion
  • ZeekWeek Tagline and SWAG Discussion
  • Other Topics

Minutes


  • Zeek Logo Discussion
    • Narrowed down the choices, gave feedback about tweaking 3 of the designs.
    • Nicole to take feedback and will present at the next LT Meeting on 9 August

  • ZeekWeek Tagline and SWAG Discussion
    • “Zeek and ye shall find” will be the ZeekWeek Tagline
    • Stickers, T-shirts, Mugs

  • Other topics
    • Consider changing the time deadline times to end in another timezone besides PT
    • Zeek Events Naming - ZeekHours, ZeekDays, ZeekWeek

Helpful Links and information:

Getting Involved: If you would like to be part of the Open Source Zeek Community and contribute to the success of the project please sign up for our mailing lists, join our IRC Channel, come to our events, follow the blog and/or Twitter feed. If you’re writing scripts or plugins for Zeek we would love to hear from you! Can’t figure out what your next step should be, just reach out. Together we can find a place for you to actively contribute and be a part of this growing community.


About Zeek (formerly Bro): Zeek is a powerful network analysis framework that is much different from the typical IDS you may know. https://www.zeek.org/

Thursday, July 25, 2019

Announcing The Zeek Package Contest - Calling All Zeek Users



Zeek Package Contest


  • Are you a Zeek user?
  • Do you enjoy writing Zeek scripts?
  • Do you like being recognized for your awesome work?
  • Do you want to make the world’s networks safer?
  • Do you like winning prizes and claiming bragging rights?
  • Do you want the opportunity to present your work at Zeek events?

If you answered, “yes” to any of the above questions, then the Zeek Package Contest sponsored by Corelight, Inc. may be just the competition for you!

This contest is intended to inspire Zeek users to demonstrate their creativity and ingenuity while winning the admiration of their peers, and giving back to the community.


What is the Zeek Package Contest?


The challenge is straightforward: Create an innovative and useful open source Zeek package that extends Zeek’s threat hunting and detection capabilities.

  • 1st place wins one free trip (hotel and airfare) to ZeekWeek 2019, $5000 cash and Zeek swag (T-shirts, stickers, etc)
  • 2nd place wins $2500 cash and Zeek swag (T-shirts, stickers, etc)
  • 3rd place wins $1000 USD cash and Zeek swag (T-shirts, stickers, etc)
  • 4th and 5th place wins $100 gift card and Zeek swag (T-shirts, stickers, etc)

The winners may also get the opportunity to present their work at future Zeek events and/or have their contributions featured on the Zeek blog.

Submissions need to be made available through the central Zeek package repository. We will evaluate them in terms of their overall functionality & quality, utility for incident responders, customizability, test coverage, and clarity of documentation. The jury will consist of Zeek core developers and other long-time Zeek community members. More details below.


Jury Members


  • Aashish Sharma (Community)
  • Jeff Atkinson (Community)
  • Johanna Amann (Corelight)
  • Justin Azoff (Corelight)
  • Nick Turley (Community)
  • Robin Sommer (Corelight)
  • Seth Hall (Corelight)
  • Vlad Grigorescu (Community)


Important Dates


  • Submission opens: August 1, 2019
  • Submission deadline: September 1, 2019
  • Notification: September 25, 2019
  • Announcement of results: ZeekWeek 2019 (October 8-11, 2019)


Contest Results


Contest results will be posted here when the results have been announced.



Rules of Engagement 


  1. The goal is to create an innovative and useful Zeek package that's compatible with the Zeek Package Manager. The focus is on Zeek scripts, not binary plugins. A package may include a plugin to support its scripts through new built-in functions (“*.bif files”). However, the contest will not consider packages with other binary functionality, such as protocol or file analyzers, log writers, input readers, etc.
  2. To submit a package to the contest, it must first be made available through the central Zeek package repository. You can then nominate it for consideration by filling out the webform. Please include with your nomination: a link to the package’s git repository, a list of authors, a short summary describing the motivation for the work, and documentation of the package’s usage. We will acknowledge receipt, and we will evaluate the version of the package as the package manager installs it at that time.
  3. All submissions must be received no later than September 1, 2019, 11:59PM PDT. The winners will be notified on September 25, 2019.
  4. Packages already included in the Zeek package repository prior to the start of this contest, 1 August 2019, will not be eligible for this contest.
  5. Submitted packages must work with the Zeek 2.6 release. They must build and install on recent, standard Linux systems. Please specify any specific OS requirements of your package, if necessary.
  6. Submitted packages must be open source. We prefer BSD licensed submissions, but will accept any OSI-approved license. By submitting an entry, you declare that you own the copyright to the source code and all related materials, and are authorized to submit it.
  7. Submissions may leverage other packages included in the Zeek package repository as dependencies as long as the package manager can resolve them during installation. They may also link against external libraries as long as their installation is clearly documented and easy to follow.
  8. The top 5 winners of the contest will get the prizes mentioned above. We reserve the right to award fewer than 5 awards if we do not receive a sufficient number of high-quality submissions.
  9. A committee of Zeek core developers and other long-time Zeek community members, chosen by Corelight, will decide the winners based on the following criteria: overall functionality & quality, utility for incident responders, customizability, test coverage, and clarity of documentation.
  10. In order to collect the cash prizes, winners will need to provide a legal picture identification and bank account information within 30 days of notification. The bank transfer will be made within two weeks after the winner is authenticated.
  11. Group entries are allowed; the prize will be paid to a person designated by the group.
  12. You may submit more than one package for the contest, but we limit awards to one per person/group.
  13. Names/aliases of the winners will be listed on the "Zeek Package Contest" web page.
  14. Zeek team members, members of the selection committee, and Corelight employees are not eligible to participate.


The Legal Stuff


In no event will Corelight be liable to you or any party entering this contest for lost profits or any form of indirect, special, incidental, or consequential damages of any character from any causes of action of any kind with respect to this contest, whether based on breach of contract, tort (including negligence), or otherwise, and whether or not you have been advised of the possibility of such damage.


More Information


If you have any questions, please contact us at contest@zeek.org.
Find out more about Zeek at: https://www.zeek.org/
Current packages list can be found at: https://packages.zeek.org/ and https://github.com/zeek/packages

The Zeek Package Contest is inspired and modeled after the Hex-Rays Plugin and Volatility contests.

Wednesday, July 24, 2019

Complacency is not an option - Freddy Dezeure to keynote ZeekWeek 2019

The Zeek Leadership Team is pleased to announce that Freddy Dezeure will keynote ZeekWeek 2019 which will take place in Seattle, Wash., Oct. 8-11, 2019.

Dezeure’s ’s keynote, “Threats are changing, so are we as defenders”, will present insights into the current attack trends used by adversaries, their motives and techniques and the challenges these create for enterprises. Dezeure will highlight changes to - and increased dependency on - our infrastructure. He will also provide an overview on what innovative new methods the threat hunting community is creating as well as share practical guidance and pointers to best practices and tools.

“The changing threat landscape requires us to continuously adapt our defenses to mitigate the risk to our organizations and the society as a whole to an acceptable level,” said Dezeure of his keynote topic. “Complacency is not an option.”

The keynote will take place Wednesday, Oct. 9 at 9:30 a.m.

Registration is open! Make sure you register soon, prices will increase on Aug. 1, 2019.

About Freddy Dezeure: Freddy Dezeure graduated from the KUL in Belgium, with a master of science in engineering in 1982. He was CIO of a private company from 1982 until 1987. He joined the European Commission in 1987 where he held a variety of management positions in administrative, financial and operational areas, in particular in information technology. He set up the EU Computer Emergency and Response Team (CERT-EU) for EU institutions, agencies and bodies in 2011 and made it into one of the most mature and respected CERTs in Europe. Until May 2017 he held the position of the Head of CERT-EU. Presently, he is an independent management consultant providing strategic advice in cybersecurity and cyber-risk management and serving as a board member and/or advisory board member for several technology companies.

About ZeekWeek: ZeekWeek (formerly BroCon) is the most important community event for users, developers, incident responders, threat hunters and architects who rely on the open-source Zeek network security monitor as a critical element in their security stack. Attending ZeekWeek is your opportunity to learn from the open-source Zeek founders, experts and enthusiasts (of all levels).

Friday, July 19, 2019

Zeke on Zeek: Working With Open-Source Zeek: Adding a Key-value For-Loop

By Zach Medley

Getting started working on Zeek can be daunting because of the sheer size of the repository. While designed reasonably, Zeek is big and a lot of reasonable design can still be a lot to handle. This blog post walks through how I added Zeek’s key-value for loop in the hope that it might make it easier for future Zeek developers to get started.

Zeek, formerly Bro, is an open-source network security monitoring tool that transforms raw traffic into rich logs, extracted files, and custom insights via a Turing-complete Zeek programming language. It’s all open source, and developed on GitHub with its community.

Defining the Problem


Before the addition of a key-value for loop in Zeek you can iterate over the items in a container with a standard range based for loop:



However, looping over tables where there are both keys and values requires a separate lookup:



This is less than ideal for both ergonomic and efficiency reasons. At its core, when Zeek does a lookup in a table, it retrieves the corresponding value as well as makes the second lookup unnecessary as Zeek user Jon points out below:




As for the syntax, Zeek’s tables can be indexed by tuples. The existing for loop supported iteration over tables with tuples by wrapping the keys in brackets and unpacking the tuple.



Christian suggested that we extend this tuple unpacking for use with key-value for loops.


Writing Tests


The testing framework Zeek uses is called btest and tests written using it are commonly called “btests.” Zeek's btests live in the testing/btest/ directory. Once you get the hang of them, they are pretty straightforward, but at first glance they can be a little confusing.

A btest usually consists of a test and a baseline. Btest works by running your test and comparing its output to a known baseline. A difference between the output and the baseline results in a failed test. In addition to cloning Zeek, you’ll need to install btest separately, as follows:

To get btest we suggest installing the development version. This will give you access to a more up-to-date btest version that the master version of Zeek may depend on. After cloning Zeek, move to the directory that it’s installed in and run:

     pip install -e aux/btest/

With btest installed, we can begin to write our tests. Zeek already has tests that cover for-loops in testing/btest/language/for.bro, so modifying that file is fine, but I chose to add a separate test file called key-value-for.bro. I wrote a couple tests for key-value for-loops and added one for iterating over tables with more than one index value because there wasn’t a test for that yet. My tests for the key-value look like this:




Note: It's important that your test has the # @TEST-EXEC … line on the top. If you don’t, btest won't know what command to use to run the test. In this case, our btest involves running Zeek on the following content, and a subsequent diff compares to our baseline of expected output.

With the test written, you’ll now have to add a baseline so that btest knows what the desired output should be. The best way to create a btest is fairly nebulous as there are many ways that will work well. Ultimately though, once you find a way you like, and as long as in the end you’re left with a working test, it’s likely fine.

The easiest way to create a simple btest is to replace the test script with some ad-hoc script that produces the same output. For the above we might replace it with some print statements that produce the desired output. Then you can go ahead and run the test with the -U parameter, which will prompt you to make a baseline. Once that’s done, don't forget to go back and change the script back to the one you want to test.

For more complicated tests, though, this ad-hoc method can get troublesome. Here, Christian suggests running the real test, letting it fail, then copying the “out” file it creates over to the baseline directory.

More or less in line with Christian’s suggestion, I created my btests by moving to the /btest/Baseline/ directory. Here I created a new folder with the name <the btest folder your test is in>.<the name of your test file>. For example, my tests were named key-value-for.bro and in the btest/language folder, so I added a folder to the btest/Baseline folder called language.key-value-for. Inside of your new folder add a file called out, and write whatever the expected output of your test is. My out file looks like this:



Now we can run our test and see if it fails. To run the test, first build and install Zeek by running

     ./configure

     make

     make install


Then, change back to the ./btest directory and run:

     btest -d language/key-value-for.bro

Writing Code


Adding new language functionality in Zeek can be done in a couple of simple steps:

Modify parse.y so that the new syntax is recognized and handled properly;

Write the underlying C++ code to make it all work. We’ll start by writing the code to parse the new for-loop.

Parsing


Zeek uses lex and yacc to generate its parser. The part that we’re concerned with can be found in src/parse.y. Specifically, we’re interested in the part that parses the for statement, underneath for_head:



I’ll walk through this code to give an overview of how it works, and then show the new parsing rules for a key-value for-loop.

TOK_FOR ‘(‘ TOK_ID TOK_IN expr ‘)’

Indicates the type of syntax that the following code deals with. Each of the tokens is represented below as a positional number, with TOK_FOR corresponding to the number 1 and ‘)’ corresponding to the number 6.

set_location(@1, @6);

When Zeek is parsed, objects can be associated with a location. For more information on the utility of this, see Bison’s page here. For a little more on how a location is represented, see src/Obj.h.

ID* loop_var = lookup_ID($3, current_module.c_str());

In this case, $3 refers to TOK_ID. Here we get loop_var’s previous definition if it already exists in the current module.





This is the meat of the parse phase. Here, if loop_var already has a definition, we make sure that it is not a global variable. Otherwise, we initialize it.

$$ = new ForStmt(loop_vars, $5);

Finally, we build a new for-statement, and $5, which refers to the thing we’re iterating through.

My implementation follows the basic for-loop’s parsing procedure very closely and calls an alternate version of the constructor that I’ll discuss next.




Core Functionality


In order to preserve as much of the original for-loop’s functionality as possible, I opted to write an alternate constructor for the for-loop that included a variable for values to be stored in as the loop moves through the table. The constructor first calls the regular for-loop constructor on the loop variables and expression, and then runs some additional code to verify the type of the value variable.

The most interesting part of the for-loop is the actual looping. This is done in the DoExec part of the for-loop in src/Stmt.cc.




We’re only interested in the part of the for-loop that deals with looping over tables because they are the only data type supported by key value for-loops. This code is mostly self explanatory with the exception of the usage of Ref() and Unref().

Zeek uses reference counting under the hood to clean up objects when they’re done being used. If you’re familiar with modern C++, this is the same way that shared_ptr works. Each object keeps track of how many references it has, if that number drops to zero, Zeek will clean it up. Whenever we’re setting an element in a frame we need to call Ref() on it. This increases the reference count in the frame, indicating that something needs to use that value until some time in the future when Unref() is called on it.

Keeping track of reference counting in Zeek can be quite difficult to get the hang of and lead to hard to track down bugs. Take care when using a value after passing it elsewhere and if you get a segfault, this is often the cause. Debuggers like gdb and tools like valgrind can be useful to help track down what it was that got deleted.

Conclusion


The addition of key-value for loops to Zeek make the process of iterating over a table simpler and more performant:



When possible, key-value for loops should be preferred to regular loops over tables.

If you’re interested in contributing to Zeek there is no bar to entry. For C and C++ people, the Zeek core is a great place to get your feet wet developing a scripting language. You can also get involved just writing Zeek. Much of Zeek is written in Zeek. Even if you don’t program much, I wrote the README so I’m sure it's got a couple spelling and grammar errors.
No matter how you do it, working on Zeek can be an incredibly rewarding experience. It's fun, challenging, educational, and keeps the world’s networks safe.


Helpful Links and information:

Getting Involved: If you would like to be part of the Open Source Zeek Community and contribute to the success of the project please sign up for our mailing lists, join our IRC Channel, come to our events, follow the blog and/or Twitter feed. If you’re writing scripts or plugins for Zeek we would love to hear from you! Can’t figure out what your next step should be, just reach out. Together we can find a place for you to actively contribute and be a part of this growing community.

About Zeek (formerly Bro): Zeek is a powerful network analysis framework that is much different from the typical IDS you may know. https://www.zeek.org/

Thursday, July 18, 2019

People of Zeek Interview Series - Introducing Fatema Bannat Wala

As we gear up for ZeekWeek 2019, I wanted to introduce you to Fatema Bannat Wala an active Zeek community member, who I had the chance to meet earlier this year at the 2019 Open Source Zeek European Workshop that was held in Geneva, Switzerland at CERN. Fatema is a frequent speaker at Zeek events including BroCon(now ZeekWeek) and multiple Zeek workshops. She recently presented a talk about the weird.zeek log file at CERN and she was a panelist in a discussion about how Zeek is being used to secure university networks. Her excitement about sharing what she learns about Zeek is so contagious that I asked if she’d do a blog post series about weird.log--the weird.log file comes from the weird.bro script which helps users detect unexpected network level activity. To kick off that series, I spoke with her about her work with Zeek and the “Weirds” log and I am pleased to introduce her to the Zeek Community in today’s Q&A blog post.

Amber Graner (AG): Fatema, thank you so much for taking the time to answer my questions and let the community know who you are and what it is about Zeek and the weird.log files that interest you. Can you take a moment to tell the community a little about yourself and what a typical day is like for you?

Fatema Bannat Wala (FW): Firstly, thanks Amber for giving me this opportunity and platform to share my ideas and knowledge with the community. I truly appreciate your efforts! I currently work as a security engineer at the University of Delaware’s Security Operations team, where I got introduced to then Bro, now Zeek. It has been a terrific journey so far working with the team and learning new stuff every day at my job. That’s what keeps me motivated and going throughout the day. My daily activities vary as per the need of the hour, working with the Security Information and Event Management (SIEM) team primarily, firewalls, monitoring and enhancing the intrusion detection in the network traffic using Zeek Network Security Monitor (NSM), and other security tools.

AG: What drew you to Zeek and how did you get involved with the project?

FW: Zeek is different was my first impression when I started playing around with it. Unlike other tools which look for the patterns in the known traffic and records them, Zeek records and scraps everything interesting and useful that passes by for you to later look at. There is a lot of information flowing around on the network, unknown to the user, and Zeek keeps records of everything it sees on the network, which you can later take a look at to draw interesting statistical analysis of the overall network traffic and what transpires on your network. It gives binocular vision to the analysts who are otherwise blind to their network traffic. There are so many use cases of Zeek that I can’t enumerate them all, but whenever I think of an issue to solve, I think: can I use Zeek to solve it? 

That thought process drew me into asking a lot of questions of the awesome Zeek community and eventually getting involved with solving some interesting use cases and sharing it with others in the form of Zeek scripts and contributions to the source code repository.

AG: What was it about the Weird logging that made you want to write documentation, give talks on it and share your knowledge?

FW: What’s weird is, as analysts we are more interested to know what unconventional activity is going on in our network apart from normal protocols traffic, and honestly the name “Weird” for one of the log files that Zeek generates attracted me towards looking into it. I asked myself, ‘what is weird in the eyes of Zeek?’ As I read through the documentation about the weird.log file, I became even more interested in looking at it because it stated that ‘unusual or exceptional activity that can indicate malformed connections, traffic that doesn’t conform to a particular protocol, malfunctioning or misconfigured hardware, or even an attacker attempting to avoid/confuse a sensor.’ This is what you should look for in the network traffic as security analysts, isn’t it?

As I researched through various weird types in our environment I was amazed and excited by the network enhancements we made just based on weird activity logged by Zeek. Those enhancements motivated me to share my research and knowledge with the community, so that if similar conditions are occurring at different networks they don’t have to reinvent the wheel, or start from scratch to find the solution.

AG: In addition to the Weird logs, what’s the most interesting thing you’ve learned about Zeek so far?

FW: Other than Weird logs, the most interesting thing about Zeek is its ability to record enough information that can be used for fingerprinting the devices or unconstrained endpoints in a university network which is practically impossible, or hard to collect via any host/client based agent. As with every semester, comes a flood of all new devices and end points that we don’t manage centrally, and to keep an eye on them (what OS is running, what kind of software fingerprint it has) we use Zeek as our passive scanner. I have bragged enough in detail about this use case of Zeek at UD and in my 2017 BroCon talk UEPtSS - Unconstrained End-Point Security System, if people are interested to learn more about it. Apart from the users’ perspective, Zeek’s Scripting and logging frameworks are among the strongest features available to customize Zeek to achieve any use case that we come up with.

AG: Can you tell the community about the “Weird” blog series we’ll be starting soon and what they can expect to learn from the series?

FW: When I started to research Weird logs, there was very limited information available online. By digging up some of the source code and looking around on the internet in various mailing lists or personal blog posts, I found the answers I was looking for. I am hoping that by sharing the analysis I have done, information I found and some basics about the Weird logs with the community will make it easily available and accessible to the community in one central location.

The series will cover some basics of Weird, where to find them and what to do with some of the noisy ones after finding them. I plan to keep the community up to date regarding the new information I find with my continued research with Weird. If people have any questions or issues about or with the Weird logs they can ask and I will be more than happy to answer those, either via this blog or on the Zeek mailing lists.

AG: For those who want to get involved in the Zeek community, what advice would you give them and where would you tell them to start?

FW: Ask questions, no matter how silly you think they are, ask them any way. This way you will get expert advice and opinions for the things you are struggling with in your Zeek playground. After that you are just going to get better with Zeek. If you have any ideas for Zeek or any use cases that you would love to have Zeek solve, then don’t hesitate to share them on Zeek mailing list/IRC channel or Zeek dev mailing lists. These are the places that are regularly watched by the awesome core team, who is the biggest factor in my success with Zeek. When I started I joined the Zeek mailing list and tried to participate in as many Zeek events like BroCons (now known as ZeekWeek) and Zeek workshops as possible. This participation gives me opportunity to meet the developers face to face, which is a chance you definitely do not want to miss.

AG: Is there anything that you’d like to share about yourself or Zeek that I haven’t asked you about?

FW: Zeek is an outcome of tremendous efforts and time dedicated by some of the brilliant minds of the industry. Making a small effort towards contributing to this amazing open source free project is very satisfying and rewarding. Thanks for giving me the opportunity to be able to be a part of this community and to contribute back. Community involvement and contributing back to the project are key factors for any open source community project that keeps it growing and flourishing. As a part of the community, I would like to say, ‘stay involved and stay connected,’ the rewards are beyond imagination!

Helpful Links and information:
Getting Involved: If you would like to be part of the Open Source Zeek Community and contribute to the success of the project please sign up for our mailing lists, join our IRC Channel, come to our events, follow the blog and/or Twitter feed. If you’re writing scripts or plugins for Zeek we would love to hear from you! Can’t figure out what your next step should be, just reach out. Together we can find a place for you to actively contribute and be a part of this growing community.

About Zeek (formerly Bro): Zeek is a powerful network analysis framework that is much different from the typical IDS you may know. https://www.zeek.org/

Thursday, June 20, 2019

Open Source Zeek - Strategic Community Goals

“Coming together is a beginning, staying together is progress, and working together is success.” 
~ Henry Ford

To all members of the Zeek community: today I’m excited to share the strategic goals I’ll be pursuing over the next year. As a reminder, I joined Corelight as Director of Community for the Zeek project a few months ago. I developed the following list after learning about the community and evaluating where it is, talking to many of you, and gathering feedback from the Zeek Leadership Team and the Corelight Founders.

Please understand, this is only a beginning. I’ll be working on other goals in the future, and would like to get your input on what you most need. But based on my prior experience supporting community efforts in the Ubuntu and Open Compute Projects, it’s often helpful to get started with infrastructure, awareness, engagement, and governance. As we work on these items, I am sure other actionable goals will move onto my plate.

If you or your organization would like to help with any of these goals or if you have questions, comments, feedback of any kind please feel free to reach out and let me know.

I look forward to collaborating with you all. Here’s to stronger communities, safer networks and many successes as we work together!

Community Goals


1. Increase Zeek Awareness - We need to drive greater awareness of Zeek in the cybersecurity / threat hunting / detection ecosystems, while also targeting adjacent open source technologies. To this end, we will:
  • Deliver a monthly newsletter (Including Zeek news/tutorials, other security news, notable CVEs, etc.)
  • Produce an editorial calendar for 2019, to include:
    • Monthly content cadence (tutorials and articles)
    • Information about new releases (including notes/demos)
    • Document editorial process  (for soliciting external contributions)
    • Rewards and incentives (for contributors)

2. Increase Engagement with the Zeek Community - We need more online and in-person engagement opportunities for the Zeek community, because there are many ways to contribute and get involved. To accomplish this, we will seek to have the following:
  • A predictable cadence of in-person meetups, training opportunities, and other events to meet and engage with the community.
  • Engagements and partnerships with adjacent technology communities.
  • Updated / reorganized documentation, tutorials as well as support channels.
  • A calendar of events for the community.
  • Definition for each major type of contribution (what tasks, what skills, what is success and how to reward and retain contributors).

3. Update Zeek Infrastructure - Last year the project was renamed Zeek (formerly Bro). Once a new logo is finalized, we need to rebrand, update, and reorganize the website - with the aim of creating a clean, easy to navigate and intuitive home where Zeek users and developers of all skill levels can go to gain knowledge and know-how. This will help us:
  • Increase brand credibility (making the website convey the same high quality of Zeek project code)
  • Gain community contributors and users (participation is a cornerstone to all successful communities)
  • Encourage contributions and project innovation

4. Design Governance Structure
- While we already have a Zeek Leadership Team (LT) and core committers, we don’t have a system that defines how people can move into either of those roles. This work will be broken down into two phases.
  • Phase 1
    • Shed more light onto the decision making process and publish notes after each LT and Zeek community meeting
    • Solicit input from the Zeek community.
  • Phase 2
    • Define processes for how to become part of the leadership and decision making bodies.

Again, thank you so much for being part of the Zeek Community!! 


Helpful Links and information:

Getting Involved: If you would like to be part of the Open Source Zeek Community and contribute to the success of the project please sign up for our mailing lists, join our IRC Channel, come to our events, follow the blog and/or Twitter feed. If you’re writing scripts or plugins for Zeek we would love to hear from you! Can’t figure out what your next step should be, just reach out. Together we can find a place for you to actively contribute and be a part of this growing community.

About Zeek (formerly Bro): Zeek is a powerful network analysis framework that is much different from the typical IDS you may know. https://www.zeek.org/

Zeke on Zeek: Paraglob

Paraglob is a data structure for quick string matching against a large set of patterns. It was originally designed by Robin Sommer, but an early, experimental implementation was slowed significantly by an internal set data structure that ran in linear time for most of its operations. As a result of a couple of these linear time operations being called together, building a paraglob took O(N2) and other operations took O(Nlog(N) time where N is the number of patterns in data structure. In this Zeke on Zeek post I’ll walk through moving paraglob to C++, and using different data structures to reduce its compile time to linear time and other operations to log(N) time.

But first, a cool looking graph summarizing some benchmarks I ran and a look ahead at the performance characteristics of a paraglob. “Queries” refers to how many strings are being matched and “patterns” refers to the number of patterns those queries are being matched against. I chose to have about 20% of the patterns match in this case. The small spikes aren’t consistent across runs, and are likely just my computer doing something else in the background. Notice how small the time increase is from running 1,000 to 20,000 queries. At the upper right paraglob is compiling a set of 10,000 patterns and running 20,000 queries on them in under 2 seconds.



The Algorithm


At its core paraglob is actually built around a relatively straightforward algorithm. For any pattern, there exists a set of words that an input must contain in order to have any hope of matching against it. For example, consider the pattern “do*”. Anything that matches against “do*” must at the very least contain the substring “do”. This can be easily extended to more complicated patterns by just breaking up the pattern on special glob syntax. For example, “dog*fish*cat” contains the substrings [“dog”, “fish”, “cat”]. We call these substrings “meta words”.

We can then reframe our problem as finding any of the meta words inside an input string and checking the patterns associated with those meta words against it. The Aho-Corasick string-searching algorithm coupled with a map from meta words to patterns solves our problem. We can summarize how paraglob works as follows:

CONSTRUCTION:
    for every input pattern:
         extract the meta words
         map them to their respective patterns
         store the meta words in the Aho-Corasick data structure

QUERYING:
    for every input string:
         get all the meta words it contains with the Aho-Corasick structure
         get candidate patterns with the map
         check those patterns against the string
         return the matches

Implementation


With the algorithm designed, paraglob’s actual implementation is fairly straightforward, but with a couple important nuances. The first lies in the fact that for a given set of patterns there is a non-zero chance that one meta word will be associated with multiple patterns. Consider for example a small pattern set [*mischiev[!o]us*, *mischevous*, *.us*, *.gov*] which might flag mischievous typosquatting and government related urls. Already this pattern set has one meta word (us) mapping to two quite different patterns.

As a result, paraglob can’t use a standard map structure which only allows for a single value for every key. The obvious solution to this is to use some sort of multimap, but in practice this proved to be unacceptably slow. Using a multimap slowed down paraglob by as much as a factor of 10 as opposed to an implementation with a standard map structure that ignores the above issue.

In order to achieve the performance offered by the latter, and still handle the association of multiple patterns with one meta word, paraglob uses a custom “ParaglobNode” class that can store a list of patterns and that is then associated with a meta word in a map. paraglobNodes also contain functionality to quickly merge patterns that they contain matching a string with an input vector. This greatly increases the speed at which paraglob is able to find patterns for an input string.

The second important nuance lies in how paraglob handles duplicate patterns. Using the same example pattern set as above, a query for “mischievious-url.uk” contains the meta words us, and mischiev. Mapping those to their respective pattern words, we get [*mischiev[!o]us*, *.us*] from us and [*mischiev[!o]us*] from mischiev. Initially it seems like we should keep these in a set so as to prevent checking the same pattern twice. As it turns out though, maintaining a set internally is much more expensive that just checking duplicate patterns and using vectors internally. The result of this is that a paraglob doesn’t remove any duplicates until the last step when the vector of matching patterns is at its smallest.

Inside Zeek


Paraglob is integrated with Zeek & provides a simple API inside of its scripting language. In Zeek, paraglob is implemented as an OpaqueType and its syntax closely follows other similar constructs inside Zeek. A paraglob can only be instantiated once from a vector of patterns and then only supports get operations which return a vector of all patterns matching an input string. The syntax is as follows:

local v = vector("*", "d?g", "*og", "d?", "d[!wl]g");
local p = paraglob_init(v);
print paraglob_get(p1, "dog");

Out:

[*, *og, d?g, d[!wl]g]

Paraglob also supports serialization, copy, and unserialization operations inside Zeek. This means that a paraglob can be sent to separate processes using Broker. Keep in mind though that copying a paraglob requires that it be recompiled and for very large paraglobs this can be an expensive operation.

While the absence of an add operation might seem strange, it stems from constraints that emerge in paraglob’s implementation. Adding a pattern to a paraglob that is already compiled requires that the paraglob be re-compiled because the Aho-Corasick tree has to be rebuilt. As a result, adding a pattern to a compiled paraglob takes the same amount of time as building a new paraglob from a vector of patterns.

While it seems reasonable that paraglob support both add and compile operations to get around this, I thought this was more likely to confuse than to provide much real benefit. People using paraglob without knowing about its performance characteristics might attempt to add to the paraglob in a loop or forget to compile it resulting in unexpectedly slow performance or errors.

With that said though, I certainly see an argument for extending the paraglob API to include add and compile operations. For use cases where there is an updating pattern set it would remove the need to keep track of a vector of patterns and a paraglob because the paraglob would maintain the vector of patterns itself. Under the hood paraglob already supports add and compile operations so adding those to Zeek would be as simple as extending ParaglobVal slightly and adding two functions to bro.bif.


Next Steps


A paraglob’s state is defined completely by the patterns inside of it. Paraglobs hold no internal state between calls, nor do they make any updates to their internal Aho-Corasick structure unless a new pattern is added. Presently, their serialization function takes advantage of this and only serializes the vector of patterns contained inside a paraglob. For unserializing, a new paraglob is built from that serialized vector of patterns, and its Aho-Corasick structure is recompiled. This recompilation is expensive though, and can take as long as 10 seconds for very long pattern sets.

Ideally, a paraglob could be serialized in such a way that the recompilation step is not needed. There exists some serialization code inside of the Boost C++ Libraries that might be useful in doing this, but due to how complicated the Aho-Corasick trie becomes when it contains a fair amount of patterns, serializing this would likely take a significant effort. Working out a clean way to serialize a Paraglob properly though would potentially result in a serious increase in its usefulness for distributing frequently changing pattern sets


Finally…


A huge thank you to Kamiar Kanani for his excellent Multifast Aho-Corasick implementation, which he allowed us to use under the BSD license for the Zeek project. Without such a well done string searching algorithm underpinning everything this would have been a much more difficult data structure to implement.


Contributed by: Zeke Medley - Website

Helpful Links and information:

Getting Involved: If you would like to be part of the Open Source Zeek Community and contribute to the success of the project please sign up for our mailing lists, join our IRC Channel, come to our events, follow the blog and/or Twitter feed. If you’re writing scripts or plugins for Zeek we would love to hear from you! Can’t figure out what your next step should be, just reach out. Together we can find a place for you to actively contribute and be a part of this growing community.

About Zeek (formerly Bro): Zeek is a powerful network analysis framework that is much different from the typical IDS you may know. https://www.zeek.org/

Tuesday, June 11, 2019

Open Source Zeek Leadership Team Meeting Minutes - 31 May 2019



The open source Zeek project Leadership Team (LT) is made up of contributors from multiple organizations throughout the community. The LT acts as both a technical steering committee and governance body. You can find out more about the LT on the team page of the website.

Below are the notes from the LT meeting held on 31 May 2019.


Zeek.org Leadership Team Members (Bold indicates attendance)

  • Keith Lehigh (Chair), Indiana University
  • Johanna Amann, International Computer Science Institute/Corelight/Lawrence Berkeley National Laboratory
  • Seth Hall, Corelight
  • Vern Paxson, Corelight & University of California at Berkeley
  • Michal Purzynski, Mozilla Foundation
  • Aashish Sharma, Lawrence Berkeley Lab
  • Adam Slagell, ESnet
  • Robin Sommer, Corelight

  • Amber Graner*, Corelight, Director of Community for the Open Source Zeek Community
         *not a member

Agenda

  • Trademark Discussion  (Amber)
  • Keynotes  (Keith)
  • Zeek Package Contest (Amber)
  • Analytics Discussion Scheduling (Keith)

Minutes

  • Trademark Discussion - The LT Discussed the current Name and Logo Usage Statement - https://www.zeek.org/documentation/marks.html Out of the discussion came the following action items to look into:
    • Create a Reciprocal Logo Usage Agreement
    • Update the Marks Usage Documentation
    • Create a standard Cease and Desist letter
  • Keynotes - LT Members will continue reaching out to potential keynote speakers for ZeekWeek 2019.
  • Zeek Package Contest - Amber brought up the Zeek Package Contest that Corelight would like to host leading up to ZeekWeek 2019. Amber to take LT feedback to the Corelight team and present the details of the program at the next LT meeting.
  • Analytics Discussion Scheduling - Keith to scheduling an additional LT meeting to discuss analytics tools for the website.

Helpful Links and information:


Getting Involved: If you would like to be part of the Open Source Zeek Community and contribute to the success of the project please sign up for our mailing lists, join our IRC Channel, come to our events, follow the blog and/or Twitter feed. If you’re writing scripts or plugins for Zeek we would love to hear from you! Can’t figure out what your next step should be, just reach out. Together we can find a place for you to actively contribute and be a part of this growing community.
About Zeek (formerly Bro): Zeek is a powerful network analysis framework that is much different from the typical IDS you may know. https://www.zeek.org/