SDN products are evolving fast. The release cycles can be short and more and more features are added in each cycle. This is clearly a change that network administrators weren’t used to with hardware solutions.
In this context the operational team in charge of the SDN functionnality of the platform must be confident when deploying new releases. For that matter the team must be able to test new builds for ISO functionality with the previous build, and detect possible regressions.
Of course SDN vendors are already running end to end tests on their releases but does they test your use cases ? And what if your are building yourself the SDN software ? You cannot be sure that your build passes the vendor tests. A good idea would be to integrate the vendor functionnal tests in your CI platform but it is not always possible. The tests are maybe not distributed or even runnable outside the vendor infrastructure.
At Cloudwatt we are building our version of OpenContrail, meaning OpenContrail upstream branch with backports and sometimes non upstreamed patches that are specific to our platform. As for OpenContrail functionnal tests there is repository avaiable at https://github.com/Juniper/contrail-test-ci. We tried to run these tests in our CI but it quickly becomes a nightmare. The tests are open but clearly not suitable to be run on a generic CI platform. In the end we decided to write our own functionnal tests.
The tests we want to run can be sumarized in 3 steps
As for the global objectives of the tests we want to be SDN agnostic. We also want to avoid any large customization of the VMs images like setting up agents and be able to control them. Ideally we should be able to integrate a complex customer stack and test it with minor modifications. Finally the orchestration must be simple as possible.
Instead of reinventing the wheel and to make tests as KISS as possible we are using two powerful tools:
In our tests we are not checking internals of the SDN solution, OpenContrail in our case. We’d like to keep the tests backend agnostic and in the end if the test passes we can assume that the backend is behaving correctly. Because of that we don’t need a complex setup to run the tests so they can be run simply from a laptop. Basically you need terraform and skydive being deployed on the platform. The tools are easy to deploy or install.
Terraform is quite well known. It provides a DSL to describe the infrastructure you wish to deploy on a cloud provider. In our case we are using the Openstack provider but Terraform can handle other providers as well (AWS, Azure…). The tool is quite comparable to the Heat component in the Openstack world. The advantage of Terraform over Heat is that you can do incremental updates to your infrastructure.
Skydive on the other end is quite new and not widely used yet. The project aims to provide a tool to debug and troubleshoot network infrastructures and especially SDN platforms. It provides a representation of the network topology (interfaces, links between them) and traffic capture on demand via REST apis.
In our tests we are using the ondemand capture feature to validate the traffic in the infrastructure.
So, how a test would look like with this solution ? For example, let’s have a look on a simple security group test.
The goal of the test is to validate that 2 VMs can talk to each other because the SG allows it, then after removing some rule of the SG we validate that the traffic is dropped.
First we need to describe the infrastructure to setup with Terraform.
We are booting 2 VMs (sg_vm1, sg_vm2). They are spawned in the same VN (sg_net) and both use the same security group (sg_secgroup) which allows ICMP and SSH traffic.
By using nova cloudinit API a script will be run on sg_vm1 that will run a ping to sg_vm2 as soon as the VM is booted.
Next we will write a small shell script to run a sequence of tasks. If one task fails the whole test should fail.
The tasks to run in order would be:
This is the full script with comments:
Doing this with bash isn’t probably the best option but it shows that with only a few lines we are able to have a end to end test. There is also no need for synchronization and no need to contact the VMs directly which makes things simpler. The VM gets its configuration and commands to run from the Nova metadata service then only requests to Skydive are made to ensure the traffic is behaving as it should.
Result of the script:
Relying on powerful tools makes our lives easier and so our tests. Instead of developing a complete test framework in-house to do the same we rely on tools that have good community support.
The glue between theses tools is so simple that you could rewrite the last test with some test framework in a day.
Finally, investing time on these tools is interesting because they are not just useful for tests but in a lot of other use-cases, such as debugging production environmnents when some bugs passed trough the test CI!