Release v1.10.0
2023 Jul 18
Table of contents
NSM Release v1.10.0
NSM v1.10.0 has been tested on:
Changes since last release
Simplify healing - determinate reselect state for whole chain
Status: RESOLVED.
AWS: AF_XDP support
Root issue: https://github.com/networkservicemesh/cmd-forwarder-vpp/issues/859
AF_XDP socket doesn’t work on AWS cluster Logs:
Apr 3 13:24:25.406 [INFO] [cmd:vpp] libbpf: Kernel error message: veth: Peer MTU is too large to set XDP
Apr 3 13:24:25.406 [INFO] [cmd:vpp] vpp[10508]: af_xdp: af_xdp_load_program: bpf_set_link_xdp_fd(eth0) failed: Numerical result out of range
Apr 3 13:24:26.563 [ERRO] [cmd:/bin/forwarder] [duration:18.015838ms] [hostIfName:eth0] [vppapi:AfXdpCreate] VPPApiError: System call error #6 (-16)
panic: error: VPPApiError: System call error #6 (-16)
AKS: AF_XDP support
Root issue: https://github.com/networkservicemesh/cmd-forwarder-vpp/issues/859
AF_XDP socket doesn’t work on AKS cluster
Ping works only without hostNetwork: true
flag.
GKE: AF_XDP support
Root issue: https://github.com/networkservicemesh/cmd-forwarder-vpp/issues/859
AF_XDP socket doesn’t work on GKE cluster Logs:
Apr 3 05:38:16.954 [INFO] [cmd:vpp] libbpf: Kernel error message: virtio_net: XDP expects header/data in single page, any_header_sg required
Apr 3 05:38:16.954 [INFO] [cmd:vpp] vpp[10244]: af_xdp: af_xdp_load_program: bpf_set_link_xdp_fd(eth0) failed: Invalid argument
Apr 3 05:38:18.228 [ERRO] [cmd:/bin/forwarder] [duration:12.809608ms] [hostIfName:eth0] [vppapi:AfXdpCreate] VPPApiError: System call error #6 (-16)
panic: error: VPPApiError: System call error #6 (-16)
Run all integration tests in parallel
Each integration test is launched in its own namespace. If we launch several tests simultaneously they won’t interfere with each other. It will reduce time of testing on CI and also it will improve coverage and quality of the project.
Results:
Suite | Old | New | Difference |
---|---|---|---|
Basic Suite | 659.59s | 192.82s | 360.78 % |
Feature Suite | 972.89s | 283.35s | 343.35 % |
Heal Suite | 1646.46s | 1667.61s | 0 % |
Ipsec Suite | 199.73s | 38.52s | 518.51 % |
Monolith Suite | 242.04s | 258.16s | 0 % |
Memory Suite | 146.37s | 37.42s | 391.15 % |
Observability Suite | 97.84s | 89.21s | 0 % |
Rvlan Suite | 1017.88s | 1018.52s | 0 % |
Missing arm64 Linux images
Add the possible to compile & run NSM on ARM based machines.
Update go to v1.20.x
We need to update go to v1.20.x to solve issues with dependencies such like opentelemetry and also to have all benefits from latest go in NSM applications.
Generate code for running commands manually
Currently gotestmd generates only automatic tests.
A test has requirements, the main body and a cleanup.
User is expected to run tests using go test
.
There is no support to run custom commands without integrating them into the testing system.
Generated commands could be re-used in performance testing to keep performance testing up-to-date.
NSC - Add support for K8S PSS restricted/baseline profiles (for hostPath volumes)
K8S 1.25 release deprecates PSP (Pod security policy) and enables PSS (Pod Security Standards)/PSA (Pod Security Admission controller) by default. The PSS has 3 profiles - Privileged, Baseline and Restricted. NSC requires hostPath volumes for unix sockets. HostPath volumes are not permitted in Baseline/Restricted profiles.
Need a solution for NSC to work in Baseline/Restricted profiles.
Couple of options:
- NSM CSI driver : Add a NSM CSI driver plugin to mount the hostPath volume.
- Use network sockets instead of unix sockets and eliminate the need for hostPath volume.
Use counters instead of histogram for datapath metrics
Counters are more appropriate for NSM data-path metrics instead of histogram.
Automatically sync-up NSM Site
NSM site is not up to date.
TODO: Consider Hugo modules, which would allow us to ‘import’ docs from repos into site.
Replace govpp from nsm vpp appss to nsm/govpp
In NSM it’s super important to use latest patches or be able to quickly fix some issues in vpp. By this reason we need to start use govpp from nsm organization that improves control of the vpp components in the project.
Add a new example when one nsc connects to two nses with different services
Status: RESOLVED.
System stability fixes and improvements
vL3 DNS is slow when using the DNS search path
Status: RESOLVED.
TestVl3_dns is not stable
Status: RESOLVED.
Make sure that NSM candidate selection is working as fast as possible
Status: RESOLVED.
Healing after failed refresh
Status: RESOLVED.
kubectl delete
is slow in tests
Status: RESOLVED.
Policy Base Routing failure; Possible table ID collision.
Status: RESOLVED.
DNS resolution doesn’t work
Status: RESOLVED.
Interdomain tests are unstable
Status: RESOLVED.
cmd-admission-webhook-k8s generates client IDs properly only for Kind: Pod
cmd-admission-webhook-k8s generated client IDs properly only if client kind is Pod.
Kubernetes installation on Equinix Metal
Status: RESOLVED.
expire chain element with multiple registries
Status: RESOLVED.
How configuration change should be handled during healing?
Status: RESOLVED.
Community page stopped showing red arrows when PR from one repo to another failed (or not finished)
Community page stopped showing red arrows when PR from one repo to another failed (or not finished)
Add govpp arm64 ci for releases
Status: RESOLVED.
Missing log printouts in forwarder-vpp at DEBUG log level
Status: RESOLVED.
Docker push ghcr workflow works twice
These runs have the same tag - f316920
https://github.com/networkservicemesh/cmd-forwarder-vpp/actions/runs/5332639101/jobs/9662080619
https://github.com/networkservicemesh/cmd-forwarder-vpp/actions/runs/5332638380/jobs/9662079089
It looks like we don’t need workflow_run:
section
https://github.com/networkservicemesh/cmd-template/blob/main/.github/workflows/docker-push-ghcr.yml
kernel NSC with multiple NetworkServices
Status: RESOLVED.
Release project board
Table of contents