FARM: Comprehensive Data Center Network Monitoring and Management
Jérôme Graf · Pavel Chuprikov · Patrick Eugster · Patrick Jahnke
Modern data centers face growing workloads, putting accrued pressure on network monitoring solutions necessary for ensuring correct and efficient operation. Advances in network programmability have meanwhile led to yet more monitoring data being straightforwardly collected from switches, exacerbating bottlenecks in corresponding collection-centric approaches. This limits scalability and responsiveness, especially when several monitoring tasks are deployed side-by-side, as is common for network management.
We present a novel and comprehensive selection-centric solution for network monitoring and management (M&M) called FARM that significantly simplifies the development and deployment of network M&M tasks while being effective and scalable. FARM’s main novelty lies in its comprehensive design. Instead of focusing solely on individual parts of network monitoring, FARM takes a global perspective on the problem and aligns all of its components correspondingly: a strongly decentralized software architecture, a specifically designed programming model, and an integrated performance optimization framework. In short, FARM performs monitoring (re)actions locally on switches to the extent possible, using centralized components only if and when needed, and globally optimizes placement, considering placement constraints intrinsically expressed through its programming model as well as commonalities among tasks.
Deployed in a production data center, FARM shows significant gains in responsiveness (up to 3427× faster over recent generic approaches and 4× faster over highly specialized solutions), savings in network bandwidth (10000×), and computational effort. Placement optimization shows excellent scalability up to 10200 seeds across 1040 switches.
Submitted
An Algebraic Language for Specifying Quantum Networks
Anita Buckley · Pavel Chuprikov · Rodrigo Otoni · Robert Soulé · Robert Rand · Patrick Eugster
Quantum networks connect quantum capable nodes in order to achieve capabilities that are impossible only using classical information. Their fundamental unit of communication is the Bell pair, which consists of two entangled quantum bits. Unfortunately, Bell pairs are fragile and difficult to transmit directly, necessitating a network of repeaters, along with software and hardware that can ensure the desired results. To this end, we developed BellKAT, a novel specification language for quantum networks based upon Kleene algebra. To cater to the specific needs of quantum networks, we designed an algebraic structure, called BellSKA, which we use as the basis of BellKAT’s denotational semantics. BellKAT’s constructs describe entanglement distribution rules that allow for modular specification. We give BellKAT a sound and complete equational theory, allowing us to verify network protocols. We provide a prototype tool to showcase the expressiveness of BellKAT and how to verify networks in practice.
Train Once Apply Anywhere: Effective Scheduling for Network Function Chains Running on FUMES
Marcel Blöcher · Nils Nedderhut · Pavel Churpikov · Ramin Khalili · Patrick Eugster · Lin Wang
The emergence of network function virtualization has enabled network function chaining as a flexible approach for building complex network services. However, the high degree of flexibility envisioned for orchestrating network function chains introduces several challenges to support dynamism in workloads and the environment necessary for their realization. Existing works mostly consider supporting dynamism by re-adjusting provisioning of network function instances, incurring reaction times that are prohibitively high in practice. Existing solutions to dynamic packet scheduling rely on centralized schedulers and a priori knowledge of traffic characteristics, and cannot handle changes in the environment like link failures. We fill this gap by presenting FUMES, a reinforcement learning based distributed agent design for the runtime scheduling problem of assigning packets undergoing treatment by network function chains to network function instances. Our system design consists of multiple distributed agents that cooperatively work on the scheduling problem. A key design choice enables agents, once trained, to be applicable for unknown chains and traffic patterns including branching, and different environments including link failures. The paper presents the system design and shows its suitability for realistic deployments. We empirically compare FUMES with state-of-the-art runtime scheduling solutions showing improved scheduling decisions at lower server capacity.
Towards an algebraic specification of quantum networks
Anita Buckley · Pavel Chuprikov · Rodrigo Otoni · Robert Rand · Robert Soulé · Patrick Eugster
The main attributes of quantum networks are the utilization of quantum phenomena, security guarantees, and availability of their main quantum resource—entanglement. The fundamental differences between classical and quantum information will require joint efforts in physics, engineering and computer science to make quantum networks functional and scalable. A common language must be established between the hardware and software community. We envision a foundational model for quantum network programming languages. Such a model should contain the essential constructs for programming quantum networks, allow for specification and verification of end-to-end entanglement distribution, and provide guidelines for composing network protocols.
Generalized policy-based non-interference for efficient confidentiality-preservation
Shamiek Mangipudi · Pavel Chuprikov · Patrick Eugster · Malte Viering · Savvas Savvides
As more organizations are leveraging third-party cloud and edge data centers to process data efficiently, the issue of preserving data privacy becomes increasingly important. In response, numerous security mechanisms have been introduced and promoted in recent years including software-based ones such as homomorphic encryption, as well as hardware-based ones such as Intel SGX and AMD SEV. However these mechanisms vary in their security properties, performance characteristics, availability, and application modalities, making it hard for programmers to judiciously choose and correctly employ the right one for a given data query.
This paper presents a mechanism-independent approach to distributed privacy-preserving data analytics. Our approach hinges on a core programming language which abstracts the intricacies of individual security mechanisms. Data is labeled using custom privacy levels arranged along a lattice in order to capture its exact privacy constraints. High-level mappings between available mechanisms and these labels are captured through a novel expressive form of security policy. Privacy is guaranteed through a type system based on a novel formulation of non-interference, generalized to support our security policy definition. Queries written in a largely security-agnostic subset of our language are transformed to the full language to automatically use mechanisms in an efficient, possibly combined manner, while provably preserving privacy in data queries end-to-end. We prototype our approach as an extension to the popular Apache Spark analytics engine, demonstrating the significant versatility and performance benefits of our approach over single hardwired mechanisms — including in existing systems — without compromising on privacy.
Live in the express lane
Patrick Jahnke · Vincent Riesop · Pierre-Louis Roman · Pavel Chuprikov · Patrick Eugster
We introduce Express-Lane (X-Lane), a novel system for mitigating interference in data center infrastructure to improve the liveness of coordination services. X-Lane follows a novel design from the ground up to achieve interactions with ultra-low latency in the single-digit microsecond range and jitter in the nanosecond range, while the remaining interaction is treated as usual. To show X-Lane’s applicability and genericity we implemented and evaluated two services atop it on commodity hardware in a production environment of SAP SE: a failure detector (X-FD) with detection time under 10 μs and a Raft implementation (X-Raft) with latencies under 20 μs. We further show the smooth integrability of X-Lane services by replacing the replication protocol of Redis with X-Raft, making it strongly consistent while improving latency 18x and write throughput 1.5x.
Competitive buffer management for packets with latency constraints
Alex Davydow · Pavel Chuprikov · Sergey Nikolenko · Kirill Kogan
Modern datacenters are increasingly required to deal with latency-sensitive applications. Incorporation of multiple traffic characteristics (e.g., packet values and required processing requirements) significantly increases the complexity of buffer management policies. In this context two major questions arise: how to represent the latency in desired objectives and how to provide guarantees for buffer management policies that would hold across a wide variety of traffic patterns. In this work, we consider a single queue buffering architecture, where every incoming packet is prepended with intrinsic value, required processing, and slack (offset from the arrival time during which this packet should be transmitted); the buffer size is implicitly bounded by slack values. Our goal is to maximize a total transmitted value (weighted throughput). In these settings, we study worst-case performance guarantees of the proposed online algorithms by means of competitive analysis whose effectiveness is compared versus an optimal clairvoyant offline algorithm. We show non-constant general lower bounds that hold for arbitrary slack values and for slacks that are additively separated from processing requirements; for the case of a multiplicative separation, we present a novel buffer management policy (stack with priority queue) and show that it is at most 3-competitive. Our theoretical results are supported by a comprehensive evaluation study on CAIDA traces.
Formalization and taxonomy of compute-aggregate problems for cloud computing applications
Pavel Chuprikov · Alex Davydow · Kirill Kogan · Sergey Nikolenko · Alexander Sirotkin
Efficient representation of data aggregations is a fundamental problem in modern big data applications, where network topologies and deployed routing and transport mechanisms play a fundamental role in optimizing desired objectives such as cost, latency, and others. In traditional networking, applications use TCP and UDP transports as a primary interface for implemented applications that hide the underlying network topology from end systems. On the flip side, to exploit network infrastructure in a better way, applications restore characteristics of the underlying network. In this work, we demonstrate that both specified extreme cases can be inefficient to optimize given objectives. We study the design principles of routing and transport infrastructure and identify extra information that can be used to improve implementations of compute-aggregate tasks. We build a taxonomy of compute-aggregate services unifying aggregation design principles, propose algorithms for each class, analyze them theoretically, and support our results with an extensive experimental study.
PREDICAT: efficient packet classification via prefix disjointness
Pavel Chuprikov · Vitalii Demianiuk · Sergey Gorinsky
While secure efficient operation of computer networks requires cost-effective line-rate packet classification, network programmability strengthens this need. A promising approach is to transform a packet classifier to a semantically equivalent representation that supports more effective classification. This paper explores transformation of ternary classifiers to equivalent prefix representations so that classification can benefit from efficient Longest Prefix Match solutions. We propose the property of prefix disjointness and design PREDICAT, a method that leverages this new property in combination with a variety of existing techniques to convert an arbitrary ternary classifier to an equivalent prefix representation. The paper analyzes prefix disjointness and evaluates PREDICAT against state-of-the-art transformation alternatives on a packet classification benchmark in regard to the number of lookups. The evaluation shows that PREDICAT outperforms a ternary-to-binary method by an order of magnitude, improves on another ternary-to-prefix solution by a factor from 2 to 5, and preforms similarly to a ternary-to-ternary approach that requires costly power-hungry Ternary Content-Addressable Memories to efficiently handle the resulting ternary representation.
SRPT-based congestion control for flows with unknown sizes
Kirill Kogan · Alex Davydow · Sergey Nikolenko · Pavel Chuprikov · Vitalii Demianiuk
Modern datacenter transports are required to support latency constraints, usually represented by some forms of flow completion time (FCT). Most implemented congestion control mechanisms that minimize FCT are based on shortest remaining processing time (SRPT) priorities. However, SRPT-based scheduling requires prior knowledge of flow sizes, making this discipline problematic in general. Non-SRPT-based alternatives such as LAS and PIAS are able to cope with this level of uncertainty but suffer from their own limitations: LAS can lead to significant starvation of concurrent elephant flows, while PIAS requires a centralized entity for correct settings. In this work, we generalize SRPT-based scheduling to allow flows with known and unknown sizes to sojourn at the same time. We not only show analytic properties of this generalization but rigorously prove important properties of non-SRPT alternatives with competitive analysis. Based on the proposed SRPT generalization, we introduce a new aSCC congestion control. Our main goal is not to propose yet another congestion control but to identify preferable and pathological traffic patterns with unknown flow sizes for various scheduling disciplines. Our observations are validated by an extensive evaluation study.
How to network delay-sensitive applications
Pavel Chuprikov · Kirill Kogan
In this paper we study design principles of congestion control supporting low-latency applications. In this regard, we show the attractiveness of a bounded-delay model satisfying latency constraints instead of optimizing them. Under this model, we propose two transformations automatically generating policies with finite buffers from policies with infinite buffers while keeping performance guarantees. These transformations allow us to reconsider design principles of congestion control, in particular, avoiding delivery of unnecessary traffic. We study the impact of different policy properties on required buffer sizing and at the extreme case define potential ways to build reliable transports without retransmissions. In addition, we build a taxonomy of management policies for various types of extra knowledge and propose another transformation kind constructing policies that optimize weighted goodput from policies optimizing throughput while keeping performance guarantees. Our analytic results are supported by extensive evaluations demonstrating attractiveness of the proposed design principles.
Towards declarative self-adapting buffer management
Pavel Chuprikov · Sergey Nikolenko · Kirill Kogan
Buffering architectures and policies for their efficient management are one of the core ingredients of network architecture. However, despite strong incentives to experiment with and deploy new policies, opportunities for changing or automatically choosing anything beyond a few parameters in a predefined set of behaviors still remain very limited. We introduce a novel buffer management framework based on machine learning approaches which automatically adapts to traffic conditions changing over time and requires only limited knowledge from network operators about the dynamics and optimality of desired behaviors. We validate and compare various design options with a comprehensive evaluation study.
New alternatives to optimize policy classifiers
Vitalii Demianiuk · Sergey Nikolenko · Pavel Chuprikov · Kirill Kogan
ACM/IEEE Trans. on Netw.
2020
Growing expressiveness of services increases the size of a manageable state at the network data plane. A service policy is an ordered set of classification patterns (classes) with actions; the same class can appear in multiple policies. Previous studies mostly concentrated on efficient representations of a single policy instance. In this work, we study space efficiency of multiple policies, cutting down a classifier size by sharing instances of classes between policies that contain them. In this paper we identify conditions for such sharing, propose efficient algorithms and analyze them analytically. The proposed representations can be deployed transparently on existing packet processing engines. Our results are supported by extensive evaluations.
New alternatives to optimize policy classifiers
Vitalii Demianiuk · Sergey Nikolenko · Pavel Chuprikov · Kirill Kogan
Growing expressiveness of services increases the size of a manageable state at the network data plane. A service policy is an ordered set of classification patterns (classes) with actions; the same class can appear in multiple policies. Previous studies mostly concentrated on efficient representations of a single policy instance. In this work, we study space efficiency of multiple policies, cutting down a classifier size by sharing instances of classes between policies that contain them. In this paper we identify conditions for such sharing, propose efficient algorithms and analyze them analytically. The proposed representations can be deployed transparently on existing packet processing engines. Our results are supported by extensive evaluations.
Formalizing compute-aggregate problems in cloud computing
Pavel Chuprikov · Alex Davydow · Kirill Kogan · Sergey Nikolenko · Alexander Sirotkin
Efficient representation of data aggregations is a fundamental problem in modern big data applications, where network topologies and deployed routing and transport mechanisms play a fundamental role to optimize desired objectives: cost, latency, and others. We study the design principles of routing and transport infrastructure and identify extra information that can be used to improve implementations of compute-aggregate tasks. We build a taxonomy of compute-aggregate services unifying aggregation design principles, propose algorithms for each class and analyze them.
Personal insights on three research directions in networked systems
Kirill Kogan · Sergey Nikolenko · Vitalii Demianiuk · Pavel Churpikov · Alex Davydow
In this work, we draw from our research and industry experience in the design of networked systems and define research directions that can simplify management and better exploit network infrastructure. We introduce problems both on data and control planes related to the design of a single network element and network-wide behaviors, concentrating on three directions: processing a single packet with packet classifiers, processing streams of packets in network switches, and network-wide control plane optimization. In particular, we consider efficient representations of packet classifiers, expressive implementations of buffer management policies, the composition of heterogeneous control planes, network virtualization, and extension of the network stack to support interactive applications. For the considered research directions outlined here, we formulate problems that, we believe, are important in the design of network systems. The purpose of this work is to attract system researchers to specific problems introduced in this paper.
How to implement complex policies on existing network infrastructure
Pavel Churpikov · Kirill Kogan · Sergey Nikolenko
Transport networks satisfy requests to forward data in a given topology. At the level of a network element, forwarding decisions are defined by flows. To implement desired data properties during forwarding, a network operator imposes economic models by applying policies to flows, ideally without dealing with underlying resource constraints. Policy splitting over multiple network elements under resource constraints is a hard optimization problem [6, 7]. We discuss limitations of the proposed methods and existing Boolean minimization techniques. The major contribution of this work is an optimal solution with linear time complexity at the price of a single bit forwarded in every packet. The results are supported by a comprehensive evaluation study that compares previous and currently proposed methods.
Priority queueing for packets with two characteristics
Pavel Churpikov · Sergey Nikolenko · Kirill Kogan
ACM/IEEE Trans. on Netw.
2018
Modern network elements are increasingly required to deal with heterogeneous traffic. Recent works consider processing policies for buffers that hold packets with different processing requirements (number of processing cycles needed before a packet can be transmitted out) but uniform value, aiming to maximize the throughput, i.e., the number of transmitted packets. Other developments deal with packets of varying value but uniform processing requirement (each packet requires one processing cycle); the objective here is to maximize the total transmitted value. In this paper, we consider a more general problem, combining packets with both nonuniform processing and nonuniform values in the same queue. We study the properties of various processing orders in this setting. We show that in the general case, natural processing policies have poor performance guarantees, with linear lower bounds on their competitive ratio. Moreover, we show several adversarial lower bounds for every priority queue and even for every online policy. On the positive side, in the special case when only two different values are allowed, 1 and V , we present a policy that achieves competitive ratio (1+(W+2/V)), where W is the maximal number of required processing cycles. We also consider copying costs during admission.
General ternary bit strings on commodity longest-prefix-match infrastructures
Pavel Churpikov · Kirill Kogan · Sergey Nikolenko
Ternary Content-Addressable Memory (TCAM) is a powerful tool to represent network services with line-rate lookup time. There are various software-based approaches to represent multi-field packet classifiers. Unfortunately, all of them either require exponential memory or apply additional constraints on field representations (e.g, prefixes or exact values) to have line-rate lookup time. In this work, we propose alternatives to tcam and introduce a novel approach to represent packet classifiers based on ternary bit strings (without constraining field representation) on commodity longest-prefix-match (LPM) infrastructures. These representations are built on a novel property, prefix reorderability, that defines how to transform an ordered set of ternary bit strings to prefixes with lpm priorities in linear memory. Our results are supported by evaluations on large-scale packet classifiers with real parameters from ClassBench; moreover, we have developed a prototype in P4 to support these types of transformations.
Planning in compute-aggregate problems as optimization problems on graphs
Pavel Chuprikov · Alex Davydow · Kirill Kogan · Sergey Nikolenko · Alexander Sirotkin
Efficient representation of data aggregations is a fun-damental problem in modern big data applications. We presenta formalization of compute-aggregate planning parameterized bythe aggregation function.
Throughput optimization with latency constraints
Alex Davydow · Pavel Chuprikov · Sergey Nikolenko · Kirill Kogan
Modern datacenters are increasingly required to deal with latency-sensitive applications. A major question here is how to represent latency in desired objectives. Incorporation of multiple traffic characteristics (e.g., packet values and required processing requirements) significantly increases the complexity of buffer management policies. In this work, we consider weighted throughput optimization (total transmitted value) in the setting where every incoming packet is branded with intrinsic value, required processing, and slack (an offset from the arrival time when a packet should be transmitted), and the buffer is unbounded but effectively bounded by slacks. The main result is a 3-competitive algorithm as the slack-to-work ratio increases. Our results supported by a comprehensive evaluation study on CAIDA network traces.
Verified operational transformation for trees
Sergey Sinchuk · Pavel Chuprikov · Konstantin Solomatov
Operational transformation (OT) is an approach to concurrency control in groupware editors first proposed by C. Ellis and S. Gibbs in 1989. Google Wave and Google Docs are examples of better known OT-based systems and there are many other experimental ones described in the literature. In their recent articles A. Imine et al. have shown that many OT implementations contain mistakes and do not possess claimed consistency properties.
The present work describes an experimental library which is based on SSReflect/Coq and contains several operational transformation algorithms and proofs of their correctness.
On demand elastic capacity planning for service auto-scaling
Pavel Churpikov · Sergey Nikolenko · Kirill Kogan
Cloud computing allows on demand elastic service scaling. The capability of a service to predict resource requirements for the next operational period defines how well it will exploit the elasticity of cloud computing in order to reduce operational costs. In this work, we consider a capacity planning process for service scale-out as an online pricing model. In particular, we study the impact of buffering service requests on revenues in various settings with allocation and maintenance costs. In addition, we analyze the incurred latency implied by buffering service requests. We believe that our insights will allow to significantly simplify predictions and mitigate the unknowns of future demands on resources.
Priority queueing with multiple packet characteristics
Pavel Churpikov · Sergey Nikolenko · Kirill Kogan
Modern network elements are increasingly required to deal with heterogeneous traffic. Recent works consider processing policies for buffers that hold packets with different processing requirement (number of processing cycles needed before a packet can be transmitted out) but uniform value, aiming to maximize the throughput, i.e., the number of transmitted packets. Other developments deal with packets of varying value but uniform processing requirement (each packet requires one processing cycle); the objective here is to maximize the total transmitted value. In this work, we consider a more general problem, combining packets with both nonuniform processing and nonuniform values in the same queue. We study the properties of various processing orders in this setting. We show that in the general case natural processing policies have poor performance guarantees, with linear lower bounds on their competitive ratio. Moreover, we show an adversarial lower bound that holds for every online policy. On the positive side, in the special case when only two different values are allowed, 1 and V, we present a policy that achieves competitive ratio (1+W+2/V), where W is the maximal number of required processing cycles. We also consider copying costs during admission.