Networks with Extended QoS


		Networks with Extended Quality of Service using Service Vectors
	Sponsor: National Science Foundation

	Findings:
		1. Decoupling End-to-End Provisioning The main feature of the new proposed QoS provisioning paradigm is that it decouples the end-to-end QoS provisioning from the service provisioning at each router, thus enhancing the end-to-end QoS granularity in the Diffserv network while maintaining the simplicity of the service implementation at each router. Based on our findings and observations it can be seen that the definitions and implementations of each service can remain the same at each core router as in the conventional Diffserv networks, and therefore the deployment of the proposed architecture can be incremental. Moreover, by providing finer end-to-end QoS, the network can provide better services to the user, and enhance its resource utilization by reducing the request dropping probability. Our numerical studies demonstrated that the proposed QoS provisioning architecture can have better service differentiation capability than the Intserv over Diffserv schemes through customization of the QoS offered to each user, reduce the request dropping probability, and provide a compatible and friendly networking environment for current TCP flows with end-to-end congestion control mechanisms. 2. QoS routing We have introduced and investigated a new problem referred to as the least hop(s) multiple additively constrained path (LHMACP) selection, which is NP-complete. We then proposed the k-shortest paths Extended Bellman-Ford (k-EB) algorithm to compute All Hops k-shortest Paths (AHKP) between a source and a destination. We have then developed an effective heuristic algorithm, based on k-EB, in finding a least hop path subject to multiple additive constraints with very low computational complexity; it achieves near 100% success ratio in finding a feasible path while minimizing its average hop count. By demonstrating that link state update schemes without considering the QoS requirements of connections cannot provide satisfactory solutions to real networks, we have introduced a novel link information update scheme, Routing-Oriented update SchEme (ROSE). ROSE has been demonstrated to greatly outperform the state of the art in terms of both protocol overhead and the accuracy of link state information. 3. Security We have proposed a network security framework, referred to as Security-enhanced Quality of Service (SQoS) with two major objectives. One objective is to offer the users elastic choices on the treatment of messages with appropriate security mechanisms with respect to their own QoS and budget requirements. Another objective is to facilitate interaction between security and QoS mechanisms in the most efficient manner by providing information to each other and performing tasks requested by each other. 4. Next-Generation Routers: Load-Balanced Combined Input-Crosspoint Buffered Switch We presented the effect of long round-trip times RTTs (RTT is the transmission delay from inputs to the core of the switch plus the transmission of flow control information from the core to the inputs in a packet switch), where the crosspoint buffer size k is such that k<RTT. We observed that switches based on buffered crossbars with dedicated crosspoint-buffer access have their maximum throughput as the ratio of k/RTT, when input ports handle a single flow with a data rate equal to the port capacity. A solution to keep the crosspoint buffer small while supporting long RTTs and high data rates is needed. We investigated a CICB switch that uses round-robin arbitration and match-based flow control under long round-trip times and high data-rate flows. We analyzed the throughput degradation as a function of the round-trip time and the crosspoint buffer size in buffered crossbar switches with dedicated connections and crosspoints. By considering the Birkhoff-Von Neumann switch, we proposed using an extra switching stage, as a load balancer, with a buffered crossbar to relax the crosspoint buffer size such that flows with high data rates can be handled when k<RTT. We show that this architecture supports high data-rate flows and RTT=kN, which results in a crosspoint-buffer of size 1/N of that in a switch without load-balancing stage. With the buffered crossbar switch with flexible access to crosspoint buffers, we created the general model where an input can decide to which crosspoint buffer a packet can be forwarded and compared to the load-balanced buffered crossbar switch. The latter switch was called as single-access switch as the load-balanced stage provides the subset of possible crosspoint buffers that can receive a packet from a given input. The results show that by increasing the number of possible crosspoint buffers for an input does not have an impact on the switch performance. In some cases, the results show that this increase of access possibilities also increases the level of contention, and therefore, the performance of the switch decreases. We plan for the following year to study the support of differentiated services and delay bounds under policed traffic under this switch. In the shared-memory switch, we found that this architecture can extend the round-trip time to twice the average crosspoint buffer size when the number of inputs is two to provide support in port-speed flows. With flows of smaller data rates, the switch provide high throughput for those with uniform distributions and slightly lower throughput than that of a switch with dedicated buffers for those with non-uniform distributions. The highlights of this switch architecture are that when compared to a shared-memory switch with speedup, our switch (with no speedup) matches the performance and therefore, makes speedup unnecessary. This reduces the cost for an actual implementation of shared-memory switches. When differentiated services are considered in the shared-memory switch, we observed that the results above hold. Therefore, the support for differentiated services does not require special features and additions to the proposed switches. Our next objective is to study traffic with quality of service parameters in this switch model. 5. Next Generation Routers: Analysis of Flow Control Mechanism as a Control System A lot of research work on AQM mechanisms has been done recently. Most of these research works focus on supporting congestion management for the transmission control protocol (TCP) flows. The objective of AQM mechanisms is to provide early congestion notification to the sources so they can reduce their sending rate to avoid packet loss produced by buffer overflow. We can consider a flow control mechanism where the parameter in observation is the queue occupancy. In this case, control theory can be applied to analyze and design queue management schemes. We analyzed an AQM-based flow-control mechanism for CICB switches. We investigated the stability margin by analyzing the relationship between the crosspoint buffer size and the round-trip time. We showed that when the ratio of the crosspoint-buffer size and the round-trip time is larger than or equal to 1/3 in our experiments, the flow control system is stable. This may provide a guideline on how to choose the crosspoint buffer size under non-negligible round trip times and under traffic with non-uniform distributions. Also, we improved the system's transient response time by almost 40% when adding an input shaper to the closed-loop system. 6. Wireless Networks We have obtained significant insights in energy performance of cooperative routing for ad hoc and ad hoc augmented hybrid networks. Our results indicate the cost of energy savings is increased delay, whose bounds and feasibility regions we have documented in detail. 7. Expanding the Explicit Endpoint Admission Control with Service Vector (EEAC-SV) scheme Before an end-to-end connection is setup, the traffic initiator sends a probing packet through the potential path, each router along the path marks the probing packet with its own QoS information. Our research has further defined the probing packet marking mechanism in such way that the QoS information marked in the probing packet yields the best chance for the traffic initiator to find the optimal route. 8. QoS Routing and Link State Update Our research discovered the accuracy of QoS Routing can be significantly improved with an appropriate link state update scheme. The history of users QoS requirements and network behavior is proven to be very useful knowledge in our design of QoS Routing and Link Sate Update for it allows us to analyze the QoS routing process with probabilistic approach. Our simulations have shown that a properly designed QoS Routing and Link State Update scheme can outperform the existing counterparts without increasing network overhead. 9. Including Security Metrics in the QoS Set Architecture and design: We have proposed an architecture that attempts to achieve two major objectives: to allow information sharing and cooperation between the security system and the QoS system, and to offer users a broad variety of security mechanisms enabled on their traffic. The network performance analysis is done by optimizing the utility functions so that the result maximizes the user's benefits while maintaining its budget and satisfying its QoS requirements. The system utility function is presented in order to optimize the overall network performance. Information assurance (IA): We have deliberated the threats that have been recently raised for the SQoS network. Several solutions have been proposed to countermeasure these threats. Our solutions provide information assurance in various aspects: authentication, authorization, access control, integrity, and confidentiality. 10. Pricing We are in progress to propose the pricing mechanism for the SQoS network. The pricing mechanism can be used as an access control mechanism that alleviate the congestion condition and enables the better network performance. 11. Crosslayer Design for QoS Support The analytic bound results in greedy forwarding can be used generically in ad hoc network design and performance evaluation. Additional constraints, such as power control, can be incorporated to study the nodal energy consumption performance. The initial results on power consumption modeling have given us new insights on the effects of node deployment, buffering, nodal sleeping behavior, relaying criteria, etc. on energy efficiency and quality-of-service performances. In fact, our work on receiver-side relay election criteria has been initiated based on the insights obtained from power consumption modeling. We believe that our results will lead to efficient ways to peer-to-peer wireless networks design. 12. Coprocessor Design The following results are obtained from Activity 8. D1: This design was written in the Verilog Hardware description Language and was synthesized using the Synplicity Pro 8.5 tool. It was mapped to the Xilinx Vertex II part: xc2v6000ff1517-6. The estimated frequency from synthesis was obtained as 158.8 MHz, and that from Place and Route was 140.805 MHz. The total number of clock cycles from the point the RTP checker module receives the information at the input to the point where the packet loss is calculated is seven. Hence, if a transmission rate of 10Gbps is assumed and the Ethernet frame size is 64 bytes, one packet will be transmitted in 51.2 ns. If each packet is processed in seven clock cycles, it will take 7.14 ns. Thus, the processing time is well within the limits of the transmission time. D2: This design was written in the Verilog Hardware description Language and was synthesized using the Synplicity Pro 8.5 tool. It was mapped to the Xilinx Vertex II part: xc2v6000ff1517-6. It was shown that the problem is tractable in the FPGA domain. D3: This design was written in the Verilog Hardware description Language and was synthesized using the Synplicity Pro 8.5 tool. It was mapped to the Xilinx Vertex II part: xc2v6000ff1517-6. The estimated frequency from synthesis was obtained as 162.3 MHz, and that from Place and Route was 132.573 MHz. There is no pre-processing of the packets; hence they will be available as such at the input to the FPGA. Assuming that the information is coming in at a rate of 10Gbps and each packet has a minimum of 64 bytes, the time for the transmission of a single packet will be about 52ns. If the FPGA is running at 130 MHZ, and has an 8-pin input bus for reading the packet, it will be reading one octet at each clock cycle. Hence, it will take about 24 clock cycles to find out if the packet belongs to the TCP/IP suite, and maybe another 30-40 cycles to get the required information from the TCP header. In the meantime, assuming that packets are coming in at 10 Gbps and have 64 bytes each, approximately 9 packets will have to be buffered by the time they can be forwarded to the FPGA. The input buffers in the design take care of storing the incoming packets. Our experimental results justify the implementation of the MPI primitives in hardware to support parallel programming in reconfigurable computing. Under continuous traffic, results for a Xilinx XC2V6000 FPGA show that the average transmission time per 32-bit word is about 1.35 clock cycles. 13. Network State Measurements Experiments were implemented to evaluate three hashing schemes for storing TCP state tuples. The results show that clustering in the hash table is maximal in linear hashing and least in double hashing. The principle of locality of reference is maximized in linear hashing, and minimum is double hashing. Quadratic hashing lies in between the two. Therefore, the number of iterations to find an empty location increases dramatically in linear hashing as the hash table gets filled up. The iterations are most evenly distributed in case of double hashing. 14. Network-based Grid Computing Our experiments and theoretical analysis demonstrated that our proposed scalability analysis framework is very robust and gives more accurate solutions than previous approaches for network-based computing environments. Efficiently using grid computing systems faces many challenges. One of them is the efficient exchange of data between distant components exacerbated by the diversity of existing protocols for communicating participants. Some protocols support communications efficiently but require pre-distributing some application-specific information to the participants. This increases the complexity of application development and deployment, and the process becomes cumbersome for large grids having substantial component diversity. Some other protocols can deal well with diversity but suffer from extremely high overheads. We proposed a communication protocol based on XML messages that combine the advantages of both kinds of protocols. It explores the self-contain property of XML to satisfy the requirement of easily integrating diverse participants while it uses compressed formats to encode numeric data in an effort to reduce overheads for information transmitted via the Internet. 15. Multiple Additively Constraints for QoS Networks We have provided a theoretical framework for selecting the cost function for multiple additively constrained QoS routing. We have found that by deploying our proposed coverage ratio as the metric for evaluating the cost function, it is advisable to use a linear cost function for the approach in which the cost of a path equals the sum of the costs of all individual links comprising the path. It should be specially noted that non-linear cost functions may be a better choice for the scenario where a path comprises only one link and, as a result, each path has only one hop. 16. Link-State Update for QoS Routing The proposed link state update policy can achieve much better performance than the three update policies, i.e., the proposed link state update policy achieves lower blocking probabilities with lower update rates than others, implying that this scheme for link state update is more practical than the equal and exponential class based link state update policies in terms of the update rate and false blocking probability of connections. 17. Improving Accuracy of Service Vectors with Packet Marking. FPGA-base realizations were designed. Experiments were implemented to evaluate three hashing schemes for storing TCP state tuples. The results show that clustering in the hash table is maximal in linear hashing and least in double hashing. The principle of locality of reference is maximized in linear hashing, and minimum is double hashing. Quadratic hashing lies in between the two. Therefore, the number of iterations to find an empty location increases dramatically in linear hashing as the hash table gets filled up. The iterations are most evenly distributed in case of double hashing. 18. Service Vectors for QoS in Wireless Networks A Service vector model for wireless ad hoc networks is developed. The proposed methodology integrates the service vector scheme at the network layer and a power minimizing delay bounded multi user scheduling approach at the data link layer. It has been demonstrated that this integrated approach facilitates considerable power savings, while at the same time achieves enhanced QoS granularity in wireless ad hoc networks. 19. Shared Memory Switch for Service Vector Networks We found that a shared-memory crosspoint-buffered switch outperforms a switch with dedicated crosspoint buffers (not shared) that has larger amounts of memory. In many of the cases observed, the shared memory switch supported the switching of a larger number of traffic classes by providing higher throughput. The switch required no speedup, which is desirable for the lag memory elements undergo when compared to switching speeds of logical-gate elements. 20. Switching and Replication of Multicast traffic The shared-memory crosspoint buffered switch shows a performance equivalent to that a combined input-crosspoint buffered switch for large switch sizes. This observation allows users to efficiently utilize memory and to reduce the total amount of memory in the buffered crossbar. This advantage is added to the property of extending the round-trip time of shared-memory switches. Furthermore, we have found that these properties are independent of using speedup in the memory elements. 21. Scaling up Scheduling Schemes for High Capacity Switches We tested our module-to-module matching scheme for input-queued Clos-network switch under different traffic patterns using computer simulation. We observed that our scheme provides high performance while using a single iteration between input and central modules. As module-to-module matching considers aggregated request, the performance of the switch is not impacted and the implementation or large size schedulers is made feasible and the resulting schedulers that are needed in this switch have the same maximum size as the largest switch module. 22. Active Probing for Measurement of Link State in QoS Networks We observed through experimentation in our testbed that existing tools for measurement of link state is rather unreliable as different methodologies may deliver different results. Our proposed scheme showed similar results to a professional measurement equipment we have available. The proposed scheme uses active probing or passive measurement. We observed that available bandwidth measurement requires large volume of packets with faster probing rate. Similarly, to diminish the monitoring load, passive measurement samples the incoming traffic. The periodic pattern samples the packets with deterministic intervals so as to simplify the process, but this causes the potential synchronization of the periodic behavior. Some characteristics of the traffic are thus lost. The random sampling such as Poisson or geometric distribution makes the length of intervals independent from each other, and yields unbiased samples.

	Back