BWS: Balanced Work Stealing for Time-Sharing Multicores

Xiaoning Ding
Intel Science and Technology Center for Cloud Computing


Running multithreaded programs in multicore systems has become a common practice for many application domains. Work stealing is an effective approach for managing and scheduling the concurrent tasks in such applications. It has been widely adopted in both commercial and open-source software, such as Cilk++, Intel TBB, Microsoft TPL, and Java Fork/Join Framework. In the talk, I will first show that existing work-stealing schedulers are not effective when multiple applications time-share a single multicore, because their management of steal-attempting threads often causes unbalanced system effects that hurt both workload throughput and fairness. Then, I will present BWS (Balanced Work Stealing), a work-stealing scheduler for time-sharing multicore systems that leverages new, lightweight operating system support. BWS improves system throughput and fairness via two means: 1) BWS monitors and balances the costs (resources consumed by steal-attempting threads) and benefits (available tasks get promptly offloaded from busy threads) in a scalable and low-cost way; 2) BWS allows a steal-attempting thread to yield its core directly to a peer thread with an unfinished task, so as to retain the core for that application and put it to better use. BWS shows great potential to increases system throughput (by up to 47%) and decreases unfairness (by up to 340%). We are making efforts to merge BWS into production work-stealing libraries and systems. I will present my future research at the end of the talk.