Comparing Column DBMSs with Row and Array DBMSs to Process Recursive Queries on Large Graphs

Carlos Ordonez
Houston University


Abstract

Analyzing graphs is a major problem in big data analytics. On the other hand, SQL recursive queries are a fundamental query mechanism to analyze graphs in a DBMS, whose optimization is significantly harder than traditional SPJ queries. Column DBMSs are a new class of faster DBMS, with significantly different storage and query processing mechanisms compared to row DBMSs, still the dominating DBMS technology. In this talk, we discuss the optimization of recursive queries on a column DBMS focusing on two fundamental and complementary graph problems: transitive closure and adjacency matrix multiplication. From a query processing perspective we consider the three fundamental relational operators: join, projection and selection, where projection subsumes SQL group-by aggregation. We compare recursive query processing on column, row and array DBMSs to analyze large graphs with different density and structure. We compare DBMS raw speed to evaluate recursive queries, as well as the relative impact of query optimizations within each DBMS. Our experiments show column DBMSs in general are much faster than row and array DBMSs to analyze graphs, regardless of their size, shape and connectivity. Well-known query optimizations used in row DBMSs still work well in a column DBMS. Finally, we show arrays DBMSs have an edge on dense graphs. In conclusion, column DBMSs represent a promising technology for graph analytics, competing with the ``Hadoop/noSQL" world for which there are many more systems.