Abstract | As a result of decades of research and industrial development, mod-ern query optimizers are complex software artifacts. However, the
quality of the query plan chosen by an optimizer is largely deter-
mined by the quality of the underlying statistical summaries. Small
selectivity estimation errors, propagated exponentially, can lead to
severely sub-optimal plans. Modern optimizers typically maintain
one-dimensional statistical summaries and make the attribute value
independence and join uniformity assumptions for efficiently esti-
mating selectivities. Therefore, selectivity estimation errors in to-
day’s optimizers are frequently caused by missed correlations be-
tween attributes. We present a selectivity estimation approach that
does not make the independence assumptions. By carefully using
concepts from the field of graphical models, we are able to fac-
tor the joint probability distribution of all the attributes in the data-
base into small, usually two-dimensional distributions. We describe
several optimizations that can make selectivity estimation highly
efficient, and we present a complete implementation inside Post-
greSQL’s query optimizer. Experimental results indicate an order
of magnitude better selectivity estimates, while keeping optimiza-
tion time in the range of tens of milliseconds.
|