Knowledge Base/Akiban Projects/Akiban Server

How does table-grouping compare to Oracle cluster tables?

Ari Weil
posted this on December 20, 2011 11:03

Question

Oracle cluster tables sound similar to Akiban table-grouping.  Are they the same - what's the difference? 

 

Answer

At a high level, Oracle cluster tables are designed to minimize disk IO and are optimized for only a few tables per cluster.  Conversely, Akiban table-groups are specifically tuned for efficiency across the latency (memory) stack, and are optimized for any number of grouped tables.  There are two major drawbacks to Oracle's solution relative to Akiban:

  1. You cannot cluster tables where the master key is not propagated to all children.  This means that in the simplest Customer / Order / Item scenario, if the Customer table's primary key is cid, then cid must exist in the Item table in order to create an Oracle table cluster.
  2. The entire Oracle cluster is (all blocks are) scanned - even if only one row is required.  For this reason, Oracle recommends that the tables involved either be small, have relatively static data, or always support a use case whereby all child data is read. 

Akiban is more than skin deep: contrasting nesting performance

Oracle cluster tables are optimized for queries that are one-level deep. In fact, for a 3-table join (when a query requests tables that are 2+ levels deep in the cluster hierarchy) the root table's clustering columns have to be propagated and explicitly maintained, at which point the scanning effect becomes prohibitively resource-intensive (and therefore typically slow). 

If an Oracle table cluster is composed of many tables, and especially if the relationships are branching and not hierarchical, cluster access requires full scans and therefore performs poorly. As a result, the solution is seldom used because of the poor performance in accessing, inserting or updating specific child rows (again because this requires the entire cluster to be scanned). 

Caveat implementor: comparing use cases

Oracle's documented guidelines to help implementors determine whether/when to cluster tables:

Use Case Oracle Cluster Tables Akiban Table-groupsTM
Frequently joined tables Recommended Recommended
Occasionally joined tables Not recommended due to high maintenance overhead Recommended
Dynamic (frequently updated) data Not recommended due to high maintenance overhead Recommended...and encouraged.
If full table scans are common Not recommended b/c full table scans on cluster tables are more expensive than on traditional tables Recommended b/c group indexes avoid table scans, and hKeys provide more efficient scanning when a scan is required.  Akiban also has Vertical indexing optimizations that facilitate high-performance and high-efficiency scanning of column data.
In master-detail scenarios Recommended in read-mostly scenarios, and if selecting all details for a master record (b/c achieved via a full scan which is inefficient for partial detail reads, and performance-killing for inserts, updates and deletes). Oracle recommends that detail tables be created in their own clusters to avoid the aforementioned scanning. Recommended b/c detail records are accessed via a highly-efficient hKey lookup (as opposed to Oracle's need to scan entire data blocks) for reads and writes. Akiban encourages you to add as many tables to a table-group as is appropriate for your use cases.
If tables are large, or many tables are frequently joined Not recommended b/c if the same cluster key value exceeds one or two data blocks, Oracle must read all blocks containing rows with the required value(s), and that is a significant IO penalty relative to standard relational table reads. Recommended...and encouraged.  Table-grouping becomes more effective the more data is added to a table-group b/c the effect of pre-computing the joins between tables, and avoiding costly scans (in favor of hKey lookups) increases with larger data sets.
If the number of rows per cluster key varies significantly Not recommended b/c this leads to a waste of space for low cardinality key values and collisions for the high cardinality key values (collisions degrade performance). Recommended because table-groups store interleaved data compressed for optimal retrieval and manipulation.  
 

Comments

User photo
Dong Jiang

How about comparing table grouping to materialized views?It seems a more suitable feature to compare.

If I consider the join of related tables as a view, then how is table-grouping different than materializing the view? compare with fast-refresh and complete-refresh? compare with pre-built option? compare with indexing on underlying tables?

July 30, 2012 13:40