In addition to the pdfs, for each dependent group of uncertain attributes present in the dependency sets,
we store its history. While dependency information (or
joint pdfs) express intra-tuple dependencies at the schema
level, history captures the inter-tuple dependencies at the
instance level. The history of a given set of uncertain attributes denotes the attribute sets from which it is derived,
and is used while performing operations on the tuples to
ensure that the results are consistent with PWS [18].
History captures dependencies among
attribute sets as a result of prior database operations. A similar concept is used in many tuple uncertainty
models to track correlations between tuples. [1] uses lineage
and [14] uses factor tables to capture such dependencies. As
we are interested in capturing historical dependencies between
attributes of tuples, our concept of dependencies is different
from this related work, which captures these dependencies on
a per tuple basis. We maintain the history of uncertain attributes by storing
the top-level ancestors of each dependency set in a tuple. The
function maps each pdf of a dependency set, to a set of pdfs
that are its ancestors, i.e. the base pdfs which are
inserted into the database by the user. We assume that the base
tuples are independent. All the derived attributes point back
to the base pdfs from which they are derived.
As an example, we show a base table and two derived tables and
in Figure 2. has two uncertain attributes and that are jointly distributed.
is derived by projecting on satisfying . is simply a projection of attribute on .
is a resulting table after joining and . As we can see from ,
the pair has a probability rather than
and the pair is not even
in the result. If we had looked at and only without checking their histories (both of them come from
and their attributes are correlated in ), we could have made mistakes about the resulting tuples after
joining the two tables. Since attribute and are historically jointly distributed, we need to
refer to the original distribution in when performing the join operation on and . The pair
has a probability according to the joint distribution in and the pair is not
a valid pair (cannot appear in any possible worlds) because it is not in the original distribution.
Figure 2:
Example illustrating the history
|
Note that the deletion of a base tuple will cause dependency
sets of its derived tuples to lose their ancestor information.
Thus, while deleting a tuple from the base table, we first
check if any other tuple in the database is referencing any
dependency set within the tuple. If there is a reference, we
delete the tuple but keep the dependency set and its pdf
as a phantom node until its reference count falls to zero.
Rohit Jain
2011-08-02