Referentially valid table listing from SQL Server

A regular task when working with test and development databases is loading test data into tables where referential constraints are present. Whilst you can often drop or suspend the constraints whilst populating tables it’s often just easier to work out a referentially valid order so that parent tables get populated before their children. This usually only applies to relatively small volumes of data because larger volumes are generally loaded via utilities which don’t go through the SQL processing engine and so ignore the referential constraints. In this case the constraints usually need to be rechecked to reactivate them before the tables can be used. The process also usually avoids logging the rows added so a backup is normally required as well.

For small volumes of data, populating tables with regular insert statements is generally easier but it does require tables to be populated in a referentially valid order.  This is something I do just about every day, copying fairly small amounts of test data around for the development team.  Years ago I wrote for DB2 a Perl script that extracts from the catalog a list of the tables in a given schema in a referentially valid order. What I didn’t have was an equivalent for SQL Server and a recent job I had needed such a list. The data model only consisted of about 120 tables but that was enough for me to write the following  function.

alter function riorder(@schema nvarchar(128))
returns @riorderTabs table
( RILevel     smallint not null
, tablename nvarchar(128) not null
)
as
begin
-- This function returns a table of RILevel and tablename for the specified schema
-- RILevel starts at 0 for tables with no RI Parents, and increases by 1 for each
-- 'generation' of tables.
-- It is possible to have "Referential loops", i.e. tabA is a parent of tabB, tabB
-- is a parent of tabC, tabC is a parent of tabA. This is unresolvable
DECLARE @schemaTabs table (tbname nvarchar(128));
DECLARE @level smallint;
DECLARE @tabsAtCurrentLevel table (tbname nvarchar(128));

-- start by getting al list of all the tables in the schema
insert into @schemaTabs
select TABLE_NAME
from information_schema.tables
where TABLE_SCHEMA = @schema
and table_type = 'BASE TABLE';

-- @level tracks the current RI level, we start at 0 for tables with no RI parent
set @level = 0;

-- We iterate through the levels until either
-- there are no tables left in @schemaTabs
-- or we find no tables at the current level (an
-- unresolvable referential loop)

declare @candidateTab as nvarchar(128);
declare candidateTabs_cursor cursor forward_only read_only for
select tbname from @schemaTabs;

declare @parentTablesFound as smallint;
declare @tablesAtCurrentLevel smallint;

-- set this to a sensible value so we drop into the main loop
select @tablesAtCurrentLevel = count(*) from @schemaTabs;
while @tablesAtCurrentLevel > 0
begin
   set @tablesAtCurrentLevel = 0;

   -- now we iterate for each table @schemaTabs
   -- and find which have no RI parent table in the @schemaTabs
   -- list of tables
   open candidateTabs_cursor;
   fetch next from candidateTabs_cursor
   into @candidateTab;
   while @@FETCH_STATUS = 0
   begin
      -- We are only interested in whether @candidateTab has parent tables
      -- in @schemaTabs, not what they are
       select @parentTablesFound = count(*)
       from INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS constr
         -- get the parent table
         inner join INFORMATION_SCHEMA.CONSTRAINT_TABLE_USAGE prnt
         on constr.UNIQUE_CONSTRAINT_CATALOG = prnt.CONSTRAINT_CATALOG
             and constr.UNIQUE_CONSTRAINT_SCHEMA = prnt.CONSTRAINT_SCHEMA
             and constr.UNIQUE_CONSTRAINT_NAME = prnt.CONSTRAINT_NAME
               -- get the child table
               inner join INFORMATION_SCHEMA.CONSTRAINT_TABLE_USAGE chld
               on constr.CONSTRAINT_CATALOG = chld.CONSTRAINT_CATALOG
               and constr.CONSTRAINT_SCHEMA = chld.CONSTRAINT_SCHEMA
               and  constr.CONSTRAINT_NAME = chld.CONSTRAINT_NAME
                     inner join @schemaTabs scTabs
                         on scTabs.tbname = prnt.TABLE_NAME
           where chld.TABLE_NAME = @candidateTab
           and chld.CONSTRAINT_SCHEMA = @schema;

           if @parentTablesFound = 0
           begin
              -- no parent tables in @schemaTabs so we note this table name.
                  -- We don't remove it from @schemaTabs yet because if we found
                  -- RI children of this table later in this loop we would think
                  -- these were at the current RI level
                  insert into @tabsAtCurrentLevel
                  values (@candidateTab);
                  set @tablesAtCurrentLevel = @tablesAtCurrentLevel + 1;
           end
           -- fetch next table
           fetch next from candidateTabs_cursor
       into @candidateTab;

   end
   close candidateTabs_cursor;
   -- Update the output list of tables for the current level
   insert into @riorderTabs select @level, tbname from @tabsAtCurrentLevel;
   -- Now we can remove the tables we just found
   -- from @schemaTabs
   delete from @schemaTabs where tbname in (select tbname from @tabsAtCurrentLevel);
   -- and clear down @tabsAtCurrentLevel
   delete from @tabsAtCurrentLevel;
   set @level = @level + 1;
end

return
end
go

Using the function is simply a matter of passing it the schema of the tables you are interested in:

select * from riorder('<schema>') order by RILevel, tablename
go

Because the first column is the ‘level’ you can get a reversed order (which is the valid order for deleting data) by using the DESCENDING  word in an ORDER BY clause.

select * from riorder('<schema>') order by RILevel, tablename order by RILevel descending
go

This script can fail to provide a complete answer if your data model contains a “referential loop”, i.e. table_A is a parent of table_B, table_B is a parent of table_C and table_C is a parent of table_A. This can happen perfectly legitimately but it’s generally a source of pain for your hardworking DBA trying to set up test data. If the above script finds this situation it recognises the loop and stops trying to go any further.

Advertisements

No comments yet

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: