SQL Server 2008: Change Data Capture (CDC)

Another one of the new features within SQL Server 2008 is the Change Data Capture (CDC).  CDC is designed to capture insert, update and delete activity on a SQL table and place the information into a separate relational table.  It uses an asynchronous capture mechanism that reads the transaction logs and populates the CDC table with the row’s data which change.  The CDC table mirrors the column structure of the tracked table, together with metadata regarding the change.  In later releases, there is also going to be a audit feature allowing you to audit who accesses what information and when, based more on select statements.

One of the posts on the forums details how to setup CDC, it’s also in books online. To enable CDC, the following two statements are required. Note, SQLServerAgent must be running for this to work.

use testDatabase
–Activate CDC
EXEC sys.sp_cdc_enable_db_change_data_capture
–IsDatabaseEnabled?
SELECT is_cdc_enabled FROM sys.databases WHERE name = ‘testDatabase’
–Enable CDC on table
EXEC sys.sp_cdc_enable_table_change_data_capture @source_schema = ‘dbo’, @source_name = ‘Table_1′, @role_name = ‘cdc_test’
–IsTableEnabled?
SELECT is_tracked_by_cdc FROM sys.tables WHERE name = ‘table_1′

The sp_cdc_enable_table… statement enables the tracking of the changes for a table, parameters are required for setting the source schema, the source name which is the table you want to track, and the role name, if the role does not exist it will create it for you.

After executing these statements you will have a number of new tables within your database.  The tables are:

  • cdc.captured_columns – Information about the captured columns being tracked.
  • cdc.change_tables – Tables being tracked.
  • cdc.ddl_history – Changes to the tracked table, details the command issued on the table.
  • cdc.index_columns – Guessing this is tracked indexed columns.
  • cdc.lsn_time_mapping – When a transaction starts and ends. Relates to rows in the tracked tables.

There is also a table for each table which is tracked, in my case I have just the one – cdc.dbo_Table_1_CT.  If we execute a change to the table, such as inserting a new record.

INSERT INTO Table_1 VALUES (1, ‘First Test’)

A new row is added to the table, with a record also being inserted into lsn_time_mapping.

__$start_lsn           __$end_lsn             __$seqval              __$operation __$update_mask id          TestMessage
0x0000001E0000006B001A NULL                   0x0000001E0000006B0018 2            0x03       1           First Test

If we update the table again using

UPDATE Table_1 SET TestMessage = ‘Updated First Test’ WHERE id = 1

Two records are inserted into the tracked table. One with the TestMessage ‘First Test’ (the original message) and the other with ‘Updated First Test’ both relating to a single transaction in the lsn_time_mapping table.  If we do

DELETE FROM Table_1 WHERE id = 1

We have a single row inserted into dbo_Table_1_CT saying which row was deleted.

Within the dbo_Table_1_CT there is a column __$operation.  This gives you the ID relating to the operation executed on the row.  2 = Insert. 3 = Before Update. 4 = Post Update. 1 = Delete.

If we change the design of the table while it is being tracked, a row is inserted into cdc.ddl_history with the command executed on the table.

source_object_id object_id   required_column_update ddl_command                    
—————- ———– ———————- ——————————–
1077578877       1093578934  0                      ALTER TABLE dbo.table_1 ADD Status nvarchar(50) NULL

However, if we execute another insert command on the table, the new column does not appear in the result set/dbo_table_1_ct.  It would be nice to keep the two insync and not have to worry about updating the table if it is being tracked. To get the tables back insync, I had to disable it (capture_instance parameter relates to the columns being captured I think) and re-enable the tracking. However, in doing so I lost all of the previous information – Boo.  The commands to do this are:

EXEC sys.sp_cdc_disable_table_change_data_capture @source_schema = ‘dbo’, @source_name = ‘Table_1′, @capture_instance = ‘all’
EXEC sys.sp_cdc_enable_table_change_data_capture @source_schema = ‘dbo’, @source_name = ‘Table_1′, @role_name = ‘cdc_test’

In case you are worried about this using up too much space, there is a sys.sp_cdc_cleanup_change_table stored procedure which can be used to remove entries up to a point in time.  You could then link this to a Job to archive the information, or simply report and remove.

I think that pretty much covers it. Another really useful feature, I have saw one or two attempts at auditing database activities like this and they are a bit hit and miss, so this will solve a lot of heartache. Looking forward to seeing what happens with the future auditing features.

One concern:  If a table is tagged as encrypted, is the cdc table also encrypted?  Will have to research this at a later date.

SQL BOL: ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_1devconc/html/7d514b05-7c4b-4ef2-8f16-b720df3bf44a.htm

0 thoughts on “SQL Server 2008: Change Data Capture (CDC)”

  1. Thatz a good one! But I am some issues when i am trying to execute the stored produre to enable CDC.
    It gives me error saying such a procedure does not exists. Can you please suggest me solution for this?

    Cheers
    Shankar

  2. Help please :)

    When I log into my machine using my domain account and create a database, I could not execute the stored procedures mentioned. However, if my local user account created the database, I do not encounter any problem.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>