Background


Recently, I have been organizing my blog data and needed to perform an operation to merge article categories. On my blog, articles and categories have a many-to-many relationship. That is, one article can belong to multiple categories, and one category can contain multiple articles. This is a very typical many-to-many relationship, and I use a many-to-many table with a composite primary key to associate this data.

What I need to do is move articles from the "DotNetBeginner" category to the "CSharpAndDotNet" category. However, since some of the articles originally in the "DotNetBeginner" category also belong to "CSharpAndDotNet", directly updating the association table would result in duplicate composite keys, which would cause issues.

To put it more intuitively, write an SQL statement to query the data from the original category (DotNetBeginner) and the target category (CSharpAndDotNet):

DECLARE @SourceCatId     AS UNIQUEIDENTIFIER,
        @TargetCatId     AS UNIQUEIDENTIFIER
 
SET @SourceCatId = '678A4011-40E0-4F75-BC23-1FFC25B72D4A'
SET @TargetCatId = 'D58043FF-B3CB-43DA-9067-522D76D21BE3'

SELECT p.Title,
       c.Name
FROM   PostCategory         AS pc
       INNER JOIN Post      AS p
            ON  p.Id = pc.PostId
       INNER JOIN Category  AS c
            ON  c.Id = pc.CategoryId
WHERE  pc.CategoryId IN (@SourceCatId, @TargetCatId)

The result is shown in the following figure:

The article "C#字符和ASCII码互转" will encounter issues if directly updated to the CSharpAndDotNet category because there is already an existing record in the PostCategory table. However, an article like "上海轨道交通……" can be directly updated without any issues.

Solution


The approach to solve this problem is divided into two steps:

  1. Delete records that will break the composite primary key, such as those in "C#字符和ASCII码互转". 
  2. Update the related table, changing the ID of the old category to the new category.

First, we need to identify which records meet the criteria for deletion. Find the IDs of these articles and use a GROUP BY HAVING clause to query them.

SELECT pc.PostId
FROM   PostCategory AS pc
WHERE  pc.CategoryId IN (@SourceCatId, @TargetCatId)
GROUP BY
       pc.PostId
HAVING COUNT(*) >= 2

The results are 10 items:

These 10 articles simultaneously belong to the DotNetBeginner, CSharpAndDotNet, and other categories.

To display intuitively and facilitate subsequent deletion operations, define a table variable called temp, and then verify which articles they are.

DECLARE @Temp TABLE (PostId UNIQUEIDENTIFIER)

INSERT INTO @Temp
  (
    PostId
  )(
       SELECT pc.PostId
       FROM   PostCategory AS pc
       WHERE  pc.CategoryId IN (@SourceCatId, @TargetCatId)
       GROUP BY
              pc.PostId
       HAVING COUNT(*) >= 2
   )
------------------------------------------------------------------------------
SELECT --pc.PostId
       p.Title,
       --pc.CategoryId,
       c.Name
FROM   PostCategory         AS pc
       INNER JOIN Post      AS p
            ON  p.Id = pc.PostId
       INNER JOIN Category  AS c
            ON  c.Id = pc.CategoryId
WHERE  pc.PostId IN (SELECT t.PostId
                     FROM   @Temp t)

The result was found to be correct:

Then, you can delete all records where the PostId in the @Temp table matches from the PostCategory association table, and the CategoryId corresponds to DotNetBeginner. After that, use an update statement to complete the merging of article categories.

-- Step 1. Delete records that will fuck up the primary key
DELETE 
FROM   PostCategory
WHERE  CategoryId = @SourceCatId
       AND PostId IN (SELECT t.PostId
                      FROM   @Temp t)

-- Step 2. Update old key to new key
UPDATE PostCategory
SET    CategoryId     = @TargetCatId
WHERE  CategoryId     = @SourceCatId

Finally, let's verify that the data has been successfully merged.