A system needed to be transferred in a hurry. The only way to extract data from that system was via csv files. The data was dirty meaning fields were not consistent across different export files (extra space and unprintable characters in id fields, additional information in text fields, date incongruities, etc.) .
The software used here is:
- Excel
- Visual Studio 2017 SSDT
The first step was to save the csv files as excel macro enabled worksheets so I could do some basic formatting within the excel files. Then I enabled the developer tab. I used the default macro to write some code to do the basic formatting. I chose to do this formatting within the excel file rather than in SSIS because it just seemed simpler. I explored ways to do this via SSIS but the benefit cost analysis deemed just getting it done more viable. As any programmer knows the smallest tasks can turn into some deep rabbit holes.
Here is an example of one macro for one excel file.
- It sets the source range so it doesn’t search thousands of empty rows/columns.
- Deletes empty rows and header rows
- it merges rows with duplicate ids (there might be three rows for one id with separate text in say the C cell. This would combine them into the first row separated by a pipe character and delete the rows with duplicate ids).
Example Excel Macro
Sub testing()
Application.EnableEvents = False
On Error Resume Next
'Activate PowerPoint Library (version 2.9)
ThisWorkbook.VBProject.References.AddFromGuid _
GUID:="{3F4DACA7-160D-11D2-A8E9-00104B365C9F}", Major:=2, Minor:=9
'Reset Error Handler
On Error GoTo 0
Application.Calculation = xlCalculationManual
Application.ScreenUpdating = False
Application.EnableEvents = False
Dim SourceRange As Range
Dim EntireRow As Range
Dim iCntr As Long
Dim previousrow As Long
Dim finalstring As String
For Each Sh In ThisWorkbook.Worksheets
Set SourceRange = Sh.Range("A1:M" + CStr(Sh.UsedRange.Rows.Count))
For iCntr = SourceRange.Rows.Count To 1 Step -1
Set EntireRow = SourceRange.Cells(iCntr, 1).EntireRow
If WorksheetFunction.CountA(EntireRow) = 0 Then
EntireRow.Delete
If iCntr Mod 5000 = 0 Then Debug.Print Sh.Name + " | " + CStr(iCntr)
End If
Next
For iCntr = SourceRange.Rows.Count To 1 Step -1
If Mid(SourceRange.Cells(iCntr, 1).Value, 1, 9) = "something:" Then
Set EntireRow = SourceRange.Cells(iCntr, 1).EntireRow
EntireRow.Delete
End If
If Mid(SourceRange.Cells(iCntr, 1).Value, 1, 8) = "somethingelse" Then
Set EntireRow = SourceRange.Cells(iCntr, 1).EntireRow
EntireRow.Delete
End If
Next
For iCntr = SourceRange.Rows.Count To 1 Step -1
previousrow = iCntr - 1
If iCntr > 1 Then
If SourceRange.Cells(iCntr, 1).Value = "name1" Or SourceRange.Cells(iCntr, 1).Value = "cnelson" Or SourceRange.Cells(iCntr, 1).Value = "name2" Or SourceRange.Cells(iCntr, 1).Value = "name3" Or SourceRange.Cells(iCntr, 1).Value = "name4" Or SourceRange.Cells(iCntr, 1).Value = "name5" Or SourceRange.Cells(iCntr, 1).Value = "name6" Or SourceRange.Cells(iCntr, 1).Value = "name7" Or SourceRange.Cells(iCntr, 1).Value = "name8" Then
If IsEmpty(SourceRange.Range("B" + CStr(iCntr)).Value) Then
finalstring = ""
Else
finalstring = CStr(SourceRange.Range("B" + CStr(iCntr)).Value)
End If
If IsEmpty(SourceRange.Range("Q" + CStr(iCntr)).Value) Then
previousstring = ""
Else
previousstring = CStr(SourceRange.Range("Q" + CStr(iCntr)).Value)
End If
SourceRange.Range("Q" + CStr(previousrow)).Value = previousstring + " | " + finalstring
finalstring = ""
previousstring = ""
If IsEmpty(SourceRange.Range("C" + CStr(iCntr)).Value) Then
finalstring = ""
Else
finalstring = CStr(SourceRange.Range("C" + CStr(iCntr)).Value)
End If
If IsEmpty(SourceRange.Range("R" + CStr(iCntr)).Value) Then
previousstring = ""
Else
previousstring = CStr(SourceRange.Range("R" + CStr(iCntr)).Value)
End If
SourceRange.Range("R" + CStr(previousrow)).Value = previousstring + " | " + finalstring
finalstring = ""
previousstring = ""
Set EntireRow = SourceRange.Cells(iCntr, 1).EntireRow
EntireRow.Delete
iCntr = iCntr - 1
End If
End If
Next
Next Sh
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
Application.EnableEvents = True
End Sub
SSIS Control Flow
The control flow dictates eye from the sky steps in the migration process. Since I have several different excel workbooks importing into several different tables, you can see there are lots of sql tasks to truncate those tables before new insertions. Also there are separate tasks for a bit more housecleaning of the data and a few steps that can be run stand alone.
I included all the excel files and the error files in the miscellaneous directory to centralize operations. You can see under the connection managers the connections for each file.

SSIS Data Flow
This is the busiest data flow as it is the all in one import. Each task is made up of
- a source call with an accompanying error file
- a data conversion section since excel data types do not translate directly to sql data types
- a database destination with accompanying error file
There are other data flows some of which are a single transaction so I can just do a single import and some of which are more cleaning routines.

TSQL Final Cleanup and Import
After all the data has been moved into the separate tables, I created a final tsql project to do some final formats to the data, standardize it, and put it into a temporary table that mimics the production table.
This import was a bit of a unique scenario as I was forced to move the data into a legacy application that used an access database for updates. I am in the process of moving that application to the web but have only gotten the front-end up for search, not the back-end side for updates. But that can be in a separate post!
So here is a sample of the final SQL. This could have been put into the SSIS package as well but since it was a run once and done type scenario I just ran it from SQL Server Management Studio.
--------------------------------------------------------
;WITH CTE7 (ALTID,LOCNUMBER,LOCSTREET) AS (
SELECT ALTID, CASE WHEN ISNUMERIC(SUBSTRING(ISNULL([Address],''),0,CHARINDEX(' ',ISNULL([Address],'')))) = 1 THEN SUBSTRING(ISNULL([Address],''),0,CHARINDEX(' ',ISNULL([Address],''))) ELSE '' END, CASE WHEN ISNUMERIC(SUBSTRING(ISNULL([Address],''),0,CHARINDEX(' ',ISNULL([Address],'')))) = 1 THEN SUBSTRING(ISNULL([Address],''),CHARINDEX(' ',ISNULL([Address],''))+1,LEN(ISNULL([Address],''))) ELSE ISNULL([Address],'') END
FROM [dbo].[VNewThings]
)
UPDATE [Tbl test]
SET [Location Number] = LOCNUMBER,
[Location Street] = LOCSTREET
FROM CTE7
WHERE CTE7.ALTID = [Tbl test].[Thing Number]
--------------------------------------------------------
--GET ONLY DISTINCT THINGS PER RECORD
--------------------------------------------------------
INSERT INTO DBO.TEMPTHINGS
SELECT ISNULL(ALTID,''),THINGS,'','','','THING2',CHARINDEX('|',SUBSTRING(ISNULL(THINGS,''),3,LEN(THINGS)-3)),0,0
FROM [dbo].[VNewCitations] VNC2
WHERE SUBSTRING(LTRIM(RTRIM(REPLACE(ISNULL(THINGS,''),'|',''))),3,LEN(THINGS)-3) <> ''
AND ISNULL(ALTID,'') <> ''
UPDATE DBO.TEMPTHINGS
SET SECONDSEPARATOR = CHARINDEX('|',SUBSTRING(ISNULL(DBO.TEMPTHINGS.originalThing,''),DBO.TEMPTHINGS.FIRSTSEPARATOR+3,LEN(DBO.TEMPTHINGS.originalThing)))
FROM DBO.TEMPTHINGS TV2
INNER JOIN DBO.TEMPTHINGS ON TV2.ID = DBO.TEMPTHINGS.ID
UPDATE DBO.TEMPTHINGS
SET Thing1 = LTRIM(RTRIM(REPLACE(SUBSTRING(DBO.TEMPTHINGS.ORIGINALThing,3,COALESCE(NULLIF(DBO.TEMPTHINGS.FIRSTSEPARATOR,0),LEN(DBO.TEMPTHINGS.ORIGINALThing))),'|','')))
FROM DBO.TEMPTHINGS TV2
INNER JOIN DBO.TEMPTHINGS ON TV2.ID = DBO.TEMPTHINGS.ID
UPDATE DBO.TEMPTHINGS
SET Thing2 = CASE WHEN DBO.TEMPTHINGS.FIRSTSEPARATOR <> 0 AND DBO.TEMPTHINGS.SECONDSEPARATOR <> 0 THEN LTRIM(RTRIM(REPLACE(SUBSTRING(DBO.TEMPTHINGS.ORIGINALThing,DBO.TEMPTHINGS.FIRSTSEPARATOR+1,DBO.TEMPTHINGS.SECONDSEPARATOR),'|','')))
WHEN DBO.TEMPTHINGS.FIRSTSEPARATOR <> 0 AND DBO.TEMPTHINGS.SECONDSEPARATOR = 0 THEN LTRIM(RTRIM(REPLACE(SUBSTRING(DBO.TEMPTHINGS.ORIGINALThing,DBO.TEMPTHINGS.FIRSTSEPARATOR+1,LEN(DBO.TEMPTHINGS.originalThing)),'|',''))) END
FROM DBO.TEMPTHINGS TV2
INNER JOIN DBO.TEMPTHINGS ON TV2.ID = DBO.TEMPTHINGS.ID
UPDATE DBO.TEMPTHINGS
SET Thing3 = CASE WHEN DBO.TEMPTHINGS.SECONDSEPARATOR <> 0 THEN LTRIM(RTRIM(REPLACE(SUBSTRING(DBO.TEMPTHINGS.ORIGINALThing,DBO.TEMPTHINGS.FIRSTSEPARATOR + DBO.TEMPTHINGS.SECONDSEPARATOR,LEN(DBO.TEMPTHINGS.originalThing)),'|',''))) END
FROM DBO.TEMPTHINGS TV2
INNER JOIN DBO.TEMPTHINGS ON TV2.ID = DBO.TEMPTHINGS.ID
UPDATE [Tbl test]
SET [O1] = Thing1
FROM TEMPTHINGS
WHERE TEMPTHINGS.ALTID = [Tbl test].[Thing Number]
UPDATE [Tbl test]
SET [O2] = Thing2
FROM TEMPTHINGS
WHERE TEMPTHINGS.ALTID = [Tbl test].[Thing Number]
UPDATE [Tbl test]
SET [O3] = Thing3
FROM TEMPTHINGS
WHERE TEMPTHINGS.ALTID = [Tbl test].[Thing Number]
----------------------------------------------------------
UPDATE [Tbl test]
SET [void]= 'void'
FROM [dbo].[VNewVoids] WHERE LTRIM(RTRIM([VNewVS].ALTID)) = LTRIM(RTRIM([Tbl test].[Number]))
----------------------------------------------------------
UPDATE [Tbl test]
SET [Warning]= 'warning'
FROM [dbo].[VNewWarnings] WHERE LTRIM(RTRIM([VNewWs].ALTID)) = LTRIM(RTRIM([Tbl test].[Number]))
----------------------------------------------------------
UPDATE [Tbl test]
SET [Write Off] = 'Write Off'
FROM [dbo].[VNewWriteOffs] WHERE LTRIM(RTRIM([VNewWFs].ALTID)) = LTRIM(RTRIM([Tbl test].[Number]))
![]()
Leave a Reply