How to upload CSV file to SQL Server with SSIS

0

Hello dear, I hope you can help me.

I am developing a workflow in SSIS to load Excel 2010 files into a table in SQL Server 2008-R2, many of the fields in Excel have text of more than 255 characters and are sometimes truncated in that value, the solution that It is recommended as I researched is to do 01 additional step and move from Excel to CSV and then do the load. The 02 drawbacks that arise are the following: 1. After searching the net I found and adapted a script that helps me to convert, but in some cases it generates line breaks spoiling the row, this even though I see that it makes the exception with "\ n" and there the workflow stops (apparently that's the problem). 2. I have struggled a lot to try to identify the encoding of the generated CSV file since sometimes it does not represent "ñ" or accents or separates the commas in a single field (this within the generated connection string) and the that fits more is "437 (OEM - United States)".

  • Someone knows some other Script that I can use to convert to CSV.
  • How to identify the correct coding of the CSV file so that it represents the characters as they are.

Enclosed Script that I am using for conversion, it is in C # and I use VB if they have it there in that language I would understand it.

/*
       Microsoft SQL Server Integration Services Script Task
       Write scripts using Microsoft Visual C# 2008.
       The ScriptMain is the entry point class of the script.
    */
    using System;
    using System.Data;
    using Microsoft.SqlServer.Dts.Runtime;
    using System.Windows.Forms;
    using Excel = Microsoft.Office.Interop.Excel;
    using Microsoft.Office.Interop.Excel;
    namespace ST_6dc747ff29cf41c6ac11b7c0bca33d19.csproj
    {
        [System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
        public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
        {
            #region VSTA generated code
            enum ScriptResults
            {
                Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
                Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
            };
            #endregion
            /*
      The execution engine calls this method when the task executes.
      To access the object model, use the Dts property. Connections, variables, events,
      and logging features are available as members of the Dts property as shown in the following examples.
      To reference a variable, call Dts.Variables["MyCaseSensitiveVariableName"].Value;
      To post a log entry, call Dts.Log("This is my log text", 999, null);
      To fire an event, call Dts.Events.FireInformation(99, "test", "hit the help message", "", 0, true);
      To use the connections collection use something like the following:
      ConnectionManager cm = Dts.Connections.Add("OLEDB");
      cm.ConnectionString = "Data Source=localhost;Initial Catalog=AdventureWorks;Provider=SQLNCLI10;Integrated Security=SSPI;Auto Translate=False;";
      Before returning from this method, set the value of Dts.TaskResult to indicate success or failure.
      
      To open Help, press F1.
     */
            private static Workbook mWorkBook;
            private static Sheets mWorkSheets;
            private static Worksheet mWSheet1;
            private static Excel.Application oXL;
            private static string ErrorMessage = string.Empty;
            public void Main()
            {
                try
                {
                    string sourceExcelPathAndName = @"ruta\aaa.xlsx";
                    string targetCSVPathAndName = @"ruta\aaa.csv";
                    string excelSheetName = @"Sheet0";
                    string columnDelimeter = @"|#|";
                    int headerRowsToSkip = 2;
                    if (ConvertExcelToCSV(sourceExcelPathAndName, targetCSVPathAndName, excelSheetName, columnDelimeter, headerRowsToSkip) == true)
                    {
                        Dts.TaskResult = (int)ScriptResults.Success;
                    }
                    else
                    {
                        Dts.TaskResult = (int)ScriptResults.Failure;
                    }
                }
                catch (Exception ex)
                {
                    Dts.TaskResult = (int)ScriptResults.Failure;
                }
            }
            public static bool ConvertExcelToCSV(string sourceExcelPathAndName, string targetCSVPathAndName, string excelSheetName, string columnDelimeter, int headerRowsToSkip)
            {
                try
                {
                    oXL = new Excel.Application();
                    oXL.Visible = false;
                    oXL.DisplayAlerts = false;
                    Excel.Workbooks workbooks = oXL.Workbooks;
                    mWorkBook = workbooks.Open(sourceExcelPathAndName, 0, false, 5, "", "", false, XlPlatform.xlWindows, "", true, false, 0, true, false, false);
                    //Get all the sheets in the workbook
                    mWorkSheets = mWorkBook.Worksheets;
                    //Get the specified sheet
                    mWSheet1 = (Worksheet)mWorkSheets.get_Item(excelSheetName);
                    Excel.Range range = mWSheet1.UsedRange;
                    //deleting the specified number of rows from the top
                    Excel.Range rngCurrentRow;
                    for (int i = 0; i < headerRowsToSkip; i++)
                    {
                        rngCurrentRow = range.get_Range("A1", Type.Missing).EntireRow;
                        rngCurrentRow.Delete(XlDeleteShiftDirection.xlShiftUp);
                    }
                    //replacing ENTER with a space
                    range.Replace("\n", " ", Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing);
                    //replacing COMMA with the column delimeter
                    range.Replace(",", columnDelimeter, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing);
                    mWorkBook.SaveAs(targetCSVPathAndName, XlFileFormat.xlCSVMSDOS,
                    Type.Missing, Type.Missing, Type.Missing, Type.Missing, Microsoft.Office.Interop.Excel.XlSaveAsAccessMode.xlExclusive,
                    Type.Missing, Type.Missing, Type.Missing,
                    Type.Missing, false);
                    return true;
                }
                catch (Exception ex)
                {
                    ErrorMessage = ex.ToString();
                    return false;
                }
                finally
                {
                    if (mWSheet1 != null) mWSheet1 = null;
                    if (mWorkBook != null) mWorkBook.Close(Type.Missing, Type.Missing, Type.Missing);
                    if (mWorkBook != null) mWorkBook = null;
                    if (oXL != null) oXL.Quit();
                    System.Runtime.InteropServices.Marshal.ReleaseComObject(oXL);
                    if (oXL != null) oXL = null;
                    GC.WaitForPendingFinalizers();
                    GC.Collect();
                    GC.WaitForPendingFinalizers();
                    GC.Collect();
                }
            }
        }
    }
    
asked by Carlos_JPQ 07.11.2018 в 00:32
source

1 answer

1

I will respond more or less to the second question you ask:

  

How to identify the correct coding of the CSV file so that it represents the characters as they are.

But I will do it by making a change in your script, and explaining part of the problem.

I think your mistake is that, in the script that you publish, you are asking Excel to save the CSV in DOS format (I guess it uses ASCII not extended, but I do not really know it and I do not have a way to test it now).

The Excel.XlFileFormat enumeration has the following flavors of csv:

  • xlCSV
  • xlCSVMac
  • xlCSVMSDOS
  • xlCSVWindows

Of these, I would choose xlCSVWindows or xlCSV (you can try both and compare the results).

On the other hand, the Workbook.SaveAsXMLData method has a parameter called TextCodepage , which I think is currently ignored, but the documentation says:

  

Ignored for all languages in Microsoft Excel.

     

Note

     

When Excel saves to workbook to one of the CSV or text formats, which are specified by using the FileFormat parameter, it uses the code page that corresponds to the language for the local system in use on the current computer. This system is available in the Control Panel, by clicking Region and Language, clicking on the Location tab, under Current location.

My free translation of the note of this documentation would be something like:

  

Note

     

When Excel saves a book in one of the CSV or text formats, which is specified using the FileFormat parameter, it uses the page code that corresponds to the language of the regional settings of the system in use on the computer. This system configuration is found in the control panel ...

From this, I could say that the file will be in ANSI format with the same page code of the PC language.

    
answered by 07.11.2018 в 03:01