Dot Net Perls

Break Lines in CSV File - C#

by Sam Allen

Problem

Divide your large CSV file into smaller files. When you upload data to a database, you use a CSV file, where each record is separated by a new line. You need code to segment a CSV file quickly and reliably on 2 MB boundaries.

Solution: C#

Our solution method will take 1 file of any number of lines of text, and then output files of up to 2 MB that together contain all the data. The output files of this method will name output files with incremented numbers.

Example: generate file names

We need a method that can return proper filenames to generate. It is static because it does not need to refer to any state in the class. It uses ToString("00") to show a 2-digit number, with a leading 0 if required.

static string FileName(string baseFileName, int fileNum)
{
    //
    // This will be one of the file names in the output files.
    //
    return baseFileName + fileNum.ToString("00") + ".txt";
}

Example: segment CSV files

This method uses StreamReader, generates file names, and keeps track of the file sizes written. Here's the main code.

public static void WriteSegments(string inFile, string outPrefix)
{
    List<string> lines = new List<string>();
    StreamReader reader = new StreamReader(inFile);

    //
    // Read in the specified file.
    //
    string line;
    while ((line = reader.ReadLine()) != null)
    {
        lines.Add(line);
    }
    reader.Dispose();

    int runningTotal = 0;
    string baseFileName = outPrefix + "_";
    int fileNum = 0;
    StreamWriter writer = new StreamWriter(FileName(baseFileName, fileNum));

    //
    // Iterate through each line in the file lines. Keep track of
    // the current length of the data. After we hit a certain length,
    // take the data and write it to a new file. Then, create another
    // file for the next segment.
    //
    for (int i = 0; i < lines.Count; i++)
    {
        int length = lines[i].Length;
        string thisLine = lines[i];

        if (runningTotal + length >= _1Mb)
        {
            fileNum++;
            runningTotal = 0;
            writer.Dispose();
            writer = new StreamWriter(FileName(baseFileName, fileNum));
        }
        writer.WriteLine(thisLine);
        runningTotal += length;
    }
    writer.Dispose();
}
  1. It reads files into Lists.
    The method collects all the lines into a single List, as described in my article about ReadLine and arrays. [C# - ReadLine for Reading File Into List - dotnetperls.com]
  2. It keeps a count.
    We measure each line before writing it to a new file, and keep a running count of the number of bytes written to the current file. Create a new file when that number meets or exceeds the maximum file size.
  3. It writes files.
    We dispose of and then create a new StreamWriter for multiple files. Files are named programmatically.

Example: call the method

The following code will take the file of the name specified and create a series of smaller files from it. Call it on comma-separated values file.

//
// Could take "all-names.txt", and write
// ALL_OUT00.txt, ALL_OUT01.txt, ALL_OUT02.txt
//
Segment.WriteSegments("all-names.txt", "ALL_OUT");

Summary

This code is a life-saver when your database goes down and you need a quick way to upload new information to it. This kind of code is valuable in your tool belt. [Segment C# - dotnetperls.com]

Dot Net Perls
About
Sitemap
Source code
RSS
Databases
SQLCE 3.5 Database Use
SQLite FTS3 Virtual Table
OdbcConnection Example
Break Lines in CSV File
Protection Proxy Design Pattern
Recent
Pi
NGEN Installer Class
List Element Equality
DateTime Tips and Tricks
Remove HTML Tags From String
© 2008 Sam Allen. All rights reserved.