You need to control Microsoft Excel in your program written in the C# language. Get the Microsoft.Office.Interop.Excel assembly working properly. It must deal with Excel XLS spreadsheets and worksheets. Your application must process a batch of spreadsheets. Here are some tips about Excel Interop performance, and some basics about using the Interop.Excel assembly.
=== Excel interop benchmark in C# === Some ways of using Excel are extremely slow. This can make your program unusable. It is best to minimize round-trips to EXCEL.EXE. Cells[]: 30.0 seconds get_Range(), Cells[]: 15.0 seconds UsedRange, get_Value(): 1.5 seconds [fastest]
Here we look at the namespace you must include to use Excel in your C# program. You have to use Visual Studio's GUI to add an assembly to your program to use Excel interop. Use the Add Reference command for this. Once you have completed this, you will need a class.
=== Namespace containing Excel interop methods === using Microsoft.Office.Interop.Excel;
First step to take. Make a new C# class file in Visual Studio and you can call it something like ExcelInterop. We use that name in the examples, but you are free to use something more familiar or standard. Add the following line to your class file. This will make the Excel namespace handy.
If you look at various Excel interop objects in IntelliSense, you will see hundreds of undocumented functions. You will only need to use a few to get most of the objective accomplished, unless you have very complex requirements.
=== Class that stores Application (C#) ===
/// <summary>
/// This class contains the Excel Inteop code we need, and
/// it can be shared in many places to avoid duplication.
/// </summary>
class ExcelReaderInterop
{
/// <summary>
/// Store the Application object we can use in the member functions.
/// </summary>
Application _excelApp;
/// <summary>
/// Initialize a new Excel reader. Must be integrated
/// with an Excel interface object.
/// </summary>
public ExcelReaderInterop()
{
_excelApp = new Application();
}
}Interop class. We need a place to store our Excel information, so let's use our special class. Put an Application object called _excelApp as a member. In the constructor, make the _excelApp a new Application object.
Here we see that to open Excel files, you must call Workbooks.Open. It's harder than it should be. Put a public function on the class, and it can use the _excelApp we already have. The following code shows how you can use the Workbooks.Open method, which you must pass many arguments to.
=== Method that opens Excel workbooks (C#) ===
/// <summary>
/// Open the file path received in Excel. Then, open the workbook
/// within the file. Send the workbook to the next function, the internal scan
/// function. Will throw an exception if a file cannot be found or opened.
/// </summary>
public void ExcelOpenSpreadsheets(string thisFileName)
{
try
{
//
// This mess of code opens an Excel workbook. I don't know what all
// those arguments do, but they can be changed to influence behavior.
//
Workbook workBook = _excelApp.Workbooks.Open(thisFileName,
Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing, Type.Missing, Type.Missing,
Type.Missing, Type.Missing);
//
// Pass the workbook to a separate function. This new function
// will iterate through the worksheets in the workbook.
//
ExcelScanIntenal(workBook);
//
// Clean up.
//
workBook.Close(false, thisFileName, null);
Marshal.ReleaseComObject(workBook);
}
catch
{
//
// Deal with exceptions.
//
}
}Workbooks.Open is messy. I apologize for the mess that is Workbooks.Open. This is probably the result of legacy code at Microsoft. There are many Type.Missing arguments. Don't panic over those Type.Missing arguments. You don't need to know what they do, at least not now.
These are the steps. Open a workbook with Workbooks.Open. Send the workbook we open to another function called ExcelScanInternal. Close the workbook and release all the memory. You will need to deal with exceptions in the catch block. I don't have the information about all the detailed exceptions that can be thrown. Just catch them all at once.
To get data from workbooks, you will need to iterate over the sheets. An Excel workbook has one or more sheets. In Excel, you can switch between sheets by clicking on the tabs on the bottom. In Excel Interop, sheets are indexed starting at 1. This is similar to Visual Basic but not the C# language.
=== Method that gets sheets (C#) ===
/// <summary>
/// Scan the selected Excel workbook and store the information in the cells
/// for this workbook in an object[,] array. Then, call another method
/// to process the data.
/// </summary>
private void ExcelScanIntenal(Workbook workBookIn)
{
//
// Get sheet Count and store the number of sheets.
//
int numSheets = workBookIn.Sheets.Count;
//
// Iterate through the sheets. They are indexed starting at 1.
//
for (int sheetNum = 1; sheetNum < numSheets + 1; sheetNum++)
{
Worksheet sheet = (Worksheet)workBookIn.Sheets[sheetNum];
//
// Take the used range of the sheet. Finally, get an object array of all
// of the cells in the sheet (their values). You can do things with those
// values. See notes about compatibility.
//
Range excelRange = sheet.UsedRange;
object[,] valueArray = (object[,])excelRange.get_Value(
XlRangeValueDataType.xlRangeValueDefault);
//
// Do something with the data in the array with a custom method.
//
ProcessObjects(valueArray);
}
}Using object array. The object[,] array in step 3 is directly usable in the C# language. Once you get the objects, you don't need to do any more Interop to get them. This provides a huge performance boost. There are benefits to reducing calls to Excel Interop. Next I show some benchmarks I took comparing the various techniques of Excel interop.
If you do your Excel processing wrong, you could be waiting a long time to process even 10 spreadsheets at once. Using the Cells[] indexer on a range, or getting the range with the Range property or get_Range() is very slow. See the figures at the top of this article. This document is based on .NET 3.5. The approach shown here is the fastest one.
Here we note a compatibility issue with the Excel program on the Apple Mac OS X platform. Excel stores OA dates in different formats on Macs and PCs. The code above shows one solution, which is to use get_Value() instead of Value2. The function get_Value() returns objects that do not vary based on the original platform.
See FromOADate and Excel Dates.
In this tutorial, we saw how to get started with Excel interop and avoid severe performance problems with the assembly. Don't call the complicated Open method with Type.Missing in more places than necessary. Make an interface and abstract away that complexity.