Control Microsoft Excel in our program. To this end, we must get the Microsoft.Office.Interop.Excel assembly working properly. It must deal with Excel XLS spreadsheets and worksheets. Our app must be fast and not require minutes to process a batch of spreadsheets, and also compatible with Mac versions of Excel.
Here are some tips about Excel Interop performance, and also some basics about using the Interop.Excel assembly in your C# project. You have to use Visual Studio's GUI to add an assembly to your program to use Excel interop. You must use the Add Reference command for this.
New class. First we will make a new C# class file in Visual Studio. Call it something like ExcelInterop. I use that name in my examples, but you are free to use something more familiar or standard in your work. Add the following line to your class file. This will put the Excel namespace right at our fingertips.
using Microsoft.Office.Interop.Excel;
If you look at various Excel interop objects in IntelliSense, you will see hundreds of undocumented functions. (I don't have any idea what many of them do.) You will only need to use a few to get most of the objective accomplished, unless you have more complex requirements than I did, though.
Interop class. We need a place to store our Excel information, so let's use a special class called ExcelInterop. Make that class and we will put an Application object called _excelApp as a member. In the constructor, make the _excelApp a new Application object.
/// <summary>
/// This class contains the Excel Inteop code we need, and
/// it can be shared in many places to avoid duplication.
/// </summary>
class ExcelReaderInterop
{
/// <summary>
/// Store the Application object we can use in the member functions.
/// </summary>
Application _excelApp;
/// <summary>
/// Initialize a new Excel reader. Must be integrated
/// with an Excel interface object.
/// </summary>
public ExcelReaderInterop()
{
_excelApp = new Application();
}
}
On the ExcelInterop class, we need a way for other classes to make it read things and do Excel interop. So, let's put a public function on the class. This public function can use the _excelApp we allocated already.
/// <summary> /// Open the file in the name we receive with Excel. Then, open the workbook /// within the file. Send the workbook to the next function, the internal scan /// function. Will throw an exception if a file cannot be found or opened. /// </summary> public void ExcelOpenSpreadsheets(string thisFileName) { try { // This mess of code opens an Excel workbook. I don't know what all // those arguments do, but they can be changed to influence behavior. Workbook workBook = _excelApp.Workbooks.Open(thisFileName, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing); // Pass the workbook to a separate function. This new function // will iterate through the worksheets in the workbook. ExcelScanIntenal(workBook); // Clean up. workBook.Close(false, thisFileName, null); Marshal.ReleaseComObject(workBook); } catch { // Deal with exceptions. } }
An Excel workbook has one or more sheets. In Excel, you can switch between sheets by clicking on the tabs on the bottom. In Excel Interop, sheets are indexed starting at 1. We first get the count of sheets, and for each sheet we use UsedRange to get arrays of the cells.
/// <summary>
/// Scan the selected Excel workbook and store the information in the cells
/// for this workbook in an object[,] array. Then, call another method
/// to process the data.
/// </summary>
private void ExcelScanIntenal(Workbook workBookIn)
{
// Get sheet count int and store the number of sheets.
int numSheets = workBookIn.Sheets.Count;
// Iterate through the sheets. They are indexed starting at 1.
for (int sheetNum = 1; sheetNum < numSheets + 1; sheetNum++)
{
Worksheet sheet = (Worksheet)workBookIn.Sheets[sheetNum];
// Take the used range of the sheet. Finally, get an object array of all
// of the cells in the sheet (their values). You can do things with those
// values. See notes about compatibility.
Range excelRange = sheet.UsedRange;
object[,] valueArray = (object[,])excelRange.get_Value(
XlRangeValueDataType.xlRangeValueDefault);
// Do something with the data in the array with a custom method.
ProcessObjects(valueArray);
}
}
Basically, if you do your Excel processing wrong, you could be waiting a long time to process even 10 spreadsheets at once. One of the most important things to remember is that using the Cells[] indexer on a range, or getting the Range with the Range property or get_Range() is very slow. These benchmarks are for processing large spreadsheets.
One issue is that Microsoft Excel stores dates in different formats on Macs and PCs. A Mac .XLS is not the same as a PC .XLS, and you need to accomodate that. The code above incorporates one solution, which is to use get_Value() instead of Value2. The function get_Value() returns objects that do not vary based on the platform used to create the spreadsheet.
Excel Interop in C# is very powerful and these are only some beginning steps to take. Use code like this to get started and avoid severe performance problems from using the assembly wrong. Don't call the complicated Open method with Type.Missing in more places than necessary. Making an interface and abstracting away that complexity would be worthwhile.