Saturday, April 8, 2017

Generate Office Open XML files (DOCX, XLSX, PPTX…) using XSLT Transformations

 

Note: Download OOXML, XSL Transformation Sample here.

 

Office Open XML format (OOXML) has been introduced by Microsoft, for representing Office file formats like Excel, Word and Powerpoint, which later standardized by ECMA, ISO and IEC. These formats has become industry wide (universal)  standards, and portable in other office suites like Open Office or LibreOffice.

XSLT is one of the fastest tool to generate office documents, where you can generate an Office Document (Say a Word, Excel) from an input XML file and a target XSLT Transformation as detailed here. This was easy for formats like Speadsheet 2003 (MSO), which is a proprietary EXCEL format from Microsoft, where an entire workbook has been represented by a single XML file and XSLT typically outputs a single XML. Since it has been a proprietary format, other Office applications will not support them and not portable. It only works with Microsoft Office Products.

Now OOXML is a new specification, where a EXCEL workbook or Word Document will be splitted among many parts (which can be binary as well), which are combined in zip format. You can simply rename a XLSX/DOCX file to ZIP and can open them in any archive viewer to find its contents. You can find an indepth dive to the concept here and here.

As OOXML is a zip format, and contains many parts inside them. You cannot apply XSLT as is to this zip file, to generate the office documents and XSLT is not worth the effort, until Eric White, has introduced a solution based on Flat OPC Xml.

He converted the zip format, to a single Flat OPC XML format file, which can then be handled using XSL Transformation constructs. Again OPC (Open Packaging Conventions) has been introduced by Microsoft (Refer System.IO.Packaging namespace for more) as a Container Specification, which can embed multiple sub objects under it and later standardized by ISO to make it an industry wide standard to work with a variety of platforms/applications.

The actual OOXML to FlatOPC format conversion has been written by Eric White, and the implementations are free to download from here.

 

The flow has been depicted below:

xslt

 

The Solution Steps has been summarised below with a sample Transformation:

 

1. Create your Office Document Template  in an Office Application:

Save it in OOXML Format (DOX or XLSX).

 

See BookTemplate.xlsx, in the attached zip file.

image

 

2. Convert this Template to a single Flat OPC  XML file:

We can use C# application in this page to do this conversion.

 

See BookTemplate.xlsx.xml, in the attached zip file.

image

 

3. Define your XML Input file:

 

See BookStore.xml, in the attached zip file.

image

 

4. Build the XSL Transformation from this OPC XML Template file:

Now Modify the OPC XML file, to replace repetitive tags with XSL Constructs to fetch values from the input XML. Do embed required XSL Constructs to make it as an XSL Transformation file and change the extension to .XSL.

 

See BookStoreToFlatOPCxml.xsl, in the attached zip file. The below screenshots shows, the updated portions of the OPC XML file with required XSL Transformations. As simple as this.

image

image

 

image

 

5. Apply XSL Transformation to the XML input file to generate the final Flat OPC XML file:

Here we are applying the Transformation directly in Visual Studio IDE, for demonstration purpose. The good things is you can debug/step in/step through on the fly and fix the bugs pretty easily.

image

To know more about XSLT Debugging in Visual Studio, See here.

 

6. Convert the XSL output, Flat OPC XML file back to OOXML format:

Now We’ve the final output file in Flat OPC XML format. This has to be converted to OOXML format, so that Office Applications can process it.

 

We can use C# application in this page to do this conversion.

 

7. Open the Office Open XML file in your favourite Office Application:

 image

 

8. Automate the Process using C#/Java Programs:

We can automate the above steps through programs, like pull the Flat OPC XSL file from a sharepoint list, and apply it to a XML Input being generated from an Oracle Database. The actual Xsl Tranformation code can be done with a few lines of code as listed here. Also ensure that, do not use C# functions in XSLT Files. This example was meant for demo. In actual scenarios, put all such C# functions inside a .NET Assembly and add it as an extension to the XslTransform process.