Office Open XML format (OOXML) has been introduced by Microsoft, for representing Office file formats like Excel, Word and Powerpoint, which later standardized by ECMA, ISO and IEC. These formats has become industry wide (universal) standards, and portable in other office suites like Open Office or LibreOffice.
XSLT is one of the fastest tool to generate office documents, where you can generate an Office Document (Say a Word, Excel) from an input XML file and a target XSLT Transformation as detailed here. This was easy for formats like Speadsheet 2003 (MSO), which is a proprietary EXCEL format from Microsoft, where an entire workbook has been represented by a single XML file and XSLT typically outputs a single XML. Since it has been a proprietary format, other Office applications will not support them and not portable. It only works with Microsoft Office Products.
Now OOXML is a new specification, where a EXCEL workbook or Word Document will be splitted among many parts (which can be binary as well), which are combined in zip format. You can simply rename a XLSX/DOCX file to ZIP and can open them in any archive viewer to find its contents. You can find an indepth dive to the concept here and here.
As OOXML is a zip format, and contains many parts inside them. You cannot apply XSLT as is to this zip file, to generate the office documents and XSLT is not worth the effort, until Eric White, has introduced a solution based on Flat OPC Xml.
He converted the zip format, to a single Flat OPC XML format file, which can then be handled using XSL Transformation constructs. Again OPC (Open Packaging Conventions) has been introduced by Microsoft (Refer System.IO.Packaging namespace for more) as a Container Specification, which can embed multiple sub objects under it and later standardized by ISO to make it an industry wide standard to work with a variety of platforms/applications.
The actual OOXML to FlatOPC format conversion has been written by Eric White, and the implementations are free to download from here.
The flow has been depicted below:
The Solution Steps has been summarised below with a sample Transformation:
1. Create your Office Document Template in an Office Application:
2. Convert this Template to a single Flat OPC XML file:
We can use C# application in this page to do this conversion.
3. Define your XML Input file:
4. Build the XSL Transformation from this OPC XML Template file:
Now Modify the OPC XML file, to replace repetitive tags with XSL Constructs to fetch values from the input XML. Do embed required XSL Constructs to make it as an XSL Transformation file and change the extension to .XSL.
5. Apply XSL Transformation to the XML input file to generate the final Flat OPC XML file:
Here we are applying the Transformation directly in Visual Studio IDE, for demonstration purpose. The good things is you can debug/step in/step through on the fly and fix the bugs pretty easily.
6. Convert the XSL output, Flat OPC XML file back to OOXML format:
Now We’ve the final output file in Flat OPC XML format. This has to be converted to OOXML format, so that Office Applications can process it.
We can use C# application in this page to do this conversion.
No comments:
Post a Comment