pandas to excel (with xlwt styles!)

Why use Python, pandas, xlwt?

While using R is mandatory when doing just about anything statistical, Python code is usually faster, more readable and is not limited to the analysis stage. R is a statistical language, and Python is a complete programming language. The Python pandas library provides very much the same functionality found in R data.frames, which means one could do their data wrangling in Python for a fraction of the time spent in R, and with better code.

Python also offers the xlwt library, which enables one to write just about anything to XLS files, including setting borders, colors, number format and functions. Taken together, having pandas and xlwt would mean one could automate complex reporting through Python scripting, right?

Missing functionality: Excel cell styles

If you’re a heavy pandas user and have to provide reports in Excel format, you would know that pandas’ DataFrame.to_excel method lacks support for xlwt styles. This is a major limitation. But is also easily solved. Whereas I would gladly contribute to pandas if I could reach github from work, the second best solution, and much more time efficient, is to simply subclass DataFrame and Series to include that support.

How can that be so simple? Well, pandas does support xlwt styles. It is there in the code, but hardcoded to one standard style. Pandas uses xlwt to write to XLS files, and xlwt requires a style argument. So the trick is just to create a xlwt.Style object and pass it as an extra argument to the subclassed pandas object.

Missing functionality: Column MultiIndex in to_excel

It is a matter of taste that I would like my column MultiIndexes to work as expected (one row per level) and not as the standard implementation (appended strings).

Hidden functionality: xlwt style string as dict

I didn’t see it described anywhere in the pandas documentation, but its source has a very neat parser to create xlwt styles from dicts. It doesn’t include number formats, but most everything else can be stored this way. It goes without saying that it makes code much more readable. The code below


conv = pd.io.parsers.CellStyleConverter()
hstyle_dict = {"font": {"bold": True},
               "border": {"top": "thin",
                          "right": "thin",
                          "bottom": "thin",
                          "left": "thin"},
                "pattern": {"pattern": "solid",
                            "fore_colour": 26},
                "align": {"horiz": "center"}}
hstyle = conv.to_xls(hstyle_dict)
dstyle_dict = {"border":{"top": "hair",
                        "right": "hair",
                        "bottom": "hair",
                        "left": "hair"}}
dstyle = conv.to_xls(dstyle_dict)
dstyle.num_format_str = '$#,##0.00'

And an object from the new XLtable class


t = XLtable(your_pandas_DataFrame_object)
t.place_table(ws=ws_1, row=5, col=5, dstyle=dstyle, rstyle=hstyle, cstyle=hstyle)

Would give you

result in Excel

result in Excel

Methods for Series and placing data, column and row indexes separately are also available. What are you waiting? Grab the code at my Gist repository!

About these ads

About dmvianna

Daniel is a keen reader of history and all things political. In his free time, Daniel has also dedicated his mind to study the brain in a laboratory setting. Part of his knowledge of history comes from listening to podcasts while he counts brain cells in a microscope. True story.
This entry was posted in Data Analysis, Python and tagged , , , , , , , , , . Bookmark the permalink.

6 Responses to pandas to excel (with xlwt styles!)

  1. Matthew says:

    I receive errors Unexpected data type when trying to write to .xls. Know of any work around? This looks very powerful. Thanks!

  2. John Machin says:

    (1) The style_from_dict gadget looks like punctuation overload. Have you tried xlwt’s easyxf? (2) I’m pig-ignorant about numpy. Please explain how numpy.int64 makes xlwt “gasp”, but numpy.float64 can be recognised … the relevant xlwt code is `isinstance(label, (float, int, long, Decimal))`

  3. dmvianna says:

    (1) Yes, I know easyxf. I just find that the pandas dict thingy keeps my code neat when I have to handle a lot of styles at the same time (as when I have to create tables with thick borders around it). (2) xlwt doesn’t recognise a type unless it is in its own list of known types. It happens that it recognises numpy.int32, for example, but not numpy.int64. Why the maintainers don’t want to include it in their code is a mystery to me.

    • John Machin says:

      “The maintainers” (i.e. me) have never been asked to include it, not even by you. Nobody has ever asked for the unincluded numpy.int32 and numpy.float64 to be included either; that would be because those types work reasonably with the built-in `isinstance` but numpy.int64 doesn’t.. See (updated) https://github.com/python-excel/xlwt/issues/15 for more information.

      • dmvianna says:

        Looks like I misread completely your first GitHub post! Again, thanks for showing what the issue really is. I will report it to NumPy as soon as I can. And thanks for making xlwt available, without which a lot of my work wouldn’t be written in Python. :D

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s