Excel is a very important tool to interactively work with data in corporate environments.
Excel is a very important tool to interactively work with data in corporate environments. Thanks to its statistical capabilities and reproducibility R can be a powerful companion to leverage and automate Excel workflows.
Excel still is the most important productivity application used in corporate environments to work with data. Spreadsheets can be valuable tools to setup interactive calculations and communicate results. Excel is available in nearly every corporate environment and does not require any extra IT-effort to install as compared to other tools.
However, using Excel for involved calculations and data science projects can be tricky. Most importantly, it mixes a spreadsheet user interface with data and calculations which makes it very hard to reproduce results. Creating automated reports based on Excel can therefore be very tricky. Excel can only handle 1 million rows in one spreadsheet and can become very slow for large data sets including many calculations. Last but not least, compared to other programming languages like R, it does not provide a large collection of statistical models and cannot be easily used for advanced analytics.
Based on the arguments in the previous section we compare Excel vs. R based on the following criteria:
The table below gives an indication how Excel and R perform based on the above criteria:
| Feature | Excel | R |
|---|---|---|
| Availability | ✅ ✅ | ✅ |
| Interactivity | ✅ ✅ | ✅ |
| Reproducibility | ❌ ❌ | ✅ |
| Large Datasets | ❌ | ✅ |
| Automation | ❌ | ✅ ✅ |
| Advanced Analytics | ❌ | ✅ ✅ |
Due to some drawbacks of MS Excel and its proprietary nature many data scientists have objections using it. R also had a rather awkward relationship with MS Excel throughout its history. Due to its open-source background the “best practice” approach interacting with Excel files has long been to export the contents to comma-separated files (CSV). However, this approach requires manual interventions and is far from practical.
Since most data science projects are interdisciplinary among different departments and stakeholders we recommend to use the best tool for the job—even if it requires to mix different tools. Currently, Excel can be integrated with R through the following workflows and packages:
Using one of the above R package integration we can have the best of both worlds and seamlessly integrate advanced statistical techniques into Excel sheets and workflows.
Coming up Readxl package…
For attribution, please cite this work as
Santos (2022, June 16). Franklin Santos: R for Excel Users (Part 1). Retrieved from https://franklinsantos.com/posts/2022-06-16-excel/
BibTeX citation
@misc{santos2022r,
author = {Santos, Franklin},
title = {Franklin Santos: R for Excel Users (Part 1)},
url = {https://franklinsantos.com/posts/2022-06-16-excel/},
year = {2022}
}