Louis’s Learnings – Supply Chain Design and Data Manipulation

Which tool should you use?

A lot of the work involved in Supply Chain Design (SCD), also called Network Design and Network Optimization, requires the manipulation of large quantities of data. In fact, I would say data manipulation is one of the most important skills to have when conducting a Supply Chain Design study.

There are many tools that can be used for data manipulation including Excel, Access, Alteryx, SQL, Python, and Hadoop/Spark.

Spreadsheets

Excel and other spreadsheet applications are by far the most widely used of these. I don’t have any studies to support my affirmation, but a quick poll around your office will assuredly show that any analyst uses spreadsheets on a daily basis as part of their functions. There is a good reason for this ubiquity: the software is readily available for free or almost free, it is easy to use and extremely versatile. 

Yet, with these advantages come certain disadvantages. Among them are the limitations of the amount of data that can be processed (about a million records per table), it’s limited and slow database functionalities (e.g., joining tables) and its susceptibility to errors. This article discusses some of Excel’s problems in the Finance industry that would be equally applicable to SCD. 

“In short, spreadsheets are good for a preliminary review of data and calculations, but for any serious data manipulation, one should definitely look beyond this tool.”

In short, spreadsheets are good for a preliminary review of data and calculations, but for any serious data manipulation, one should definitely look beyond this tool.

Microsoft Access

Microsoft’s Access is a tool that is commonly used for SCD. Like Excel, it is typically easily available from the IT department. Some ease of use and versatility compared to Excel is traded for power and database functionality. The most significant caveats with Access are its database size limitation (2 GB), performance with larger databases, and graphical user-interface (GUI). 

Most SCD models I have worked on in the past several years quickly surpass the database size restriction or approach it enough to make the software too slow to be effectively used. While the GUI is a boon to neophyte users, it quickly becomes a hindrance when building/debugging/maintaining tables that are created from multiple other tables and queries. 

To improve your efficiency, I would move away from this tool.

Alteryx

Alteryx is a tool that seems to be growing exponentially in popularity. In the context of data manipulation for SCD, this software provides a similar benefit to Access, i.e., a GUI for SQL, but without the database size and performance limitation. You also get many other benefits in terms of automating workflows. If writing SQL code from a blank page seems like a daunting task for you, Alteryx is a great alternative.

SQL

SQL coupled with an Integrated Development Environment (IDE) is my preferred toolset for data manipulation. There are multiple variations including SQL Server, MySQL, NoSQL, etc. for server software and SQL Server Management Studio (SSMS), MySQL Workbench, NoSQLBooster, etc. for an IDE. I’ve been using SQL Server Express and SSMS. They are freely available from the Microsoft website, well documented and extensively supported by the greater community. Any issues I’ve had, I’ve been able to quickly resolve simply by googling it (kudos to the community!).

There are many advantages to using a computer language such as SQL for data manipulation. 

The first two are that, as you write your code, you are simultaneously documenting your work and making it reproducible. These are invaluable and often overlooked benefits. Yes, if you are using Excel, it is so much easier to simply go to a cell and correct a misspelled value. Doing the same in SQL would require three lines of codes and figuring the best location to include the code in a new or existing script. However, if all of a sudden you find that the same data point has been misspelled in multiple locations, correcting the situation in Excel now becomes a much more tedious process fraught with potential errors and omissions, whereas the SQL fix is still only three lines of code that are easy to check, debug and repeat down the line when you get a raw data update (that will happen!).

“The ability to by-pass the somewhat clunky and slow data import functionality of these apps is a big time saver and a godsend for me! Try it, and you won’t want to go back!”

Another advantage of SQL, compared to the three other tool options that have been discussed, is its performance, i.e., speed, when performing a query. This aspect is growing in importance as datasets become larger and larger.

SQL code is also easy to reuse in other projects. I’ve been adapting the same bulk import script for the past 5 years.

Lastly and perhaps most importantly, SQL, unlike Excel and Access, allows a user to directly write to the tables in the underlying data schema of your Supply Chain Design application (e.g., Llamasoft’s Supply Chain Guru, LogicNet Plus, etc.). The ability to by-pass the somewhat clunky and slow data import functionality of these apps is a big time saver and a godsend for me! Try it, and you won’t want to go back!

Python

In terms of computer languages, SQL is an easy if somewhat limited one. If you want to further push the envelope and automate the data manipulation process, it can be done using a richer programming language. My recommendation would be Python, but any other advanced programming language would probably do. I have shied away from using Python in order to make my work more accessible to my clients that typically do not have programming language experience.

Hadoop and Spark

I have yet to meet an SCD project that requires the storage capacity, capability and performance of Hadoop with Spark. If you have, I’d love to hear from you to understand your experience and how your SCD software handled the model! I’m including these Big Data tools here for the sake of completeness.

Conclusion

I like to use the crawl/walk/run analogy when discussing data manipulation tools. Excel and other spreadsheet applications are in the Crawl category. Access is in the Walk category. Alteryx and SQL are in the Run category. If you want to sprint, go ahead and use Python. 

JBF Consulting can help you and your staff get up to speed with the best practices around data manipulation and Supply Chain Design.


Louis Bourassa is the Analytics & Optimization Practice Head at JBF Consulting. He provides analytical and optimization support to JBF clients. Louis has a diverse background with a mix of industry, consulting and software roles that allowed him to develop a strong business acumen and expert knowledge of supply chain analysis and design.

Founded in 2003, JBF Consulting is a supply chain execution strategy and systems integrator to logistics-intensive companies of every size and any industry. Our background and deep experience in the field of packaged logistics technology implementation positions us as industry leaders whose craftsmanship exceeds our client expectations. We expedite the transformation of supply chains through logistics & technology strategy, packaged & bespoke software implementation, and analytics & optimization. For more information, visit us at www.jbf-consulting.com

Image by macrovector / Freepik