5 Pro-Tips For Data Scientists To Write Good Code
Use Version Control
This is important for both collaboration and backups. It allows us to track the changes to a project as it undergoes development, useful for coordinating tasks and encouraging due diligence. Git is a powerful version control software, with the ability to branch parts of the development, track and commit changes, push and fetch from remote respositories, and merge code pieces together overcoming conflicts as necessary .
Make it Readable
A key component of collaborative coding is the ability to hand it over to other developers for review and use, meaning it has to be readable. This includes using appropriate variable and function names with explanatory comments where necessary, and regular inclusion of docstrings that introduce the piece of code and its details. It is also important to follow the relevant style guide for the language you’re using, e.g., PEP-8 in Python
Keep it Modular
When writing code it’s important to keep it modular.That is, to break it up into smaller pieces that execute separate tasks as part of the overall algorithm. This level of functionality makes it easy to:
- control the scoping of variables,
- reuse modules of code,
- refactor code during further development,
- read, review and test code.
Write Unit Tests
A unit test generally exercises the functionality of the smallest possible unit of code (which could be a method, class, or component) in a repeatable way. For example, if you are unit testing a class, your test might check that the class is in the right state. Typically, the unit of code is tested in isolation: your test affects and monitors changes to that unit only. Ideally this forms part of a “Test Driven Development” framework for encouraging that all pieces of software are fully reviewed and tested before being integrated or deployed,minimizing time spent refactoring and debugging later on.
Code for Production
Try to write your code as if you’re putting it into production. This will form good habits as well as make it easy to scale-up when it inevitably (hopefully) does go into production.
Consider “algorithm efficiency” and try to optimise to reduce runtime and memory use. “Big-O notation” is important here .
Also consider your code environment or ecosystem and avoid dependencies by, e.g., virtualisation either at the code level (e.g.Python virtualenv) or at the operating system level (e.g.Docker containers).
Production level code should also employ “logging” to make it easy to review, inspect and diagnose issues when executing the code.
Follow me on twitter: @datascienceuni and here
Recent Comments