What we’ll cover
- Basic shell navigation
- Cookiecutter data analysis
- Managing virtual environments
- Creating a test data pipeline
- Doing a groupby aggregation
- Doing unit tests
What we won’t cover
This seminar is just a taster of what we can cover in a full-day workshop, so we won’t be covering:
- hardware and GPUs
- working on clusters
- dynamic data linking
- file formats
- using the key python libraries in more detail
- unit testing
- version control with git
- working with larger datasets
- interactive visualisations
- reproducible containers like Docker or Singularity
- working on your data
Activity
What do you feel are the major bottlenecks in your analysis process?