Data analytics and data science – What I wish I knew when I started out

Share on linkedin
Share on facebook
Share on twitter

I have had many people who are considering pursuing data analytics or data science as a career ask me about my experiences and day-to-day environment in the field. So here I share some of the common questions that others may find useful.

What personality type is suited to either data analytics or data science as a career?

If you enjoy problem solving, investigating patterns and improving processes, your thought-processes are already in line with the field. It is a career where you can:

  • Efficiently test or experiment an idea
  • Continuously learn new things, if you know how to ask the right questions
  • Incorporate the skills into almost any other industry
  • Advance your general strategic planning and critical analysis
  • Gain fact-based understanding of company processes
  • Grow rapidly, alongside the ever increasing knowledge base, technology and volume of data

What is the difference between a data engineer, scientist and analyst?

The (data) engineer, scientist and analyst play an integrated role in different stages and functions in the data process. The engineer solves for how best to acquire, transport, transform and store the data. The scientist solves for using the best tools and techniques in order to maximize the power of the data, often using predictive modelling techniques – from linear regression, to tree-based methods and machine learning. Lastly, the analyst solves for how to effectively create value from the available data. They help inform each other and these titles are often ambiguously used interchangeably outside of the industry.

Which one am I suited for?

It depends on your interest and ability on the spectrum of technical involvement versus business aspects.

You don’t necessarily have to specifically pursue one of these career paths. In smaller companies, you could be wearing two or three of the caps. Even in large companies, there is often overlap between the role of an analyst and a scientist. Rather than focusing on the title of interest, focus on the activities of interest.

Which one am I suited for?

You don’t necessarily have to specifically pursue one of these career paths. In smaller companies, you could be wearing two or three of the caps. Even in large companies, there is often overlap between the role of an analyst and a scientist. Rather than focusing on the title of interest, focus on the activities of interest.

What do I need to study?

There are data science degrees one can take; however, most people I’ve met in the industry have not studied those specific degrees. The majority of current data analysts in field have acquired transferable skills through other degrees such as engineering, statistics, computer science and mathematics. At the same time, I have colleagues who have studied commerce, architecture and psychology. You do need to have a decent grounding in math but in my opinion what is more important is to be analytically minded, knowing what questions to ask and where to look for the answer. 

What other skills can I learn to prepare?

It will vary depending on whether you’re looking to continue as an engineer/analyst/scientist. Some of the programs worth starting with would be:

  1. Microsoft Office – Excel (particularly Pivot tables)
  2. SQL (MySQL is the similar, free, open source version)
  3. Python or R (start with one of them. Python is more versatile)
  4. Java (for the data engineering route)
  5. PySpark (uses Python language to interface with Apache Spark for large volumes of data)

Where is the best place to learn?

Unlike other degrees where one needs a certificate to practice it, in data science it is more important that you can prove ability over qualification. Holding an advanced degree, is certainly one way to set yourself out of the crowd, but beyond that you need to be able to demonstrate the ability to translate data into insights and solutions.

I would suggest doing free online courses. A great place to start would be Class Central, where they search the web for courses on every topic and rank them accordingly. A highly recommended Python training MOOC, hosted by the University of Michigan, is accessible via Coursera

Soft skills are also important. Once you have done the hard work, it means nothing if you can’t convey it smoothly and concisely. This article by Consultant’s Mind covers several good and bad representations related to data

Machine learning! What’s that about?

Machine learning is really just a reference to using one of several predictive modelling algorithms. The algorithm will use some form of reinforcement learning where the model is teaching itself through many iterations. Understanding the theoretical process of ML will be important in order to be able to justify its use in your work, but beyond that the hard work is already done for you in different program libraries.

Consider not starting with machine learning right away. It is a powerful solution in many problems, but it is by no means a solution to everything. First learn the languages and the easier predictive model building techniques. Linear and logistic regression will be able to cover most of what you need to do. Later down the line, when you’re ready, you can try out various machine learning packages and techniques with tutorials and exercises on Kaggle. They have lots of real life data-sets to play with.

Most importantly, have fun and keep digging deeper into the data.

If you are already in the data industry, what would you say you wish you knew when starting out? If you are starting out, is there something else that doesn’t make sense or have you tried any of the above?