What I’ve Learned & What You Need To Know About Data Science
So…Data Science, huh?
Was it difficult? To me, sometimes yes. Was it fun? Definitely. Was it worth it? Sure. It’s one of the most demanded knowledge and skills in the century. Although I already have a job in a GIS company, I decided to pursue it because 1) it’d be an advantage for my current job 2) I have to adapt with changes 3) I like to consider myself like a multi-functional tool, like I am one thing but also a lot of things. Like if people need me to be a content writer, I could be that. If they need me to be a data analyst, I could be that. Something like that.
So I just completed my Master’s degree in Data Science in one of the local universities, while having a full-time job. I had almost zero social life for two years, all I did was studying and getting paid. I had classes on weekends. The only thing I regret about it is that I should’ve pursued it a little sooner. But what can I do, the post-grad data science course is still new in the uni. I was in the second batch. There was only like 15-ish students (6 students in the first batch). Now maybe more.
The students came from different backgrounds. I, myself, have a computer science background and working experience in GIS. Some others are doctor, engineer, statistician, system developer, etc. There’s even someone from the military. Data science can be applied in many areas because it’s about solving problems. The challenge with different background is that some of us may understand some things differently. Like when this lecturer, she told the class that the data is kept in a ‘table’, and this doctor was like ‘A table? How do you keep data IN a table?’ Because he thought the ‘table’ as in the TABLE (where you put food on it). He’s not familiar with the terms, he would describe it as ‘store’ instead of ‘table’. But as someone from computer science background, to me it makes total sense. We have a database, and the database has tables with rows and columns, that’s where we keep the data, while ‘data store’ would mean a different thing.
Anyways, your backgrounds don’t matter if you’re willing to learn.
Here are some of the subjects I’ve learned:
- Data Organization
- Enterprise Data Analytics
- Statistical Computing
- Data Mining
- Advanced Decision Support System
- Data Visualization
- Text Analytics
- Nature-inspired Computing
Data Organization
Okay this is more to computer science stuff. You learn about the types of data (structure, unstructured, semi-structured), how to store and manage them to make them useful. You learn about databases, the flow of data in a system, basically learning how to organize data in a system based on the nature of the business.
Like you know how food menu is categorized by the type, right? It’s sort of like that. We organize them so that it’s easier for customers to understand and look for what they want.
Enterprise Data Analytics
When we talk about ‘enterprise’, we talk about ‘money’. That means ‘business’. It could be anything from banking (loan application), telecommunication business (customer churn), human resources (employee turnover), retail business (customer purchases), etc. Like for example, did you notice that if you browse scarfs on Tudungpeople website, suddenly all the ads on websites you visit shows the scarfs you saw on the Tudungpeople website; or on Zalora when you look for black shoes, you’d notice that the site would recommend shoes that are similar to what you’re looking for; on Netflix, notice that there’s ‘89% Match’, 98% Match’ on TV shows you haven’t watched. The system learns from your browsing history, what you clicked on, it basically learns your behaviour, your taste, what you like, so that you’ll spend more money and time on it. Sounds evil when I say it like that but believe it or not, these are all just business. And for this subject, you’ll learn how to do that. You think all your Facebook posts and tweets are completely private? Don’t be silly.
Statistical Computing
You know how people in data science love to compare R and Python? Okay, for those who are wondering, R and Python are programming languages that are mostly used for data analysis. I personally would prefer Python over R because it can be used for general purposes, highly flexible, and basically faster. However, R is specifically designed for statistical computing. Which language you should choose to use, it depends on the application. You’ll also learn a little bit of math and techniques like bootstrapping, jackknife (for data resampling), monte carlo simulation, visualization, interpreting statistical results, etc.
Python, on the other hand, is more popular and widely used for data analysis. I even used Python for my data science project. It got all the package I need and process my data faster. Maybe I prefer to use it because I also use Python for work, so to me it’s more familiar.
Data Mining
This is one of the subjects that I enjoyed. Data mining is how we learn to discover patterns, anomalies or correlations in large datasets, thus learn how to make prediction. It’s one of the core knowledge that you need to know if you want to venture into data science. It is a technique how we turn data into information. We learn data mining methods that includes various machine learning techniques in supervised and unsupervised learning.
Some of the widely used machine learning techniques used for classification, regression and prediction are covered in these subjects such as Artificial Neural Network, Support Vector Machines, Naive Bayes, Decision Tree, and Logistic Regression. Besides that, we also learn clustering (a method in unsupervised learning) and market basket analysis (association rules).
Also, we be doing some math here and there, but nothing too complex. It really teaches you how to think analytically and critically. SPSS, SAS E-Miner are some of the software that we use to perform data mining tasks.
Advanced Decision Support System
This is also one of those computer science stuff. Decision Support System (DSS) is a system that analyze data and information that supports business for decision-making activities. For example, the GPS system. Ever wonder how the navigation system determine which route to use so that you can reach to your mother’s cousin’s friend’s house as soon as possible with lower chance to get lost? How modern farmers determine the best time to plant, fertilize and harvest crops? How medical personnel diagnose illnesses? Those are just few examples of DSS. But what does it mean by ‘advanced’?
In this context, the advanced decision support system refers to the intelligent system (artificial intelligence or AI) applied to the DSS so that it has the ability to learn from the data that is fed into the system. You mean like a robot? Pretty much, yeah. For example, IBM Watson. Watson is an advanced question-answering computer system that uses AI approaches to assist health professionals in making decisions about diagnoses and treatment options. So basically, massive amount of data such as clinical literature, health records, test results are fed into the its database, then a medical personnel can input a query to the system about the symptoms on certain patient and Watson process the input to identify the piece of information and facts that is relevant to the patient’s medical history. The system then form and test hypotheses and provides list of individualized recommendation of treatments.
And that’s just one example. Imagine what else we can do with such technology? That’s what we learn in this subject.
Data Visualization
Sounds easy, right? Just create a couple of charts here and there, throw in some text, use different colours, and voila. NO. There’s a technique for this, okay? Imagine all that you have is numbers and short texts in a structured file, and you have to tell a story about it, and make it interactive.
At first I thought this subject is easy until I actually have to do it for projects. Data visualization is about telling stories using graphical representations. It’s about what you want to tell to the audience/viewers, what do they need to know, what they do NOT need to know, how should I present this data to them without them needing to read too much? Although many of my friends claimed that it’s not a complex subject, which I agree, maybe some of us think way too deep we forgot that that’s not what we actually looking for. But believe me, it’s a fun subject to learn.
The software that are commonly used for data visualization includes Tableau, Power BI, and if you’re into simplicity, you can actually just use Excel. I personally think it’s easier to use Power BI compared to Tableau. Tableau is great though, it just took me a little longer to get used to it.
Text Analytics
Two words: Sentiment analysis. Basically we learn about identifying emotions in text. The challenge is the language. Most text analytic software only identify text that are in English. Since Malaysian people usually mix both Malay and English words in sentences, performing sentiment analysis on that kind of textual data is not that easy. Natural language processing is a pretty complex subject to learn. People keep creating new words. Let’s not even start with short forms like ‘lmao’ for ‘laughing my ass off’, or curse words and sarcasm. As human, we understand the emotion in the text by understanding the context. But with machine, it does not have the capability to understand the context yet. For example,
“This app is fucking awesome.”
“It’s lit but the extra feature is trash.”
“Just gr8. I paid $25 for nothing”
And some Malay text:
“Game ni best gila!”
“Nape asyik crash?”
“Kejadah suruh bayo? Bagi free la!”
Can you identify the emotion in each sentence? Is it positive, negative, or neutral? How does a machine do it? In sentiment analysis, the words in the sentence are segmented and analyzed one by one. Like ‘awesome’, that’s positive. The problem is words like ‘lit’, ‘gr8’, and we know that ‘trash’ is negative but machines don’t know that, if we don’t teach them (by using corpus). Malay text? That’s another problem. Because our people are used to short forms such as ‘nape’ from ‘kenapa’ (why), and changing words according to how it sounds (‘bayo’ instead of ‘bayar’ (pay)). Also mixing with English words since we’re all still internally colonized.
It’s a cool subject to learn. A huge area to venture. Maybe you can contribute and become an expert in sentiment analysis that focus on Malay language. That would be cool.
Nature-inspired Computing
I consider this as one of my favourite subjects. It’s not an easy subject though but I always have a thing for complicated stuff. Even in people. People with complex character are my favourite. Like you just wanna crack them open and figure them out. That’s how I feel about this subject.
We learn about nature-inspired computation techniques/algorithms such as Evolutionary Algorithm (EA) like genetic algorithm (GA), Swarm Intelligence such as particle swarm optimization (PSO), and ant colony optimization (ACO), other kind like cuckoo search, firefly, and gravitational search algorithm, and how all these algorithms are used to solve problems such as traffic signal problem, hydorelectric system schedule problem, image edge detection, job scheduling problem, assembly sequence planinng, etc. So for genetic algorithm, we learn a little bit of biology like how meiosis work. For particle swarm, we learn about how animals that travel in swarm such as birds move. For ant colony, we learn about how ants travel from their nest to find food. Like have you ever wondered how ants always move in order and they all use the same route? We learn about all that and then we create algorithm based on that. Fascinating, right? Also you’ll be learning a lot of math. When you learn data science, you just can’t escape from doing at least a little calculation.
Maybe if you pay enough attention to the nature, you might create a new algorithm. There’s even an algorithm called Social Emotional Optimization, I’m not sure how that works but that does sound interesting.
So What?
What can I do with all this knowledge? A data scientist, is that all I can be? What’s wrong with being a data scientist though? It’s one of the highest-paying job in the century, that’s why everyone wants to be a data scientist. Okay, not everyone.
But that’s not all you can be. You can work as machine learning engineer, data engineer, data analyst, automated system engineer, and many more. If you love figuring things out, getting into details, math, tech, and the dark side, you will enjoy doing this.