Pro5 recently held a webinar panel discussion on Careers and Skills in Data Engineering: What you need to know.
Our distinguished guest speakers were:
- Melecio Valerio from Maya Philippines,
- Maheshkumar Paik from Maltem Asia-Pacific; and,
- John Masud Parvez from Vietnam Social Health Revolution Group.
The panel discussion featured pre-prepared and on-the-spot questions from the audience.
Below is the list of questions that were asked along with the panel’s insightful answers.
How would you define data engineering?
MEL VALERIO: In a nutshell, I would say that it is the practice of designing and building systems to store and analyze data at scale.
When I say at scale, it should be flexible and you are getting data from discrete data sources. It is a broad field with applications and every industry now is getting those things.
Organizations have the ability to collect massive amounts of data, and here come the data engineers. They need the right people and the right technology to ensure that it is in a highly usable state in time for the data scientists and analysts to use.
So, meaning to say, data engineering is a bridge between the systems, between the technology going into our data consumers to make it usable. That is how data engineering works as a practice.
In summary, it is more of designing, building, collecting, storing, and analyzing data that will be useful to our business and to the data consumers.
That’s how I picture it or define it."
Why is data engineering a good career?
MEL VALERIO: “It depends on how you approach it and if it’s really your passion. It can be both rewarding and challenging. You play an important role in an organization’s success by providing access to data.
Data analysts and scientists, the decision-makers, need to do their jobs. So, those data roles need the data engineering team to provide accessible, available, and accurate data to solve their problem and create a scalable solution. Data engineers will always be in demand.”
JOHN PARVEZ: “There’s a huge denial in data, and there’s a lot of work to be done because data is a new thing.
10 years ago, nobody even understood what data is about. So, it’s a new thing to happen which now became this upper layer for IT and the system. Data is sitting on top of it because data can help businesses create new products and services, and improve the existing products and services too.
The majority is more mature, meaning more people are already there. Maybe they’re ahead of you, smarter than you, maybe they can just work harder than you, too. So, to handle this one, in the data industry, there’s not much traffic jam on the road when you want to go forward, because it’s a new thing, and a lot of jobs are going to open. "
Basically, data is cannibalizing a lot of old out-of-date IT and data departments at work, so it’s a great place to go.
How does data engineering or data science differ and relate to artificial intelligence (AI)?
MAHESH PAIK: “I’ve been in the data business for the last 10-11 years and I’ve seen the data industry progress. The growth has been amazing, and I feel that the branding of AI has been done so nicely that it is so popular nowadays.
But we have to remember that AI is just the tip of the iceberg. There are a lot of things that happen below that, and data engineering is one of those things.
So, AI comes at the end of all the projects; if you have proper data management, proper data governance, data strategy, at the end you can do some ML (machine learning) and AI.”
JOHN: “Think about science and business. When you only innovate, you become like a scientist.
But when you actually know how to market an innovation, and how to solve a business problem with that, that’s where the engineering will come into.
If you see the innovation graph, a scientist is closer to the innovation and an engineer would be closer to a packaged solution in the market where the customer will be willing to pay for it.”
If one wants to start a career in data engineering, what’s the best role to start with considering that it is not a course that’s offered in school?
MEL VALERIO: “Many data engineers have a bachelor’s degree in computer science or a related field. By earning a degree, you can build a foundation of knowledge that you will need in this fast-evolving field.
You will definitely need to develop your data engineering skills.
When I say engineering skills, those are the fundamentals of cloud computing, coding, and perhaps database design, which is also a starting point to another career in data science.
When we say coding, it is a very critical skill for data engineering, you should be proficient in coding languages. Consider taking courses to learn about common programming languages like SQL and Python.
You also have to know about relational and non-relational databases, and the process of data engineering, which is to Extract, Transform, and Load (ETL).
Some of the data engineers came from being software engineers. From there, you already have the foundation in terms of your coding skills. Then, at least when you go to a data engineering role, it is more on how you will develop your coding skills.
You will be working with a lot of databases, tables and doing some modeling stuff. There are a lot of channels that you can go through in going into a data engineering role. There are a lot of opportunities.”
JOHN PARVEZ: “Take an online program immediately. It will probably take a few weeks, it’s free. The world is amazing.
When I was a student, everything was in books. I suffered a lot. You don’t have to suffer like me, you have everything available for you. Because there’s no internet at that time.
First, get a few free courses online, free, I would tell you. Don’t pay for it because you’re not sure if you’re really going to like it or not. By getting it free, you have less commitment. It’s good to get you started.
Take a few of them and then start making your resume better.
Start sending your CV to get an internship with these companies. If they reject you, keep sending your CV every three months, and probably you can have a backup list of companies where you could get started too.
One of the best things you can do when coming into an internship is to do research about the company and come up with an idea of what they can do better with their data, or what they are doing wrong with their data.
Just comment or say it during the interview. If you do it nicely and if you do a few practices, and you’ll know how to answer some questions in the interview, they’re going to love you.
That will set your tone for a crushing journey with the data engineers. They’re going to learn and they’re going to hire you before you even graduate from school.”
In your current role, how do you attract or find the right talent for your data engineering teams?
MAHESH PAIK: “Whenever I am looking for any resource, any consultant, I always try to see one thing, like curiosity. Curiosity is very important because I feel that when you are doing any kind of development, it’s going to be straightforward, but with a database, the customer would say one thing but they would expect something very different.
You have to come up with a lot of things, a lot of proof, a lot of biases, and a lot of research, all of which you have to do by yourself. That’s one thing.
Communication also plays a very important role when you’re working with data because you have to talk to the stakeholders, the in-designers, and the technical people, and you have to be very straightforward.
After this, of course, you need to have really good technical skills. SQL is a default skill. SQL is like the blood of data engineering as a whole.
Once you’re very good in SQL, NoSQL comes very easy, very straightforward. And you cannot skip Python. You need to learn Python.
There are a lot of videos, a lot of information out there. I feel that is one of the benefits and that is also one of the disadvantages because you can get it immediately.
You have to be very clear, as John said. You have to write down exactly what you want to do. If you look up something on the internet you will always get thousands of results, so you have to be very clear. So yeah, curiosity, communication, and technical skills.”
Despite not being good at coding, is there still an opportunity to become a data engineer?
MAHESH PAIK: “I remember when I started my career, it was not the coding that was very important; it was the algorithm that was also very important.
If you’re working with tools like ETL, everything is drag and drop nowadays, and it will start working. So, if you are already very amazing at Lego, then you can be very good at these kinds of tools. You just have to build it by dragging and dropping all the stuff.
But to do this, you need to have a very good understanding of the client’s needs. You have to go to the specs again and again. You must understand the description and then you have to start building up.
No code is revolutionary for sure, I have no doubt on that, but at some point, if something is not working you have to deep dive and you have to start coding.”
MEL VALERIO: “Robots cannot replace data engineers in the future. But yes, if you are not very proficient in coding, you can still be a data engineer.
As a data engineer, you are a problem solver and it doesn’t mean that hundred percent of your time, you will always be doing coding. It’s more about how you understand the business, the requirements of the business, and how you are going to present it to your data consumers so they can make use of your data.
Using those ETL tools, it can make data engineering easy, but again, if you really have a passion for data, you will always unleash your potential, you will strive hard, and you will not just depend on those very nice ETL tools.
If everything doesn’t work well, then definitely you will be forced to do a lot of validations, a lot of seeing what went wrong, you need to become a logical thinker, and eventually, you will now do your coding. So, you will not get away with coding.
But it doesn’t mean that you don’t have a chance or opportunity to become a data engineer.
If you’re not good with coding, it will come, as long as you have the foundations of a data engineer which are being a problem solver, a logical thinker, and [you have] a passion for data.
That’s how I see it.”
What would be a good place to start learning these technologies?
MAHESH PAIK: “It all jumps down back to SQL. Once you have a good understanding of SQL, Python, and any kind of technology, you can build up your own skillset of tools and you can create a custom set of tools that you know.
Make your base very good.
Any kind of tool is good, but you need to have a very good understanding of it and what it does.
You can start building your own ecosystem of tools, you just have to get the look and feel of it. So just have SQL, Python, and one of the programming languages, then you’re good to go.”
What types of businesses or industries are currently showing a demand for data engineers?
All three of them answered that every industry needs a data engineer and John provided a shortlist of industries below:
JOHN PARVEZ: “Number one, healthcare, they have a massive need in the security sector. If somebody is interested in information security, this industry is very relevant. The second one is retail. The third one I will say is insurance. There is also big data coming from their industry.
Another one I could mention that could be very important is the food and beverage industry. They have a lot of stuff to do because they want to create a lot more feel for their existing products and they want to create new ones too. Because in the food and beverage industry, all the food is available and they think, how do I differentiate my
product? How do I add something new?
Building a new product can be easily crafted when you look at the data.
Did you know they collect a lot of data? For example, you go to a convenience store, you buy chips, and nobody cared about that before. But now, they start to realize that they need to collect that, like most retail stores have their own apps.
At the end of the day, the retail store can sell or share the data to the company so they can build an understanding of the buying behavior of their market, and can innovate their products too.
These are the industries right on the top of my mind, and to bring us back to the original answer: everybody needs it because it’s a new thing.
MEL VALERIO: “Choose a company that’s going for a digital transformation because they will definitely need a data engineer to process all their big data and make use of that for better customer experience.”
Can you give some concrete examples of a problem that a data engineer solves?
MEL VALERIO: “Nowadays, every organization has this vision of giving better customer experience, right? Meaning to say, you need to collect all data coming from different channels because it is a complex problem for the data analytics and data science team; how can they see what a particular consumer is doing, their behavior, how can we
provide them a personalized offer, how can we run a marketing campaign?
The data will be coming from different sources—from their front-end systems, their recall systems, and other applications. So now it is very hard for the data science and data analytics team to combine everything because they will be running a lot of things in your model and all that. Here come the data engineers making wonders out of it.
They will be the ones responsible to consolidate everything. Based on the business’ role, they will transform your data into a business dataset that would be scalable and flexible for daily use by your data consumers.
So now you are a problem solver for those kinds of challenges that the company is facing.
So, you consolidate all your data; you perform all the necessary data governance or data management disciplines, accuracy, validation, and all that. Your business is now confident that the data coming from different sources is now consolidated.
JOHN PARVEZ: “In Vietnam, last year, they launched a COVID vaccination passport. That means it has [a record of] different doses, the first dose, second dose, and third dose. So, 90 million people all over the world were taking vaccine shots. All the data was consolidated in a mobile app for everyone to install so people can easily see their present status.
Everything has been settled in their mobile apps. And this mobile app has these QR codes, so everywhere they go, to the mall or their office, they just scan the QR code and they go in. So now, all the data is going to a central place. Now if somebody gets COVID positive, then immediately we can know what path they’ve taken for the last 2 weeks; and which people they interacted with.
They will be F0, F1, and F2, and that will make the work easy for the administering team. A report will be sent to their team and they can take action easily.
That played a powerful role in controlling COVID-19 in Vietnam, and that was incredible. That powerful tool can now be used not only for COVID but for any kind of national level contamination, or even for a massive flood or a disaster issue, the same app can be used.”
How do you prioritize what data to focus on, how to convert it, and how to prepare it?
MEL VALERIO: “As the head of data governance, I always work with the team of data engineering and data science. We work and align together with the vision of creating business value out of data.
If you want to create a business value out of data, you have to align your metrics, your dashboard, or your KPIs with the company’s mission.
Get the top 3, say we need all this data and identify those critical data elements because we cannot boil the ocean.
So, focus on those critical data elements and make sure those elements are really critical in terms of providing relevant information to generate a report, an analysis, or a model.
From there, you can create a data pipeline strategy where all the data comes from different sources.
We make sure that when they go through our data lake infrastructure or data lake architecture, they are already curated, always available, accurate, and relevant or timely information or data that can be used by our data consumers for decision-making. That’s how we build in terms of how we can be able to provide a data strategy or a roadmap.
We focus on a domain; we identify a consumer domain, non-consumer domain, product domain, and those critical data elements so that it can become a minimum viable product (MVP).
Meaning, that in an agile world, it can already be beneficial or have an impact on the business. You have to take an incremental approach.”
Have you ever been part of a company that has a Chief Data Officer (CDO) and do you need to wait for the next ten years to have one?
MEL VALERIO: “It should not wait until the next 10 to have a CDO. Right now, almost all the organizations have a CDO or are already looking for a CDO.
The primary roles of a CDO are to always look for the data, utilize the data, monetize the data, and make sure that it will add big value to the business.
I worked before with a CDO and it’s a really good motivation for any data professional. Regardless, if you would want to stay in your own data career, like in data engineering, and if you really want to step up the ladder, then definitely it is a stepping stone to becoming a CDO.”
JOHN PARVEZ: “The way things are changing in the industry, it’s very fast. So, it’s not going to be 10 years, probably 4 to 5 years, maybe 3, almost all major companies will have a CDO. But which position will be in the package?
At the end of the day, it seems to me that a CDO will take a lot more role on the digital marketing side. The Chief Marketing Officer (CMO) may lose their part to the CDO, and probably even the head of product development’s [tasks] will be well connected with the CDO too. The third position is the CIO. The Chief Information Officer will now kind of turn into a Chief Trust Formation Officer. We’re not seeing a lot of CDO yet.
I think the biggest problem for a company not having a CDO is not because they don’t want a CDO. They want to have one, but they cannot find people with that kind of experience and expertise yet.”
MAHESH PAIK: “It’s not like the companies don’t want [a CDO]; they just don’t have a person who has such depth of understanding yet. It is going to happen very soon.
I can give you one example. One of the major banks here in Singapore recently has put forward a very good purpose around the data and is trying to educate everyone. They have started a school of data where they’ll be able to educate everyone across the globe about [tech like] Tableau, SQL."
It is getting very important to have at least the least knowledge about data, right? CDO, CIO is going to happen for sure.
Did you find this blog interesting? Stay tuned for more as we share the latest and top learning resources for aspiring data engineers!