Are you a recruiter or hiring manager? Hunting for highly talented ETL developer?
We all know that business users, higher executives/authorities use the data warehouse information to make important business decisions. It’s the biggest hurdle to get the best out of the company’s unstructured data. To overcome this real big challenge, hiring the top-notch ETL developer on board is a must. And believe me, it’s even more challenging!
But don’t worry, I think I can surely help you with this!
In this article, along with proven ETL interview questions, I have written down the detailed process to be followed in order to grab the right talent. It mainly consists of identifying the responsibilities/skills required, screening & then interviewing the right candidates. Let’s discuss them in detail.
Step 1: Look for Skills & Responsibilities of ETL Developer:
1. The Responsibilities of ETL Developer:
Before proceeding further, let’s understand what exactly is ETL?
ETL (Extract, Transform & Load) is the process performed in data warehouse environment which involves the extraction of data from multiple data sources, the transformation of data into suitable format & then loading the transformed data into destination system. In simple words, if an information is supposed to be oil, then the pipeline created to carry that oil is nothing but the ETL process & this is what ETL developers do! They work on creating & strengthening this pipeline to extract & transfer the data from one place to another.
I. Extracting the Data (Data Mining):
ETL developer is responsible to manage & develop extraction tools i.e. homogeneous or heterogeneous data is extracted from the source data like relational databases, SaaS services, mobile applications, Flat files, etc. The size of extracted data varies from hundreds of kilobytes up to gigabytes. Extracting the data is the toughest task among all & it is time-consuming. Therefore all the three tasks are performed in tandem.
II. Transformation /Transportation of Data:
ETL developer is also responsible to transform the data of source into a suitable format in order to meet the specification of the destination database. It involves data mapping & efficient code generation to perform business logic.
III. Loading the Data:
This is the last & the reverse process of extraction of data. It focuses on how effectively & efficiently the transformed data is loaded into the targeted system (database, file, another data warehouse).
2. ETL Developer Skills Required:
- Knowledge of one or more programming languages: Python, Java, R, SCALA. Map Reduce, etc.
- Big Data Stores Knowledge: Hadoop, MongoDB, Spark, etc.
- Hands on ETL Tools: Data Stage, Informatica, etc.
- Data Modeling: The technique used for defining the detailed business requirements for specific database in order to design the best possible ETL process.
- Testing Knowledge: Should be able to perform the unit testing on generated code.
- Analytical & Logical Thinking: A good ETL developer/ Data engineer should be able to analyze the given data & judge all the possible outcomes.
- Communication skills: ETL Developer should have good verbal & written communication skills.
Step 2: Assess Candidates Before The Interview:
If you think that resumes speak the truth, then you are in a really big trouble! Blindly trusting the resumes & asking the candidate to directly come for the interview is risky, time-consuming and a tedious job. At the time of the interview, you can gauge about candidate’s personality, whether he/she is dedicated, passionate or cultural fit or not. What about their skills? As an employer, you must opt for a scientific approach to validate the technical expertise of the candidate.
Before asking the candidate to come directly for the interview, assess their working skills using skill testing using Interview Mocha’s valid & reliable tests that are listed below:
- ETL Assessments, designed & developed by our subject matter experts to quantify the skills of ETL developer.
- Java Coding Test for Experienced is scientifically designed to assess Java coding skills of ETL developer
- SCALA Online Test helps employers to quantify SCALA programming skills of ETL developer
- Python Coding Test validates Python programming skills of ETL developer
- MapReduce Online Test helps to gauge skills of the candidate for MapReduce programming model.
- Hadoop Online Test helps recruiters & hiring managers to analyze Hadoop knowledge of applicants.
- MongoDB Online Test is designed to quantify MongoDB, a NoSQL Database expertise of candidate before an interview.
- Spark Online Test is created to assess Spark, a modified version of Hadoop skills.
- Data Mining Test is developed to evaluate the working skills of candidate for data mining
- Data Modeling Test checks candidate’s hands on data modeling technique.
- Data Science and Analytics Test is developed to assess expertise of candidate in extracting the data from various sources using R tools.
- IBM InfoSphere DataStage Online Test helps you to evaluate the application skills of the candidate for Data Stage.
- Software Testing Online Tests are designed to validate the software testing skills of candidates before hiring.
For more details, you can visit Interview Mocha, an applicant testing software. Employers can even ask for the customizable assessments. Our content team will be happy to provide you the custom test as per your specific needs!
Step 3: Interview Round:
After screening the qualified candidates, ask them to appear for the interview. Here, I have compiled the proven ETL interview questions to ask potential prospects that will help you to assess ETL skills of applicants.
ETL Interview Questions to Assess & Hire ETL Developers:
- The models such as budgeting, financial reporting, allocations, etc. are heavy on calculations and do they not translate well into SQL. Among Multi-dimensional, Real Time and Relational OLAP, Which type of OLAP is not suitable when computations have to be carried out on these models and why?
- What do you know about Operational data store in data warehousing?
- How is a distributed data warehouse different from a virtual data warehouse?
- Do you know about a data cube? What is the significance of data cubes?
- You have a product dimension and a sales fact table in your data warehouse. A new product is created in the OLTP system and sales transactions happen for that product and then you extract the OLTP system. Due to this you get only the sales transaction into the staging environment and not the products. Why did this happen?
- Can you explain me the difference between parametric models and nonparametric models of predicting? Also given one example of each.
- Shown is a Graph Database model skeleton. Consider N be the nodes (representing entity), E be the edges (represents relationships among entities). How do you represent entity attributes, plus metadata in such data model?
- What can you say about any column if a database has high cardinality value?
- Among fact table and a dimension table which one remains in denormalized form?
- A business model has employee details, company details, and salary details. The model (schema) has only one foreign key i.e. Employee_ID for all the dimensions which are necessary. What type of model is being referred to here?
- Do you know about association rule mining? Explain with one of its practical use case.
- Suppose i1, i2, i3, i4, i5, i6, i7, i8 are the items sold at a grocery store. Customers can buy any combination of the above items as per their requirements. After collecting the data from all transaction it was found that i5, i6, i7 is the most frequently bought itemset. While i2, i4, i5 is bought least frequently. What can you say about the purchase behavior of i5i6 itemset and i2i4i5i7 itemset?
- What do you think is the primary role of a data mining engine in any data mining system?
- I give you the below ways that explain working of any typical data mining process?
i) Perform ETL onto the data warehouse system. ii) Provide data access to analysts iii) Store and manage the data in a multidimensional DB system. iv)Analysis and Presentation of data
Can you rearrange these steps for a true data mining process?
- Consider data mining techniques like Clustering, Associations, Link analysis, and Deviation detection, which one would you use for finding a relationship between two entities?
- Suppose you’re asked to reduce an existing data model using IBM DataStage ETL (Assume you are not using Parallel Extender). Can you do this as percentage wise reductions on the rows in the data source? If yes, how? If no, what is the workaround?
- Among Hash file and Sequential file, which file is faster to search data for and why?
- Why is the sparse lookup in DataStage faster than Normal lookup?
- You have two-time data type field values in a DataStage transformer. In one record the value is 00:59:59 while in the other record the value is 00:10:10. If you want to add these two values so as to get the output as 01:09:09, how would you do that?
- Can you explain me about merge stage in DataStage in the context of jobs? Also, explain how is merge performed and what is mandatory when the merge is performed on a key field?
ETL is a never ending process & so are the techniques & methods used in it! Though I have written down many aspects, role & responsibilities of ETL developers, there may be more to this depending upon the job-role assigned. But this guide will surely help you to streamline your hiring process & pick up the few small but essential things to make a big difference, to get the right hire!
If you have any queries or other approaches that worked really well for you, we would really like to hear that. Please feel free to write it down in comments below.