Fivetran : 7 practical best-practice tips to conceive modern data stacks
A modern data stack transforms raw data into precious insights. However, their design can be tedious without the use of a good method and the right tools. Veronica Zhai, Analyst Director of Fivetran, was previously an option trader for the J.P Morgan bank. She conceived the first Modern Data Stack. In the study published by Fivetran, she shares seven best practices to achieve a qualitative and durable result.
Register for the HUBDAY Data & AI For Business on May 17th and 18th
A Data Stack is a suite of tools that allows you to integrate data. In its modern version, most of those tools are integrated to the Cloud (or Cloud native), which enables the automatisation, the reduction of cost and using facility for the final users in all the life cycle of this data.
First tip : methodically design your data stack
To conceive your data stack in a efficient way, it is necessary to follow a precise framework, which Fivetran's study summarizes in 4 steps:
- Configure the data warehouse based on the Cloud :
You have to study the different data warehouse offers, their scalability, and pricing models. The technical characteristics, programmation languages are also good examples to take in consideration.
- You will then have to connect the data warehouse you have chosen to a Business Intelligence (BI) Cloud Native tool. It will allow you a better visualization of your data, a user-friendly interface and facilitate the collaboration of the different professions.
- Choose your data pipeline to extract the data on your applications and operational systems and change them on your data warehouse. Here again, pipeline providers have different approaches that need to be compared according to your needs.
- Finally, choose the tools that will allow you to transform your data to produce exclusive insights and predict future data based on past data.
Second tip : avoid pitfalls when creating a modern Data Stack
Veronica Zhai identifies several mistakes that can hinder companies' ability to use data effectively.
According to her, companies often tend to maintain an expensive and high-maintenance on-premises storage infrastructure. The infrastructures on site do not adapt in high traffic, while on the Cloud stockage ressources or complementary calculation could be easily activated.
It is also advisable to avoid customizing your data pipeline yourself. This involves serious engineering challenges and mistakes can be made. Creating a "home-made" data pipeline is also very time consuming and laborious. Try to externalize this task and automate it as soon as possible.
Use ELT instead of ETL for your data integration. The Extract Load Transform automatically loads the data in a quasi-raw state so that it can be modeled by the analysts. This is much more convenient and preserves data integrity.
Finally, be disciplined in your data management. By integrating more and more data and using new tools. For example, it is useful to use guardrails that control and eliminate data assets that are no longer useful. This allows for more efficient searching in the data warehouse and minimizes errors.
Third tip : Use a frame to succeed your recruitments in the data field
Successful recruiting for your data team consists of quickly hiring talent with the required technical skills. To do so, Veronica Zhai offers a multi-step selection process to help you achieve your goals.
- First, consider testing your candidates' technical skills, especially advanced SQL skills.
- In the second round of selections, test your candidates' execution skills. For example, ask them to write a 30-60 and 90-day plan and evaluate their business strategy.
- You can then test their analytical skills by asking your candidates to present insights and visualizations from complex sample data.
- Look for cultural fit and alignment with your company values. Prioritize new hires whose traits and abilities complement your team. When in doubt, consider making reference calls.
Finally, with the use of SQL for modeling and transformation, you can go without data engineers for a while. However, continue to build your team of analysts and consider hiring a data architect to optimize the overall system.
Fourth tip: the essential 180 days testing period to build a solid data base
The first six months are crucial to prepare the field and your company's analytics efforts. First of all, a centralized data team should be created, attached to the manager or to a technical executive who makes the link with specialized teams such as product, sales or marketing analysts
By creating alliances between teams that already use analytics and those who don’t, you can help them to optimize their Data Integration and avoid the duplication of tasks. Finally, it is important to use the same performance indicators than the direction and ensure that they are reflected in business intelligence and business strategy. This could be revenue metrics, customer growth and churn, or the number of daily active users of your product.
Fith tip: manage your data team like a R&D team
Do not consider your data team as a simple technical support team, an engineer team or a product team, but more as a combination of the three. Indeed, the team must be composed with a product goal with client satisfaction as principal indicators.
Similarly, the team will need to operate with engineering principles: to be performant, it will need to invest at least 25% of its resources in building an easily navigable and scalable data infrastructure. Finally, the Analytics team must operate a customer-centric service.
This team will be able to provide onboarding and ongoing support, as well as work with partner teams to resolve production issues and develop training materials for the final users.
Sixth tip: use systemic thinking, a solution to data complexity
The complexity of companies’ data grows throughout the data life and management cycle. This is especially true for large organizations like J.P. Morgan, which have experienced numerous fusions and acquisitions and have had to integrate many different data-generating systems.
Systemic thinking allows us to simplify the organization of data by breaking down silos and centralizing data. First, it allows different teams to minimize the number of queries because they are all based on the same data. It is also an opportunity to implement Machine Learning algorithms to automate the update of this indicator and provide teams with real-time visibility.
Seventh tip: Identify virtuous and vicious circles of information
Thanks to quality data, you can also better illustrate your company's operations, particularly through "flywheels". These are virtuous (or vicious) circles that synthesizes the information flowing through your company. For example, a good understanding of your customers will allow you to obtain a superior quality product, which will promote customer satisfaction and increase your customer base.
These flywheels can be assembled into causal loop diagrams. This allows you to illustrate even more precisely the cause and effect links between your company's operations.
Causal loop diagrams allow you to present in one place the different parameters that act on your company and its activities. You can therefore more easily identify, one by one, the areas in which you can gain in competitiveness. The data will allow you to quantify the different elements of your operations. You will therefore have a more precise vision of your resources and will be able to highlight your results.
This thinking about the use of data can truly revolutionize the manner in which we operate," says Veronica Zhai. It can have a profoundly positive impact on the business world and help professionals reach a higher level of awareness.
Meet Fivetran at the HUBDAY Data & AI For Business, on May 17th and 18th