Building an Event-Based Pipeline and Data Quality Framework in Energy Supply & Sales
Metricating your business operations is the best way to comprehend them and make improvements where they are most needed. Our client, a European energy trading company, wanted to make information rapidly available so market analysts and traders, as well as internal systems, could act on it and, on the other hand, ensure the data meets certain quality thresholds. Scalefocus created an event-driven pipeline that applied real-time triggers across the organization so front- and mid-offices could take proactive steps to mitigate risks and act on insights that unfold at the moment.
Raw input turns into actionable data
Modern data pipeline architecture
An international energy company that operates numerous power plants and leverages advanced technologies and innovations to manage large-scale power systems and infrastructure. The company sources oil, natural gas, and global commodities and trades electricity, emissions certificates, natural gas, LNG, and coal.
Our client’s trading and market analytics teams conduct a range of analytical, modeling, and optimization activities to leverage positions in international commodity markets. Market, pricing, weather, and other data types play a crucial role in quickly identifying opportunities and assessing corresponding risks. The backbone of their operations is a big data analytics platform used for analysis and forecasting. This platform is a flexible, state-of-the-art, scalable solution for flowing market data across systems, processes, and departments, ensuring rapid and high-quality insights.
The platform had to be integrated with a series of external downstream tools, processes, and workflows and trigger them based on specific events to settle a more rapid and automatic approach. On top of this, we had to figure out how to ensure the quality of the data ingested into the time series meets the expected thresholds for various use cases.
These enormous data sets contain tens of thousands of data series with more than a billion records, so the entire event triggering and quality check processes had to be developed in a scalable and effective way without jeopardizing the existing ingestion process.
The client also needed an experienced partner who could define the data quality goals of the analytics platform and implement a data quality framework that continuously profiles data for errors, completeness, consistency, and reliability.
The solution was split into two major phases.
First, we established the event-based pipeline (EBP) architecture throughout Azure API Management to ensure that events can trigger various downstream tools. Moreover, the rule checking and following EBP can be designed to start once the data for the monitored objects are ingested. This allows the end users to automate or semi-automate many of their processes once the data is available or complete.
Second, we developed a data quality framework (DQF) that ensures the ingested data meets the required quality thresholds for the specific use cases, mainly categorized into two concepts – Availability and Completeness of the data:
Data Availability – checks if enough data points were inserted into the system. It counts the number of received records into the Snowflake database for a specified time range. For example, a user wants to check if there are exactly 96 data points per day of the energy pricing curves he has been interested in for the past 3 months. Based on these quality criteria for specific time series, the rule can automatically trigger a dashboard in Tableau, for example, or notify the user if there is a different number of data points than expected. The rule can be configured to start in a certain datetime or when new data is incoming through the ingestion pipeline. We designed the systems to be easily scalable, where users can combine multiple availability rules into a single rule – the so-called meta-rules.
Values Threshold – the aforementioned Availability rule could also be designed to trigger downstream processes if a specific value is met per the specified criteria. For example, if any upcoming ingested data reaches above 100 EUR/MWh, a report or notification is triggered.
Data Completeness – checks if the required number of records is received in accordance with each time point of a given timestamp. For example, for hourly data curves, the system can provide a report for which exact hours in a range of 30 days there are missing data points. Similar rules can be designed for time series with different intervals of the data points – minutes, quarter hours, days, months, etc.
The technological challenge was to cover all the data sources while maintaining the system’s productivity. The rules of the DQF can require scraping and reading data from hundreds of data sources simultaneously – processing hundreds of data objects and time series. Thanks to the platform’s maturity, we had a clear vision of how we wanted to develop it further and where the EBP and DQF could add real business value without jeopardizing the existing performance.
Scalefocus upgraded the client’s existing platform with new functionalities and features that heavily optimized its performance.
We improved the data quality, and now users can rely on fast data ingestion and a data quality framework that provides complete visibility on all data series. If the data is inconsistent or missing, the users are notified, which is very important for the operations of the middle office and the traders. Before, users often were unaware that they were missing vital data points. Now they receive up-to-date reports and, within seconds, can scan the report and see what data they are missing, which is very useful when you have to monitor hundreds of data objects.
- Increased Resilience – event-driven architecture that enables increased resilience, reducing dependencies between applications.
- Scalability – modern data pipeline architecture with seamless design to integrate new data sources while maintaining scalability and business operations
- Predictability – it is easy to follow the path of data and monitor any missing time series
- Raw input turns into actionable information
We have a global client base that includes Fortune 500 companies, innovative startups and industry leaders in Information Technology, E-Commerce, Insurance, Healthcare, Finance and Energy & Utilities.