Best Practices when working with Azure Synapse
How to work efficiently with serverless SQL-Pools
--
If you are in the Microsoft Azure Cloud and want to run big data analytics there and run a Data Warehouse, you might well choose Synapse Analytics here.
What is Azure Synapse Analytics?
Azure Synapse Analytics is an analytics service that combines data integration, data warehousing, and Big Data analytics. You can either use the serverless or dedicated option. Azure Synapse brings these worlds together in a unified interface to capture, explore, prepare, transform, manage, and deliver data for out-of-the-box business intelligence and machine learning applications [1]. If you are interested in how it compares to other solutions, this article might be interesting for you.
What are the Best Practices?
When working with SaaS Data Warehouses, you should also always think about the costs. The cloud can save a lot of money, but you should also follow best practices, otherwise antipatterns may lead to additional or unnecessary costs.
Therefore, here are some tips that I have found useful during my work with Synpase and that Microsoft also recommends here [2]:
- Collect all of your client applications with a SQL pool that functions without any servers.
- Utilize a non-server SQL Pool that is in near distance to the machine of your client (e.g. Power BI, SQL Server Management Studio or Azure Data Studio).
- Only query the data that is really needed.
- Make sure that the storage and the serverless SQL pool are located in the same area.
- Make sure to get the most out of the storage layout by using partitioning and keep your files between 100 MB and 10 GB [2].
- Filter the results by string columns with the usage of BIN2_UTF8 sort [2].
- Cache the results on side of the client by using the Power BI Import Mode or the Azure Analysis Services.
Summary
If you plan to move from an on-premise to a Data Warehouse, then many opportunities arise, but there can also be immense costs. Therefore, in addition to monitoring and controlling, best practices must be observed, especially in data analysis, because all large cloud providers pay well for this. If you pay attention to this, you can save a lot of unnecessary costs. I hope the best practices for Microsoft Azure Synapse shown above will help you.
If you are mor interested in Azure Synapse these articles might be also interesting for you:
Sources and Further Readings
[1] Microsoft, Azure Synapse Analytics (2022)
[2] Microsoft, Best practices for serverless SQL pool in Azure Synapse Analytics (2022)