Azure Synapse

Not a SQL Server or a Azure Synapse , you can make use of the Azure Synapse service.

Earlier they only had the service for hosting a sql data warehouse and then they brought the service of Azure synapse. Initially while using the Azure synapse was that you can host a sql data warehouse.

Over the time they have introduced many services to Azure synapse itself . Now you can host a sql database using the sql option as part of the entire Eco-system as part of the Azure synapse . But at the same time you can also make use of Apache Spark when it comes to Analyzing your data . You can bring your data much more closer for your Analytical needs by bring your "Azure Data Lake" attaching it basically to your Azure synapse workspace .

You can use the tools on the right hand side which can be used for visualization. You can actually ingest your data using data injection tools on the left.

-- Creating an Azure Synapse workspace :

The first thing about working with Azure Synapse is creating a workspace

-- One thing that we can do with Azure synapse workspace to hosting a sql datawarehouse.
-- The other thing we can do with Azure Synapse workspace is analyze the data in the Azure data lake.

when it comes to transferring data from a data source this could be a semi-structured data this could be the data in the sql database and you want to transfer the data to a sequel data warehouse that can be analyzed by your business users
-- You first need to perform some transformation in the data before it can be transferred to the sequel data warehouse . Now the transformation that you want to put in the azure datalake depends upon your analysis of the data itself in the data lake

-- we saw in some of our tables that had NULL value -- we can make the decision of not to transfer those NULL rows to data warehouse. So you need to perform some initial analysis on your source data

The data in parquet based files are in binary format . therefore you cannot be able to perform the data analysis.

You can use Azure Synapse to analyse your data even in your datalake.

When it comes to Azure Synapse - There are two compute options

we also have apache spark system that we will discuss later on.

The two compute options are

Your sql pool is used to build your data warehouse

-- If you want to have tables in place with data you can build sql pool . If you need to persist your data.
-- If that comes to Azure sql database or any sql database

The first thing your data engineers can do is cleanse your data, transform your data and put it to the sql data-warehouse. To be analysed by the business users .

Serverless Pool : You do not get charged for the servers or storage, you only get charged for the data you analyse -- Azure Datalake Gen 2 storage account

you can use the serverless pool to quickly perform the adhoc analysis and then you can use tools within Azure synapse to transfer data to the dedicated sequel pool. Here you are charged based on how much you use. In the serverless pool you cannot persist data on disk .

External Tables:

External tables can be used to read and write data in Azure blob storage, Hadoop or Azure Data Lake storage . External tables are possible both in serverless sql pool and dedicated sql pool
with external table you can now define the structure of a table that you will normally do it on a RDBS And the table can be pointed to the data on the Azure Datalake storage account.

To work with external tables there are three code command that you need to work with they are as below. Apart from other commands as well.

47 : Lab : Using External Tables - CSV Part 2