Posts

Showing posts from November, 2023

SQL With Python

 https://www.youtube.com/watch?v=zrNHkRgWzTI Sql queries using Pandas Data frames : https://www.youtube.com/watch?v=oPuVYSC_kpo

Azure Synapse

Image
Not a SQL Server or a Azure Synapse , you can make use of the Azure Synapse service. Earlier they only had the service for hosting a sql data warehouse and then they brought the service of Azure synapse.  Initially while using the Azure synapse was that you can host a sql data warehouse. Over the time they have introduced many services to Azure synapse itself . Now you can host a sql database using the sql option as part of the entire Eco-system as part of the Azure synapse . But at the same time you can also make use of Apache Spark when it comes to Analyzing your data . You can bring your data much more closer for your Analytical needs by bring your "Azure Data Lake" attaching it basically to your Azure synapse workspace . You can use the tools on the right hand side which can be used for visualization. You can actually ingest your data using data injection tools on the left. -- Creating an Azure Synapse workspace : The first thing about working with Azure Synapse is creati...

Queries

Image
 Learning to write Queries: select 1+1 select is as good as print and Go command ends a batch we will learn what a batch is

Data Engineer Skillsets

  Scala is a programming language that combines object-oriented and functional programming paradigms. It is designed to be concise, elegant, and interoperable with Java. Scala runs on the Java Virtual Machine (JVM), which makes it compatible with existing Java libraries and frameworks. Apache Spark, on the other hand, is an open-source distributed computing system that provides a fast and general-purpose cluster computing framework for big data processing. Spark is designed to be fast and flexible and supports various programming languages, including Scala, Java, Python, and R. Scala is one of the primary programming languages for Apache Spark. Many of Spark's core components and APIs are written in Scala, and Spark applications can be developed using Scala. The combination of Scala and Spark allows developers to leverage the expressive and concise syntax of Scala while taking advantage of Spark's distributed computing capabilities for processing large datasets. Some key points...

Study Materials - Udemy :

Study Materials - Udemy : Data Engineer Associate DP -203 :  https://www.udemy.com/course/data-engineering-on-microsoft-azure/learn/lecture/27327228?start=30#overview Azure Databricks, Spark for Data Engineer https://www.udemy.com/course/azure-databricks-spark-core-for-data-engineers/learn/lecture/27514570?start=0#overview Hadoop Big Data : https://www.udemy.com/course/the-ultimate-hands-on-hadoop-tame-your-big-data/learn/lecture/11863332?start=15#overview Python: https://www.udemy.com/course/complete-python-developer-zero-to-mastery/learn/lecture/22727561?start=75#overview T-SQL https://www.udemy.com/course/70-461-session-2-querying-microsoft-sql-server-2012/learn/lecture/11725694#overview Apache Spark with Scala https://www.udemy.com/course/apache-spark-with-scala-hands-on-with-big-data/learn/lecture/11863448#overview
  Exam DP-203: Data Engineering on Microsoft Azure As a candidate for this exam, you should have subject matter expertise in integrating, transforming, and consolidating data from various structured, unstructured, and streaming data systems into a suitable schema for building analytics solutions. As an Azure data engineer, you help stakeholders understand the data through exploration, and build and maintain secure and compliant data processing pipelines by using different tools and techniques. You use various Azure data services and frameworks to store and produce cleansed and enhanced datasets for analysis. This data store can be designed with different architecture patterns based on business requirements, including: Modern data warehouse (MDW) Big data Lakehouse architecture As an Azure data engineer, you also help to ensure that the operationalization of data pipelines and data stores are high-performing, efficient, organized, and reliable, given a set of business requirements a...