Description
Written by a Senior Data Architect with over twenty-five years of experience in the business, Data Engineering for AWS is a book whose sole aim is to make you proficient in using the AWS ecosystem. Using a thorough and hands-on approach to data, this book will give aspiring and new data engineers a solid theoretical and practical foundation to succeed with AWS.As you progress, youll be taken through the services and the skills you need to architect and implement data pipelines on AWS. Youll begin by reviewing important data engineering concepts and some of the core AWS services that form a part of the data engineers toolkit. Youll then architect a data pipeline, review raw data sources, transform the data, and learn how the transformed data is used by various data consumers. Youll also learn about populating data marts and data warehouses along with how a data lakehouse fits into the picture. Later, youll be introduced to AWS tools for analyzing data, including those for ad-hoc SQL queries and creating visualizations. In the final chapters, youll understand how the power of machine learning and artificial intelligence can be used to draw new insights from data.By the end of this AWS book, youll be able to carry out data engineering tasks and implement a data pipeline on AWS independently.Spis treści:Data Engineering with AWSContributorsAbout the authorAdditional contributorsAbout the reviewersPrefaceWho this book is forWhat this book coversTo get the most out of this bookDownload the example code filesDownload the color imagesConventions usedGet in touchShare Your ThoughtsSection 1: AWS Data Engineering Concepts and TrendsChapter 1: An Introduction to Data EngineeringTechnical requirementsThe rise of big data as a corporate assetThe challenges of ever-growing datasetsData engineers the big data enablersUnderstanding the role of the data engineerUnderstanding the role of the data scientistUnderstanding the role of the data analystUnderstanding other common data-related rolesThe benefits of the cloud when building big data analytic solutionsHands-on creating and accessing your AWS accountCreating a new AWS accountAccessing your AWS accountSummaryChapter 2: Data Management Architectures for AnalyticsTechnical requirementsThe evolution of data management for analyticsDatabases and data warehousesDealing with big, unstructured dataA lake on the cloud and a house on that lakeUnderstanding data warehouses and data marts fountains of truthDistributed storage and massively parallel processingColumnar data storage and efficient data compressionDimensional modeling in data warehousesUnderstanding the role of data martsFeeding data into the warehouse ETL and ELT pipelinesBuilding data lakes to tame the variety and volume of big dataData lake logical architectureBringing together the best of both worlds with the lake house architectureData lakehouse implementationsBuilding a data lakehouse on AWSHands-on configuring the AWS Command Line Interface tool and creating an S3 bucketInstalling and configuring the AWS CLICreating a new Amazon S3 bucketSummaryChapter 3: The AWS Data Engineers ToolkitTechnical requirementsAWS services for ingesting dataOverview of Amazon Database Migration Service (DMS)Overview of Amazon Kinesis for streaming data ingestionOverview of Amazon MSK for streaming data ingestionOverview of Amazon AppFlow for ingesting data from SaaS servicesOverview of Amazon Transfer Family for ingestion using FTP/SFTP protocolsOverview of Amazon DataSync for ingesting from on-premises storageOverview of the AWS Snow family of devices for large data transfersAWS services for transforming dataOverview of AWS Lambda for light transformationsOverview of AWS Glue for serverless Spark processingOverview of Amazon EMR for Hadoop ecosystem processingAWS services for orchestrating big data pipelinesOverview of AWS Glue workflows for orchestrating Glue componentsOverview of AWS Step Functions for complex workflowsOverview of Amazon managed workflows for Apache AirflowAWS services for consuming dataOverview of Amazon Athena for SQL queries in the data lakeOverview of Amazon Redshift and Redshift Spectrum for data warehousing and data lakehouse architecturesOverview of Amazon QuickSight for visualizing dataHands-on triggering an AWS Lambda function when a new file arrives in an S3 bucketCreating a Lambda layer containing the AWS Data Wrangler libraryCreating new Amazon S3 bucketsCreating an IAM policy and role for your Lambda functionCreating a Lambda functionConfiguring our Lambda function to be triggered by an S3 uploadSummaryChapter 4: Data Cataloging, Security, and GovernanceTechnical requirementsGetting data security and governance rightCommon data regulatory requirementsCore data protection conceptsPersonal dataEncryptionAnonymized dataPseudonymized data/tokenizationAuthenticationAuthorizationPutting these concepts togetherCataloging your data to avoid the data swampHow to avoid the data swampThe AWS Glue/Lake Formation data catalogAWS services for data encryption and security monitoringAWS Key Management Service (KMS)Amazon MacieAmazon GuardDutyAWS services for managing identity and permissionsAWS Identity and Access Management (IAM) serviceUsing AWS Lake Formation to manage data lake accessHands-on configuring Lake Formation permissionsCreating a new user with IAM permissionsTransitioning to managing fine-grained permissions with AWS Lake FormationSummarySection 2: Architecting and Implementing Data Lakes and Data Lake HousesChapter 5: Architecting Data Engineering PipelinesTechnical requirementsApproaching the data pipeline architectureArchitecting houses and architecting pipelinesWhiteboarding as an information-gathering toolConducting a whiteboarding sessionIdentifying data consumers and understanding their requirementsIdentifying data sources and ingesting dataIdentifying data transformations and optimizationsFile format optimizationsData standardizationData quality checksData partitioningData denormalizationData catalogingWhiteboarding data transformationLoading data into data martsWrapping up the whiteboarding sessionHands-on architecting a sample pipelineDetailed notes from the project „Bright Light” whiteboarding meeting of GP Widgets, IncSummaryChapter 6: Ingesting Batch and Streaming DataTechnical requirementsUnderstanding data sourcesData varietyData volumeData velocityData veracityData valueQuestions to askIngesting data from a relational databaseAWS Database Migration Service (DMS)AWS GlueOther ways to ingest data from a databaseDeciding on the best approach for ingesting from a databaseIngesting streaming dataAmazon Kinesis versus Amazon Managed Streaming for Kafka (MSK)Hands-on ingesting data with AWS DMSCreating a new MySQL database instanceLoading the demo data using an Amazon EC2 instanceCreating an IAM policy and role for DMSConfiguring DMS settings and performing a full load from MySQL to S3Querying data with Amazon AthenaHands-on ingesting streaming dataConfiguring Kinesis Data Firehose for streaming delivery to Amazon S3Configuring Amazon Kinesis Data Generator (KDG)Adding newly ingested data to the Glue Data CatalogQuerying the data with Amazon AthenaSummaryChapter 7: Transforming Data to Optimize for AnalyticsTechnical requirementsTransformations making raw data more valuableCooking, baking, and data transformationsTransformations as part of a pipelineTypes of data transformation toolsApache SparkHadoop and MapReduceSQLGUI-based toolsData preparation transformationsProtecting PII dataOptimizing the file formatOptimizing with data partitioningData cleansingBusiness use case transformsData denormalizationEnriching dataPre-aggregating dataExtracting metadata from unstructured dataWorking with change data capture (CDC) dataTraditional approaches data upserts and SQL viewsModern approaches the transactional data lakeHands-on joining datasets with AWS Glue StudioCreating a new data lake zone the curated zoneCreating a new IAM role for the Glue jobConfiguring a denormalization transform using AWS Glue StudioFinalizing the denormalization transform job to write to S3Create a transform job to join streaming and film data using AWS Glue StudioSummaryChapter 8: Identifying and Enabling Data ConsumersTechnical requirementsUnderstanding the impact of data democratizationA growing variety of data consumersMeeting the needs of business users with data visualizationAWS tools for business usersMeeting the needs of data analysts with structured reportingAWS tools for data analystsMeeting the needs of data scientists and ML modelsAWS tools used by data scientists to work with dataHands-on creating data transformations with AWS Glue DataBrewConfiguring new datasets for AWS Glue DataBrewCreating a new Glue DataBrew projectBuilding your Glue DataBrew recipeCreating a Glue DataBrew jobSummaryChapter 9: Loading Data into a Data MartTechnical requirementsExtending analytics with data warehouses/data martsCold dataWarm dataHot dataWhat not to do anti-patterns for a data warehouseUsing a data warehouse as a transactional datastoreUsing a data warehouse as a data lakeUsing data warehouses for real-time, record-level use casesStoring unstructured dataRedshift architecture review and storage deep diveData distribution across slicesRedshift Zone Maps and sorting dataDesigning a high-performance data warehouseSelecting the optimal Redshift node typeSelecting the optimal table distribution style and sort keySelecting the right data type for columnsSelecting the optimal table typeMoving data between a data lake and RedshiftOptimizing data ingestion in RedshiftExporting data from Redshift to the data lakeHands-on loading data into an Amazon Redshift cluster and running queriesUploading our sample data to Amazon S3IAM roles for RedshiftCreating a Redshift clusterCreating external tables for querying data in S3Creating a schema for a local Redshift tableRunning complex SQL queries against our dataSummaryChapter 10: Orchestrating the Data PipelineTechnical requirementsUnderstanding the core concepts for pipeline orchestrationWhat is a data pipeline, and how do you orchestrate it?How do you trigger a data pipeline to run?How do you handle the failures of a step in your pipeline?Examining the options for orchestrating pipelines in AWSAWS Data Pipeline for managing ETL between data sourcesAWS Glue Workflows to orchestrate Glue resourcesApache Airflow as an open source orchestration solutionPros and cons of using MWAAAWS Step Function for a serverless orchestration solutionPros and cons of using AWS Step FunctionDeciding on which data pipeline orchestration tool to useHands-on orchestrating a data pipeline using AWS Step FunctionCreating new Lambda functionsCreating an SNS topic and subscribing to an email addressCreating a new Step Function state machineConfiguring AWS CloudTrail and Amazon EventBridgeSummarySection 3: The Bigger Picture: Data Analytics, Data Visualization, and Machine LearningChapter 11: Ad Hoc Queries with Amazon AthenaTechnical requirementsAmazon Athena in-place SQL analytics for the data lakeTips and tricks to optimize Amazon Athena queriesCommon file format and layout optimizationsWriting optimized SQL queriesFederating the queries of external data sources with Amazon Athena Query FederationQuerying external data sources using Athena Federated QueryManaging governance and costs with Amazon Athena WorkgroupsAthena Workgroups overviewEnforcing settings for groups of usersEnforcing data usage controlsHands-on creating an Amazon Athena workgroup and configuring Athena settingsHands-on switching Workgroups and running queriesSummaryChapter 12: Visualizing Data with Amazon QuickSightTechnical requirementsRepresenting data visually for maximum impactBenefits of data visualizationPopular uses of data visualizationsUnderstanding Amazon QuickSights core conceptsStandard versus enterprise editionSPICE the in-memory storage and computation engine for QuickSightIngesting and preparing data from a variety of sourcesPreparing datasets in QuickSight versus performing ETL outside of QuickSightCreating and sharing visuals with QuickSight analyses and dashboardsVisual types in Amazon QuickSightUnderstanding QuickSights advanced features ML Insights and embedded dashboardsAmazon QuickSight ML InsightsAmazon QuickSight embedded dashboardsHands-on creating a simple QuickSight visualizationSetting up a new QuickSight account and loading a datasetCreating a new analysisSummaryChapter 13: Enabling Artificial Intelligence and Machine LearningTechnical requirementsUnderstanding the value of ML and AI for organizationsSpecialized ML projectsEveryday use cases for ML and AIExploring AWS services for MLAWS ML servicesExploring AWS services for AIAI for unstructured speech and textAI for extracting metadata from images and videoAI for ML-powered forecastsAI for fraud detection and personalizationHands-on reviewing reviews with Amazon ComprehendSetting up a new Amazon SQS message queueCreating a Lambda function for calling Amazon ComprehendAdding Comprehend permissions for our IAM roleAdding a Lambda function as a trigger for our SQS message queueTesting the solution with Amazon ComprehendSummaryFurther readingChapter 14: Wrapping Up the First Part of Your Learning JourneyTechnical requirementsLooking at the data analytics big pictureManaging complex data environments with DataOpsExamining examples of real-world data pipelinesA decade of data wrapped up for Spotify usersIngesting and processing streaming files at Netflix scaleImagining the future a look at emerging trendsACID transactions directly on data lake dataMore data and more streaming ingestionMulti-cloudDecentralized data engineering teams, data platforms, and a data mesh architectureData and product thinking convergenceData and self-serve platform design convergenceImplementations of the data mesh architectureHands-on cleaning up your AWS accountReviewing AWS Billing to identify the resources being charged forClosing your AWS accountSummaryWhy subscribe?Other Books You May EnjoyPackt is searching for authors like youShare Your Thoughts
kamil marciniak, weronika witek, szkoła podstawowa nr 2 pruszcz gdański, bilard nowy targ, aplikacja ewa angielski cena, odpowiedz na pytanie w jakiej odległości od środka ziemi o promieniu r, rysunki do szkicownika, anna lisiewicz, kurs fotografii bielsko, poznan dzieci, polsko niemiecki słownik, hybrydowe nauczanie, państwo po angielsku, bielsko biała szkoła
yyyyy