In-Memory Analytics with Apache Arrow (E-book)

169,00 

Description

Apache Arrow is designed to accelerate analytics and allow the exchange of data across big data systems easily.In-Memory Analytics with Apache Arrow begins with a quick overview of the Apache Arrow format, before moving on to helping you to understand Arrows versatility and benefits as you walk through a variety of real-world use cases. Youll cover key tasks such as enhancing data science workflows with Arrow, using Arrow and Apache Parquet with Apache Spark and Jupyter for better performance and hassle-free data translation, as well as working with Perspective, an open source interactive graphical and tabular analysis tool for browsers. As you advance, youll explore the different data interchange and storage formats and become well-versed with the relationships between Arrow, Parquet, Feather, Protobuf, Flatbuffers, JSON, and CSV. In addition to understanding the basic structure of the Arrow Flight and Flight SQL protocols, youll learn about Dremios usage of Apache Arrow to enhance SQL analytics and discover how Arrow can be used in web-based browser apps. Finally, youll get to grips with the upcoming features of Arrow to help you stay ahead of the curve.By the end of this book, you will have all the building blocks to create useful, efficient, and powerful analytical services and utilities with Apache Arrow. Spis treści:In-Memory Analytics with Apache ArrowForewordAcknowledgmentsContributorsAbout the authorAbout the reviewersPrefaceWho this book is forTo get the most out of this bookDownload the example code filesDownload the color imagesConventions usedGet in touchShare Your ThoughtsSection 1: Overview of What Arrow Is, its Capabilities, Benefits, and GoalsChapter 1: Getting Started with Apache ArrowTechnical requirementsUnderstanding the Arrow format and specificationsWhy does Arrow use a columnar in-memory format?Learning the terminology and physical memory layoutQuick summary of physical layouts, or TL;DRHow to speak ArrowArrow format versioning and stabilityWould you download a library? Of course!Setting up your shooting rangeUsing pyarrow For PythonC++ for the 1337 codersGo Arrow go!SummaryReferencesChapter 2: Working with Key Arrow SpecificationsTechnical requirementsPlaying with data, wherever it might be!Working with Arrow tablesAccessing data files with pyarrowAccessing data files with Arrow in C++pandas firing ArrowPutting pandas in your quiverMaking pandas run fastKeeping pandas from running wildSharing is caring especially when its your memoryDiving into memory managementManaging buffers for performanceCrossing the boundariesSummaryChapter 3: Data Science with Apache ArrowTechnical requirementsODBC takes an Arrow to the kneeLost in translationSPARKing new ideas on JupyterUnderstanding the integrationEveryone gets a containerized development environment!SPARKing joy with Arrow and PySparkInteractive charting powered by ArrowStretching workflows onto ElasticsearchIndexing the dataSummarySection 2: Interoperability with Arrow: pandas, Parquet, Flight, and DatasetsChapter 4: Format and Memory HandlingTechnical requirementsStorage versus runtime in-memory versus message-passing formatsLong-term storage formatsIn-memory runtime formatsMessage-passing formatsSumming upPassing your Arrows aroundWhat is this sorcery?!Producing and consuming ArrowsLearning about memory cartographyThe base caseParquet versus CSVMapping data into memoryToo long; didnt read (TL;DR) Computers are magicSummaryChapter 5: Crossing the Language Barrier with the Arrow C Data APITechnical requirementsUsing the Arrow C data interfaceThe ArrowSchema structureThe ArrowArray structureExample use casesUsing the C Data API to export Arrow-formatted dataImporting Arrow data with PythonExporting Arrow data with the C Data API from Python to GoStreaming across the C Data APIStreaming record batches from Python to GoOther use casesSome exercisesSummaryChapter 6: Leveraging the Arrow Compute APIsTechnical requirementsLetting Arrow do the work for youInput shapingValue castingTypes of functionsExecuting compute functionsUsing the C++ compute libraryUsing the compute library in PythonPicking the right toolsAdding a constant value to an arraySummaryChapter 7: Using the Arrow Datasets APITechnical requirementsQuerying multifile datasetsCreating a sample datasetDiscovering dataset fragmentsFiltering data programmaticallyExpressing yourself a quick detourUsing expressions for filtering dataDeriving and renaming columns (projecting)Using the Datasets API in PythonCreating our sample datasetDiscovering the datasetUsing different file formatsFiltering and projecting columns with PythonStreaming resultsWorking with partitioned datasetsSummaryChapter 8: Exploring Apache Arrow Flight RPCTechnical requirementsThe basics and complications of gRPCBuilding modern APIs for dataEfficiency and streaming are importantArrow Flights building blocksHorizontal scalability with Arrow FlightAdding your business logic to FlightOther bells and whistlesUnderstanding the Flight Protocol Buffer definitionsUsing Flight, choose your language!Building a Python Flight ServerBuilding a Go Flight serverWhat is Flight SQL?Setting up a performance testRunning the performance testFlight SQL, the new kid on the blockSummarySection 3: Real-World Examples, Use Cases, and Future DevelopmentChapter 9: Powered by Apache ArrowSwimming in data with Dremio SonarClarifying Dremio Sonars architectureThe library of the Godsof data analysisSpicing up your ML workflowsBringing the AI engine to where the data livesArrow in the browser using JavaScriptGaining a little perspectiveTaking flight with FalconSummaryChapter 10: How to Leave Your Mark on ArrowTechnical requirementsContributing to open source projectsCommunication is keyYou dont necessarily have to contribute codeThere are a lot of reasons why you should contribute!Preparing your first pull requestNavigating JIRASetting up GitOrienting yourself in the code baseBuilding the Arrow librariesCreating the PRUnderstanding the CI configurationDevelopment using ArcheryFind your interest and expand on itGetting that sweet, sweet approvalFinishing up with style!C++ stylingPython code stylingGo code stylingSummaryChapter 11: Future Development and PlansExamining Flight SQL (redux)Why Flight SQL?Defining the Flight SQL protocolFiring a Ballista using Data(Fusion)What about Spark?Looking at Ballistas development roadmapBuilding a cross-language compute serializationWhy Substrait?Working with Substrait serializationGetting involved with Substrait developmentFinal wordsWhy subscribe?Other Books You May EnjoyPackt is searching for authors like youShare Your Thoughts

karolina sawka wiek, funkcja data excel, ile trzeba miec punktow z matury z matematyki, korki opole, nauka gry na pianinie w wieku 40 lat

yyyyy