Description

As a part of Sqoop, Hive, and Impala for Data Analysts (Previously CCA 159), you’ll be taught key abilities similar to Sqoop, Hive, and Impala.

This complete course covers all points of the certification with real-world examples and knowledge units.

Overview of Huge Data ecosystem

  • Overview Of Distributions and Administration Instruments

  • Properties and Properties Recordsdata – Basic Pointers

  • Hadoop Distributed File System

  • YARN and Map Reduce2

  • Submitting Map ReduceJob

  • Figuring out Variety of Mappers and Reducers

  • Understanding YARN and Map Cut back Configuration Properties

  • Overview and Override Job Properties

  • Reviewing Map Cut back Job Logs

  • Map Cut back Job Counters

  • Overview of Hive

  • Databases and Question Engines

  • Overview of Data Ingestion in Huge Data

  • Data Processing utilizing Spark

HDFS Instructions to handle information

  • Introduction to HDFS for Certification Exams

  • Overview of HDFS and PropertiesFiles

  • Overview of Hadoop CLI

  • Itemizing Recordsdata in HDFS

  • Consumer Areas or Residence Directories in HDFS

  • Creating Directories in HDFS

  • Copying Recordsdata and Directories into HDFS

  • File and Listing Permissions Overview

  • Getting Recordsdata and Directories from HDFS

  • Previewing Textual content Recordsdata in HDFS

  • Copying or Transferring Recordsdata and Directories inside HDFS

  • Understanding Measurement of File System and Recordsdata

  • Overview of Block Measurement and ReplicationFactor

  • Getting File Metadata utilizing hdfs fsck

  • Assets and Workout routines

Getting Began with Hive

  • Overview of Hive Language Handbook

  • Launching and utilizing Hive CLI

  • Overview of Hive Properties

  • Hive CLI Historical past and hiverc

  • Operating HDFS Instructions in Hive CLI

  • Understanding Warehouse Listing

  • Creating and Utilizing Hive Databases

  • Creating and Describing Hive Tables

  • Retrieve Matadata of Tables utilizing DESCRIBE

  • Function of Hive Metastore Database

  • Overview of beeline

  • Operating Hive Instructions and Queries utilizing beeline

Creating Tables in Hive utilizing Hive QL

  • Creating Tables in Hive – orders

  • Overview of Primary Data Sorts in Hive

  • Including Feedback to Columns and Tables

  • Loading Data into Hive Tables from Native File System

  • Loading Data into Hive Tables from HDFS

  • Loading Data – Overwrite vs Append

  • Creating Exterior tables in Hive

  • Specifying Location for Hive Tables

  • Distinction between Managed Desk and Exterior Desk

  • Default Delimiters in Hive Tables utilizing Textual content File

  • Overview of File Codecs in Hive

  • Variations between Hive and RDBMS

  • Truncate and Drop tables in Hive

  • Assets and Workout routines

Loading/Inserting knowledge into Hive tables utilizing Hive QL

  • Introduction to Partitioning and Bucketing

  • Creating Tables utilizing Orc Format – order_items

  • Inserting Data into Tables utilizing Stage Tables

  • Load vs. Insert in Hive

  • Creating Partitioned Tables in Hive

  • Including Partitions to Tables in Hive

  • Loading into Partitions in Hive Tables

  • Inserting Data Into Partitions in Hive Tables

  • Insert Utilizing Dynamic Partition Mode

  • Creating Bucketed Tables in Hive

  • Inserting Data into Bucketed Tables

  • Bucketing with Sorting

  • Overview of ACID Transactions

  • Create Tables for Transactions

  • Inserting Particular person Data into Hive Tables

  • Replace and Delete Data in Hive Tables

Overview of features in Hive

  • Overview of Capabilities

  • Validating Capabilities

  • String Manipulation – Case Conversion and Size

  • String Manipulation – substr and break up

  • String Manipulation – Trimming and Padding Capabilities

  • String Manipulation – Reverse and Concatenating A number of Strings

  • Date Manipulation – Present Date and Timestamp

  • Date Manipulation – Date Arithmetic

  • Date Manipulation – trunc

  • Date Manipulation – Utilizing date format

  • Date Manipulation – Extract Capabilities

  • Date Manipulation – Coping with Unix Timestamp

  • Overview of Numeric Capabilities

  • Data Sort Conversion Utilizing Forged

  • Dealing with Null Values

  • Question Instance – Get Phrase Rely

Writing Primary Queries in Hive

  • Overview of SQL or Hive QL

  • Execution Life Cycle of Hive Question

  • Reviewing Logs of Hive Queries

  • Projecting Data utilizing Choose and Overview of From

  • Derive Conditional Values utilizing CASE and WHEN

  • Projecting Distinct Values

  • Filtering Data utilizing The place Clause

  • Boolean Operations in The place Clause

  • Boolean OR vs IN Operator

  • Filtering Data utilizing LIKE Operator

  • Performing Primary Aggregations utilizing Mixture Capabilities

  • Performing Aggregations utilizing GROUP BY

  • Filtering Aggregated Data Utilizing HAVING

  • International Sorting utilizing ORDER BY

  • Overview of DISTRIBUTE BY

  • Sorting Data inside Teams utilizing SORT BY

  • Utilizing CLUSTERED BY

Becoming a member of Data Units and Set Operations in Hive

  • Overview of Nested Sub Queries

  • Nested Sub Queries – Utilizing IN Operator

  • Nested Sub Queries – Utilizing EXISTS Operator

  • Overview of Joins in Hive

  • Performing Interior Joins utilizing Hive

  • Performing Outer Joins utilizing Hive

  • Performing Full Outer Joins utilizing Hive

  • Map Aspect Be a part of and Cut back Aspect Take part Hive

  • Becoming a member of in Hive utilizing Legacy Syntax

  • Cross Joins in Hive

  • Overview of Set Operations in Hive

  • Carry out Set Union between two Hive Question Outcomes

  • Set Operations – Intersect and Minus Not Supported

Windowing or Analytics Capabilities in Hive

  • Put together HR Database in Hive with Workers Desk

  • Overview of Analytics or Windowing Capabilities in Hive

  • Performing Aggregations utilizing Hive Queries

  • Create Tables to Get Day by day Income utilizing CTAS in Hive

  • Getting Lead and Lag utilizing Windowing Capabilities in Hive

  • Getting First and Final Values utilizing Windowing Capabilities in Hive

  • Making use of Rank utilizing Windowing Capabilities in Hive

  • Making use of Dense Rank utilizing Windowing Capabilities in Hive

  • Making use of Row Quantity utilizing Windowing Capabilities in Hive

  • Distinction Between rank, dense_rank, and row_number in Hive

  • Understanding the order of execution of Hive Queries

  • Overview of Nested Sub Queries in Hive

  • Filtering Data on High of Window Capabilities in Hive

  • Getting High 5 Merchandise by Income for Every Day utilizing Windowing Capabilities in Hive – Recap

Operating Queries utilizing Impala

  • Introduction to Impala

  • Function of Impala Daemons

  • Impala State Retailer and Catalog Server

  • Overview of Impala Shell

  • Relationship between Hive and Impala

  • Overview of Creating Databases and Tables utilizing Impala

  • Loading and Inserting Data into Tables utilizing Impala

  • Operating Queries utilizing Impala Shell

  • Reviewing Logs of Impala Queries

  • Synching Hive and Impala – Utilizing Invalidate Metadata

  • Operating Scripts utilizing Impala Shell

  • Task – Utilizing NYSE Data

  • Task – Answer

Getting Began with Sqoop

  • Introduction to Sqoop

  • Validate Supply Database – MySQL

  • Overview JDBC Jar to Connect with MySQL

  • Getting Assist utilizing Sqoop CLI

  • Overview of Sqoop Consumer Information

  • Validate Sqoop and MySQL Integration utilizing Sqoop Checklist Databases

  • Itemizing Tables in Database utilizing Sqoop

  • Run Queries in MySQL utilizing Sqoop Eval

  • Understanding Logs in Sqoop

  • Redirecting Sqoop Job Logs into Log Recordsdata

Importing knowledge from MySQL to HDFS utilizing Sqoop Import

  • Overview of Sqoop Import Command

  • Import Orders utilizing target-dir

  • Import Order Objects utilizing warehouse-dir

  • Managing HDFS Directories

  • Sqoop Import Execution Move

  • Reviewing Logs of Sqoop Import

  • Sqoop Import Specifying Variety of Mappers

  • Overview the Output Recordsdata generated by Sqoop Import

  • Sqoop Import Supported File Codecs

  • Validating avro information utilizing Avro Instruments

  • Sqoop Import Utilizing Compression

Apache Sqoop – Importing Data into HDFS – Customizing

  • Introduction to customizing Sqoop Import

  • Sqoop Import by Specifying Columns

  • Sqoop import Utilizing Boundary Question

  • Sqoop import whereas filtering Pointless Data

  • Sqoop Import Utilizing Break up By to distribute import utilizing non default column

  • Getting Question Outcomes utilizing Sqoop eval

  • Coping with tables with Composite Keys whereas utilizing Sqoop Import

  • Coping with tables with Non Numeric Key Fields whereas utilizing Sqoop Import

  • Coping with tables with No Key Fields whereas utilizing Sqoop Import

  • Utilizing autoreset-to-one-mapper to make use of just one mapper whereas importing knowledge utilizing Sqoop from tables with no key fields

  • Default Delimiters utilized by Sqoop Import for Textual content File Format

  • Specifying Delimiters for Sqoop Import utilizing Textual content File Format

  • Coping with Null Values utilizing Sqoop Import

  • Import Mulitple Tables from supply database utilizing Sqoop Import

Importing knowledge from MySQL to Hive Tables utilizing Sqoop Import

  • Fast Overview of Hive

  • Create Hive Database for Sqoop Import

  • Create Empty Hive Desk for Sqoop Import

  • Import Data into Hive Desk from supply database desk utilizing Sqoop Import

  • Managing Hive Tables whereas importing knowledge utilizing Sqoop Import utilizing Overwrite

  • Managing Hive Tables whereas importing knowledge utilizing Sqoop Import – Errors Out If Desk Already Exists

  • Understanding Execution Move of Sqoop Import into Hive tables

  • Overview Recordsdata generated by Sqoop Import in Hive Tables

  • Sqoop Delimiters vs Hive Delimiters

  • Totally different File Codecs supported by Sqoop Import whereas importing into Hive Tables

  • Sqoop Import all Tables into Hive from supply database

Exporting Data from HDFS/Hive to MySQL utilizing Sqoop Export

  • Introduction to Sqoop Export

  • Put together Data for Sqoop Export

  • Create Desk in MySQL for Sqoop Export

  • Carry out Easy Sqoop Export from HDFS to MySQL desk

  • Understanding Execution Move of Sqoop Export

  • Specifying Variety of Mappers for Sqoop Export

  • Troubleshooting the Points associated to Sqoop Export

  • Merging or Upserting Data utilizing Sqoop Export – Overview

  • Fast Overview of MySQL – Upsert utilizing Sqoop Export

  • Replace Data utilizing Replace Key utilizing Sqoop Export

  • Merging Data utilizing allowInsert in Sqoop Export

  • Specifying Columns utilizing Sqoop Export

  • Specifying Delimiters utilizing Sqoop Export

  • Utilizing Stage Desk for Sqoop Export

Submitting Sqoop Jobs and Incremental Sqoop Imports

  • Introduction to Sqoop Jobs

  • Including Password File for Sqoop Jobs

  • Creating Sqoop Job

  • Run Sqoop Job

  • Overview of Incremental Masses utilizing Sqoop

  • Incremental Sqoop Import – Utilizing The place

  • Incremental Sqoop Import – Utilizing Append Mode

  • Incremental Sqoop Import – Create Desk

  • Incremental Sqoop Import – Create Sqoop Job

  • Incremental Sqoop Import – Execute Job

  • Incremental Sqoop Import – Add Further Data

  • Incremental Sqoop Import – Rerun Job

  • Incremental Sqoop Import – Utilizing Final Modified

Listed here are the goals for this course.

Present Construction to the Data

Use Data Definition Language (DDL) statements to create or alter constructions within the metastore for use by Hive and Impala.

  • Create tables utilizing a wide range of knowledge sorts, delimiters, and file codecs

  • Create new tables utilizing current tables to outline the schema

  • Enhance question efficiency by creating partitioned tables within the metastore

  • Alter tables to switch the present schema

  • Create views to be able to simplify queries

Data Evaluation

Use Question Language (QL) statements in Hive and Impala to investigate knowledge on the cluster.

  • Put together reviews utilizing SELECT instructions together with unions and subqueries

  • Calculate combination statistics, similar to sums and averages, throughout a question

  • Create queries towards a number of knowledge sources through the use of be a part of instructions

  • Rework the output format of queries through the use of built-in features

  • Carry out queries throughout a gaggle of rows utilizing windowing features

Workout routines will likely be supplied to have sufficient apply to get higher at Sqoop in addition to writing queries utilizing Hive and Impala.

All of the demos are given on our state-of-the-art Huge Data cluster. In case you wouldn’t have multi-node cluster, you possibly can enroll for our labs and apply on our multi-node cluster. It is possible for you to to apply Sqoop and Hive on the cluster.

If the coupon just isn’t opening, disable Adblock, or strive one other browser.

Leave a comment

Your email address will not be published. Required fields are marked *