Sqoop, Hive and Impala for Data Analysts (Formerly CCA 159)

Description

As a part of Sqoop, Hive, and Impala for Data Analysts (Previously CCA 159), you’ll be taught key abilities similar to Sqoop, Hive, and Impala.

This complete course covers all points of the certification with real-world examples and knowledge units.

Overview of Huge Data ecosystem

Overview Of Distributions and Administration Instruments
Properties and Properties Recordsdata – Basic Pointers
Hadoop Distributed File System
YARN and Map Reduce2
Submitting Map ReduceJob
Figuring out Variety of Mappers and Reducers
Understanding YARN and Map Cut back Configuration Properties
Overview and Override Job Properties
Reviewing Map Cut back Job Logs
Map Cut back Job Counters
Overview of Hive
Databases and Question Engines
Overview of Data Ingestion in Huge Data
Data Processing utilizing Spark

HDFS Instructions to handle information

Introduction to HDFS for Certification Exams
Overview of HDFS and PropertiesFiles
Overview of Hadoop CLI
Itemizing Recordsdata in HDFS
Consumer Areas or Residence Directories in HDFS
Creating Directories in HDFS
Copying Recordsdata and Directories into HDFS
File and Listing Permissions Overview
Getting Recordsdata and Directories from HDFS
Previewing Textual content Recordsdata in HDFS
Copying or Transferring Recordsdata and Directories inside HDFS
Understanding Measurement of File System and Recordsdata
Overview of Block Measurement and ReplicationFactor
Getting File Metadata utilizing hdfs fsck
Assets and Workout routines

Getting Began with Hive

Overview of Hive Language Handbook
Launching and utilizing Hive CLI
Overview of Hive Properties
Hive CLI Historical past and hiverc
Operating HDFS Instructions in Hive CLI
Understanding Warehouse Listing
Creating and Utilizing Hive Databases
Creating and Describing Hive Tables
Retrieve Matadata of Tables utilizing DESCRIBE
Function of Hive Metastore Database
Overview of beeline
Operating Hive Instructions and Queries utilizing beeline

Creating Tables in Hive utilizing Hive QL

Creating Tables in Hive – orders
Overview of Primary Data Sorts in Hive
Including Feedback to Columns and Tables
Loading Data into Hive Tables from Native File System
Loading Data into Hive Tables from HDFS
Loading Data – Overwrite vs Append
Creating Exterior tables in Hive
Specifying Location for Hive Tables
Distinction between Managed Desk and Exterior Desk
Default Delimiters in Hive Tables utilizing Textual content File
Overview of File Codecs in Hive
Variations between Hive and RDBMS
Truncate and Drop tables in Hive
Assets and Workout routines

Loading/Inserting knowledge into Hive tables utilizing Hive QL

Introduction to Partitioning and Bucketing
Creating Tables utilizing Orc Format – order_items
Inserting Data into Tables utilizing Stage Tables
Load vs. Insert in Hive
Creating Partitioned Tables in Hive
Including Partitions to Tables in Hive
Loading into Partitions in Hive Tables
Inserting Data Into Partitions in Hive Tables
Insert Utilizing Dynamic Partition Mode
Creating Bucketed Tables in Hive
Inserting Data into Bucketed Tables
Bucketing with Sorting
Overview of ACID Transactions
Create Tables for Transactions
Inserting Particular person Data into Hive Tables
Replace and Delete Data in Hive Tables

Overview of features in Hive

Overview of Capabilities
Validating Capabilities
String Manipulation – Case Conversion and Size
String Manipulation – substr and break up
String Manipulation – Trimming and Padding Capabilities
String Manipulation – Reverse and Concatenating A number of Strings
Date Manipulation – Present Date and Timestamp
Date Manipulation – Date Arithmetic
Date Manipulation – trunc
Date Manipulation – Utilizing date format
Date Manipulation – Extract Capabilities
Date Manipulation – Coping with Unix Timestamp
Overview of Numeric Capabilities
Data Sort Conversion Utilizing Forged
Dealing with Null Values
Question Instance – Get Phrase Rely

Writing Primary Queries in Hive

Overview of SQL or Hive QL
Execution Life Cycle of Hive Question
Reviewing Logs of Hive Queries
Projecting Data utilizing Choose and Overview of From
Derive Conditional Values utilizing CASE and WHEN
Projecting Distinct Values
Filtering Data utilizing The place Clause
Boolean Operations in The place Clause
Boolean OR vs IN Operator
Filtering Data utilizing LIKE Operator
Performing Primary Aggregations utilizing Mixture Capabilities
Performing Aggregations utilizing GROUP BY
Filtering Aggregated Data Utilizing HAVING
International Sorting utilizing ORDER BY
Overview of DISTRIBUTE BY
Sorting Data inside Teams utilizing SORT BY
Utilizing CLUSTERED BY

Becoming a member of Data Units and Set Operations in Hive

Overview of Nested Sub Queries
Nested Sub Queries – Utilizing IN Operator
Nested Sub Queries – Utilizing EXISTS Operator
Overview of Joins in Hive
Performing Interior Joins utilizing Hive
Performing Outer Joins utilizing Hive
Performing Full Outer Joins utilizing Hive
Map Aspect Be a part of and Cut back Aspect Take part Hive
Becoming a member of in Hive utilizing Legacy Syntax
Cross Joins in Hive
Overview of Set Operations in Hive
Carry out Set Union between two Hive Question Outcomes
Set Operations – Intersect and Minus Not Supported

Windowing or Analytics Capabilities in Hive

Put together HR Database in Hive with Workers Desk
Overview of Analytics or Windowing Capabilities in Hive
Performing Aggregations utilizing Hive Queries
Create Tables to Get Day by day Income utilizing CTAS in Hive
Getting Lead and Lag utilizing Windowing Capabilities in Hive
Getting First and Final Values utilizing Windowing Capabilities in Hive
Making use of Rank utilizing Windowing Capabilities in Hive
Making use of Dense Rank utilizing Windowing Capabilities in Hive
Making use of Row Quantity utilizing Windowing Capabilities in Hive
Distinction Between rank, dense_rank, and row_number in Hive
Understanding the order of execution of Hive Queries
Overview of Nested Sub Queries in Hive
Filtering Data on High of Window Capabilities in Hive
Getting High 5 Merchandise by Income for Every Day utilizing Windowing Capabilities in Hive – Recap

Operating Queries utilizing Impala

Introduction to Impala
Function of Impala Daemons
Impala State Retailer and Catalog Server
Overview of Impala Shell
Relationship between Hive and Impala
Overview of Creating Databases and Tables utilizing Impala
Loading and Inserting Data into Tables utilizing Impala
Operating Queries utilizing Impala Shell
Reviewing Logs of Impala Queries
Synching Hive and Impala – Utilizing Invalidate Metadata
Operating Scripts utilizing Impala Shell
Task – Utilizing NYSE Data
Task – Answer

Getting Began with Sqoop

Introduction to Sqoop
Validate Supply Database – MySQL
Overview JDBC Jar to Connect with MySQL
Getting Assist utilizing Sqoop CLI
Overview of Sqoop Consumer Information
Validate Sqoop and MySQL Integration utilizing Sqoop Checklist Databases
Itemizing Tables in Database utilizing Sqoop
Run Queries in MySQL utilizing Sqoop Eval
Understanding Logs in Sqoop
Redirecting Sqoop Job Logs into Log Recordsdata

Importing knowledge from MySQL to HDFS utilizing Sqoop Import

Overview of Sqoop Import Command
Import Orders utilizing target-dir
Import Order Objects utilizing warehouse-dir
Managing HDFS Directories
Sqoop Import Execution Move
Reviewing Logs of Sqoop Import
Sqoop Import Specifying Variety of Mappers
Overview the Output Recordsdata generated by Sqoop Import
Sqoop Import Supported File Codecs
Validating avro information utilizing Avro Instruments
Sqoop Import Utilizing Compression

Apache Sqoop – Importing Data into HDFS – Customizing

Introduction to customizing Sqoop Import
Sqoop Import by Specifying Columns
Sqoop import Utilizing Boundary Question
Sqoop import whereas filtering Pointless Data
Sqoop Import Utilizing Break up By to distribute import utilizing non default column
Getting Question Outcomes utilizing Sqoop eval
Coping with tables with Composite Keys whereas utilizing Sqoop Import
Coping with tables with Non Numeric Key Fields whereas utilizing Sqoop Import
Coping with tables with No Key Fields whereas utilizing Sqoop Import
Utilizing autoreset-to-one-mapper to make use of just one mapper whereas importing knowledge utilizing Sqoop from tables with no key fields
Default Delimiters utilized by Sqoop Import for Textual content File Format
Specifying Delimiters for Sqoop Import utilizing Textual content File Format
Coping with Null Values utilizing Sqoop Import
Import Mulitple Tables from supply database utilizing Sqoop Import

Importing knowledge from MySQL to Hive Tables utilizing Sqoop Import

Fast Overview of Hive
Create Hive Database for Sqoop Import
Create Empty Hive Desk for Sqoop Import
Import Data into Hive Desk from supply database desk utilizing Sqoop Import
Managing Hive Tables whereas importing knowledge utilizing Sqoop Import utilizing Overwrite
Managing Hive Tables whereas importing knowledge utilizing Sqoop Import – Errors Out If Desk Already Exists
Understanding Execution Move of Sqoop Import into Hive tables
Overview Recordsdata generated by Sqoop Import in Hive Tables
Sqoop Delimiters vs Hive Delimiters
Totally different File Codecs supported by Sqoop Import whereas importing into Hive Tables
Sqoop Import all Tables into Hive from supply database

Exporting Data from HDFS/Hive to MySQL utilizing Sqoop Export

Introduction to Sqoop Export
Put together Data for Sqoop Export
Create Desk in MySQL for Sqoop Export
Carry out Easy Sqoop Export from HDFS to MySQL desk
Understanding Execution Move of Sqoop Export
Specifying Variety of Mappers for Sqoop Export
Troubleshooting the Points associated to Sqoop Export
Merging or Upserting Data utilizing Sqoop Export – Overview
Fast Overview of MySQL – Upsert utilizing Sqoop Export
Replace Data utilizing Replace Key utilizing Sqoop Export
Merging Data utilizing allowInsert in Sqoop Export
Specifying Columns utilizing Sqoop Export
Specifying Delimiters utilizing Sqoop Export
Utilizing Stage Desk for Sqoop Export

Submitting Sqoop Jobs and Incremental Sqoop Imports

Introduction to Sqoop Jobs
Including Password File for Sqoop Jobs
Creating Sqoop Job
Run Sqoop Job
Overview of Incremental Masses utilizing Sqoop
Incremental Sqoop Import – Utilizing The place
Incremental Sqoop Import – Utilizing Append Mode
Incremental Sqoop Import – Create Desk
Incremental Sqoop Import – Create Sqoop Job
Incremental Sqoop Import – Execute Job
Incremental Sqoop Import – Add Further Data
Incremental Sqoop Import – Rerun Job
Incremental Sqoop Import – Utilizing Final Modified

Listed here are the goals for this course.

Present Construction to the Data

Use Data Definition Language (DDL) statements to create or alter constructions within the metastore for use by Hive and Impala.

Create tables utilizing a wide range of knowledge sorts, delimiters, and file codecs
Create new tables utilizing current tables to outline the schema
Enhance question efficiency by creating partitioned tables within the metastore
Alter tables to switch the present schema
Create views to be able to simplify queries

Data Evaluation

Use Question Language (QL) statements in Hive and Impala to investigate knowledge on the cluster.

Put together reviews utilizing SELECT instructions together with unions and subqueries
Calculate combination statistics, similar to sums and averages, throughout a question
Create queries towards a number of knowledge sources through the use of be a part of instructions
Rework the output format of queries through the use of built-in features
Carry out queries throughout a gaggle of rows utilizing windowing features

Workout routines will likely be supplied to have sufficient apply to get higher at Sqoop in addition to writing queries utilizing Hive and Impala.

All of the demos are given on our state-of-the-art Huge Data cluster. In case you wouldn’t have multi-node cluster, you possibly can enroll for our labs and apply on our multi-node cluster. It is possible for you to to apply Sqoop and Hive on the cluster.

If the coupon just isn’t opening, disable Adblock, or strive one other browser.

Sqoop, Hive and Impala for Data Analysts (Formerly CCA 159)

Leave a comment

Cancel reply