Trending Courses

Sqoop, Hive and Impala for Data Analysts (Formerly CCA 159)


As a part of Sqoop, Hive, and Impala for Data Analysts (Previously CCA 159), you’ll be taught key expertise resembling Sqoop, Hive, and Impala.

This complete course covers all facets of the certification with real-world examples and information units.

Overview of Massive Data ecosystem

HDFS Instructions to handle recordsdata

  • Introduction to HDFS for Certification Exams

  • Overview of HDFS and PropertiesFiles

  • Overview of Hadoop CLI

  • Itemizing Recordsdata in HDFS

  • Person Areas or House Directories in HDFS

  • Creating Directories in HDFS

  • Copying Recordsdata and Directories into HDFS

  • File and Listing Permissions Overview

  • Getting Recordsdata and Directories from HDFS

  • Previewing Textual content Recordsdata in HDFS

  • Copying or Transferring Recordsdata and Directories inside HDFS

  • Understanding Dimension of File System and Recordsdata

  • Overview of Block Dimension and ReplicationFactor

  • Getting File Metadata utilizing hdfs fsck

  • Sources and Workout routines

Getting Began with Hive

  • Overview of Hive Language Guide

  • Launching and utilizing Hive CLI

  • Overview of Hive Properties

  • Hive CLI Historical past and hiverc

  • Working HDFS Instructions in Hive CLI

  • Understanding Warehouse Listing

  • Creating and Utilizing Hive Databases

  • Creating and Describing Hive Tables

  • Retrieve Matadata of Tables utilizing DESCRIBE

  • Function of Hive Metastore Database

  • Overview of beeline

  • Working Hive Instructions and Queries utilizing beeline

Creating Tables in Hive utilizing Hive QL

  • Creating Tables in Hive – orders

  • Overview of Primary Data Sorts in Hive

  • Including Feedback to Columns and Tables

  • Loading Data into Hive Tables from Native File System

  • Loading Data into Hive Tables from HDFS

  • Loading Data – Overwrite vs Append

  • Creating Exterior tables in Hive

  • Specifying Location for Hive Tables

  • Distinction between Managed Desk and Exterior Desk

  • Default Delimiters in Hive Tables utilizing Textual content File

  • Overview of File Codecs in Hive

  • Variations between Hive and RDBMS

  • Truncate and Drop tables in Hive

  • Sources and Workout routines

Loading/Inserting information into Hive tables utilizing Hive QL

  • Introduction to Partitioning and Bucketing

  • Creating Tables utilizing Orc Format – order_items

  • Inserting Data into Tables utilizing Stage Tables

  • Load vs. Insert in Hive

  • Creating Partitioned Tables in Hive

  • Including Partitions to Tables in Hive

  • Loading into Partitions in Hive Tables

  • Inserting Data Into Partitions in Hive Tables

  • Insert Utilizing Dynamic Partition Mode

  • Creating Bucketed Tables in Hive

  • Inserting Data into Bucketed Tables

  • Bucketing with Sorting

  • Overview of ACID Transactions

  • Create Tables for Transactions

  • Inserting Particular person Data into Hive Tables

  • Replace and Delete Data in Hive Tables

Overview of features in Hive

  • Overview of Capabilities

  • Validating Capabilities

  • String Manipulation – Case Conversion and Size

  • String Manipulation – substr and cut up

  • String Manipulation – Trimming and Padding Capabilities

  • String Manipulation – Reverse and Concatenating A number of Strings

  • Date Manipulation – Present Date and Timestamp

  • Date Manipulation – Date Arithmetic

  • Date Manipulation – trunc

  • Date Manipulation – Utilizing date format

  • Date Manipulation – Extract Capabilities

  • Date Manipulation – Coping with Unix Timestamp

  • Overview of Numeric Capabilities

  • Data Sort Conversion Utilizing Forged

  • Dealing with Null Values

  • Question Instance – Get Phrase Rely

Writing Primary Queries in Hive

  • Overview of SQL or Hive QL

  • Execution Life Cycle of Hive Question

  • Reviewing Logs of Hive Queries

  • Projecting Data utilizing Choose and Overview of From

  • Derive Conditional Values utilizing CASE and WHEN

  • Projecting Distinct Values

  • Filtering Data utilizing The place Clause

  • Boolean Operations in The place Clause

  • Boolean OR vs IN Operator

  • Filtering Data utilizing LIKE Operator

  • Performing Primary Aggregations utilizing Mixture Capabilities

  • Performing Aggregations utilizing GROUP BY

  • Filtering Aggregated Data Utilizing HAVING

  • International Sorting utilizing ORDER BY

  • Overview of DISTRIBUTE BY

  • Sorting Data inside Teams utilizing SORT BY

  • Utilizing CLUSTERED BY

Becoming a member of Data Units and Set Operations in Hive

  • Overview of Nested Sub Queries

  • Nested Sub Queries – Utilizing IN Operator

  • Nested Sub Queries – Utilizing EXISTS Operator

  • Overview of Joins in Hive

  • Performing Internal Joins utilizing Hive

  • Performing Outer Joins utilizing Hive

  • Performing Full Outer Joins utilizing Hive

  • Map Facet Be a part of and Scale back Facet Take part Hive

  • Becoming a member of in Hive utilizing Legacy Syntax

  • Cross Joins in Hive

  • Overview of Set Operations in Hive

  • Carry out Set Union between two Hive Question Outcomes

  • Set Operations – Intersect and Minus Not Supported

Windowing or Analytics Capabilities in Hive

  • Put together HR Database in Hive with Staff Desk

  • Overview of Analytics or Windowing Capabilities in Hive

  • Performing Aggregations utilizing Hive Queries

  • Create Tables to Get Each day Income utilizing CTAS in Hive

  • Getting Lead and Lag utilizing Windowing Capabilities in Hive

  • Getting First and Final Values utilizing Windowing Capabilities in Hive

  • Making use of Rank utilizing Windowing Capabilities in Hive

  • Making use of Dense Rank utilizing Windowing Capabilities in Hive

  • Making use of Row Quantity utilizing Windowing Capabilities in Hive

  • Distinction Between rank, dense_rank, and row_number in Hive

  • Understanding the order of execution of Hive Queries

  • Overview of Nested Sub Queries in Hive

  • Filtering Data on Prime of Window Capabilities in Hive

  • Getting Prime 5 Merchandise by Income for Every Day utilizing Windowing Capabilities in Hive – Recap

Working Queries utilizing Impala

  • Introduction to Impala

  • Function of Impala Daemons

  • Impala State Retailer and Catalog Server

  • Overview of Impala Shell

  • Relationship between Hive and Impala

  • Overview of Creating Databases and Tables utilizing Impala

  • Loading and Inserting Data into Tables utilizing Impala

  • Working Queries utilizing Impala Shell

  • Reviewing Logs of Impala Queries

  • Synching Hive and Impala – Utilizing Invalidate Metadata

  • Working Scripts utilizing Impala Shell

  • Task – Utilizing NYSE Data

  • Task – Resolution

Getting Began with Sqoop

  • Introduction to Sqoop

  • Validate Supply Database – MySQL

  • Evaluate JDBC Jar to Hook up with MySQL

  • Getting Assist utilizing Sqoop CLI

  • Overview of Sqoop Person Information

  • Validate Sqoop and MySQL Integration utilizing Sqoop Checklist Databases

  • Itemizing Tables in Database utilizing Sqoop

  • Run Queries in MySQL utilizing Sqoop Eval

  • Understanding Logs in Sqoop

  • Redirecting Sqoop Job Logs into Log Recordsdata

Importing information from MySQL to HDFS utilizing Sqoop Import

  • Overview of Sqoop Import Command

  • Import Orders utilizing target-dir

  • Import Order Objects utilizing warehouse-dir

  • Managing HDFS Directories

  • Sqoop Import Execution Circulation

  • Reviewing Logs of Sqoop Import

  • Sqoop Import Specifying Variety of Mappers

  • Evaluate the Output Recordsdata generated by Sqoop Import

  • Sqoop Import Supported File Codecs

  • Validating avro recordsdata utilizing Avro Instruments

  • Sqoop Import Utilizing Compression

Apache Sqoop – Importing Data into HDFS – Customizing

  • Introduction to customizing Sqoop Import

  • Sqoop Import by Specifying Columns

  • Sqoop import Utilizing Boundary Question

  • Sqoop import whereas filtering Pointless Data

  • Sqoop Import Utilizing Cut up By to distribute import utilizing non default column

  • Getting Question Outcomes utilizing Sqoop eval

  • Coping with tables with Composite Keys whereas utilizing Sqoop Import

  • Coping with tables with Non Numeric Key Fields whereas utilizing Sqoop Import

  • Coping with tables with No Key Fields whereas utilizing Sqoop Import

  • Utilizing autoreset-to-one-mapper to make use of just one mapper whereas importing information utilizing Sqoop from tables with no key fields

  • Default Delimiters utilized by Sqoop Import for Textual content File Format

  • Specifying Delimiters for Sqoop Import utilizing Textual content File Format

  • Coping with Null Values utilizing Sqoop Import

  • Import Mulitple Tables from supply database utilizing Sqoop Import

Importing information from MySQL to Hive Tables utilizing Sqoop Import

  • Fast Overview of Hive

  • Create Hive Database for Sqoop Import

  • Create Empty Hive Desk for Sqoop Import

  • Import Data into Hive Desk from supply database desk utilizing Sqoop Import

  • Managing Hive Tables whereas importing information utilizing Sqoop Import utilizing Overwrite

  • Managing Hive Tables whereas importing information utilizing Sqoop Import – Errors Out If Desk Already Exists

  • Understanding Execution Circulation of Sqoop Import into Hive tables

  • Evaluate Recordsdata generated by Sqoop Import in Hive Tables

  • Sqoop Delimiters vs Hive Delimiters

  • Completely different File Codecs supported by Sqoop Import whereas importing into Hive Tables

  • Sqoop Import all Tables into Hive from supply database

Exporting Data from HDFS/Hive to MySQL utilizing Sqoop Export

  • Introduction to Sqoop Export

  • Put together Data for Sqoop Export

  • Create Desk in MySQL for Sqoop Export

  • Carry out Easy Sqoop Export from HDFS to MySQL desk

  • Understanding Execution Circulation of Sqoop Export

  • Specifying Variety of Mappers for Sqoop Export

  • Troubleshooting the Points associated to Sqoop Export

  • Merging or Upserting Data utilizing Sqoop Export – Overview

  • Fast Overview of MySQL – Upsert utilizing Sqoop Export

  • Replace Data utilizing Replace Key utilizing Sqoop Export

  • Merging Data utilizing allowInsert in Sqoop Export

  • Specifying Columns utilizing Sqoop Export

  • Specifying Delimiters utilizing Sqoop Export

  • Utilizing Stage Desk for Sqoop Export

Submitting Sqoop Jobs and Incremental Sqoop Imports

  • Introduction to Sqoop Jobs

  • Including Password File for Sqoop Jobs

  • Creating Sqoop Job

  • Run Sqoop Job

  • Overview of Incremental Masses utilizing Sqoop

  • Incremental Sqoop Import – Utilizing The place

  • Incremental Sqoop Import – Utilizing Append Mode

  • Incremental Sqoop Import – Create Desk

  • Incremental Sqoop Import – Create Sqoop Job

  • Incremental Sqoop Import – Execute Job

  • Incremental Sqoop Import – Add Extra Data

  • Incremental Sqoop Import – Rerun Job

  • Incremental Sqoop Import – Utilizing Final Modified

Listed here are the goals for this course.

Present Construction to the Data

Use Data Definition Language (DDL) statements to create or alter buildings within the metastore for use by Hive and Impala.

  • Create tables utilizing a wide range of information varieties, delimiters, and file codecs

  • Create new tables utilizing current tables to outline the schema

  • Enhance question efficiency by creating partitioned tables within the metastore

  • Alter tables to switch the prevailing schema

  • Create views with the intention to simplify queries

Data Evaluation

Use Question Language (QL) statements in Hive and Impala to investigate information on the cluster.

  • Put together studies utilizing SELECT instructions together with unions and subqueries

  • Calculate mixture statistics, resembling sums and averages, throughout a question

  • Create queries towards a number of information sources by utilizing be part of instructions

  • Remodel the output format of queries by utilizing built-in features

  • Carry out queries throughout a bunch of rows utilizing windowing features

Workout routines might be supplied to have sufficient observe to get higher at Sqoop in addition to writing queries utilizing Hive and Impala.

All of the demos are given on our state-of-the-art Massive Data cluster. If you happen to wouldn’t have multi-node cluster, you possibly can join for our labs and observe on our multi-node cluster. It is possible for you to to observe Sqoop and Hive on the cluster.



Get Coupon

Join us on telegram for Course Updates

Join Whatsapp Group for Daily Free Courses

Leave a Reply

Your email address will not be published. Required fields are marked *