Google Cloud Certified Professional Data Engineer 2023

Description

Designing information processing techniques

Deciding on the suitable storage applied sciences. Issues embrace:

● Mapping storage techniques to enterprise necessities

● Data modeling

● Commerce-offs involving latency, throughput, transactions

● Distributed techniques

● Schema design

Designing information pipelines. Issues embrace:

● Data publishing and visualization (e.g., BigQuery)

● Batch and streaming information (e.g., Dataflow, Dataproc, Apache Beam, Apache Spark and Hadoop ecosystem, Pub/Sub, Apache Kafka)

● On-line (interactive) vs. batch predictions

● Job automation and orchestration (e.g., Cloud Composer)

Designing a knowledge processing answer. Issues embrace:

● Alternative of infrastructure

● System availability and fault tolerance

● Use of distributed techniques

● Capability planning

● Hybrid cloud and edge computing

● Structure choices (e.g., message brokers, message queues, middleware, service-oriented structure, serverless features)

● Not less than as soon as, in-order, and precisely as soon as, and so on., occasion processing

Migrating information warehousing and information processing. Issues embrace:

● Consciousness of present state and learn how to migrate a design to a future state

● Migrating from on-premises to cloud (Data Switch Service, Switch Equipment, Cloud Networking)

● Validating a migration

Constructing and operationalizing information processing techniques

Constructing and operationalizing storage techniques. Issues embrace:

● Efficient use of managed companies (Cloud Bigtable, Cloud Spanner, Cloud SQL, BigQuery, Cloud Storage, Datastore, Memorystore)

● Storage prices and efficiency

● Life cycle administration of knowledge

Constructing and operationalizing pipelines. Issues embrace:

● Data cleaning

● Batch and streaming

● Transformation

● Data acquisition and import

● Integrating with new information sources

Constructing and operationalizing processing infrastructure. Issues embrace:

● Provisioning assets

● Monitoring pipelines

● Adjusting pipelines

● Testing and high quality management

Operationalizing machine studying fashions

Leveraging pre-built ML fashions as a service. Issues embrace:

● ML APIs (e.g., Imaginative and prescient API, Speech API)

● Customizing ML APIs (e.g., AutoML Imaginative and prescient, Auto ML textual content)

● Conversational experiences (e.g., Dialogflow)

Deploying an ML pipeline. Issues embrace:

● Ingesting acceptable information

● Retraining of machine studying fashions (AI Platform Prediction and Coaching, BigQuery ML, Kubeflow, Spark ML)

● Steady analysis

Selecting the suitable coaching and serving infrastructure. Issues embrace:

● Distributed vs. single machine

● Use of edge compute

● {Hardware} accelerators (e.g., GPU, TPU)

Measuring, monitoring, and troubleshooting machine studying fashions. Issues embrace:

● Machine studying terminology (e.g., options, labels, fashions, regression, classification, advice, supervised and unsupervised studying, analysis metrics)

● Impression of dependencies of machine studying fashions

● Frequent sources of error (e.g., assumptions about information)

Making certain answer high quality

Designing for safety and compliance. Issues embrace:

● Identification and entry administration (e.g., Cloud IAM)

● Data safety (encryption, key administration)

● Making certain privateness (e.g., Data Loss Prevention API)

● Authorized compliance (e.g., Well being Insurance coverage Portability and Accountability Act (HIPAA), Kids’s On-line Privateness Safety Act (COPPA), FedRAMP, Common Data Safety Regulation (GDPR))

Making certain scalability and effectivity. Issues embrace:

● Constructing and working take a look at suites

● Pipeline monitoring (e.g., Cloud Monitoring)

● Assessing, troubleshooting, and bettering information representations and information processing infrastructure

● Resizing and autoscaling assets

Making certain reliability and constancy. Issues embrace:

● Performing information preparation and high quality management (e.g., Dataprep)

● Verification and monitoring

● Planning, executing, and stress testing information restoration (fault tolerance, rerunning failed jobs, performing retrospective re-analysis)

● Selecting between ACID, idempotent, ultimately constant necessities

Making certain flexibility and portability. Issues embrace:

● Mapping to present and future enterprise necessities

● Designing for information and software portability (e.g., multicloud, information residency necessities)

● Data staging, cataloging, and discovery

If the coupon just isn’t opening, disable Adblock, or strive one other browser.

Google Cloud Certified Professional Data Engineer 2023

Leave a comment

Cancel reply