The following JD received from GlobalLogic shows the standard and quality of the company. The same points are placed twice with different heading as "Job Description" and "Job Responsibility". it is just copy and paste.
Job Description
Data engineer with 6+ years of hands on experience working on Big Data Platforms
Experience building and optimizing Big data data pipelines and data sets ranging from Data ingestion to Processing to Data Visualization.
Good Experience in writing and optimizing Spark Jobs, Spark SQL etc. Should have worked on both batch and steaming data processing
Good experience in any one programming language -Scala/Python, Python preferred.
Experience in writing and optimizing complex Hive and SQL queries to process huge data. good with UDFs, tables, joins, Views etc
Experience in using Kafka or any other message brokers
Configuring, monitoring and scheduling of jobs using Oozie and/or Airflow
Processing streaming data directly from Kafka using Spark jobs, expereince in Spark- streaming is must
Should be able to handling different file formats(ORC, AVRO and Parquet) and unstructured data
Should have experience with any one No SQL databases like Amazon S3 etc
Should have worked on any of the Data warehouse tools like AWS Redshift or Snowflake or BigQuery etc
Work expereince on any one cloud AWS or GCP or Azure
Good to have skills:
Experience in AWS cloud services like EMR, S3, Redshift, EKS/ECS etc
Experience in GCP cloud services like Dataproc, Google storage etc
Experience in working with huge Big data clusters with millions of records
Experience in working with ELK stack, specially Elasticsearch
Experience in Hadoop MapReduce, Apache Flink, Kubernetes etc
Job Responsibilities
Data engineer with 6+ years of hands on experience working on Big Data Platforms
Experience building and optimizing Big data data pipelines and data sets ranging from Data ingestion to Processing to Data Visualization.
Good Experience in writing and optimizing Spark Jobs, Spark SQL etc. Should have worked on both batch and steaming data processing
Good experience in any one programming language -Scala/Python, Python preferred.
Experience in writing and optimizing complex Hive and SQL queries to process huge data. good with UDFs, tables, joins, Views etc
Experience in using Kafka or any other message brokers
Configuring, monitoring and scheduling of jobs using Oozie and/or Airflow
Processing streaming data directly from Kafka using Spark jobs, expereince in Spark- streaming is must
Should be able to handling different file formats(ORC, AVRO and Parquet) and unstructured data
Should have experience with any one No SQL databases like Amazon S3 etc
Should have worked on any of the Data warehouse tools like AWS Redshift or Snowflake or BigQuery etc
Work expereince on any one cloud AWS or GCP or Azure
Good to have skills:
Experience in AWS cloud services like EMR, S3, Redshift, EKS/ECS etc
Experience in GCP cloud services like Dataproc, Google storage etc
Experience in working with huge Big data clusters with millions of records
Experience in working with ELK stack, specially Elasticsearch
Experience in Hadoop MapReduce, Apache Flink, Kubernetes etc