Podcasts

Table of contents
  1. Data Lineage: Understanding Data Lineage at Scale with Julien Le Dem
  2. Open Telemetry

Data Lineage: Understanding Data Lineage at Scale with Julien Le Dem

podcast name Data Skeptic
description An overview of the OpenTelemetry project
listened on 02/24/2022
thoughts While ostensibly about data lineage, they don’t much get into it until the very end. It’s mostly a history of data engineering, from its outset with Hadoop and HDFS at Yahoo through to the move towards SQL, to where we are now, which is a more mature space. They talk about storage formats like Parquet, or Apache Arrow, as well as compute engines like Spark or Flink, to orchestration/event platforms like Apache Pig or Airflow. Julien Le Dem is a highly regarded data engineer in the industry.
questions Does event sourcing also fulfil this data lineage need?
more reading https://softwareengineeringdaily.com/2021/07/12/data-lineage-understanding-data-lineage-at-scale-with-julien-le-dem/

Open Telemetry

podcast name Data Skeptic
description An overview of the OpenTelemetry project
listened on 02/23/2022
thoughts OpenTelemetry Cloud Native Computing Foundation (CNCF) sponsored project and is an attempt to standardize service instrumentation. Broadly speaking, it is a set of specifications, APIs, and libraries/SDKs. You implement them in your services, and then you can implement a generic collector which all services pass data to. OpenTelemetry isn’t about the storage or viz layers of an observability stack, but rather standardizing the schemas for three types of observability signals:
- metrics
- logs
- traces
questions Where does the ELK stack fit into OpenTelemetry?
more reading https://newrelic.com/blog/best-practices/what-is-opentelemetry
https://github.com/open-telemetry