TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow

TensorFlow Extended (TFX)
An End-to-End ML Platform
Konstantinos (Gus) Katsiapis
Google
Ahmet Altay
Google

Data
Ingestion
Data
Analysis + Validation
Feature
Engineering
Trainer
Model Evaluation
and Validation
Serving Logging
Shared Utilities for Garbage Collection, Data Access Controls
Pipeline Storage
Tuner
Shared Conﬁguration Framework and Job Orchestration
Integrated Frontend for Job Management, Monitoring, Debugging, Data/Model/Evaluation Visualization
TensorFlow Extended (TFX) is an
end-to-end ML pipeline for TensorFlow

TFX powers our most important bets and
products...
(incl. )
Major ProductsAlphaBets

… and some of our most important partners.

What is Apache Beam?
- A unified batch and stream distributed processing API
- A set of SDK frontends: Java, Python, Go, Scala, SQL
- A set of Runners which can execute Beam jobs into
various backends: Local, Apache Flink, Apache Spark,
Apache Gearpump, Apache Samza, Apache Hadoop,
Google Cloud Dataflow, …

Building Components out of Libraries
Data Ingestion
TensorFlow
Transform
Estimator Model
TensorFlow
Model Analysis
Honoring
Validation
Outcomes
TensorFlow
Data Validation
TensorFlow
Serving
ExampleGen
StatisticsGen
SchemaGen
Example
Validator
Transform Trainer
Evaluator
Model
Validator
Pusher Model Server
Powered by Beam Powered by Beam

What makes a Component
Model
Validator
Packaged binary
or container

Last Validated
Model
New (Candidate)
Model
Validation
Outcome
Well deﬁned
inputs and outputs
Model
Validator

Config
Last Validated
Model
New (Candidate)
Model
Validation
Outcome
Well defined
configuration
Model
Validator

Metadata Store
Last Validated
Model
New (Candidate)
Model
Validation
Outcome
Context
Model
Validator
Conﬁg

Metadata Store
Trainer
Last Validated
Model
New (Candidate)
Model
New Model
Validation
Outcome
Pusher
New (Candidate)
Model
Validation
Outcome
Deployment targets:
TensorFlow Serving
TensorFlow Lite
TensorFlow JS
TensorFlow Hub
Model
Validator
Conﬁg

Trainer
Metadata Store? That’s new
Task-Aware Pipelines
Transform

Metadata Store? That’s new
Task-Aware Pipelines
Input Data
Transformed
Data
Trained
Models
Deployment
Task- and Data-Aware Pipelines
Pipeline + Metadata Storage
Training Data
TrainerTransformTrainerTransform

What’s in the Metadata Store?
Trained
Models
Type definitions of Artifacts and their Properties
E.g., Models, Data, Evaluation Metrics

Trained
Models
Trainer Execution Records (Runs) of Components
E.g., Runtime Configuration, Inputs + Outputs

Trained
Models
Trainer Execution Records (Runs) of Components
E.g., Runtime Configuration, Inputs + Outputs
Lineage Tracking Across All Executions
E.g., to recurse back to all inputs of a specific artifact

List all training runs and
attributes

Visualize lineage of a specific model
Model artifact
that was created

Visualize data a model was trained on

Visualize sliced eval metrics associated
with a model

Launch TensorBoard for a specific model run

Compare data statistics for multiple models

Examples of Metadata-Powered Functionality
Use-cases enabled by lineage tracking

Compare previous model runsUse-cases enabled by lineage tracking

Compare previous model runs
Carry-over state from previous models

Use-cases enabled by lineage tracking Compare previous model runs
Carry-over state from previous models Re-use previously computed outputs

How do we orchestrate TFX?
Component Component Component

Metadata Store
Component
Driver
Publisher
Component
Driver
Publisher
Component
Driver
Publisher

Metadata Store
Component
Driver
Publisher
Component
Driver
Publisher
Component
Driver
Publisher
Executor Executor Executor

Metadata Store
Executors do the work
Driver
Transform, etc.
Publisher
Beam
Spark Dataflow

class Executor(base_executor.BaseExecutor):
"""Generic TFX statsgen executor."""
...
def Do(...) -> None:
"""Computes stats for each split of input using tensorflow_data_validation.
...
with beam.Pipeline(argv=self._get_beam_pipeline_args()) as p:
for split, instance in split_to_instance.items():
...
output_path = os.path.join(output_uri, _DEFAULT_FILE_NAME)
_ = (
p
| 'ReadData.' + split >> beam.io.ReadFromTFRecord(file_pattern=input_uri)
| 'DecodeData.' + split >> tf_example_decoder.DecodeTFExample()
| 'GenerateStatistics.' + split >> stats_api.GenerateStatistics(stats_options)
| 'WriteStatsOutput.' + split >> beam.io.WriteToTFRecord(
output_path,shard_name_template='',
coder=beam.coders.ProtoCoder(
statistics_pb2.DatasetFeatureStatisticsList)))
tf.logging.info('Statistics written to {}.'.format(output_uri))

Metadata Store
Driver
Trainer
Publisher
TensorFlow

Metadata Store
Driver
Pusher, etc
Publisher

TFX Conﬁg
Metadata Store
Component
Driver
Publisher
Component
Driver
Publisher
Component
Driver
Publisher

def create_pipeline():
"""Implements the chicago taxi pipeline with TFX."""
examples = csv_input(os.path.join(data_root, 'simple'))
example_gen = CsvExampleGen(input_base=examples)
statistics_gen = StatisticsGen(input_data=...)
infer_schema = SchemaGen(stats=...)
validate_stats = ExampleValidator(stats=..., schema=...)
# Performs transformations and feature engineering in training and serving
transform = Transform(
input_data=example_gen.outputs.examples,
schema=infer_schema.outputs.output,
module_file=_taxi_module_file)
trainer = Trainer(...)
model_analyzer = Evaluator(examples=..., model_exports=...)
model_validator = ModelValidator(examples=..., model=...)
pusher = Pusher(model_export=..., model_blessing=..., serving_model_dir=...)
return [example_gen, statistics_gen, infer_schema, validate_stats, transform, trainer,
model_analyzer, model_validator, pusher]
pipeline = AirflowDAGRunner(_airflow_config).run(_create_pipeline())

… Back to orchestration
TFX Conﬁg
Metadata Store
Component
Driver
Publisher
Component
Driver
Publisher
Component
Driver
Publisher

Bring your very own favorite orchestrator
TFX Config
Metadata Store
Component
Driver
Publisher
Component
Driver
Publisher
Component
Driver
Publisher
Airflow Runtime Kubeflow Runtime Your own runtime...

Examples of orchestrated TFX pipelines
Airflow Kubeflow Pipelines

Overview
Data Ingestion
Data Analysis &
Validation
Data
Transformation

45
Component: ExampleGen
Example
Gen
Raw Data
Inputs and Outputs
CSV TF Record
Split
TF Record
Data
Training
Eval

Component: ExampleGen
examples = csv_input(os.path.join(data_root, 'simple'))
example_gen = CsvExampleGen(input_base=examples)
Conﬁguration
Eval
Example
Gen
Raw Data
CSV TF Record
Split
TF Record
Data
Training
Eval
Inputs and Outputs

Data Analysis & Validation
Data Analysis &
Validation

49
Component: StatisticsGen
StatisticsGen
Data
ExampleGen
Inputs and Outputs
Statistics
● Training
● Eval
● Serving logs (for skew detection)

50
StatisticsGen
Data
● Captures shape of data
● Visualization highlights unusual stats
● Overlay helps with comparison
ExampleGen
Inputs and Outputs
Statistics

statistics_gen =
StatisticsGen(input_data=example_gen.outputs.examples)
Conﬁguration
Visualization
StatisticsGen
Data
ExampleGen
Inputs and Outputs
Statistics

52
Why are my tip predictions bad in the morning hours?

54
Component: SchemaGen
SchemaGen
Statistics
StatisticsGen
Inputs and Outputs
Schema
● High-level description of the data
○ Expected features
○ Expected value domains
○ Expected constraints
○ and much more!
● Codiﬁes expectations of “good” data
● Initially inferred, then user-curated

55
Component: SchemaGen
infer_schema = SchemaGen(stats=statistics_gen.outputs.output)
Conﬁguration
Visualization
SchemaGen
Statistics
StatisticsGen
Inputs and Outputs
Schema

56
What are expected values for payment types?

58
Component: ExampleValidator
Example
Validator
Statistics Schema
StatisticsGen SchemaGen
Inputs and Outputs
Anomalies
Report
● Missing features
● Wrong feature valency
● Training/serving skew
● Data distribution drift
● ...

59
Component: ExampleValidator
validate_stats = ExampleValidator(
stats=statistics_gen.outputs.output,
schema=infer_schema.outputs.output)
Conﬁguration
Visualization
Example
Validator
Statistics Schema
StatisticsGen SchemaGen
Inputs and Outputs
Anomalies
Report

60
Is this new taxi company name a typo or
a new company?

62
Using tf.Transform for feature
transformations.

63
transformations.

64
transformations.
Training Serving

65
Component: Transform
Transform
Data Schema
Transform
Graph
Transformed
Data
ExampleGen SchemaGen
Trainer
Inputs and Outputs
● User-provided transform code (TF Transform)
● Schema for parsing
Code

66
Transform
Data
Transform Graph
● Applied at training time
● Embedded in serving graph
Transformed Data
● Optional, for performance optimization
Schema
Transform
Graph
Transformed
Data
Trainer
Inputs and Outputs
Code

transform = Transform(
input_data=example_gen.outputs.examples,
module_file=taxi_module_file)
Conﬁguration
for key in _DENSE_FLOAT_FEATURE_KEYS:
outputs[_transformed_name(key)] = transform.scale_to_z_score(
_fill_in_missing(inputs[key]))
# ...
outputs[_transformed_name(_LABEL_KEY)] = tf.where(
tf.is_nan(taxi_fare),
tf.cast(tf.zeros_like(taxi_fare), tf.int64),
# Test if the tip was > 20% of the fare.
tf.cast(
tf.greater(tips, tf.multiply(taxi_fare, tf.constant(0.2))), tf.int64))
# ...
Code
Transform
Data Schema
Transform
Graph
Transformed
Data
Trainer
Inputs and Outputs
Code

69
Component: Trainer
Trainer
Data Schema
Transform SchemaGen
Evaluator
Inputs and Outputs
Code
Transform
Graph
Model
Validator
Pusher
Model(s)
● User-provided training code (TensorFlow)
● Optionally, transformed data

Trainer
Data Schema
Transform SchemaGen
Evaluator
Inputs and Outputs
Code
Transform
Graph
Model
Validator
Pusher
Model(s)
70
Component: Trainer
Highlight: SavedModel Format
TensorFlow
Serving
TensorFlow
Model Analysis
Train, Eval, and Inference Graphs
SignatureDef
Eval
Metadata
SignatureDef

Component: Trainer
trainer = Trainer(
module_file=taxi_module_file,
transformed_examples=transform.outputs.transformed_examples,
transform_output=transform.outputs.transform_output,
train_steps=10000,
eval_steps=5000,
warm_starting=True)
Conﬁguration
Code: Just TensorFlow :)
Trainer
Data Schema
Transform SchemaGen
Evaluator
Inputs and Outputs
Code
Transform
Graph
Model
Validator
Pusher
Model(s)

74
Component: Evaluator
Evaluator
Data Model
ExampleGen Trainer
Inputs and Outputs
Evaluation
Metrics
● Evaluation split of data
● Eval spec for slicing of metrics

Component: Evaluator
model_analyzer = Evaluator(
examples=examples_gen.outputs.output,
eval_spec=taxi_eval_spec,
model_exports=trainer.outputs.output)
Conﬁguration
Visualization
Evaluator
Data Model
ExampleGen Trainer
Inputs and Outputs
Evaluation
Metrics

77
Component: ModelValidator
Model
Validator
Data
ExampleGen Trainer
Inputs and Outputs
Validation
Outcome
Model (x2)
● Evaluation split of data
● Last validated model
● New candidate model

Component: ModelValidator
model_validator = ModelValidator(
examples=examples_gen.outputs.output,
model=trainer.outputs.output,
eval_spec=taxi_mv_spec)
Conﬁguration
● Conﬁguration options
○ Validate using current eval data
○ “Next-day eval”, validate using unseen data
Model
Validator
Data
ExampleGen Trainer
Inputs and Outputs
Validation
Outcome
Model (x2)

80
Component: Pusher
Pusher
Validation
Outcome
Model
Validator
Inputs and Outputs
Pusher
Pusher
Deployment
Options

Component: Pusher
pusher = Pusher(
model_export=trainer.outputs.output,
model_blessing=model_validator.outputs.blessing,
serving_model_dir=serving_model_dir)
Conﬁguration
● Block push on validation outcome
● Push destinations supported today
○ Filesystem (TensorFlow Lite, TensorFlow JS)
○ TensorFlow Serving
Pusher
Validation
Outcome
Model
Validator
Inputs and Outputs
Pusher
Pusher
Deployment
Options

TFX Data Parallel
Processing
Apache Beam and Apache Spark

Data
Ingestion
TensorFlow
Data Validation
TensorFlow
Transform
Estimator
or Keras
Model
TensorFlow
Model Analysis
TensorFlow
Serving
Logging
Shared Utilities for Garbage Collection, Data Access Controls
Pipeline Storage
Tuner
Shared Conﬁguration Framework and Job Orchestration
Integrated Frontend for Job Management, Monitoring, Debugging, Data/Model/Evaluation Visualization
TFX + Apache Beam

Beam Vision
Provide a comprehensive portability framework
for data processing pipelines, one that allows you
to write your pipeline once in your language of
choice and run it with minimal effort on the
execution engine of choice.

Apache Beam
Sum Per Key
input | Sum.PerKey()
Python
input.apply(
Sum.integersPerKey())
Java
stats.Sum(s, input)
Go
SELECT key, SUM(value)
FROM input GROUP BY key
SQL
Cloud Dataﬂow
Apache Spark
Apache Flink
Apache Apex
Gearpump
Apache Samza
Apache Nemo
(incubating)
IBM Streams

How does Beam (Java) map to Spark?
Beam Java: Already runs on Spark!

How does Beam (Python) map to Spark?
Beam Portability (Python, …)
• Active work in progress!
• Several PRs are already in!
– Supports: Impulse, ParDo, GroupByKey,
Combine, Flatten, PAssert, Metrics, Side
inputs, ...
– Missing: State/Timers, SDF, ...

How does Beam map to Spark?
Call to action!
• Help with code, reviews, testing.
• Tracking JIRA(s)
– BEAM-2891
– BEAM-2590

Get started with TensorFlow Extended
(TFX)
An End-to-End ML Platform
github.com/tensorflow/tfx
tensorflow.org/tfx

TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow

Recommended

More Related Content

What's hot (20)

Similar to TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow (20)

More from Databricks (20)

Recently uploaded (20)

TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow