Flock: Data Science Platform @ CISL

Applied research group
Collaborating with Azure Data product group
Open-sourcing our code
Apache Hadoop, REEF, Heron, MLflow

Our labs by numbers
637
Patents GAed or Public Preview
features just this year
LoC in OSS
0.5M
LoC in OSS
130+Publications in top
tier
conferences/journals
1.1M
LoC in products
600k
Servers running our
code in Azure/Cosmos

Systems considered thus far
Cloud Providers Private Services OSS

Training
Experiment Tracking
Managed Notebooks
Pipelines / Projects
Multi-Framework
Proprietary Algos
Distributed Training
Auto ML
Serving
Batch prediction
On-prem deployment
Model Monitoring
Model Validation
Data Management
Data Provenance
Data testing
Feature Store
Featurization DSL
Labelling
In-DB ML
Good Support OK Support No Support Unknown

Let Data Scientists do Data Science!

offline
online
Data-driven development
Solution Deployment
NN
Model transform
ONNX
ONNX’ Optimization
Close/Update Incidents
Job-id
Job telemetry
telemetry
application
tracking
model
training
LightGBM
policies
deployment
ONNX’
pyfunc
policies
Dhalion

DEMO Python code
import pandas as pd
import lightgbm as lgb
from sklearn import metrics
data_train = pd.read_csv("global_train_x_label_with_mapping.csv")
data_test = pd.read_csv("global_test_x_label_with_mapping.csv")
train_x = data_train.iloc[:,:-1].values
train_y = data_train.iloc[:,-1].values
test_x = data_test.iloc[:,:-1].values
test_y = data_test.iloc[:,-1].values
n_leaves = 8
n_trees = 100
clf = lgb.LGBMClassifier(num_leaves=n_leaves, n_estimators=n_trees)
clf.fit(train_x,train_y)
score = metrics.precision_score(test_y, clf.predict(test_x), average='macro’)
print("Precision Score on Test Data: " + str(score))
import mlflow
import mlflow.onnx
import multiprocessing
import torch
import onnx
from onnx import optimizer
from functools import partial
from flock import get_tree_parameters, LightGBMBinaryClassifier_Batched
import mlflow.sklearn
import mlflow
import pandas as pd
import lightgbm as lgb
from sklearn import metrics
data_train = pd.read_csv('global_train_x_label_with_mapping.csv')
data_test = pd.read_csv('global_test_x_label_with_mapping.csv')
train_x = data_train.iloc[:, :-1].values
train_y = data_train.iloc[:, (-1)].values
test_x = data_test.iloc[:, :-1].values
test_y = data_test.iloc[:, (-1)].values
n_leaves = 8
n_trees = 100
clf = lgb.LGBMClassifier(num_leaves=n_leaves, n_estimators=n_trees)
mlflow.log_param('clf_init_n_estimators', n_trees)
mlflow.log_param('clf_init_num_leaves', n_leaves)
clf.fit(train_x, train_y)
mlflow.sklearn.log_model(clf, 'clf_model')
score = metrics.precision_score(test_y, clf.predict(test_x), average='macro')
mlflow.log_param('precision_score_average', ' macro')
mlflow.log_param('score', score)
print('Precision Score on Test Data: ' + str(score))
n_features = 100
activation = 'sigmoid'
torch.set_num_threads(1)
device = torch.device('cpu')
model_name = 'griffon'
model = clf.booster_.dump_model()
n_features = clf.n_features_
tree_infos = model['tree_info']
pool = multiprocessing.Pool(8)
parameters = pool.map(partial(get_tree_parameters, n_features=n_features),
tree_infos)
lgb_nn = LightGBMBinaryClassifier_Batched(parameters, n_features, activation
).to(device)
torch.onnx.export(lgb_nn, torch.randn(1, n_features).to(device), model_name +
'_nn.onnx', export_params=True, operator_export_type=torch.onnx.
OperatorExportTypes.ONNX_ATEN_FALLBACK)
passes = ['eliminate_deadend', 'eliminate_identity',
'eliminate_nop_monotone_argmax', 'eliminate_nop_transpose',
'eliminate_unused_initializer', 'extract_constant_to_initializer',
'fuse_consecutive_concats', 'fuse_consecutive_reduce_unsqueeze',
'fuse_consecutive_squeezes', 'fuse_consecutive_transposes',
'fuse_matmul_add_bias_into_gemm', 'fuse_transpose_into_gemm',
'lift_lexical_references']
model = onnx.load(model_name + '_nn.onnx')
opt_model = optimizer.optimize(model, passes)
mlflow.onnx.log_model(opt_model, 'opt_model')
pyfunc_loaded = mlflow.pyfunc.load_pyfunc('opt_model', run_id=mlflow.
active_run().info.run_uuid)
scoring = pyfunc_loaded.predict(pd.DataFrame(test_x[:1].astype('float32'))
).values
print('Scoring through mlflow pyfunc: ', scoring)
mlflow.log_param('pyfunc_scoring', scoring[0][0])
User code Instrumented
code
Flock

Current OnCall Workflow
Revised OnCall Workflow with Griffon
A support engineer (SE) spends hours
of manual labor looking through
hundreds of metrics
After 5-6 hours of investigation, the
reason for job slow down is found.
A job goes out of SLA and
Support is alerted
A job goes out of SLA and
the SE is alerted The Job ID is fed through Griffon and
the top reasons for job slowdown are
generated automatically
The reason is found in
the top five generated
by Griffon.
All the metrics Griffon
has looked at can be
ruled out and the SE
can direct their efforts
to a smaller set of
metrics.

ONNX: Interoperability across ML frameworks
Open format to represent ML models
Backed by Microsoft, Amazon, Facebook, and several hardware vendors

Train a model using a
popular framework
such as TensorFlow
Convert the model to
ONNX format
Perform inference
efficiently across
multiple platforms and
hardware using ONNX
runtime

ONNX Runtime and optimizations
Key design points:
Graph IR
Support for multiple backends (e.g., CPU, GPU, FPGA)
Graph optimizations
Rule-based optimizer inspired by DB optimizers
Improved inference time and memory consumption
Examples: 117msec → 34msec; 250MB → 200MB

~40 ONNX
models
in production
>10 orgs
are migrating
their models to
ONNX Runtime
Average
Speedup
2.7x
ONNX Runtime in production

ONNX Runtime in production
Office – Grammar Checking Model
14.6x reduction in latency

mlflow models serve -m /artifacts/model -p 1234
curl -X POST -H "Content-Type:application/json;format=pandas-split"--data '{"columns":["alcohol","chlorides", "citric
acid", "density","fixed acidity","free sulfur dioxide", "pH", "residual sugar", "sulphates", "total sulfur dioxide", "volatile
acidity"],"data":[[12.8,0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}' http://127.0.0.1:1234/invocations
[6.379428821398614]
Deploy the server
Perform Inference
ONNX Runtime is
automatically invoked

Flock: Data Science Platform @ CISL

Recommended

More Related Content

What's hot (20)

Similar to Flock: Data Science Platform @ CISL (20)

More from Databricks (20)

Recently uploaded (20)

Flock: Data Science Platform @ CISL