SlideShare a Scribd company logo
Apache HBase at Airbnb
JINGWEI LU, LIYIN TANG, AND JASON ZHANG
1
Data Infrastructure at Airbnb
Event
Logs
MySQL
Dumps
Gold Cluster
HDFS
Hive
Kafk
a
Sqoo
p
Silver Cluster Spark Cluster
Spark
ReAi
r
Airflow Scheduling
S3
Presto Cluster
AirPal
Caravel
Tableau
Batch Infrastructure
Yarn HDFS
Hive
Yarn
Jingwei Lu, Liyin Tang, Jason Zhang
3
Streaming at Airbnb
Event
Logging
MySQL
BINLOG
Cluster
HDFS
Hive
Spinal tap
Presto Cluster
Yarn
Kafk
a
HBase
Spark Streaming
Datadog
Druid
Kafka
Jingwei Lu, Liyin Tang, Jason Zhang
4
Growing Pain
Stateless
Jingwei Lu, Liyin Tang, Jason Zhang
Computation SinkSource
DStream DF DF
Stateful
Jingwei Lu, Liyin Tang, Jason Zhang
Computatio
n
Source
DStream DF DF
Sink1
Sink2
Sink N
State Storage
RDD
Multiple Streams
Jingwei Lu, Liyin Tang, Jason Zhang
DataFrame
Sink1
Process
A
Sink2
Sink3
SinkN
…
DataFrame
Sink1
Process
N
Sink2
Sink3
SinkN
…
Source
DStream
Align by Time
DataFram
e
DataFram
e
State
Storage
Source
DStream
…
Streaming + Batch
Jingwei Lu, Liyin Tang, Jason Zhang
DataFrame
Sink1
Process
A
Sink2
Sink3
SinkN
…
DataFram
e
State
Storage
Sourc
e
DStream
Sourc
e
…
Align by Time
…
DataFrame
Sink1
Process
A
Sink2
Sink3
SinkN
…
Simplify and Unify
AirStream Architecture
Jingwei Lu, Liyin Tang, Jason Zhang
Sources
Stream #1 Stream #N
Hive Tables
HBase
Tables
Virtual Table Views for Computation
Sinks
…
Customized ComputationSpark SQL
Simple Config
HBase Services
Streaming
Sources
Druid
AirStream Architecture
Jingwei Lu, Liyin Tang, Jason Zhang
Sources
Stream #1 Stream #N
Hive Tables
HBase
Tables
Virtual Table Views for Computation
Sinks
…
Customized ComputationSpark SQL
HBase Services
Streaming
Sources
Druid
Same Computation
for Batch
processing
Stateful
Jingwei Lu, Liyin Tang, Jason Zhang
State Store
• Merge changes
• Provide fast lookup
• Fast persistent storage across
streaming and batch jobs
14
Why HBase
Jingwei Lu, Liyin Tang, Jason Zhang
Rich Functionalities
Rich Integration with Hadoop EcoSystem
Easy Management
Strong Community
Reliable and Scalable
HBase State Store
Operators in Airstream
Jingwei Lu, Liyin Tang, Jason Zhang
16
Full Table Scan
Simple Aggregation
Bulk Upload
Key/Prefix Lookup
Update
Jingwei Lu, Liyin Tang, Jason Zhang
Computation DAG
17
Input Data
Left Outer Join Result
Key Lookup
Jingwei Lu, Liyin Tang, Jason Zhang
Key Space Design
• Hash partition key space
for load balance
• Composite key for K -> V
• Support full key lookup
• Prefix lookup supported for
all keys used in hash
function
Hash key1 key2 key3
Hash based on key prefix
Hash key1 key2
Lookup based on key prefix
key1 = ‘value1’ and key2 = ‘value2’
18
• Partition based on key before write
• Use bulk upload for large volume update
Write Performance
Jingwei Lu, Liyin Tang, Jason Zhang
19
Case Study
Jingwei Lu, Liyin Tang, Jason Zhang
Experiment realtime feedback
Update
Experiment Assignment
Event
Lookup
HBase
with TTL
Booking Event
Druid
Datado
g
20
one airstream job
Realtime Data Ingestion
Realtime Ingestion on HBase
Data Infrastructure
MySQL
Analytica
l Events
Kafka
Spark
Streamin
g
HBase
HDFS
Presto/Hive/Spar
k
Source
Inges
t
Realtime
Query
Snapsh
ot
Batch
Query
Jingwei Lu, Liyin Tang, Jason Zhang
22
Access Data in HBase
Jingwei Lu, Liyin Tang, Jason Zhang
HBase
Hive Presto
Spark
SQL
Spark
Streaming
Batch Jobs Interactive Query Streaming
HDFS
Snapshot
Table Mapping/Unifed View on realtime data
23
Snapshot & Reseed
Jingwei Lu, Liyin Tang, Jason Zhang
HBase HDFS
Snapshot(HFile Links)
Bulk Upload
24
Case Study 1: Events Ingestion
Jingwei Lu, Liyin Tang, Jason Zhang
Kafka
topic
…
topic
topic
Spark
Executor
1
…
Executor
2
Executor
K
HBase
DeDup
HDFS
Region1…Region2R
egion M
Daily
Snapshot
Realtime
Query
Hive
Presto
Events
Partition
25
Case Study 2: Streaming DB Export
KafkaRDS
Table1
…
Spinalta
p.
Table1
…
Table2
TableN
Spinaltap.
Table2
Spinaltap.
TableN
Spark
Executor
1
…
Executor2
Executor K
HBase
Region1
…
Region2
Region M
HDFS
Region1…Region2R
egion M
Daily Snapshot
Realtime Query
Jingwei Lu, Liyin Tang, Jason Zhang
26
Case Study: Streaming DB Export
Rows CF: Colums Version Value
<ShardKey><DB_TABLE_#1><PK_a=A> id Fri May 19 00:33:19 2016 101
<ShardKey><DB_TABLE_#1><PK_a=A> city Fri May 19 00:33:19 2016 San Francisco
<ShardKey><DB_TABLE_#1><PK_a=A> city Fri May 10 00:34:19 2016 New York
<ShardKey><DB_TABLE_#2><PK_a=A’> id Fri May 19 00:33:19 2016 1
Jingwei Lu, Liyin Tang, Jason Zhang
27
Case Study: Streaming DB Export
TXN 1
Commit_TS:
101
…
TXN 2
Commit_TS:
102
TXN 3
Commit_TS:
103
TXN N
Commit_TS:
N’
Binlog Order
Jingwei Lu, Liyin Tang, Jason Zhang
28
Case Study: Streaming DB Export
TXN 1
Commit_TS:
101
…
TXN 2
Commit_TS:
103
TXN 3
Commit_TS:
102
TXN N
Commit_TS:
N’
NTP
Binlog Order
Jingwei Lu, Liyin Tang, Jason Zhang
29
Case Study: Streaming DB Export
TXN 1
Commit_TS:
101
…
Binlog Order
TXN 2
Commit_TS:
103
TXN 3
Commit_TS:
102
TXN N
Commit_TS:
N’
Point-in-Time Restore on TS 102
Jingwei Lu, Liyin Tang, Jason Zhang
30
Case Study: Streaming DB Export
Rows CF: Colums Version Value
<ShardKey><DB_TABLE_#1><PK_a=A> id bin100 101
<ShardKey><DB_TABLE_#1><PK_a=A> city bin101 San Francisco
<ShardKey><DB_TABLE_#1><PK_a=A> city bin102 New York
<ShardKey><DB_TABLE_#2><PK_a=A’> id bin100 1
Jingwei Lu, Liyin Tang, Jason Zhang
31
Case Study: Streaming DB Export
Rows Version (Logical Offset) Value
<ShardKey><DB_TABLE_#1><2016-05-23 23><100> 100 mysql-bin.00000:100
<ShardKey><DB_TABLE_#1><2016-05-23 23><101> 101 mysql-bin.00000:101
<ShardKey><DB_TABLE_#1><2016-05-23 23><103> 103 mysql-bin.00000:103
<ShardKey><DB_TABLE_#1><2016-05-24 00><102> 102 mysql-bin.00000:102
Jingwei Lu, Liyin Tang, Jason Zhang
32
Case Study: Streaming DB Export
Rows Version (Logical Offset) Value
<ShardKey><DB_TABLE_#1><2016-05-23 23><100> 100 mysql-bin.00000:100
<ShardKey><DB_TABLE_#1><2016-05-23 23><101> 101 mysql-bin.00000:101
<ShardKey><DB_TABLE_#1><2016-05-23 23><103> 103 mysql-bin.00000:103
<ShardKey><DB_TABLE_#1><2016-05-24 00><102> 102 mysql-bin.00000:102
Jingwei Lu, Liyin Tang, Jason Zhang
33
Summary
Jingwei Lu, Liyin Tang, Jason Zhang
Scalable and Reliable
Rich Stateful Computation
Rich Integration with Hadoop EcoSystem
Easy Operation
35

More Related Content

What's hot (20)

Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
Benjamin Leonhardi
 
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
DataWorks Summit/Hadoop Summit
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Cloudera, Inc.
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


Cloudera, Inc.
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data TransportApache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
Wes McKinney
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived
Vinoth Chandar
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
DataWorks Summit
 
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta LakeNear Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
Databricks
 
Redis persistence in practice
Redis persistence in practiceRedis persistence in practice
Redis persistence in practice
Eugene Fidelin
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
强 王
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
University of California, Santa Cruz
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Databricks
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kai Wähner
 
InnoDB Internal
InnoDB InternalInnoDB Internal
InnoDB Internal
mysqlops
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
Isheeta Sanghi
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Cloudera, Inc.
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


Cloudera, Inc.
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data TransportApache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
Wes McKinney
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived
Vinoth Chandar
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta LakeNear Real-Time Data Warehousing with Apache Spark and Delta Lake
Near Real-Time Data Warehousing with Apache Spark and Delta Lake
Databricks
 
Redis persistence in practice
Redis persistence in practiceRedis persistence in practice
Redis persistence in practice
Eugene Fidelin
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
强 王
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Databricks
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kai Wähner
 
InnoDB Internal
InnoDB InternalInnoDB Internal
InnoDB Internal
mysqlops
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
Isheeta Sanghi
 

Similar to Apache HBase at Airbnb (20)

Airstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbAirstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At Airbnb
Jen Aman
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Databricks
 
HBaseCon2017 Data Product at AirBnB
HBaseCon2017 Data Product at AirBnBHBaseCon2017 Data Product at AirBnB
HBaseCon2017 Data Product at AirBnB
HBaseCon
 
Riding the Elephant - Hadoop 2.0
Riding the Elephant - Hadoop 2.0Riding the Elephant - Hadoop 2.0
Riding the Elephant - Hadoop 2.0
Simon Elliston Ball
 
Building Stream Processing as a Service
Building Stream Processing as a ServiceBuilding Stream Processing as a Service
Building Stream Processing as a Service
Steven Wu
 
2014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part12014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part1
Adam Muise
 
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the JobAkka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
Lightbend
 
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
Paolo Castagna
 
The Rise of Streaming SQL
The Rise of Streaming SQLThe Rise of Streaming SQL
The Rise of Streaming SQL
Sriskandarajah Suhothayan
 
[WSO2Con USA 2018] The Rise of Streaming SQL
[WSO2Con USA 2018] The Rise of Streaming SQL[WSO2Con USA 2018] The Rise of Streaming SQL
[WSO2Con USA 2018] The Rise of Streaming SQL
WSO2
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
confluent
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
Ferran Galí Reniu
 
Data platform evolution
Data platform evolutionData platform evolution
Data platform evolution
Lev Brailovskiy
 
Data pipeline with kafka
Data pipeline with kafkaData pipeline with kafka
Data pipeline with kafka
Mole Wong
 
Down the event-driven road: Experiences of integrating streaming into analyti...
Down the event-driven road: Experiences of integrating streaming into analyti...Down the event-driven road: Experiences of integrating streaming into analyti...
Down the event-driven road: Experiences of integrating streaming into analyti...
inovex GmbH
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Flink Forward
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQ
pivotalny
 
Extending Analytic Reach
Extending Analytic ReachExtending Analytic Reach
Extending Analytic Reach
Agilisium Consulting
 
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike LimcacoExtending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
Data Con LA
 
Airstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbAirstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At Airbnb
Jen Aman
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Databricks
 
HBaseCon2017 Data Product at AirBnB
HBaseCon2017 Data Product at AirBnBHBaseCon2017 Data Product at AirBnB
HBaseCon2017 Data Product at AirBnB
HBaseCon
 
Riding the Elephant - Hadoop 2.0
Riding the Elephant - Hadoop 2.0Riding the Elephant - Hadoop 2.0
Riding the Elephant - Hadoop 2.0
Simon Elliston Ball
 
Building Stream Processing as a Service
Building Stream Processing as a ServiceBuilding Stream Processing as a Service
Building Stream Processing as a Service
Steven Wu
 
2014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part12014 sept 26_thug_lambda_part1
2014 sept 26_thug_lambda_part1
Adam Muise
 
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the JobAkka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
Lightbend
 
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
Paolo Castagna
 
[WSO2Con USA 2018] The Rise of Streaming SQL
[WSO2Con USA 2018] The Rise of Streaming SQL[WSO2Con USA 2018] The Rise of Streaming SQL
[WSO2Con USA 2018] The Rise of Streaming SQL
WSO2
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
confluent
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
Ferran Galí Reniu
 
Data pipeline with kafka
Data pipeline with kafkaData pipeline with kafka
Data pipeline with kafka
Mole Wong
 
Down the event-driven road: Experiences of integrating streaming into analyti...
Down the event-driven road: Experiences of integrating streaming into analyti...Down the event-driven road: Experiences of integrating streaming into analyti...
Down the event-driven road: Experiences of integrating streaming into analyti...
inovex GmbH
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Flink Forward
 
SQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQSQL and Machine Learning on Hadoop using HAWQ
SQL and Machine Learning on Hadoop using HAWQ
pivotalny
 
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike LimcacoExtending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
Extending Analytic Reach - From The Warehouse to The Data Lake by Mike Limcaco
Data Con LA
 

More from HBaseCon (20)

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
HBaseCon
 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
HBaseCon
 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
HBaseCon
 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
HBaseCon
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
HBaseCon
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
HBaseCon
 
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
HBaseCon
 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
HBaseCon
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
HBaseCon
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
HBaseCon
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
HBaseCon
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
HBaseCon
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
HBaseCon
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
HBaseCon
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
HBaseCon
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
HBaseCon
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon
 
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
HBaseCon
 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
HBaseCon
 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
HBaseCon
 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
HBaseCon
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
HBaseCon
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
HBaseCon
 
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
HBaseCon
 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
HBaseCon
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
HBaseCon
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
HBaseCon
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
HBaseCon
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
HBaseCon
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
HBaseCon
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
HBaseCon
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
HBaseCon
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
HBaseCon
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon
 

Recently uploaded (20)

How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Auto Data Preparation in IBM SPSS Modeler.pptx
Auto Data Preparation in IBM SPSS Modeler.pptxAuto Data Preparation in IBM SPSS Modeler.pptx
Auto Data Preparation in IBM SPSS Modeler.pptx
Version 1 Analytics
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
uTorrent Pro Crack Latest Version free 2025
uTorrent Pro Crack Latest Version free 2025uTorrent Pro Crack Latest Version free 2025
uTorrent Pro Crack Latest Version free 2025
channarbrothers93
 
Greedy algorithm technique explained using minimal spanning tree(MST).pptx
Greedy algorithm technique explained using minimal spanning tree(MST).pptxGreedy algorithm technique explained using minimal spanning tree(MST).pptx
Greedy algorithm technique explained using minimal spanning tree(MST).pptx
riyalkhan462
 
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
How Coupang Leverages Distributed Cache to Accelerate ML Model TrainingHow Coupang Leverages Distributed Cache to Accelerate ML Model Training
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio, Inc.
 
Shift Left using Lean for Agile Software Development
Shift Left using Lean for Agile Software DevelopmentShift Left using Lean for Agile Software Development
Shift Left using Lean for Agile Software Development
SathyaShankar6
 
WTFAST Crack Latest Version FREE Downlaod 2025
WTFAST Crack Latest Version FREE Downlaod 2025WTFAST Crack Latest Version FREE Downlaod 2025
WTFAST Crack Latest Version FREE Downlaod 2025
channarbrothers93
 
Adobe Illustrator Crack | Free Download & Install Illustrator
Adobe Illustrator Crack | Free Download & Install IllustratorAdobe Illustrator Crack | Free Download & Install Illustrator
Adobe Illustrator Crack | Free Download & Install Illustrator
usmanhidray
 
AI Testing Tools Breakdown: Which One is Right for Your QA Needs?
AI Testing Tools Breakdown: Which One is Right for Your QA Needs?AI Testing Tools Breakdown: Which One is Right for Your QA Needs?
AI Testing Tools Breakdown: Which One is Right for Your QA Needs?
Shubham Joshi
 
Agentic AI Use Cases using GenAI LLM models
Agentic AI Use Cases using GenAI LLM modelsAgentic AI Use Cases using GenAI LLM models
Agentic AI Use Cases using GenAI LLM models
Manish Chopra
 
Minitab 22 Full Crack Plus Product Key Free Download [Latest] 2025
Minitab 22 Full Crack Plus Product Key Free Download [Latest] 2025Minitab 22 Full Crack Plus Product Key Free Download [Latest] 2025
Minitab 22 Full Crack Plus Product Key Free Download [Latest] 2025
wareshashahzadiii
 
SpyHunter Crack Latest Version FREE Download 2025
SpyHunter Crack Latest Version FREE Download 2025SpyHunter Crack Latest Version FREE Download 2025
SpyHunter Crack Latest Version FREE Download 2025
channarbrothers93
 
Avast Free Antivirus Crack FREE Downlaod 2025
Avast Free Antivirus Crack FREE Downlaod 2025Avast Free Antivirus Crack FREE Downlaod 2025
Avast Free Antivirus Crack FREE Downlaod 2025
channarbrothers93
 
MindMaster Crack Latest Version FREE Download 2025
MindMaster Crack Latest Version FREE Download 2025MindMaster Crack Latest Version FREE Download 2025
MindMaster Crack Latest Version FREE Download 2025
mahmadzubair09
 
CCleaner Pro Crack Latest Version FREE Download 2025
CCleaner Pro Crack Latest Version FREE Download 2025CCleaner Pro Crack Latest Version FREE Download 2025
CCleaner Pro Crack Latest Version FREE Download 2025
channarbrothers93
 
Adobe Photoshop CC 2025 Crack Full Serial Key With Latest
Adobe Photoshop CC 2025 Crack Full Serial Key  With LatestAdobe Photoshop CC 2025 Crack Full Serial Key  With Latest
Adobe Photoshop CC 2025 Crack Full Serial Key With Latest
usmanhidray
 
Bandicam Crack FREE Download Latest Version 2025
Bandicam Crack FREE Download Latest Version 2025Bandicam Crack FREE Download Latest Version 2025
Bandicam Crack FREE Download Latest Version 2025
channarbrothers93
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
Itop vpn crack Latest Version 2025 FREE Download
Itop vpn crack Latest Version 2025 FREE DownloadItop vpn crack Latest Version 2025 FREE Download
Itop vpn crack Latest Version 2025 FREE Download
mahnoorwaqar444
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
Auto Data Preparation in IBM SPSS Modeler.pptx
Auto Data Preparation in IBM SPSS Modeler.pptxAuto Data Preparation in IBM SPSS Modeler.pptx
Auto Data Preparation in IBM SPSS Modeler.pptx
Version 1 Analytics
 
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...Explaining GitHub Actions Failures with Large Language Models Challenges, In...
Explaining GitHub Actions Failures with Large Language Models Challenges, In...
ssuserb14185
 
uTorrent Pro Crack Latest Version free 2025
uTorrent Pro Crack Latest Version free 2025uTorrent Pro Crack Latest Version free 2025
uTorrent Pro Crack Latest Version free 2025
channarbrothers93
 
Greedy algorithm technique explained using minimal spanning tree(MST).pptx
Greedy algorithm technique explained using minimal spanning tree(MST).pptxGreedy algorithm technique explained using minimal spanning tree(MST).pptx
Greedy algorithm technique explained using minimal spanning tree(MST).pptx
riyalkhan462
 
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
How Coupang Leverages Distributed Cache to Accelerate ML Model TrainingHow Coupang Leverages Distributed Cache to Accelerate ML Model Training
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio, Inc.
 
Shift Left using Lean for Agile Software Development
Shift Left using Lean for Agile Software DevelopmentShift Left using Lean for Agile Software Development
Shift Left using Lean for Agile Software Development
SathyaShankar6
 
WTFAST Crack Latest Version FREE Downlaod 2025
WTFAST Crack Latest Version FREE Downlaod 2025WTFAST Crack Latest Version FREE Downlaod 2025
WTFAST Crack Latest Version FREE Downlaod 2025
channarbrothers93
 
Adobe Illustrator Crack | Free Download & Install Illustrator
Adobe Illustrator Crack | Free Download & Install IllustratorAdobe Illustrator Crack | Free Download & Install Illustrator
Adobe Illustrator Crack | Free Download & Install Illustrator
usmanhidray
 
AI Testing Tools Breakdown: Which One is Right for Your QA Needs?
AI Testing Tools Breakdown: Which One is Right for Your QA Needs?AI Testing Tools Breakdown: Which One is Right for Your QA Needs?
AI Testing Tools Breakdown: Which One is Right for Your QA Needs?
Shubham Joshi
 
Agentic AI Use Cases using GenAI LLM models
Agentic AI Use Cases using GenAI LLM modelsAgentic AI Use Cases using GenAI LLM models
Agentic AI Use Cases using GenAI LLM models
Manish Chopra
 
Minitab 22 Full Crack Plus Product Key Free Download [Latest] 2025
Minitab 22 Full Crack Plus Product Key Free Download [Latest] 2025Minitab 22 Full Crack Plus Product Key Free Download [Latest] 2025
Minitab 22 Full Crack Plus Product Key Free Download [Latest] 2025
wareshashahzadiii
 
SpyHunter Crack Latest Version FREE Download 2025
SpyHunter Crack Latest Version FREE Download 2025SpyHunter Crack Latest Version FREE Download 2025
SpyHunter Crack Latest Version FREE Download 2025
channarbrothers93
 
Avast Free Antivirus Crack FREE Downlaod 2025
Avast Free Antivirus Crack FREE Downlaod 2025Avast Free Antivirus Crack FREE Downlaod 2025
Avast Free Antivirus Crack FREE Downlaod 2025
channarbrothers93
 
MindMaster Crack Latest Version FREE Download 2025
MindMaster Crack Latest Version FREE Download 2025MindMaster Crack Latest Version FREE Download 2025
MindMaster Crack Latest Version FREE Download 2025
mahmadzubair09
 
CCleaner Pro Crack Latest Version FREE Download 2025
CCleaner Pro Crack Latest Version FREE Download 2025CCleaner Pro Crack Latest Version FREE Download 2025
CCleaner Pro Crack Latest Version FREE Download 2025
channarbrothers93
 
Adobe Photoshop CC 2025 Crack Full Serial Key With Latest
Adobe Photoshop CC 2025 Crack Full Serial Key  With LatestAdobe Photoshop CC 2025 Crack Full Serial Key  With Latest
Adobe Photoshop CC 2025 Crack Full Serial Key With Latest
usmanhidray
 
Bandicam Crack FREE Download Latest Version 2025
Bandicam Crack FREE Download Latest Version 2025Bandicam Crack FREE Download Latest Version 2025
Bandicam Crack FREE Download Latest Version 2025
channarbrothers93
 
EASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License CodeEASEUS Partition Master Crack + License Code
EASEUS Partition Master Crack + License Code
aneelaramzan63
 
Itop vpn crack Latest Version 2025 FREE Download
Itop vpn crack Latest Version 2025 FREE DownloadItop vpn crack Latest Version 2025 FREE Download
Itop vpn crack Latest Version 2025 FREE Download
mahnoorwaqar444
 

Apache HBase at Airbnb

  • 1. Apache HBase at Airbnb JINGWEI LU, LIYIN TANG, AND JASON ZHANG 1
  • 3. Event Logs MySQL Dumps Gold Cluster HDFS Hive Kafk a Sqoo p Silver Cluster Spark Cluster Spark ReAi r Airflow Scheduling S3 Presto Cluster AirPal Caravel Tableau Batch Infrastructure Yarn HDFS Hive Yarn Jingwei Lu, Liyin Tang, Jason Zhang 3
  • 4. Streaming at Airbnb Event Logging MySQL BINLOG Cluster HDFS Hive Spinal tap Presto Cluster Yarn Kafk a HBase Spark Streaming Datadog Druid Kafka Jingwei Lu, Liyin Tang, Jason Zhang 4
  • 6. Stateless Jingwei Lu, Liyin Tang, Jason Zhang Computation SinkSource DStream DF DF
  • 7. Stateful Jingwei Lu, Liyin Tang, Jason Zhang Computatio n Source DStream DF DF Sink1 Sink2 Sink N State Storage RDD
  • 8. Multiple Streams Jingwei Lu, Liyin Tang, Jason Zhang DataFrame Sink1 Process A Sink2 Sink3 SinkN … DataFrame Sink1 Process N Sink2 Sink3 SinkN … Source DStream Align by Time DataFram e DataFram e State Storage Source DStream …
  • 9. Streaming + Batch Jingwei Lu, Liyin Tang, Jason Zhang DataFrame Sink1 Process A Sink2 Sink3 SinkN … DataFram e State Storage Sourc e DStream Sourc e … Align by Time … DataFrame Sink1 Process A Sink2 Sink3 SinkN …
  • 11. AirStream Architecture Jingwei Lu, Liyin Tang, Jason Zhang Sources Stream #1 Stream #N Hive Tables HBase Tables Virtual Table Views for Computation Sinks … Customized ComputationSpark SQL Simple Config HBase Services Streaming Sources Druid
  • 12. AirStream Architecture Jingwei Lu, Liyin Tang, Jason Zhang Sources Stream #1 Stream #N Hive Tables HBase Tables Virtual Table Views for Computation Sinks … Customized ComputationSpark SQL HBase Services Streaming Sources Druid Same Computation for Batch processing
  • 14. Jingwei Lu, Liyin Tang, Jason Zhang State Store • Merge changes • Provide fast lookup • Fast persistent storage across streaming and batch jobs 14
  • 15. Why HBase Jingwei Lu, Liyin Tang, Jason Zhang Rich Functionalities Rich Integration with Hadoop EcoSystem Easy Management Strong Community Reliable and Scalable
  • 16. HBase State Store Operators in Airstream Jingwei Lu, Liyin Tang, Jason Zhang 16 Full Table Scan Simple Aggregation Bulk Upload Key/Prefix Lookup Update
  • 17. Jingwei Lu, Liyin Tang, Jason Zhang Computation DAG 17 Input Data Left Outer Join Result Key Lookup
  • 18. Jingwei Lu, Liyin Tang, Jason Zhang Key Space Design • Hash partition key space for load balance • Composite key for K -> V • Support full key lookup • Prefix lookup supported for all keys used in hash function Hash key1 key2 key3 Hash based on key prefix Hash key1 key2 Lookup based on key prefix key1 = ‘value1’ and key2 = ‘value2’ 18
  • 19. • Partition based on key before write • Use bulk upload for large volume update Write Performance Jingwei Lu, Liyin Tang, Jason Zhang 19
  • 20. Case Study Jingwei Lu, Liyin Tang, Jason Zhang Experiment realtime feedback Update Experiment Assignment Event Lookup HBase with TTL Booking Event Druid Datado g 20 one airstream job
  • 22. Realtime Ingestion on HBase Data Infrastructure MySQL Analytica l Events Kafka Spark Streamin g HBase HDFS Presto/Hive/Spar k Source Inges t Realtime Query Snapsh ot Batch Query Jingwei Lu, Liyin Tang, Jason Zhang 22
  • 23. Access Data in HBase Jingwei Lu, Liyin Tang, Jason Zhang HBase Hive Presto Spark SQL Spark Streaming Batch Jobs Interactive Query Streaming HDFS Snapshot Table Mapping/Unifed View on realtime data 23
  • 24. Snapshot & Reseed Jingwei Lu, Liyin Tang, Jason Zhang HBase HDFS Snapshot(HFile Links) Bulk Upload 24
  • 25. Case Study 1: Events Ingestion Jingwei Lu, Liyin Tang, Jason Zhang Kafka topic … topic topic Spark Executor 1 … Executor 2 Executor K HBase DeDup HDFS Region1…Region2R egion M Daily Snapshot Realtime Query Hive Presto Events Partition 25
  • 26. Case Study 2: Streaming DB Export KafkaRDS Table1 … Spinalta p. Table1 … Table2 TableN Spinaltap. Table2 Spinaltap. TableN Spark Executor 1 … Executor2 Executor K HBase Region1 … Region2 Region M HDFS Region1…Region2R egion M Daily Snapshot Realtime Query Jingwei Lu, Liyin Tang, Jason Zhang 26
  • 27. Case Study: Streaming DB Export Rows CF: Colums Version Value <ShardKey><DB_TABLE_#1><PK_a=A> id Fri May 19 00:33:19 2016 101 <ShardKey><DB_TABLE_#1><PK_a=A> city Fri May 19 00:33:19 2016 San Francisco <ShardKey><DB_TABLE_#1><PK_a=A> city Fri May 10 00:34:19 2016 New York <ShardKey><DB_TABLE_#2><PK_a=A’> id Fri May 19 00:33:19 2016 1 Jingwei Lu, Liyin Tang, Jason Zhang 27
  • 28. Case Study: Streaming DB Export TXN 1 Commit_TS: 101 … TXN 2 Commit_TS: 102 TXN 3 Commit_TS: 103 TXN N Commit_TS: N’ Binlog Order Jingwei Lu, Liyin Tang, Jason Zhang 28
  • 29. Case Study: Streaming DB Export TXN 1 Commit_TS: 101 … TXN 2 Commit_TS: 103 TXN 3 Commit_TS: 102 TXN N Commit_TS: N’ NTP Binlog Order Jingwei Lu, Liyin Tang, Jason Zhang 29
  • 30. Case Study: Streaming DB Export TXN 1 Commit_TS: 101 … Binlog Order TXN 2 Commit_TS: 103 TXN 3 Commit_TS: 102 TXN N Commit_TS: N’ Point-in-Time Restore on TS 102 Jingwei Lu, Liyin Tang, Jason Zhang 30
  • 31. Case Study: Streaming DB Export Rows CF: Colums Version Value <ShardKey><DB_TABLE_#1><PK_a=A> id bin100 101 <ShardKey><DB_TABLE_#1><PK_a=A> city bin101 San Francisco <ShardKey><DB_TABLE_#1><PK_a=A> city bin102 New York <ShardKey><DB_TABLE_#2><PK_a=A’> id bin100 1 Jingwei Lu, Liyin Tang, Jason Zhang 31
  • 32. Case Study: Streaming DB Export Rows Version (Logical Offset) Value <ShardKey><DB_TABLE_#1><2016-05-23 23><100> 100 mysql-bin.00000:100 <ShardKey><DB_TABLE_#1><2016-05-23 23><101> 101 mysql-bin.00000:101 <ShardKey><DB_TABLE_#1><2016-05-23 23><103> 103 mysql-bin.00000:103 <ShardKey><DB_TABLE_#1><2016-05-24 00><102> 102 mysql-bin.00000:102 Jingwei Lu, Liyin Tang, Jason Zhang 32
  • 33. Case Study: Streaming DB Export Rows Version (Logical Offset) Value <ShardKey><DB_TABLE_#1><2016-05-23 23><100> 100 mysql-bin.00000:100 <ShardKey><DB_TABLE_#1><2016-05-23 23><101> 101 mysql-bin.00000:101 <ShardKey><DB_TABLE_#1><2016-05-23 23><103> 103 mysql-bin.00000:103 <ShardKey><DB_TABLE_#1><2016-05-24 00><102> 102 mysql-bin.00000:102 Jingwei Lu, Liyin Tang, Jason Zhang 33
  • 34. Summary Jingwei Lu, Liyin Tang, Jason Zhang Scalable and Reliable Rich Stateful Computation Rich Integration with Hadoop EcoSystem Easy Operation
  • 35. 35

Editor's Notes

  • #4: *Disaster recovery *High Slow SLA job isolation
  • #15: Slide why Stateful process vs stateless
  • #17: Use diagram to show operators
  • #22: Realtime ingestion provides fast feedback loop. Advanced monitoring infrastructure Tracking changes instead of full snapshot for RDS dump
  • #23: What is the goal of realtime ingestion: *fast feedback loop for experiment to reduce testing cycle *provide realtime view of production database for many offline workload(for example, machine learning)
  • #24: Table mapping provide a unified view to access realtime ingested data.
  • #25: For snapshot using scan it takes 10-30 minutes per table. This does not scale. Take 10 minutes to do the link and restore. All tables can be accessed afterward.
  • #27: Backup based db export restore takes 9 - 12 hours and it is subject to AWS network situation. Long latency and fragile. We just need to track changes and apply to snapshot. Provide near realtime snapshot of db. Unify across mysql and dynamodb