SlideShare a Scribd company logo
1 of 24
Dynamic Columns for SQL on
NoSQL Data
(Incentives in HBase)
June 13, 2017
Anil Gupta
Amey Hegde
2
Agenda
• Overview of Incentives
• Components
• Why Phoenix?
• HBase Data Model
• Performance Tuning
• Learnings
3
About TrueCar
• TrueCar is an online marketplace for buying and selling
cars.
• We are dedicated to being the most transparent brand in
automotive industry.
• We show consumers what others paid for the car they
want, so they can recognize a fair price.
4
Example of Incentives
5
List of various Incentives
6
Overview of Incentives
IncentiveId TrimId Amount New/Used 94017 94121 90401
30076 30610 1000 N 1 1 1
29653 28565 779 U 1
16455 24981 1200 N 1
An Incentive can be active from 1-40K Postal Codes.
So, dynamic columns will be good.
Postal Code
7
Overview of Incentives
● Historical Incentives data
 Snapshot or history of incentives over last 18 months
 Used in internal analytics jobs
● Current Incentives data
 Latest OEM incentives to customers
 Used to published to the website
Old Pipeline Dataflow Overview
8
[Database] Data from all
sources
[Sqoop] Dumps data from
sql server to HDFS [Pig] Joins multiple data-
sets
HDFS
ES
[Mapper] Highly nested
Avro/JSON data
9
Shortcomings of Old Pipeline
● Interference of backend job with live traffic
● Scalability with Elastic Search
● Reads were complex
● Nested dataset increases post processing time
10
Incentive Components
● HBase: Datastore for Historical Incentives
● Phoenix : SQL layer to operate on HBase.
● Elasticsearch: Stores current Incentive data for Front End.
● MapReduce: Computation engine for Incentives
● Avro: Serialization library for storing data on HDFS
111111
Why Phoenix ?
● Easy to use across all disciplines (multiple teams/roles)
● Standard SQL API and JDBC connection
● Dynamic Column Feature
● Fully integrated with Hadoop Ecosystem
New Pipeline Dataflow Overview
12
[Database] Data from all
sources
[Sqoop] Dumps data from
sql server to HDFS [Pig] Joins multiple data-
sets
HDFSHbase [Mapper] De-normalize
then parse and validate
record
Ingestion Logic in Mapper
13
Incoming
Record
Insert
Finish
Check
YesNo
Update
14
Column Family Description Versions
S • Stores all static columns 1
D • Stores dynamic columns for
postal code
1
Table: INCENTIVES
Row Key:
<TRIM_ID><SNAPSHOT_START_DATE><VALUE_TYPE><INCENTIVE_ID>
Initial Data Model
15
Initial Performance
0
10
20
30
40
50
60
M
i
n
u
t
e
s
Transaction Records
New
Pipeline(HBase/Phoenix)
Old Pipeline
(Elasticsearch)
16
Column Family Description Versions
S • Stores all static columns 1
E • Stores dynamic columns for even
scheme of postal code
1
O • Stores dynamic columns for odd
scheme of postal code
1
Table: INCENTIVES
Row Key:
<TRIM_ID><SNAPSHOT_START_DATE><VALUE_TYPE><INCENTIVE_ID>
HBase Data Model
17
Sample Select Query
SELECT * FROM HIST_INCENTIVES (O.P90401
INTEGER) WHERE TRIM_ID = 30070 AND
SNAPSHOT_START_DATE= 1497386726 AND
VALUE_TYPE = ’CUSTOMERCASH’ AND P90401=1
18
HBase Tuning
● Split postal code data into 2 column families(even/odd)
● Added bloom filter to Row-Columns
● Splitting regions
● Evenly distributed data across region servers
● Time to live (TTL) = 540 days
● Region size 8-10 GB
19
Performance after tuning
0
10
20
30
40
50
60
M
i
n
u
t
e
s
Transaction Records
New
Pipeline(HBase/Phoenix)
Old Pipeline
(Elasticsearch)
2.6x
performance
gain
20
Description Old Pipeline New Pipeline
Data Ingestion • Write to Elasticsearch
• MapReduce Job
• 32-35min
• Normalized data-set
• Write to HBase using Phoenix
• Map only Job
• 48-50 min
• De-normalized data-set
Data Retrieval • Read from Elasticsearch
• 48-50 min
• Sequential action for all five
different provider
• Read from HBase using Phoenix
• 18-20 min
• Parallel action for all provider
Performance Testing Results
2121
Random Facts
● Affects approximately 2.536 billion cells in each run
● Data retrieval performance is improved by 80%
● Data duplication was eliminated from the pipeline
● Post processing after data retrieval is negligible due to de-
normalized data
2222
Unit Testing
● Create HBase minicluster
● Establish phoenix connection
● Create a HBase table
● Created various test suites to validate all the use cases
2323
Summary
● Improvement in performance of analytical jobs that use
Historical Incentives
● Higher Scalability with new architecture of Historical
Incentives
● Eliminated intervention of offline jobs with live traffic
Thanks!
Questions?
24

More Related Content

What's hot

What’s new in MariaDB ColumnStore
What’s new in MariaDB ColumnStoreWhat’s new in MariaDB ColumnStore
What’s new in MariaDB ColumnStoreMariaDB plc
 
Performance tuning ColumnStore
Performance tuning ColumnStorePerformance tuning ColumnStore
Performance tuning ColumnStoreMariaDB plc
 
HBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at XiaomiHBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at XiaomiHBaseCon
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBaseCon
 
How QBerg scaled to store data longer, query it faster
How QBerg scaled to store data longer, query it fasterHow QBerg scaled to store data longer, query it faster
How QBerg scaled to store data longer, query it fasterMariaDB plc
 
ScyllaDB's Avi Kivity on UDF, UDA, and the Future
ScyllaDB's Avi Kivity on UDF, UDA, and the FutureScyllaDB's Avi Kivity on UDF, UDA, and the Future
ScyllaDB's Avi Kivity on UDF, UDA, and the FutureScyllaDB
 
HBase, crazy dances on the elephant back.
HBase, crazy dances on the elephant back.HBase, crazy dances on the elephant back.
HBase, crazy dances on the elephant back.Roman Nikitchenko
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践HBaseCon
 
ImpalaToGo and Tachyon integration
ImpalaToGo and Tachyon integrationImpalaToGo and Tachyon integration
ImpalaToGo and Tachyon integrationDavid Groozman
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at NeteaseHBaseCon
 
Performance Testing: Scylla vs. Cassandra vs. Datastax
Performance Testing: Scylla vs. Cassandra vs. DatastaxPerformance Testing: Scylla vs. Cassandra vs. Datastax
Performance Testing: Scylla vs. Cassandra vs. DatastaxScyllaDB
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...Cloudera, Inc.
 
What to expect from MariaDB Platform X5, part 2
What to expect from MariaDB Platform X5, part 2What to expect from MariaDB Platform X5, part 2
What to expect from MariaDB Platform X5, part 2MariaDB plc
 
in-memory database system and low latency
in-memory database system and low latencyin-memory database system and low latency
in-memory database system and low latencyhyeongchae lee
 
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...Cloudera, Inc.
 

What's hot (20)

What’s new in MariaDB ColumnStore
What’s new in MariaDB ColumnStoreWhat’s new in MariaDB ColumnStore
What’s new in MariaDB ColumnStore
 
Performance tuning ColumnStore
Performance tuning ColumnStorePerformance tuning ColumnStore
Performance tuning ColumnStore
 
HBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at XiaomiHBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at Xiaomi
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
 
How QBerg scaled to store data longer, query it faster
How QBerg scaled to store data longer, query it fasterHow QBerg scaled to store data longer, query it faster
How QBerg scaled to store data longer, query it faster
 
ScyllaDB's Avi Kivity on UDF, UDA, and the Future
ScyllaDB's Avi Kivity on UDF, UDA, and the FutureScyllaDB's Avi Kivity on UDF, UDA, and the Future
ScyllaDB's Avi Kivity on UDF, UDA, and the Future
 
HBase, crazy dances on the elephant back.
HBase, crazy dances on the elephant back.HBase, crazy dances on the elephant back.
HBase, crazy dances on the elephant back.
 
OLAP
OLAPOLAP
OLAP
 
Apache Gobblin at MZ
Apache Gobblin at MZApache Gobblin at MZ
Apache Gobblin at MZ
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
ImpalaToGo and Tachyon integration
ImpalaToGo and Tachyon integrationImpalaToGo and Tachyon integration
ImpalaToGo and Tachyon integration
 
141060753008 3715302
141060753008 3715302141060753008 3715302
141060753008 3715302
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
 
Performance Testing: Scylla vs. Cassandra vs. Datastax
Performance Testing: Scylla vs. Cassandra vs. DatastaxPerformance Testing: Scylla vs. Cassandra vs. Datastax
Performance Testing: Scylla vs. Cassandra vs. Datastax
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
 
What to expect from MariaDB Platform X5, part 2
What to expect from MariaDB Platform X5, part 2What to expect from MariaDB Platform X5, part 2
What to expect from MariaDB Platform X5, part 2
 
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
 
in-memory database system and low latency
in-memory database system and low latencyin-memory database system and low latency
in-memory database system and low latency
 
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
 

Similar to Dynamic Columns of Phoenix for SQL on Sparse(NoSql) Data

Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkDataWorks Summit
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at HuaweiHBaseCon
 
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedIs Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedRevolution Analytics
 
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward
 
TPC-H Column Store and MPP systems
TPC-H Column Store and MPP systemsTPC-H Column Store and MPP systems
TPC-H Column Store and MPP systemsMostafa Mokhtar
 
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solutionhbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index SolutionMichael Stack
 
Skills Portfolio
Skills PortfolioSkills Portfolio
Skills Portfoliorolee23
 
Overview of business intelligence
Overview of business intelligenceOverview of business intelligence
Overview of business intelligenceAhsan Kabir
 
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...StampedeCon
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modelingvivekjv
 
MIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresMIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresSteven Johnson
 
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalyticsconf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalyticsTom LaGatta
 
The_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdf
The_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdfThe_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdf
The_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdfDotInsight1
 
BI Portfolio
BI PortfolioBI Portfolio
BI Portfoliotcomeaux
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesSingleStore
 
Big Data, Bigger Analytics
Big Data, Bigger AnalyticsBig Data, Bigger Analytics
Big Data, Bigger AnalyticsItzhak Kameli
 

Similar to Dynamic Columns of Phoenix for SQL on Sparse(NoSql) Data (20)

Why and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on FlinkWhy and how to leverage the simplicity and power of SQL on Flink
Why and how to leverage the simplicity and power of SQL on Flink
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
 
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results RevealedIs Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed
 
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
Flink Forward San Francisco 2018: Fabian Hueske & Timo Walther - "Why and how...
 
ITReady DW Day2
ITReady DW Day2ITReady DW Day2
ITReady DW Day2
 
TPC-H Column Store and MPP systems
TPC-H Column Store and MPP systemsTPC-H Column Store and MPP systems
TPC-H Column Store and MPP systems
 
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solutionhbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solution
 
Teradata a z
Teradata a zTeradata a z
Teradata a z
 
Skills Portfolio
Skills PortfolioSkills Portfolio
Skills Portfolio
 
Overview of business intelligence
Overview of business intelligenceOverview of business intelligence
Overview of business intelligence
 
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
Enabling Key Business Advantage from Big Data through Advanced Ingest Process...
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
MIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresMIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome Measures
 
Value Stream Maps
Value Stream MapsValue Stream Maps
Value Stream Maps
 
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalyticsconf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
conf2015_TLaGatta_CHarris_Splunk_BusinessAnalytics_DeliveringHighLevelAnalytics
 
The_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdf
The_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdfThe_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdf
The_Case_for_Single_Node_Systems_Supporting_Large_Scale_Data_Analytics (1).pdf
 
BI Portfolio
BI PortfolioBI Portfolio
BI Portfolio
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming Architectures
 
Transaction processing
Transaction processingTransaction processing
Transaction processing
 
Big Data, Bigger Analytics
Big Data, Bigger AnalyticsBig Data, Bigger Analytics
Big Data, Bigger Analytics
 

Recently uploaded

Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 

Recently uploaded (20)

Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 

Dynamic Columns of Phoenix for SQL on Sparse(NoSql) Data

  • 1. Dynamic Columns for SQL on NoSQL Data (Incentives in HBase) June 13, 2017 Anil Gupta Amey Hegde
  • 2. 2 Agenda • Overview of Incentives • Components • Why Phoenix? • HBase Data Model • Performance Tuning • Learnings
  • 3. 3 About TrueCar • TrueCar is an online marketplace for buying and selling cars. • We are dedicated to being the most transparent brand in automotive industry. • We show consumers what others paid for the car they want, so they can recognize a fair price.
  • 5. 5 List of various Incentives
  • 6. 6 Overview of Incentives IncentiveId TrimId Amount New/Used 94017 94121 90401 30076 30610 1000 N 1 1 1 29653 28565 779 U 1 16455 24981 1200 N 1 An Incentive can be active from 1-40K Postal Codes. So, dynamic columns will be good. Postal Code
  • 7. 7 Overview of Incentives ● Historical Incentives data  Snapshot or history of incentives over last 18 months  Used in internal analytics jobs ● Current Incentives data  Latest OEM incentives to customers  Used to published to the website
  • 8. Old Pipeline Dataflow Overview 8 [Database] Data from all sources [Sqoop] Dumps data from sql server to HDFS [Pig] Joins multiple data- sets HDFS ES [Mapper] Highly nested Avro/JSON data
  • 9. 9 Shortcomings of Old Pipeline ● Interference of backend job with live traffic ● Scalability with Elastic Search ● Reads were complex ● Nested dataset increases post processing time
  • 10. 10 Incentive Components ● HBase: Datastore for Historical Incentives ● Phoenix : SQL layer to operate on HBase. ● Elasticsearch: Stores current Incentive data for Front End. ● MapReduce: Computation engine for Incentives ● Avro: Serialization library for storing data on HDFS
  • 11. 111111 Why Phoenix ? ● Easy to use across all disciplines (multiple teams/roles) ● Standard SQL API and JDBC connection ● Dynamic Column Feature ● Fully integrated with Hadoop Ecosystem
  • 12. New Pipeline Dataflow Overview 12 [Database] Data from all sources [Sqoop] Dumps data from sql server to HDFS [Pig] Joins multiple data- sets HDFSHbase [Mapper] De-normalize then parse and validate record
  • 13. Ingestion Logic in Mapper 13 Incoming Record Insert Finish Check YesNo Update
  • 14. 14 Column Family Description Versions S • Stores all static columns 1 D • Stores dynamic columns for postal code 1 Table: INCENTIVES Row Key: <TRIM_ID><SNAPSHOT_START_DATE><VALUE_TYPE><INCENTIVE_ID> Initial Data Model
  • 16. 16 Column Family Description Versions S • Stores all static columns 1 E • Stores dynamic columns for even scheme of postal code 1 O • Stores dynamic columns for odd scheme of postal code 1 Table: INCENTIVES Row Key: <TRIM_ID><SNAPSHOT_START_DATE><VALUE_TYPE><INCENTIVE_ID> HBase Data Model
  • 17. 17 Sample Select Query SELECT * FROM HIST_INCENTIVES (O.P90401 INTEGER) WHERE TRIM_ID = 30070 AND SNAPSHOT_START_DATE= 1497386726 AND VALUE_TYPE = ’CUSTOMERCASH’ AND P90401=1
  • 18. 18 HBase Tuning ● Split postal code data into 2 column families(even/odd) ● Added bloom filter to Row-Columns ● Splitting regions ● Evenly distributed data across region servers ● Time to live (TTL) = 540 days ● Region size 8-10 GB
  • 19. 19 Performance after tuning 0 10 20 30 40 50 60 M i n u t e s Transaction Records New Pipeline(HBase/Phoenix) Old Pipeline (Elasticsearch) 2.6x performance gain
  • 20. 20 Description Old Pipeline New Pipeline Data Ingestion • Write to Elasticsearch • MapReduce Job • 32-35min • Normalized data-set • Write to HBase using Phoenix • Map only Job • 48-50 min • De-normalized data-set Data Retrieval • Read from Elasticsearch • 48-50 min • Sequential action for all five different provider • Read from HBase using Phoenix • 18-20 min • Parallel action for all provider Performance Testing Results
  • 21. 2121 Random Facts ● Affects approximately 2.536 billion cells in each run ● Data retrieval performance is improved by 80% ● Data duplication was eliminated from the pipeline ● Post processing after data retrieval is negligible due to de- normalized data
  • 22. 2222 Unit Testing ● Create HBase minicluster ● Establish phoenix connection ● Create a HBase table ● Created various test suites to validate all the use cases
  • 23. 2323 Summary ● Improvement in performance of analytical jobs that use Historical Incentives ● Higher Scalability with new architecture of Historical Incentives ● Eliminated intervention of offline jobs with live traffic

Editor's Notes

  1. To change OPENING SLIDE background image (placing image inside shape): This must be done on the MASTER LAYOUT: “COVER” Go to “Slide Master View”.  Right-Click on current background image In pop-up display select  "Format Picture“ Below “SHAPE OPTIONS” and under “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” If necessary … Select Crop Tool drop down and select “Fit” (to insure image is not distorted) If necessary … Select Crop Tool again to resize and position image inside shape
  2. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  3. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  4. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  5. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  6. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  7. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  8. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  9. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  10. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  11. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  12. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  13. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  14. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  15. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  16. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  17. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  18. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  19. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  20. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  21. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  22. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  23. To change background image on this slide: Right-Click on current background image In pop-up display select  "Format Background“ Below “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” To change title of the deck in the footer (lower right): Go to “Slide Master View”.  Select to the SLIDE MASTER. (The large slide with “1”) In lower right corner text box select all the current title text and replace with new text. Capitalize each word.
  24. To change SECTION BREAK SLIDE background image (placing image inside shape): This must be done on the MASTER LAYOUT: “SECTION#0?”. There are 5 “SECTION” master layouts with different background images. Go to “Slide Master View”.  Right-Click on current background image In pop-up display select  "Format Picture“ Below “SHAPE OPTIONS” and under “FILL”, Select “Picture or texture fill“ Below “Insert picture from” select “File” Locate your replacement image where stored on your computer. Click “Insert” If necessary … Select Crop Tool drop down and select “Fit” (to insure image is not distorted) If necessary … Select Crop Tool again to resize and position image inside shape