Beruflich Dokumente
Kultur Dokumente
Previously @
Senior Engineer - Data Infrastructure
Previously @
Engineer - News Feed Team
We are a personal styling company
Elasticsearch is unique
1. Cost in $$$
2. Operational Burden
3. Effectiveness of a hosted solution
Size of the Dataset - Cost $$$
Cost will scale linearly based off of the size of your dataset
The slope and intercept of the cost functions are different tho
Size of the Dataset - Cost $$$
Elastic Cloud (1 node cluster w/ 16GB mem) $0.5221 / hour = $382 / month
! The larger your dataset, the more maintenance the cluster requires
! A small cluster can be operated more manually
! With bigger clusters, maintenance needs to be automated
! A very small cluster can be easily hosted on premise but is also cheap to buy
from a hosting service
! Other factors, like time-frame, will matter more if the dataset is small
Size of the Dataset - Effectiveness of a Hosted Solution
Regular datasets:
! Documents have a well defined set of fields
Irregular datasets:
! Documents can have any field, lots of dynamic fields
! Data is not evenly distributed across the cluster. Hot spots may occur
Irregularity of the data set
! Finding these issues, understanding, and fixing them requires an in-depth knowledge of ES
! You also need access to detailed dashboards and monitoring tools
! Regardless of on-premise vs. SaaS, you need to know how to model your data, how to structure your
indexes, ES best-practices, dos and don’ts, etc.
! If your data is irregular, you need an understanding of ES and its internals in order to ensure the stability
of your cluster
! Developing this knowledge is half the hurdle when it comes to operating and hosting ES yourself
! If your data is highly irregular, you could bite the bullet and host ES yourself
Irregularity of the data set
! On the other hand, paying someone to host for you will offload the burden of having to know anything at
all about Elasticsearch
Skills of your team
Skills of your team
! Is your team mostly front end engineers? You probably want hosted ES
! Is your team very small and strapped for cycles? Go with hosted ES
! Your team is small, strapped for cycles, and strapped for cash? Maybe on-premise
DevOps Skills
! If you are considering using Elasticsearch, you already have a software stack
! Software built in-house, you already maintain, monitor, deploy, upgrade and operate
! You’ve maybe already invested in devops to some degree
! If you already have strong devops chops, the upfront cost of deploying your own infrastructure is much less
! For example, if your stack is mainly JVM with high SLAs, you likely already have the tools necessary to run
your own ES cluster
Timeframe
Timeframe
! If search is central to your business, having a better ES deployment than your competitors gives you a
competitive advantage
! If you host ES on-prem, it is easier to get more involved with the open source community
! Easier to write and submit patches upstream
! If search is important to the business, it may be valuable to play an active role in open-source
! If you need search to power an internal tool, the ROI is less and a hosted solution could be better
Security Considerations
Security Considerations
! If you are storing sensitive data (such as PII) in your ES cluster, you may not want to send the data to a
third party.
! Some regulations can make it harder to work with hosted solutions, i.e. HIPAA, EU GDPR
! If the data in your ES cluster was breached, how bad would it be for your company?
! How much bureaucracy is involved when you decide to work with a hosted solution?
! Would it be faster and easier to just host it yourself?
SLA Requirements
SLA Requirements
! Many ES stability issues happen because “you shot yourself in the foot”.
! They are self-inflicted by your query/indexing patterns, sharding, document structure, etc.
! If uptime is extremely important, hosting ES yourself may yield a more stable cluster
Some Examples
When hosted ES makes sense
1. ELK stack for a small to moderately sized team
or department
2. To power search for a small and predictable
dataset, i.e. inventory in a warehouse, blog
posts, etc.
3. When prototyping a new search feature or
product
4. To power search for an internal tool
When on-premise ES makes sense
1. You’re launching a search-centric business or
product line, i.e. Yelp
2. Company-wide ELK deployment for mid to large
sized company
3. You’re building an app that searches highly
sensitive data
Why my team uses hosted ES
! On-prem ES will perform better and be more stable, especially for large clusters
! Elasticsearch is not like other kinds of infrastructure. Do not expect plug & play
Some armchair thoughts…
Many open-source companies (Elastic included) are increasingly relying on hosted solutions
for revenue
They are incentivized to convince you that hosting their software is onerous
They are incentivized to get you to upgrade and use more resources
They are incentivized to make it difficult for you to host their software
Always think critically, consider your options deeply and understand the ROI
Thanks for coming!
Tweet me at @zzbennett
E-mail me at elizabeth.bennett@stitchfix.com