Beruflich Dokumente
Kultur Dokumente
there are Enterprise BI tools like SAS, Cognos, and Microstrategy, QlikView etc who are
enterprise BI tools and have good Hadoop compatibility. Then there are Hadoop specific
visualization tools like HUNK, Datameer, and Platfora etc. that are meant specifically for
visualization for Hadoop data. Thirdly there are open source tools like Pentaho, BIRT, and
Jaspersoft that have been early adopters of Hadoop and probably have made more investment
in Hadoop compatibility than some of the biggies. Finally there are charting libraries like
RShiny, D3.js, Highcharts etc that are mostly open source / low cost and have good
visualization but require scripting and coding. As you move from first category to fourth
category, costs of the software licenses goes on reducing and ease of development and self
service capabilities also go on reducing. There are some exceptions to this general trend
though.
2. Your existing BI tool - Most probably your company is already using some BI tool or the
other. You may have SAS, Microstrategy, IBM Cognos, and OBIEE in your company. Most
of these tools have made tremendous investment in enhancing their tools for compatibility
with Hadoop ecosystem. They have connectors for Hadoop and NoSQL databases, graphical
tools are available. It may be easy for the end users to use something that they are already
using. Think of using your existing BI tool for Hadoop data visualization unless there are
obvious drawbacks in it.
3. Hadoop distribution used - If you are using Hadoop distribution from say Cloudera or
Hortonworks, you can safely select tools that are certified by these distributors of Hadoop.
For example, Tableau, Microstrategy, Pentaho, QlikView are all certified by Cloudera and
have proven connectors to Cloudera distribution of Hadoop. Similarly, most of these tools are
partners of Hortonworks also. In case your Big Data platform is IBM BigInsights, then going
for Cognos makes sense since being IBM products, compatibility will not be an issue. It is
always advisable to check if the tool you are selecting for visualization is certified by the
Hadoop distribution being used.
4. Nature of data - If the data you want to analyze is tabular, columnar data then most of the
tools are capable of providing visualization facilities. However if the data is say log data
special purpose charting libraries like timeplot may be good option. Similarly, for social
mediadata, tools like Zoomdata provide better visualization capabilities.
5. End user profile - Who are your end users? Are they data scientists? Then a visualization
tool with very high end visualization patterns will be required. If operational business users
(such as sales managers, finance managers) are end users, then more than advanced
visualization, speed of delivery and cost of tool (since number of users may be very high) is
important.
6. Programming skills available - If you have good Java and JavaScript skills available in
house, going for scripting based tools makes sense. Also, if you are an R shop and have good
R programming capabilities, RShiny can be a good alternative. Standard BI tools such as
Microstrategy, Pentaho on the other hand allow writing SQL on top of Hadoop data. Tools
like Datameer are schema free and drag and drop tools. So in short, each tool comes with its
own set of programming skill requirements and you need to make sure these requirements are
compatible with programming skills available in house.
7. Operating system - It is a basic checkbox while selecting tools for visualization. We
come across customers who use Linux platforms only and using Windows based tools like
QlikView, Tableau, Microsoft BI is not possible in this case. Also, if you are planning an
implementation on cloud, make sure your cloud provider can provide OS required by the
visualization tool.
8. Visualization features required - Traditional BI tools that have added Hadoop capabilities
are more mature compared to new entrants in providing visualization patterns commonly
required. For example, multiple Y axes, support for HTML5 and animation, user friendly drill
down are some features that are very mature in traditional BI tools, but still evolving in new
entrants, open source BI tools and some charting libraries. It is advisable to compare your
visualization needs to capabilities offered by the tools.
9. Data Volume - Data volume and streaming nature of data is important consideration
especially if you are thinking of an in memory architecture visualization tool. If your Hadoop
data store has Terabytes of data, data is being added real time and you plan to use in memory
visualization tool then you need to think of mechanism to reduce the volume and feed data
continuously from Hadoop to the visualization tool. This is possible, but not very simple. Be
aware of impact of real time high volume data on in memory architecture.
10. Industry experience - It is always advisable to depend on dominant players in your
industry vertical. SAS for example has been used by banks in analyzing big data for customer
intelligence and risk management. In cases like this, the availability of underlying algorithms
and visualization patterns makes the big data project implementation much easier.
All of these factors need to be carefully thought after. Some of them like operating system
seem to be no brainer, but I have seen companies make an oversight and select visualization
tool that later needed to be changed. After shortlisting visualization alternatives considering
these factors, you are ready for next step in the journey of building visualization platform for
big data and that step is initiating a Proof of Concept. More about learnings from data
visualization Proof of Concept in the next blog.
Data volume, variety, and velocity. Its worth mentioning QlikView has the flexibility to be
developed in small cycles so theres an opportunity to develop in lieu of requirements.
Although QlikView is well suited to this style of development I wouldnt suggest theyre
missed completely.
Other QlikView functions like document chaining and binary load accelerate the process of
exploring very large data sets. This is the path taken by many QlikView customers when
analyzing terabytes of data stored in data warehouses or Hadoop clusters and similar
archiving systems.
This hybrid approach offers business users the possibility to benefit from Big Data with no
programming skills and the ability to add context and insight while drilling-down to granular
details.
Users require only aggregated or summary Users require access to record-level of detail
data, i.e. hourly or daily averages, or record stored in a large fact table that will not fit inlevel detail over a limited time period.
memory.
QlikView analyze 50 billion rows worth of retail data in seconds. In this powerful demo,
both QlikView and ParStream are running in Amazon's cloud. You'll see how QlikView is
able to connect to ParStream database in real time using Direct Discovery.
Please refer below Link for demo app:
https://www.youtube.com/watch?v=xSFbCtV7__8
"Big Data"
The term "Big Data" has been thrown around for several years, and yet it continues to have a
very vague definition. In fact, there are no two big data installations and configurations alike
insert snowflake paradigm here. Its no surprise, given the unique nature of big data, it
cannot be forced into an abstract model. These type of data systems evolve organically, and
morph based on the ever changing business requirements.
If we accept that no two big data systems are alike, how can one deliver analytics from
those systems with a singular approach?
Picking one and only one method of analysis prevents the basic question What problem is
the business user trying to solve? from being answered. So what do I mean by picking one
version of analysis?
These solutions have their place, but to pick only one greatly limits a users ability to
succeed, especially when the limits of each solution are reached.
So how does Qlik differentiate itself from the narrow approaches and tools that exist in
the market?
Simple answer, variety. Qlik is in a unique position that offers a set of techniques and
strategies that allow the widest range of capabilities within a big data ecosystem.
Below are some of the approaches Qlik addresses the big data community with:
In-Memory Analytics: Get the data you need and accelerate it, which provides a
great solution for concepts such as data lakes. Qlik creates a Synch and Drink
strategy for big data. Fast and powerful, but does not retrieve all the data, which
might be ok given the requirements. Think of it as a water tower for your data lake.
Do you really need 1 petabyte of log data, or maybe just the errors and anomalies over
the last 30 days?
Direct/Live Query: Sometimes you do need all the data, or a large set that isnt
realistic to fit into memory, or latency is a concern then use Qlik in live query mode.
The catch with this strategy is you are completely dependent on the source system to
provide speed. This scenario is best when an accelerator (Teradata, Jethro, atScale,
Impala, etc) is used as a performance booster. Qlik uses our Direct Discovery
capability to enable this scenario
API - App on Demand: This is a API evolution of the shopping cart method above
but embedded within a process or environment of another interface or mashup. This
technique allows Qlik apps to be created temporarily (i.e. session app) or permanently
based on the inputs from another starting point. This is an ideal solution for big data
partners or OEMs who would like to build Qlik integration directly into their tool.
In summary, to prevent limited interactions with whatever big data system you use, you
need options. Qlik is uniquely positioned in this area due to the power of the QIX engine and
our ELT + Acceleration + Visualization three-in-one architecture. Since no two big data
systems are alike, Qlik offers the most flexibility with solutions in the market to adapt to any
data scenario big, or small.