Sie sind auf Seite 1von 21

Five Reasons to Offload Analytics from Your Data Warehouse

Sponsored by ParAccel

Speakers: Barry Devlin, Founder and Principal of 9Sight Consulting and Rick Glick, Vice President for Customer and Partner Development for ParAccel Moderated by Ron Powell
Ron Powell: Welcome everyone to our web event. Five Reasons To Offload Analytics From Your Data Warehouse sponsored by ParAccel. ParAccel is a leading analytic platform provider that helps enterprises create a foundation for running big data analytics anytime, anywhere. I am Ron Powell and I am the Editorial Director and Associate Publisher of the BI Network, a part of Tech Target and I will be the moderator for the web seminar today. Our presentation features Barry Devlin and Rick Glick. Dr. Barry Devlin is Founder and Principal of 9Sight Consulting. He is also a founder of the data warehousing industry and one of the foremost authorities on business intelligence and beyond. He is a widely respected consultant, lecturer and author and Barry has more than 30 years of experience providing strategic consulting and thought leadership to enterprises worldwide. Our second presenter is Rick Glick, the Vice President for Customer and Partner Development for ParAccel. In his role at ParAccel over the past 3 years, he has been integral to this strategic direction of the ParAccel platform. Prior to ParAccel he was CTO of database engineering at Teradata. Barrys presentation for this web seminar will look at the changing business and technology landscape explaining why a repositioning of the enterprise data warehouse is necessary. To talk about the five main reasons you should consider, offloading analytic workloads from your data warehouse and he will even share four criteria for selecting the right platform for this offloading. We will have time at the end of this web seminar today for questions. Please feel free to submit your questions at anytime during these events. If you have a specific question that you would like either Barry or Rick to answer please note their name and we will get started. It is now my pleasure to introduce Barry so he can begin todays presentation. Barry?

Barry Devlin: Ron, thank you so much. It is a great pleasure to be back here on BI Network and to be working again with the great folks at ParAccel. Yeah my presentation today is entitled Fiver Reasons To Offload Analytics From Your Data Warehouse and if we have space on the title page it would have been also and if you havent put them on your data warehouse already here is the five reasons why you shouldnt. Lets move quickly on to the main meal and the main course, skipping over I hope the details about myself. I will just leave them there for a moment. You can read them later at your leisure, but Ron you have done a wonderful job introducing me so I will get going with the next main slide.

So yeah most of us know and love the original data warehouse architecture or at least you do if you have been around as long as I have. This data warehouse architecture picture on the right hand side of this slide dates from the early 90s and in fact it states that its genesis is even a few years earlier than that and as we note here it is optimized for hard data and that is to say very structured data and that is what you get in relational databases and here I say it is spreadsheets too, optimized for hard data from internally managed operational sources, operational system and that was what we had back then.

That was the world that we dealt with and if you take a look at this picture the thing that you will note is this that the data flows into the enterprise data warehouse and flow out again. What it says is that you if you have got data in your operational system you better think about putting it into your enterprise data warehouse in order to reconcile it to make it consistent, to make it usable and then you can then spread it out to the data marks where people get at it and thats a very good data management technique and it is great for data governance, but of course it does put a piece of the bottleneck into the architecture and I and many other consultants have been using the picture for many years now and now we say to folks that this is what you need to do, this is a good thing to do, why dont you go and do it? And yet if we are really honest with ourselves we also know what typically is implemented today, it is called spaghetti architecture in my way of looking at it and now I am not going to go through this and it is far too complex to talk about all the pieces in this and I dont really want to do that and boil with it but lets just think about some of the key ideas that are still in there in this picture.

First of all I just pointed out that the enterprise data warehouse does remain in our thinking and in our usage a primary source of information although we have expanded this is over the years with virtualization, federation type approaches to enable us to get more instant access to data and to get data access in different places. The focus remains in summary and complete data and you know I think over the years looking back we really have improved the consistency and the completeness of our data and the enterprise over the years and that really has helped us to be able to think beyond this box where we everything has to go through either a data staging area and/or an enterprise data warehouse and even an operational data store. However there are ongoing and growing issues and we are probably aware of this and thats why you are probably here thinking that it would be a good thing to offload analytics from the data warehouse among other things. There are performance issues. There is the issue of big data but volumes and variety of data that are really becoming available and that we want to look at and use in our environment with the explosion of data marks and spreadsheet that we have dealt with over the years and we are not going to deal too much about that today. So there are many issues with this picture but it has been a growing set of issues over the years and I have been thinking about it probably for the last 3 or 4 years, looking at it in terms of business and technology. So I come up with these sort of 3 drivers that cause me to think we need to look at it differently. One is the idea that we are really involved in a world of a closed loop business. Business people that I talk to want to link seamlessly from strategy to execution and back again without pause so thats one issue that says do we have time to keep moving this data around and through this enterprise data warehouse.

As mentioned already point #2 the massive information, volume and the growing volumes of information and types of information that are coming into use and coming into our cross hairs for analytics and thats really a driver thats going to lead us into where we need to go with this presentation. The third point which is the social networking piece, the collaboration to innovate is the key driver for the way that businesses are going to work over the next few years. So thats the business side of the drivers and whats been happening on the technology side and I am sure Rick is going to come out into some areas of this later on, hardware developments, the MPP and storage developments, which really are moving very fast and very interesting developments over the last few years in terms of what we can do with data and how fast we can do it. The advances in the software side in terms of the relational database management system, columnar databases for example, the analytics software, the no SQL movement all of that data processing is key to what we can do in terms of moving beyond the data warehouse as the place where we try to do everything. And point #3 which we are not going to really deal with in this session today is the moving beyond the information to the semantics knowledge and I am talking here about Enterprise 2.0 on the social networking side.

So thats whats been driving my thinking and in 2009 I came up with a picture which looks like three layers, a new layered architecture, it is like the old-layered architecture, which had three layers but here in this layered architecture, rather than having 3 layers of information I have 3 layers, one of people, one of process and one of information and I have called this new architecture business integrated insight and the reasons that sort of changed the title but kept the BI in there is that I believe thats when we move towards this new architecture we move away from having separate operational information of and collaborative platform and places where we do those things into a much more consolidated and integrated view of the business. This is talking about re-architecting IT and not just architecting BI. Briefly these 3 layers, at the top layer we have the people layer. Here is this personal action domain where we think about and address all users intents and actions and the idea here is the high technical and physical aspects of process and information from people who are actually using it. And thats one thats going to develop slowly because this requires some interesting developments in terms of sociology and psychological thinking that we in IT are probably have a little while to go yet. The middle layer is process, the business function assembly, which is all the process of the business and I say all the process of the business, thats both business and IT creating, managing and accessing information. Why do I say both business and IT, well because I notice that out there in the world are business folks that beginning to more and more to build applications, they know how to build processes to mash up things and so on and so forth. Thats what used to be IT work. So we need to have a business function assembly, which spanned all of the processes as well and finally down to the bottom, which is of course our focus for today, the business information resource, all of the information of the business. I heard a sharp intake of breath out there on the internet and yes all the information of the business, I mean the content, I mean the transactions, I mean the measurements, I mean operational, informational, collaborative, all of that information in a single virtual layer across all storage types and location and at the bottom of the slide thats point you to a paper from 2009, where I talk about it that in more detail and later on in the slides that I have got some additional resources that you can look at. But we are talking about this business information resource as a virtual layer and somewhere in the middle of it there is a physical environment which you might call the data warehouse and around that there are other stores and pieces of storage and types and locations, which you might call analytical platforms and you might content stores and you might call lots of things and we are now going to focus in on that part of it that part that talks about the data warehouse and the analytics.

And I am going to get into the theme of the promise of this presentation is about why offloading analytics from your data warehouse. So the first thing that we can obviously think about doing is saying well you know you have got all of this analytics in your data warehouse, wouldnt it be good to put it in an environment that is actually optimized for analytics and thats essentially the first answer as to why offloading. It boosts your overall analytic capabilities. There are new areas of analytics and new analytic functions coming up day by day and I am not going to go into them in great depth here in this presentation but we can talk about revenue assurance, fraud detection, managing customer churn, marketing campaign optimization, market basket analysis, customer insight. All of these things can stretch and stretch standard relational database. Here we have the possibility when you think about analytic platform to do some of this work on a platform, which has been optimized and designed for that. This brings us into the big data area because we are really talking about data thats beyond the normal scope of traditional data warehouses. I have mentioned briefly measurement or sensor data coming in from RFID ID chips and this one shown on the slide here, which is just a tiny little thing on the tip of your finger, billions of these have been sold over the last few years and this high volumes of mega data arriving from everywhere in the world today. The Internet is becoming a very common phrase. There is transitory data that we may want to use briefly and throw away and there is social networking data from looking at fixed streams all the way to analyzing twitter feeds and Facebook pages. All of this information key for the new types of analytics that all of our businesses do want to do. And I just pose this question to you, I mean of course if you are in Google or in E-Bay you do know your analytic needs or at least you probably do because you have been playing with it for a long time but it is really a question as do you know your analytic needs especially in the mid market or the mid to high-end of the market where these things are going to become more and more important, more interesting as we go forward.

So another reason why we might think about offloading analytics is the speed to speed the time to analytic results and from there to innovation. So if we have got powerful analytic platform we are going to get quick access to data. We are going to get access to advanced functions. We are going to get highest query speed and all of that gives us the ability to go more speedily through iterations of the analysis and get quicker time to resolve. On the right hand side of this slide I am showing something that I have called in the past, the adaptive information cycle and the thought here is that if you think about the way that a real business analyst works, they actually go around a cycle a number of times between the event coming in or what it is that triggers it and what do I need to look at to the decision and action of what am I going to do about it. So think about it this way. You get data and it is somewhere in a database, somewhere in the operational systems or on the network or in the big data whatever it is, it is recalled. The first step is to make it usable and useful. You condition it. The next step after the top of the circle is to utilize it, which means let me do some analysis on it, let me understand whets going on, let me pose some questions, let me think about some possibility and normally what happens at that stage is that rather than going to decision and action is that you discover, hey I actually need more data or maybe I need to go talk to my peers and you go around the loop and you assimilate. Thats the part where you go back and think about it again and then you go back and you find you may need to recall more data you condition it and you got the analytics page again and then you present it to your manager, and he says hey Jo you know did you think about, have you really felt what I really wanted. So this loop goes around a number of times before you come up with a decision and action. So the interesting thing about it, if you can iterate here more quickly in your analytics environment, what you do is you get more quickly to the innovation and the business answers that you really need and this is a key point in terms of how we can imagine doing analytics as we go forward and have the platform that really supports us.

Number three is perhaps a simple thought but lowering overall acquisition and maintenance costs. You know on the left hand side here, we have got the reasonably expensive generalpurpose relational data base hardware that we use and the software that we use in data warehouses. We have got highly trained DBAs, we have got extensive modeling which goes on in that environment and I am not saying stop doing it. It is certainly needed for lots of things but do you need it for everything and I think a lot of the information that we think about doing analytics we may not actually need to go through that environment, that hardware, software and the personal environment that extends on the environment on the left hand side. We got the possibility to use lower cost analytic platforms. We have got lower amounts of modeling and maintenance costs and as we have already mentioned we have got quicker time for results. So this gives us a possibility to lower the overall acquisition and maintenance cost and we all know that the major cost components these days is actually the people costs, the knowledge and skills are other things when compared to hardware and software platforms

As we move on to question number four, we are really beginning to say is that if we can move our analytics on to a different platform or not put it on to the data warehouse platforms in the first place what we can do is refocus our data warehouse and business intelligence capacity back to where it was originally intend. Because large scale analytics can be computed and the enterprise data warehouse platform has been optimized for we built it for like it was a Swiss army knife in many ways but we optimized it for reconciliation, for cleansing and for managing and building the history that we know is important in terms of the long term management of our enterprise resource and making sure that we brought a consistent view of the history and the position of the organization going out to the outside world. Our BI environment and the data is optimized for reporting for traditional querying and for slicing and dicing and thats not the same thing as focusing and optimizing on analytics. What we are seeing here is the traditional long term tradeoff that we have always done between specialization and generalization and the emergence of analytic platforms and specific analytic functions gives us the ability to do another piece of specialization and moving from that generalized platform.

And our fifth bullet here, our fifth point for offloading our analytics is that we could actually allow ourselves to extend the life of our data warehouse environment, because we have taken the analytics off and that we can allow the current data warehouse environment to be maintained at its optimum query for performance and usage. We can perhaps reduce the load on ETL for the data warehouse by routing the analytic data direct to the analytics platform or moving the analytic functions to the data where it lies and removing the analytics actually provides a growth capacity from the traditional workload on our old data warehouse, dare I say old, our traditional data warehouse environment. So these are five good reasons for thinking about putting our analytics on a different platform in a different space.

So the question I think that then arises is how do we consider what that space looks like, what are the considerations for that overflowing platform and of course the first one that we will come to is the question is it fast enough? Now I think this is fairly obvious but I really do want to point out that it is important to think about the need for the full analytic process. It is not just about the query. And we as data warehouse and BI people probably know that that I bring it to the fore because sometimes we forget when we move to a new platform. The speed of actually getting the data into the environment, the speed of design and implementation, they are key considerations when we talk about, speed and is it fast enough. So we really need to think about are we reducing the data movement, or are we getting faster data movement, are we getting a design and implementation phase that is easier for us to manage? We got to think about our own data. You know I think this analytic environment and the analytic tools are still at the early stages of development and evolution and you know the data that you have and the thought of analytics that you need to do are probably pretty specific to your industry or may be even to your particular company. So you need to do a proof of concept. You need to figure out is this going to be the way that you are going to do it and will it work for you on this platform and will this platform be fast enough for you and of course it is not only about performance, it is not only about speed, it is of course about price performance.

The second consideration on choosing the offload platform is agility. I love this lady and the picture of this lady, I have used it so many times and every time I see it I think I hope I am going to be able to do that when I am that age. Hey I cant even do that now. So anyway how agile is the complete solution? Well we need to be able to think about how easy is it to deploy new analytics, new tools, new functions, and new ways of analyzing the data. It is very important. As I said its an evolving environment. Can we run many scenarios at once, can we run many advanced functions, how can we bring all this together into an environment, which provides our users, our business analysts with the ability to play with the data in a way that they want to do it. Can we bring in the data or analytics when we need it? Is it easy to load it in? Can we easily load it in and quickly load it in? How much agility do you need because like speed of course there is a tradeoff here. The more agility that you want probably the more expensive and complex the hardware you need. So it becomes a bit of tradeoff but it is important I think in this particular instance to err on the side of more agility because as I said it is very much an evolving environment.

The third point is can it support the complexity of the release? We are really talking about here moving into pretty complex business decision and as we go further into the analytic environment, I think we are going to have more and more analytic functions that we need to involve and to be able to move just beyond the analytic function into how do I join it into the decision making, how do I make sure that I can combine analytics from different areas within my company or even indeed outside the company. Can I make sure that I can bring in different kinds of data when I need to do that and all of those areas of complexity need to be considered as we look at the platform and how to chose the platform and as the little cartoon on the right hand side says the simple answer at Ajax is probably not going to do the job.

Finally the fourth question here is how to scale with you? Can it handle, will the platform you are thinking about, and can it handle any amount of data? Can it go to any depth of analytic, any combination of analytic, can it support any kind of analysts and any number of analysts? I am sure that Rick will be spending some time talking about some or all of these issues as he get into his part of the presentation. But before we do that I just want to come back to that architecture picture and to give you a flavor of what that large box at the bottom of the slide might look like and how we might have a physical implementation of such a virtual and extensive architecture that I drew earlier in business integrated insight.

So I think this will have a pretty different look, different look from the traditional enterprise data warehouse data marks. I think what we will see is very powerful personal machines with significant amount of storage and memory where a quite significant amount of data will be processed locally and managed locally and used locally, maybe we are talking here about spreadsheets on steroids, and we need to manage and control that environment, but that is another day at work. Underneath that at some level, some level of data virtualization because at the bottom here I have got a new vision that will say there is going to be many stores where the data is the life and in the middle I think that there will still be something that looks like an enterprise data warehouse, but I think it is going to shrink in relative terms to what we have today and I have started to call it core business information, because I think it will focus as I said on the core of business information as opposed to being the warehouse through which everything must balance and in there must be some aspects of the operational data store and there will be excellent aspects of MDM after data management and so on, but the idea here is, this is the core business information, if you like to pool everything together and a variety of other sources, some of them is content sources, some of them is e-mail and some of them is also the very unstructured data growing rapidly and having a variety and on the other side analytical data stores or analytical applications and platforms and data marks wherein some of the data, maybe most of the data, doesnt necessarily flow through the enterprise data warehouse anymore and data virtualization which bridges those different information resources, using very extensive set of meta data that I think is evolving. I think it is discovering that it is evolving that becomes key to holding the whole process, the whole system together and essentially what that is saying is as just talked about is moving data and function from the enterprise data warehouse to the analytic appliances platform and as I sort of mentioned in passing not putting it in the enterprise data warehouse in the first place, but going straight to the analytic platform or some level of that data information that we are going to use in the enterprises for the future.

So in conclusion, what I was saying to you is we are going through what is a very different world, a new world in terms of the data needs of the organization and the drivers that we are seeing for change is really in terms of the need for business intelligence and the operational use. I have mentioned that we have seen an explosion of data sources over the years and the explosion of volumes and variety there and the increase in demand for timeliness both in development and execution and indeed in analysis. These drivers for change are really changing the ways that the entire system looks and the way that the world is going to look as we go forward and I have shown you a new architecture with disparate or distributed data which I call BI to the power of 2, or business integrated insights, where data is stored in the technology that is best suited for this in storage and accessed through data virtualization and technology and this enables the offloading of traditional data warehouse environments to analytic platforms and what we focused here in this presentation is on the advantages in offloading and indeed on the choice of analytic platforms.

I would just like to leave you with some further resources you can go and read on my website, on the BI network and the blog and various other places, I think even on Youtube for goodness sake. st Talk about moving into the 21 century and as I leave you with that I would to say thank you and I would like to hand you over to Rick and let Rick take us into some more detail.

Rick Glick: Thanks Barry for that very insightful and articulate reasons and to offload analytics and kind of a new architecture. That was fantastic. What I want to do is talk about our analytic platform and how it might fit into your architecture and I think I would like to start and talk about, Barry gave us a list of reasons, but some of the symptoms we see when we talk to different organizations and different challenges that sort of drive us to a new architecture and this is kind of an imagination of a bunch of different customers that we have talked to and a bunch of different organizations out there and over and over again, I have this conversation around the amount of pent up demand people have, and how the IT and existing information infrastructure allow the analytics. You see people having 6 months a year and 18-month backlogs on new applications and new things that they want to do and want to provide to the businesses and simply cant get to and the reason is because the existing platform and architecture is actually allowing them to. They are struggling with how to do in this monolithic EDW environment of today how to do different kinds of workloads and how to integrate different kinds of tools and how to pull this off together and also how to give a different scope and breadth of analytics than what is currently capable and so then on top of that, on top of failing and kind of the traditional things, there is all new data from social media and other sorts of things and you have the new architectures and the new platforms springing up around the no SQL clouds and sort of things, which never should they talk to the relational folks and they would almost end up with. After EDW in the early 90s, the issue of data integration gave you a single version of the truth and now you have another source of data and you have stove pipes all over again and so that is kind of, if you recognize these things in your environment, those and what other folks are recognizing and that kind of leads to what Barry was talking about, here are reasons that you want to go switch and so then what the goal is, is to create unconstrained analytics and it is to enable your users to build new things just in time to build new applications and new analysis to use the right tool for the right job and not every tool, not every job requires the right set of tools and right now e

have the sort of narrow set of tools and narrow set of standards within an organization, everything goes into the EDW and I think that really constrains and leads to the objections that we had before. I think companies need to look at right levels of government and right levels of security for the task at hand and different perspectives and different securities need different levels of governance and they need different levels of security. Also you need platforms to scale to any degree and can handle any kinds of data and pull together different tools into one environment so you could have what I often dub a computing environment. So let us talk for just a minute about the ParAccel platform and how it kind of its into all of that. It is built on an MTP shared architecture. So fundamentally that gives scale, scale for any amount of CPU approach in topics and any amount of data underneath, it linearly scales and what is really special about our platform is that at each level of the architecture, be it CPU, I/O, interconnect memory and we kind of do best in class optimizations and you know at the I/O it is columnar and at the CPU it is compiled and interconnected and it is knitting commodity fabric with our own private internet protocol. So that gives you and then on top of that we have the best in class of SQL optimizers, we speak SQL, the lingua franca of the data warehousing analysts and that gives you an environment with some interesting characteristics, one scale, two agility, Barry talked about agility, both scheme agnostic, no need for us to load, no need for things and shoddy performance structures like indexes and stuff, which actually create a more fragile and very closed environment, but not a very agile environment. I am talking about the core database and we have built an extensibility framework that allows you to add on analytics of any nature into the core platform where it becomes part of the core platform, it is not things lying in different processes and that sort of stuff, it is all tied together with SQL and the API is such that while you are writing your analytics use database functionally to read through the data and the parallelisms do all kinds of weird and wonderful things from a problematic point of view, but what is important about that is the fact that you can totally integrate it into the environment and do any kind of secret analytics you can think of and it can be involved in the environment. One of the things that we have done with this environment is this thing we have called ondemand integration and this nearly leads to the heart of qualitative computing. We have this ability to reach out and in the concept of SQL talk to other data sources, be it ODDP data sources, be they teradata, be they Hadoop, and if all of them are in the concept of SQL and you can do Hadoop analytics, again social media, change to the market math data in the retailer and so you can look at both filament and fine behavior in the concept of a single query and expose that with the BI tools, thats really quite unique, that ability to take data from three, four, five different sources and in the concept of a single analytics and a single query, one multiple analytics and expose them through the tool set that you have in your environment today. So, that ability to work and tie into your environment together in a very natural and qualitative way I think is a key thing that people need to think about, work on and learn to do and then as always it is always better to move analytics today as in data analytics, that seems like a tired phrase for sometime, but it is important to have a wide variety of analytics and functionality integrated into the database and a growing library, one that you can expand on your own and you are always building your own stuff and in a community of parallelized analytics through you. So this is just kind of what one company looked at and so when they were analyzing and sort of contemplating going down an analytic offload strategy they did a POC which everybody did and these were some of the results of the POC. Now to be honest, these results are very kind of narrow in some respects and in this particular POC it was 12 nodes of ParAccel versus 20 nodes of other platforms. Same generation of hardware. These numbers arent particularly scaled appropriately and so if we are determining the 20 nodes on a complex ad hoc query is going against market facts detail data, and segmentation data and other things, if you would have scaled that workload so that the platforms with equal size instead of 18 x for example but that would be 30 x, but better performance, speed, agility, that is all

important and it is simply loaded in and turned around and the other thing that customer analytics says that we should focus on the Four X better, if this were 4X better, scaled appropriately and that would be closer to we would call it 16 X better and what is important about that is, all that was done in an existing application in this case, and the BI tool was pointed to a different platform and nothing else changed out of the box and this was needed for performance improvement but is what is really interesting thing is that the congruent analytics that were actually constrained to one quarters worth of data and sometimes another set of analytics were constrained to 10% of the data and a platform designed for analytics doesnt have those constrains and so we are into the complete a years worth of history and the other platform couldnt completely read off and if it just doesnt successfully lead off your platform it is definitely a wrong tool for a complex analytics and where we can come in with the kind of the right tools for that. Anyway, in summary I would just like to say, right tools for the right job as well as an important consideration and think of it is as not simply one tool but an environment where the tools play well together and cooperate together both from a data perspective and from an analytics perspective, with that I would like to turn it back over to Ron for question and answers. Thanks. Ron Powell: Great Rick. Thank you so much, Barry and Rick for great presentations and it is now time to move to our Q&A and again as I mentioned earlier if you have a question for either Barry or Rick, please include the name at the front of the question and please ask your questions. So our first question is we already do most of our analytics in analytic tools and business intelligence tools, how much work is involved in offloading to an analytics platforms. I want to start with you Rick. Rick Glick: Well, if you have the right ability to talk and work with the other tools in the environment, it should be actually really straightforward to bring in a new platform and to within a matter of weeks if not days turn it on and start getting value back especially in ad hoc kinds of analytic environments where you are discovering things, different applications and reports and processes and things will take a variable amount of time. Ron Powell: Great. Barry a question for you, are you killing the EDW? Barry Devlin: Killing my baby, no I am not. I just think that the baby needs to grow up a bit and get a bit more focused. I think what has happened over the years is that the EDW has become a sort of a dumping ground for an awful lot of stuff that probably there was no where else to put in the olden days and now there are other places and better places and so I think the sort of applications that Rick was talking about and that I mentioned earlier, I think the specialized tools that can handle them are probably the right place to do it and when you talk about content and I really dont think we want to be brining all of our text information into the EDW and so it becomes much more about focusing the EDW on to the thing that it is for and keeping in mind that the reason that it was thought up in the first place, I was there and so I know, it was really about those inconsistencies, it was getting a place where everything came to together that needed to come together and not everything that ever existed came together. Ron Powell: Rick, are there certain trains of workloads that you recommend offloading to an analytics platform and how do you determine what to offload? Rick Glick: Yes there are. I think the simple answer is in every enterprise warehouse and every platform you will see a set of workloads that are working really well and then the threat that are more complex and that are taking a disproportionate amount of resources compared to the other workloads and so it is kind of highly complex and hot workloads are an example of workloads which are unpredictable but highly complex unpredictable workloads that are putting large strains and forcing us to do jump through the hoops to manage your enterprise warehouse environment and those are things that you want to offload and I just think it is important to put out there that is a whole bunch of things that people simply arent doing today because of constraints in existing technology and so there is an offload approach and then there is a things and very much in this

as well that people just simply dont do and I think I wouldnt want to just what do we move but also what do we want to do that we havent been able to with the existing technology. Ron Powell: Right, I totally understand that and Barry this is a question that I get also and I would love to hear your answer as an expert in big data, could you give us an idea where most companies are today? Barry Devlin: Well you know I am isolated down here in Cape Town and so I can make some interesting guesses around the rest of the world and my feeling is that we are a lot further behind where the hype could make us think we are and for sure the large web denizens as I sometimes call them, the Googles and the E-Bays and the Twitters and some of the smaller ones as well, those people have been doing it for years now and they have building tetabytes of data and really making it go and I think that it is beginning to move into then the larger retial organizations that typically are with the people that got involved in data warehousing in the early days first. People with significant amounts of customer information and customer knowledge that they would believe is to be found on Twitter or on Facebook whatever and they would see that coming in. The next wave I think is going to be really around and I think it is starting already and it is still growing is where the center data is going to start coming in and in large quantities and large volumes and that is going to affect not just business intelligence environment and the analytic environment but the operational environment as well and I think that is where it is probably more unexpectedly a lot of the growth in this analytics is going to come from and because at the moment everybody is really being loud about social networking aspects and who said what on Twitter and when and is it good stuff or bad stuff, but the volumes of data that are being generated per se for example by a 747 with four engines in flight and I mean it is terabytes per minute as I far as I understand it and if that stuff is really going to be analyzed in depth, then we have got a huge analytics possibility which is coming but it is still not there and then we have to get down to the small and medium businesses where I think it is going to be a much smaller, well big data is big data and it is defined relative to the amount of data you are currently using and so everything eventually becomes big data and so I think in true terms, I dont think it is really going to get down to the smaller movement businesses. Ron Powell: Intriguing, intriguing. Next question, Rick, how quickly can we offload to a ParAccel environment without realizing the benefit from an operational perspective. Rick Glick: Well, again it depends on particularly what you are trying to accomplish and if it is an environment where you can standup and since we have the hardware in place, you can standup ParAccel in a matter of hours and it is an environment where we have this on demand integration technology I briefly spoke about in place and then you know within a day or two you can be starting to offload at least the ad hoc workload and populating data into ParAccel and then for the more and other workloads will take, it depends on the physical, I mean deploying things at different places and moving processes and it is more internal BI and IT kind of work and those go to more traditional time frames. Ron Powell: Interesting, interesting. You know this next question goes along the same lines and I think it is agility that you have talked about a little bit earlier Barry. We hear a lot today about analytic sandboxes and are you seeing this as a use case for data offloading? Barry you want to go at it first and then we will ask Rick. Barry Devlin: Yeah sure. This sand boxing idea has been around for a long time and I know that folks in teradata have championed it for many years. I think this is an excellent set of data and an excellent set of processors that really do suit the environment with the strength and functions that are available on analytics platforms and I think it is really important to keep in mind that when we are sand boxing, it really is about bringing in data from different sources and I think Rick pictured in his presentation, about the data coming from both the Hadoop and teradata into the analytics platform and it gives an example of the sort of things that becomes very powerful when we want to do sand boxing and I think we have to be careful about it and so I think it is

really important to remember that this stuff is this is powerful stuff and powerful conclusions that one can draw from it and so I think you have to be a little bit careful about who gets to do the sand boxing, giving our ordinary business users the sort of power that is you know the analytics playrooms can offer, might be considered something like giving Kalashnikoffs to school kids. I think you have got be very careful about who you give it to and how you manage that process because I think you can do a lot of damage without knowing it and these analytic functions are statistical and they are not intuitive by nature and I think we have got to be careful about who gets to play with these particular toys. Ron Powell: Rick any comments on the use of a sand box as well for offloading information? Rick Glick: Absolutely. The idea has been around for a long time and I think there is actually a couple of things that make it easier today to think about and after all when we want to do a sandbox thing, you have to carve out more space and more resources and things to when in order to play in your sandbox there has to be space and CPU and all of those sorts of things and one of the notions that I think are beginning to address that is actually some notions around cloud and managing your resources across a greater set of things and so I think the one thing that is missing historically from sandboxes is to be able to very, very quickly provision new resources and so I actually, I am starting to be a big fan of notions around private cloud and part of the reason is so that you can have you know studies or new things and spin them up very quickly and provision them very quickly and so I think that there is a platform evolution that has gone on that is starting to put more meat behind the notion of analytics sandboxes anyway. Ron Powell: Well thats very good and I know that we are getting near the end here and I just want to, I have one last question for both of you. You know obviously when we talk about something new and data offloading and analytics is a fairly new area for some companies, and what do you see are some major issues in moving towards an offload analytics environment, Barry? Barry Devlin: Yeah, I mean the interesting thing is that when you look at the possibilities and I think Rick has put it so nicely talking about private clouds, there are so many technology options becoming available for doing all of this sort of stuff that the main issue that I see is for IT to get their hands around what is going on and there is the possibility here I think of the spreadsheets problem, the spread mart problem that Wayne Eckerson wrote about what was it, 6 or 7 years ago and that becomes an even bigger issue because the two have become so powerful the storage has become so huge that people are running terabytes of databases under their desks these days and you know the analytics platforms and I am guessing that ParAccel, maybe it does, but I am guessing that ParAccel doesnt sit under a desk now and the more machines that you get out there with more disks and the more possibilities of having yet more copies of data, the data management issues are the ones that I worry about. Rick Glick: I agree with Barry and I think you know to pull out a slightly finer print on it are keeping all of these manageability, the governance and doing the right amount of those things and you have all the effects of the tools and all of the different data sources and you no longer one big data dictionary and saying how to manage that and keep control over it is absolutely the biggest issue I think right now and we are starting to get wonderful tools and we are starting to play together nicely and not as well as they should but beginning to but I dont think things like meta data and governance and those sorts of things are keeping pace especially techniques that are getting in the way of empowering users. Barry Devlin: We also end up with the joy of meta data at the end of these presentations dont we? Rick Glick: thats a good point, I have never heard of it put that way Barry, but you know the other thing when you look at big data and when you look at the data warehouse, this analytics problem and Barry, I love your comment on this I mean this has really been a pain point right now

for most enterprises, trying to do analytics with the rest of their data warehouse environment and this to me seems like a fairly straightforward solution to help the overall data warehouse performance. Do you concur with that? Barry Devlin: Absolutely and in terms of you know what we see emerging in technology over the last two years and I think ParAccel has been well in the forefront of this particular way, we have seen the ability to do things that were not possible before that were put on the back-burner because we havent got the power and we havent got the ability to do it overnight or even in hour or whatever it was we needed to do and in terms of having the technology it is a no-brainer and as we sort of talked about, it brings a new set of problems that needs to be dealt with and I think that is going to be the interesting thing that we have to deal with over the coming years because I have no doubt that we are going to see is it is not going to be just ParAccel I am sure, we are going to see people moving significant amounts of data and significant amount of processing into these platforms, these independent platforms and you know I remember in my IBM days we were worried about teradata and we were worried about the fact that data was moving out the mainframe and I think we are going through the same thing again and now we have the data warehouses, the oracles, the teradata, the IBMs whatever it is and now there is a new set of new kids on the block with faster performance and better price performance points and so on and so forth and the data and the functions are going to move there, we just have to figure out how to manage it and there is no point in saying you know dont do that, it is not just going to happen. Rick Glick: Right, and the analytics have to be done in seconds and we cannot wait for analytic responses in minutes or hours and it just isnt going to work in the new world today. Ron Powell: Well this concludes our web event today and I want to thank both Barry and Rick for their insightful presentations and I would like to thank all of you for joining today and I want to make you aware that the web seminar is available on bit pipe or replace or any of your associates who were not able to make it to the web seminar today, if they would like to view it or if you would like to see it again, please go to bitpipe and the BI network to replay it and thank you very much for attending.

Das könnte Ihnen auch gefallen