Beruflich Dokumente
Kultur Dokumente
Abstract
Thrift [1] is an open source library that expedites development and implementation of efficient and scalable back-end services. Its lightweight framework and support for cross language communication makes it more robust and efficient than other RPC frameworks like SOA (REST/SOAP) for many operations. However, Thrifts capabilities are challenged by emerging enterprise solutions like Big Data that impose high maintainability and administrative overheads on an enterprise hosting multiple services over the network, due to its limitation of hosting one service per port. This paper addresses the challenge and details the approach that Impetus has devised, to enhance the caliber of Thrift and enable it to meet enterprise expectations.
Table of Contents
Introduction .............................................................................................. 2 Whats so special about Thrift? ................................................................ 3 Thrift is powerful, yet lacks the prowess .................................................. 4 Adding charm to the glorious API through multiplexing .......................... 5 The approach ............................................................................... 5 Components................................................................................. 5 How to use thrift multiplexing .................................................................. 9 Creating a multiplexing server with a lookup registry ................. 9 Making a wise investment lucrative ....................................................... 13 Summary ................................................................................................. 14
Introduction
Thrift is a very lightweight framework for developing and accessing remote services that are highly reliable, scalable and efficient in communicating across languages. Thrift API is extensively used for creating services like search, logging, mobile, ads, and the developer platform across various enterprises. The services of various Big Data open source initiatives like HBase [6], Hive [7] and Cassandra [8] are hosted on Thrift. Its simplicity, versioning support, development efficiency, and scalability make it a strong contender in the SOA market, helping it to compete successfully against more established integration approaches and products. Thrift has the capability of supporting a large number of functions, communicating across languages for each service. This capability can be further enhanced by extending Thrift support to host multiple services on each server. In this white paper, we look at how the capabilities of Thrift can be enhanced to make optimum use of enterprise resources. We have also presented a framework that can enable the creation of server hosting multiple services, registration of service(s) and lookup of service(s), based on standard context.
2
fig1.1 : Option 1- Write a monolithic, unwieldy implementation and host it as single service
If an enterprise opts to follow the first option (ref fig 1.1) then, monolithic and unwieldy implementation elevates the development cost of the solution. Since the complexity of the solution keeps on growing with the addition of every new service. Return on Investment (ROI) is adversely affected by high maintenance overheads.
If an enterprise opts for the second option, the number of ports consumed for hosting multiple services will be high. Since ports are a limited enterprise resource, that needs to be used judiciously, this poses a serious concern. This option will therefore be challenged by high administrative and maintenance overheads. Also, to prevent overheads related to connection setups on each call, clients have to maintain too many connections (at least one to each port). With the addition of every new service, a new port has to be opened on the firewall. The advantage of Thrifts flexible design for the solution is thus challenged by high administrative overheads.
The approach
The baseline approach is to assign a symbolic name to each service which is referred to as 'service context' in this Paper. This will help us in hosting multiple services on each server where each service can be recognized by its respective service context. A client using lookup service should be able to fetch the appropriate service context and use the same for directing the service call to the respective servant.
Components
The solution has extended the Thrift API[version 0.9.0] to introduce some of the new components (highlighted with red boundaries in fig1.3) mentioned below: Multiplexer Multiplexer is the processor that is at the heart of this solution. This component acts as a server side request broker and is responsible for identifying the service that the client has requested for, based on the service context propagated by the client. This component maintains a mapping between the service context and the service. While processing any request, it reads the service context from the underlying protocol and based on the mapping, directs the request to the appropriate service.
5
Protocol In our approach, we have made our solution transport and protocol agnostic. We have created a wrapper around the underlying protocol (any Protocol instance) that is capable of embedding service context to the message on the client side and fetching the same on the server side. Thus, we have added a new class TMultiplexProtocol as a wrapper around the existing TProtocol that overrides the behavior of writeMessageBegin (TMessage) and readMessageBegin() methods. Any client that has to communicate with TMultiplexer needs to wrap the underlying protocol using the TMultiplexProtocol instance.
Registry and Lookup In order to reduce the overheads associated with managing the service context manually, we have created a registry component along with this solution that is responsible for managing information pertaining to all services hosted on a particular server. This component is hosted as one of the service on the underlying multiplexer and can be queried by the client on the TMultiplexerConstants.LOOKUP_CONTEXT for procuring relevant information about the hosted services. The TRegistry interface is the basic client API for querying the lookup registry. It provides several lookup methods for querying registry based on service context, service name and regular expression. It also
facilitates users in checking the existence of any service context and listing all available service contexts with the registry. TRegistryHelper is an interface for the server API, which is used by the server for binding, rebinding and unbinding of service context with the lookup registry. We have provided one basic implementation of the registry API, TRegistryBase that performs in memory management of the service context. This component can be extended to override the default behavior, based on the specific need, and can be used along with the Factory class. TRegistryClientFactory is the Factory class for creating the registry client that facilitates remote lookup of registry. Service Information The solution uses the URIContext class to capture/represent information regarding service(s) hosted on a particular server. This object is capable of transmitting across the network; and hence can be accessed remotely by the client. Service context, service name and description are part of the information captured by this object in the present solution. Multiplexer-extension for lookup
On its own, Multiplexer is capable of hosting multiple services. However, managing service information is an overhead for the client as well as server administrator. To reduce this overhead, we have introduced a registry component that is capable of managing service information. In order to leverage the capability of the multiplexer and registry component in a single processor, we have introduced our new processor TLookupMultiplexer that is capable of hosting multiple services along with an additional lookup service based on the registry. The processor therefore creates an instance of registry with all service information and exposes it as an additional service to clients. This enables clients to query registry using Registry API, and accessing the underlying service using the service context obtained after querying. Server We have presented a new abstract server, the TMultiplexingServer, which is capable of hosting any server implementation on any transport and any protocol, using TLookupMultiplexer. This class abstracts the underlying complexities of object creation and exposes two abstract methods, vis. getServer and configureMultiplexer, to be implemented by any class extending this class. This class enables a user to identify the server transport and protocol at the time of the server object creation, thus providing an additional degree of flexibility when it comes to hosting the same server with multiple services on different transport and protocols with no additional coding effort. The TMultiplexingServer internally wraps the instance of the TServer, allowing the server startup and shutdown to be managed in accordance with the requirement. Source Code We have extended the Thrift Java library[version 0.9.0] and added a new source folder by the name ext that contains the underlying implementation of multiplexing components. Also, build.xml has been amended to compile existing and extended source code. Compatibility of the solution has additionally been tested with the present stable version 0.8.0 of Thrift for seamless integration. In order to use the multiplexing capability of Thrift, one has to download/pull source code of the extended Thrift library [9] from git-hub and run the ant command on the downloaded Thrift Java library. This will generate the libthrift-xxx.jar in build folder, which can further be used by developers for creating their enterprise solutions.
Step 2: Optionally override the default constructor to accept server transport and protocol public Server1(T serverTransport, F protFactory) { super(serverTransport, protFactory); } Step 3: Implement the configureMultiplexer() method to configure the lookup multiplexer. As a part of this configuration, one has to create a list of MultiplexerArgs that capture the details of the services that will be hosted on the server and their respective service information. In the example illustrated below, we have hosted the HR and Finance services on Server1. @Override protected List<MultiplexerArgs<URIContext, TProcessor>>configureMultiplexer() { //list of multiplexer arguments List<MultiplexerArgs<URIContext, TProcessor>> args = new ArrayList<MultiplexerArgs<URIContext, TProcessor>>(); // configuring HR service context TProcessor processor = new HRService.Processor<HRServiceImpl>(new HRServiceImpl()); URIContext context = new URIContext(Constants.HR_CONTEXT, "HumanResource_Service"); MultiplexerArgs<URIContext, TProcessor> arg = new MultiplexerArgs<URIContext,TProcessor>(processor, context); args.add(arg); // configuring FIN service context
processor = new FinanceService.Processor<FinanceServiceImpl>(new FinanceServiceImpl()); context = new URIContext(Constants.FIN_CONTEXT, "Finance_Service"); arg = new MultiplexerArgs<URIContext,TProcessor>(processor, context); args.add(arg); return args; } Step 4: Implement the getServer() method to create an instance of the desired server. In the example below, we are creating an instance of ThreadPoolServer using the arguments. @Override Protected TServer getServer (TServerTransport serverTransport, TProtocolFactory protFactory, TProcessor processor) { //creating server args Args serverArgs= new Args(serverTransport); serverArgs.protocolFactory(protFactory); serverArgs.transportFactory(new TTransportFactory()); serverArgs.processor(processor); serverArgs.minWorkerThreads=1; serverArgs.maxWorkerThreads=5; //creating server instance Return new TThreadPoolServer(serverArgs); } Step 5: Create the instance of a server class, using the appropriate transport and protocol, and start the server. public static void main(String[] args) { //identifying server transport TServerSocket SERVER1_TRANSPORT = new TServerSocket(Constants.SERVICE1_PORT); //identifying server protocol Factory SERVER1_FACTORY = new TBinaryProtocol.Factory(); //creating server instances for specific transport and protocol Server1<TServerSocket, TBinaryProtocol.Factory> server1 = new Server1<TServerSocket, TBinaryProtocol.Factory>(SERVER1_TRANSPORT, SERVER1_FACTORY);
10
//starting server server1.start(); } Creating a client for querying the registry and using the service context A Client-to-query multiplexing server registry can be procured from org.apache.thrift.registry.TRegistryClientFactory class.TRegistryClientFactory is the convenience class that provides multiplexing client instances. On the client side, one can use the static method getClient(..) of this factory to procure the registry client. This can further be used to query registry and identify the appropriate server for processing the request. The example code provided below is about a client that retrieves the tax detail of an employee using the finance service: public double getTaxDetails(intempId){ TTransport transport = null; TProtocol protocol = null; try { //transport transport = new TSocket(Constants.SERVICE_IP, Constants.SERVICE1_PORT, 60); //Multiplexing protocol protocol = Factory.getProtocol(new TBinaryProtocol(transport), TConstants.LOOKUP_CONTEXT); //Procuring Registry client TRegistry client = TRegistryFactory.getClient(protocol); //opening transport transport.open(); //querying registry to get context Set<URIContext> contexts = client.lookupByName("Finance_Service"); //executing the request on appropriate service using the context if(contexts.size()==1){ URIContext uricontext = contexts.iterator().next(); protocol = newTMultiplexProtocol(newTBinaryProtocol(transport),uricontext.getContext()) ; com.service.FinanceService.Client finService = new com.service.FinanceService.Client(protocol);
11
12
13
Summary
In recent times Thrift has emerged as a powerful technology for communicating across programming languages in a reliable and efficient manner. Enterprises dealing with Big Data and other advanced technologies can use the Thrift solution to host multiple services on the network by efficiently utilizing enterprise resources, at low maintenance costs. References [1] http://thrift.apache.org/ [2] http://avro.apache.org/ [3] http://msgpack.org/ [4] http://code.google.com/p/protobuf/ [5] http://bsonspec.org/ [6] http://hbase.apache.org/ [7] http://hive.apache.org/ [8] http://cassandra.apache.org/ [9] git://github.com/impetus-opensource/thrift.git
About Impetus Impetus Technologies offers Product Engineering and Technology R&D services for software product development. With ongoing investments in research and application of emerging technology areas, innovative business models, and an agile approach, we partner with our client base comprising large scale ISVs and technology innovators to deliver cutting-edge software products. Our expertise spans the domains of Big Data, SaaS, Cloud Computing, Mobility Solutions, Test Engineering, Performance Engineering, and Social Media among others. Impetus Technologies, Inc. 5300 Stevens Creek Boulevard, Suite 450, San Jose, CA 95129, USA Tel: 408.213.3310 | Email: inquiry@impetus.com Regional Development Centers - INDIA: New Delhi Bangalore Indore Hyderabad Visit: www.impetus.com
Disclaimers
The information contained in this document is the proprietary and exclusive property of Impetus Technologies Inc. except as otherwise indicated. No part of this document, in whole or in part, may be reproduced, stored, transmitted, or used for design purposes without the prior written permission of Impetus 14 Technologies Inc.