You are on page 1of 101

INFORMATION TECHNOLOGY, ELECTRONICS & COMMUNICATIONS

DEPARTMENT

GOVERNMENT OF ANDHRA PRADESH


HYDERABAD

REQUEST FOR PROPOSAL


FOR
SELECTION OF SYSTEM INTEGRATOR
OF
DATALYTICS PACKAGE - e-Pragati IN ANDHRA PRADESH STATE
Volume I

August 2016

Andhra Pradesh Technology Services Limited


4th Floor, B-Block, BRKR Bhavan
Tank bund Road, Hyderabad – 500 063

e-Pragati Requirements Specification Document – DataLytics Page 1 of 101


Proprietary & Confidential

No part of this document can be reproduced in any form or by any means, disclosed or
distributed to any person without the prior consent of APTS except to the extent required
for submitting bid and no more.

e-Pragati Requirements Specification Document - DataLytics Page 2 of 101


RFP Structure

This RFP is meant to invite proposals from interested companies capable of delivering the
services described herein. The content of this RFP has been documented as a set of four
volumes explained below:

Volume I: Functional, Technical and Operational Requirements


Volume I of RFP intends to bring out all the details with respect to solution and other
requirements that Information Technology Electronics and Communication (ITE&C)
department deems necessary to share with the potential bidders. The information set out in
this volume has been broadly categorized as Functional, Technical, and Operational covering
multiple aspects of the requirements.

Volume II: Instructions to Bidders, Scope of Work and Financial and Bidding Terms& Forms
Volume II of RFP purports to detail out all that may be needed by the potential bidders to
understand the Terms & Conditions, project implementation approach, commercial terms
and bidding process details.

Volume III: Contractual and Legal Specifications


Volume III of RFP is essentially devoted to explain the contractual terms that ITE&C
department wishes to specify at this stage. It also includes a draft of Master Services
Agreement.

This document is Volume I

Kindly note that all volumes of the RFP have to be read in conjunction as there are cross
references on sections in these volumes. The selected System Integrator will be solely
responsible for any gaps in scope coverage caused by not referring to all three volumes.

e-Pragati Requirements Specification Document - DataLytics Page 3 of 101


e-Pragati Requirements Specification Document
DataLytics
Table of Contents
1. E-PRAGATI PROGRAM REQUIREMENTS .................................................................................................... 9
1.1. OVERVIEW OF E-PRAGATI PROGRAM .............................................................................................................. 9
1.2. VISION ...................................................................................................................................................... 9
1.3. VALUE PROPOSITION OF E-PRAGATI ............................................................................................................. 10
1.4. COMMON MANDATORY STANDARDS OF E-PRAGATI ....................................................................................... 14
2. SYSTEM OVERVIEW OF DATALYTICS ....................................................................................................... 15
2.1. OVERVIEW ............................................................................................................................................... 15
2.1.1. BUSINESS DRIVERS .................................................................................................................................... 16
2.1.2. OBJECTIVES .............................................................................................................................................. 17
2.1.3. STAKEHOLDER ANALYSIS............................................................................................................................. 17
2.1.4. FACTSHEET ............................................................................................................................................... 18
2.2. BUSINESS ARCHITECTURE............................................................................................................................ 20
2.2.1. USER SCENARIOS....................................................................................................................................... 20
2.3. APPLICATION ARCHITECTURE....................................................................................................................... 25
2.3.1. LOGICAL VIEW .......................................................................................................................................... 25
2.4. TECHNOLOGY ARCHITECTURE ...................................................................................................................... 31
2.4.1. DEPLOYMENT VIEW ................................................................................................................................... 31
2.4.2. NETWORK VIEW ....................................................................................................................................... 32
2.4.3. DATALYTICS SECURITY VIEW ....................................................................................................................... 34
3. SCOPE OF WORK OF SI SELECTED FOR DATALYTICS ................................................................................ 38
3.1. OVERVIEW OF SCOPE OF WORK ................................................................................................................... 38
3.2. DETAILED SCOPE OF WORK ......................................................................................................................... 39
3.3. DELIVERABLES........................................................................................................................................... 52
3.4. IT SERVICE DELIVERY ................................................................................................................................. 57
4. TRAINING AND CHANGE MANAGEMENT................................................................................................ 59
4.1. INDICATIVE TRAINING REQUIREMENTS .......................................................................................................... 59
4.2. TRAINING DELIVERABLES ............................................................................................................................ 60
4.3. TRAINING RESPONSIBILITY & DURATION ....................................................................................................... 60
4.4. TRAINING NEED ANALYSIS .......................................................................................................................... 60
4.5. CHANGE MANAGEMENT ............................................................................................................................. 61

e-Pragati Requirements Specification Document - DataLytics Page 4 of 101


Annexures
ANNEXURE 1 – FUNCTIONAL REQUIREMENTS OF DATALYTICS ............................................................................................ 63
ANNEXURE 2 – TECHNICAL REQUIREMENTS OF DATALYTICS............................................................................................... 81
ANNEXURE 3 – NON-FUNCTIONAL REQUIREMENTS OF DATALYTICS .................................................................................... 82
ANNEXURE 4 – BILL OF MATERIALS OF DATALYTICS ......................................................................................................... 90
ANNEXURE 5 - SOFTWARE TESTING &QUALITY ASSURANCE ............................................................................................... 97

e-Pragati Requirements Specification Document - DataLytics Page 5 of 101


List of Figures
FIGURE 1: LOGICAL VIEW OF DATALYTICS SYSTEM ............................................................................................................. 26
FIGURE 2: DEPLOYMENT ARCHITECTURE VIEW ................................................................................................................ 31
FIGURE 3: NETWORK ARCHITECTURE VIEW..................................................................................................................... 33

e-Pragati Requirements Specification Document - DataLytics Page 6 of 101


List of Tables
TABLE 1: PRINCIPLES OF E-PRAGATI................................................................................................................................ 13
TABLE 2: E-PRAGATI COMMON MANDATORY STANDARDS .................................................................................................. 14
TABLE 3 : DATALYTICS FACTSHEET................................................................................................................................. 19
TABLE 4 : DATALYTICS USER SCENARIOS ........................................................................................................................ 23
TABLE 5: COMPONENTS OF DATALYTICS DEPLOYMENT ARCHITECTURE................................................................................... 32
TABLE 6: NETWORK ARCHITECTURE COMPONENTS .......................................................................................................... 34
TABLE 6: BROAD SCOPE OF WORK ................................................................................................................................. 38
TABLE 7 : DATALYTICS PACKAGE DELIVERABLES................................................................................................................ 57
T ABLE 8 : I NDICATIVE T RAINING R EQUIREMENTS ........................................................................................................ 60
TABLE 9: TRAINING DELIVERABLES ................................................................................................................................ 60
T ABLE 10: DATALYTICS T RAINING REQUIREMENTS ...................................................................................................... 61
T ABLE 11: CHANGE MANAGEMENT REQUIREMENTS .................................................................................................... 62
T ABLE 12: CHANGE MANAGEMENT – SPECIAL C ONSIDERATIONS .................................................................................. 62
TABLE 13: SERVICE CATEGORIES ................................................................................................................................... 83
TABLE 14: USER CATEGORIES ....................................................................................................................................... 83
TABLE 15: ESTIMATED DATA GROWTH ........................................................................................................................... 87
TABLE 16: QPS ESTIMATE ............................................................................................................................................ 87
TABLE 17: QUERY RESPONSE TIMES ............................................................................................................................... 87
TABLE 18: CONCURRENT QUERY SUPPORT ...................................................................................................................... 88
TABLE 20: DATA INGESTION RATE ................................................................................................................................. 88
TABLE 21: GENERIC REQUIREMENTS .............................................................................................................................. 90
TABLE 22: DATA SIZE AND GROWTH RATES...................................................................................................................... 91
TABLE 23: STRUCTURED DATA UNIT ............................................................................................................................... 92
TABLE 24: ADVANCED DISCOVERY LAB UNIT .................................................................................................................... 93
TABLE 25: UNSTRUCTURED DATA PROCESSING UNIT ......................................................................................................... 93
TABLE 28: APPLICATION AND SYSTEM SOFTWARE ............................................................................................................. 94
TABLE 29: MANPOWER REQUIREMENTS ........................................................................................................................ 96
TABLE 30: SERVICE LEVEL AGREEMENT ........................................................................................................................... 96

e-Pragati Requirements Specification Document - DataLytics Page 7 of 101


List of ACRONYMS
ACRONYM FULL FORM

APSRDH Andhra Pradesh State Resident Data Hub

APSWAN Andhra Pradesh State Wide Area Network

B2C Business to Government

C2G Citizen to Government

COTS Commercially Off The Shelf

CR Core Package

DC/SDC Data Centre / State Data Centre

DR Disaster Recovery

GoAP Government of Andhra Pradesh

G2B Government to Business

G2C Government to Citizen

G2G Government to Government

GIS Geo-spatial Information System

GoI Government of India

IAM Identity and Access Management

ITE&C Information Technology Electronics & Communications

PMU Project Monitoring Unit

RTM Reverse Traceability Matrix

SI System Integrator

SLA Service Level Agreement

SSO Single Sign-On

UML Unified Modelling Language

e-Pragati Requirements Specification Document - DataLytics Page 8 of 101


1. e-Pragati Program Requirements

1.1. Overview of e-Pragati Program

Use of information technology to automate and digitise Government activities and services is not
new to India. The Digital India program launched by the Government of India aims to propel the
country to the next level of e-Governance maturity. Envisaged to be a programme to transform India
into a digitally empowered society and knowledge economy, it sets the long term direction.
Similarly, States have their respective e-governance initiatives. Collectively, there is no dearth of
programmes and projects to implement this vision. The question that emerges is with so many
things happening, what should bind them together into a holistic approach, such that there is
convergence and coherence.

e-Pragati, the Andhra Pradesh State Enterprise Architecture, is this new paradigm. It is a Whole-of-
Government framework and adopts a mission-centric approach to implementation. e-Pragati seeks
to help realise the vision of Sunrise AP 2022 by supporting the seven development Missions
launched by the Government in the areas of Primary Sector (Agriculture & Allied), , Social
Empowerment (Education & Healthcare), Skill Development, Urban Development, Infrastructure,
Industrial Development, and the Services Sector.

1.2. Vision

e-Pragati is not a project. It is large program with a long-term vision for creating a sustainable eco-
system of e-Governance. The vision of e-Pragati is stated below:
"e-Pragati is a new paradigm in governance based on a Whole-of-Government
framework, transcending the departmental boundaries. It adopts a Mission-centric
approach in its design and implementation and seeks to realize the Vision of Sunrise
AP 2022, by delivering citizen-centric services in a coordinated, integrated, efficient
and equitable manner."
e-Pragati is a framework to provide integrated services to citizens through a free flow of
information, and to usher in an era of good governance, characterised by efficiency, effectiveness,
transparency, and foresight. The different dimensions of the vision of e-Pragati are described below:

Developmental
1. e-Pragati will be a catalyst for enhancing the effectiveness of implementing various
developmental projects and welfare schemes undertaken by the Government, by providing
insights and foresights through analysis of data.
2. Planning and monitoring of public sector schemes and projects shall take advantage of IT,
GIS, and satellite imaging technologies.

Aspirational
1. e-Pragati shall be an effective tool in realising the vision of Sunrise Andhra Pradesh.

e-Pragati Requirements Specification Document - DataLytics Page 9 of 101


2. e-Pragati shall be rated among the best in the world and help Andhra Pradesh achieve a high
rank in global e-Governance Development Index.
3. Andhra Pradesh will focus on quality of life of its citizens with a special emphasis on quality
of education, healthcare, skill development, agriculture, infrastructure, and services.

Citizen-Centric

1. Citizens and businesses will have a seamless and smooth interface with Government.
2. Departments and Government agencies will interoperate with ease and provide integrated
services to citizens and businesses.
3. The medium of paper will be minimised in all G2C, C2G, G2B, B2G, and G2G interactions.

Inclusive

1. Digital divide will be adequately addressed, especially leveraging the mobile technologies.
2. e-Pragati will enhance realisation of participative and inclusive governance, by analysing the
devolution of the benefits of development upon various sections of the society and by
measuring the impact thereof.
3. Citizen engagement will be accomplished with ease.

Technological
1. Government and citizens will be enabled to take advantage of leading technologies like
SMAC IoT and Big Data Analysis.
2. Principles of open data, open standards and open APIs will be ingrained in the designs of all
information systems.
3. e-Pragati will ensure the right balance between information security and privacy of personal
data.

1.3. Value Proposition of e-Pragati

In line with its vision, e-Pragati seeks to move away from the existing systems of Governance
(Government 1.0) towards establishing Government 2.0. The siloed and hierarchical systems will be
replaced by an integrated and collaborative operating model. The single-channel, 'one-size-fits-all'
models of service delivery will give way to personalised services delivered through multiple
channels. The output-driven processes will be replaced by transparent, outcome-driven procedures.
The citizens will no longer be passive spectators of governance and mere recipients of services, but
will be empowered to be active participants in the governance process.

All these aspirations will be achieved through establishment of a common and shared digital
infrastructure and applications, delivering a set of integrated and cross-cutting services based on
common standards and enterprise principles.

Value to Government:
1. The effectiveness of implementing various development projects and welfare schemes
undertaken by the Government will be enhanced, through extensive use of Enterprise
Project/Program/Scheme Management Systems.

e-Pragati Requirements Specification Document - DataLytics Page 10 of 101


2. Planning and/or monitoring of Government schemes and projects shall be more effective
through use of, inter alia, data analytics tools.
3. There would be considerable savings to the exchequer, through:
i. Improved means of targeting beneficiaries
ii. Better control on project and scheme costs
iii. Consolidation of IT assets, like hardware, system software and applications.
iv. Arresting leakages of revenue through use of, inter alia, data analytics tools.
4. There would be an all-round development of the State through alignment of e-Pragati with
the goals of Sunrise AP and a clear focus on e-Pragati Indicators in each of the seven
missions.
5. The coordination between the Government departments would significantly increase due to
free exchange of information via the e-Highway.
6. e-Pragati envisages extensive use of Data Analytics tools that would not only throw up
trends, patterns, gaps and areas of improvement in various Government schemes, but will
also provide useful suggestions and ideas for the future actions (preventive and accelerative)
through predictive analytics.

Value to Citizens and Businesses:


1. Citizen-centric services provided by e-Pragati would enhance convenience and transparency.
2. e-Pragati will facilitate the concept of single-window and enhance the ease of transacting
with the Government.
3. The Certificate-Less Governance System (CLGS) will reduce significant burden, both on the
citizens and the Government departments, by reducing/eliminating unproductive work.
4. Programs like e-Health, e-Education, e-Agriculture and e-Market shall enhance quality of life
and productivity as well as income of the citizens.
5. Direct Benefits Transfer will ensure hassle-free receipt of the various benefit programs of the
Government.
6. Widespread use of e-Office and other productivity tools will enhance transparency of public
agencies.

Value to Society:

1. The implementation of e-Pragati will unleash various potential in the software, hardware,
electronics, and networking sectors. This will have a significant extent of multiplier effect to
the tune of 4 X.
2. e-Pragati, being based on open technologies, will open up new windows for innovation and
produce IT and non-IT employment in various sectors.
3. The economic development of the State will be spurred by increased productivity in all the
seven major sectors comprising the Sunrise AP Mission.
4. The successful implementation of e-Pragati is likely to motivate several such initiatives
across the country, as witnessed in respect of the CARD and e-Seva projects pioneered by
AP, and would lead to faster development of the nation.

e-Pragati Requirements Specification Document - DataLytics Page 11 of 101


Mandatory Principles of e-Pragati

A large and complex program like e-Pragati may not make a radical impact unless certain
fundamental principles of Enterprise Architecture are adopted by all the stakeholders, namely, the
Government, the System Integrators and the users. These are stated below. It is the responsibility
of the System Integrator selected for the implementation of DataLytics to meticulously observe
and/or promote the observance of these principles by the other stakeholders.

Principle #1: Uphold the Primacy of these Principles

These principles of information management apply to all organisations within the


Government.

Principle #2: Maximise Benefit to the Government as a Whole


All decisions associated to information management are made to provide maximum benefit
to the Government as a whole. Applications and components should be shared across
organisational boundaries.
Principle #3: Information Management is Everybody’s Business.

All organisations in the Government participate in information management decisions


needed to accomplish business objectives, and implement such decisions with full
commitment, devoting the right and adequate resources. Respective Domain Owners and
Managers shall develop their own sub-architectures following these principles, and federate
the same to APSEA.
Principle #4: Government is a Data Trustee
Each data element has a trustee accountable for data quality. Only the data trustee can
decide on the content of data, and authorise its modification. Information should be
captured electronically once and immediately validated as close to the source as possible.
Quality control measures must be implemented to ensure the integrity of the data.
Principle #5: Establish Common Vocabulary and Data Definitions

Data is defined consistently throughout Government, and the definitions are understandable
and available to all users. Defining Metadata and Data Standards (MDDS) within each
domain assumes great significance.

Principle #6: Data is Shared

Data is shared across enterprise functions andorganisations. It is cost effective to maintain


timely, accurate data in a single application, and then share it, than to maintain duplicative
data in multiple applications. Shared data will result in faster and improved decisions.

Principle #7: Data is an Asset

Data is an asset that has a specific and measurable value to the Government and hence, it
must be managed accordingly.

Principle #8: Build Once, Use Many Times

e-Pragati Requirements Specification Document - DataLytics Page 12 of 101


Common functionalities used across all or several departments should be built once and
used commonly by all, to reduce cost and complexity.
Principle #9: Enterprise Architecture is Technology Independent

Applications are independent of specific technology choices and therefore can operate on a
variety of technology platforms.

Principle #10: Adopt Service Oriented Architecture


The enterprise architecture is based on a design of services which mirror real-world activities
required to conduct the business of Government. Open Standards-based Service Oriented
Architecture shall be adopted in all implementations to realise interoperability and location
transparency.

Principle #11: Interoperability


Software and hardware should conform to defined standards that promote interoperability
for data, applications, and technology.
Principle #12: Data Security
Data is protected from unauthorised use and disclosure through data governance
framework. This includes, but is not limited to protection of pre-decisional, sensitive, source
selection-sensitive, personal and proprietary information. Data standards applicable to GoI
are identified and listed out in standards document.

TABLE 1: PRINCIPLES OF E -PRAGATI

e-Pragati Requirements Specification Document - DataLytics Page 13 of 101


1.4. Common Mandatory Standards of e-Pragati

A set of Common Mandatory Standards is prescribed for strict compliance by all the System
Integrators selected for implementing different packages of e-Pragati, to ensure interoperability,
maintainability, and uniformity of user experience across the entire landscape of e-Pragati. These
relate to the areas listed below:

Standard Code Standard


SD01 Interoperability
SD02 Software Engineering
SD03 Documentation
SD04 Testing
SD05 Usability
SD06 IT Services
SD07 Mobility
SD08 Security
SD09 Document Management
SD10 Imaging
SD11 GIS
SD12 Localisation
SD13 Metadata
SD14 Master Data
SD15 Data Definition
SD16 Data and Information Exchange
SD17 Access, Presentation and Style

T ABLE 2: E -PRAGATI COMMON MANDATORY STANDARDS

The technical specifications of the above standards can be accessed at the URL http://e-
pragati.ap.gov.in/bestpractices.html.
It is necessary to upgrade the e-Pragati systems to the relevant standards, whenever the standards
are revised, during the currency of the contract. It is part of the responsibility of the SI (to be
selected through this RFP) to follow the relevant standards for system upgrade.

THIS RFP SEEKS TO SELECT AN APPROPRIATE ENTERPRISE-CLASS DATA ANALYTICS PRODUCT


TOGETHER WITH A SYSTEM INTEGRATOR, EXPERIENCED IN THE DEPLOYMENT AND USE OF SUCH
PRODUCT, WHICH WILL BE THE UNIFIED PLATFORM FOR SERVING ALL THE ANALYTICS
REQUIREMENTS OF THE GOVERNMENT IN GENERAL, AND 18 DEPARTMENTS SPECIFIED IN THE RFP,
IN PARTICULAR.

e-Pragati Requirements Specification Document - DataLytics Page 14 of 101


2. System Overview of DataLytics
2.1. Overview

Business Intelligence is a set of methodologies, processes, and architectures that leverage the
output of information management processes for analysis, reporting, performance management,
and information delivery. Data Analytics is the process of developing actionable insights through
problem definition and application of statistical models and analysis against existing and/or
simulated future data. Big data analytics is the process of examining large data sets containing a
variety of data types -- i.e., big data -- to uncover hidden patterns, unknown correlations, and other
useful information.

Business Intelligence and Data Analytics system comprises of applications and technologies for
gathering, storing, analysing, and providing access to data to help decision makers in making
decisions. Typically, the applications include decision support systems, query, and reporting, Online
Analytical Process (OLAP), statistical analysis, forecasting, and data mining.

Business Intelligence and Data Analytics Technologies could help government policy makers draw
key conclusions from data, and become a critical component of a large e-Governance initiative like
the e-Pragati. These technologies are more towards G2G than other forms. All major government
plans can be designed and major decisions can be arrived at on the basis of detailed multi-
dimensional analyses of all the relevant data, which, in the context of a Government, is bound to
beimmense. Business Intelligence and Data Analytics can play a role in gaining a better insight into
what the citizens’ needs are, and, the manner in which they should be provided. The use of such
Analytical tools also allows decision makers to gain insights and take more knowledgeable decisions
and to plan more effectively to introduce new services and improve the quality of existing services

DataLytics application is an integrated Business Intelligence and Data Analytics system which
includes conventional and Big Data. The system is proposed to be a state-wide Analytical Engine
which takes in data from various government department databases, internet, sensors, machine logs
and other sources, transforms them, and presents them in an analys--able format. Also, DataLytics
provides tools for performing analysis on the data, and gain insights, make data-based predictions,
and identify best course of action for improving operational efficiency and governance.

The role of Big data analytics in Government is gaining increasing significance, as indicated below:

Big Data Value Proposition:

1. Insight: Big Data Analytics opens up opportunity to use unlimited amount of data, particularly
data from internet to understand people sentiment, and improve governance. Data from
newspapers, blogs, social media, news channels, emails, reports, satellites, images etc. can be
scanned, related, and structured in a manner that can be used for analysis. The insights provided
by big data analytics enables the Government to predict the future and plan appropriate
interventions or make mid-course corrections in all societally important sectors like healthcare,

e-Pragati Requirements Specification Document - DataLytics Page 15 of 101


education, welfare and Urban/rural development and infrastructure. Regulatory functions of the
governmwnt can be more effective with such insights.
2. Timliness: With the use of Data Analytics Government can be PROACTIVE rather than REACTIVE.
It is possible to achieve timely and agile response to opportunities, threats and challenges that
the state faces. While traditional data could be used to identify these opportunities and threats,
it will take time to collect, cleanse, standardise, and thenanalyse. In the current world,
governments simply cannot afford to wait for long periods to make the necessary interventions.
3. Breadth: Provide single view of diverse data sources – People, Entities, Land, etc. By combining
traditional and big data under one platform, a true 360 degree analytical view can be achieved.
In so fa as e-Pragati is a Whole-of-Government initiative, it is possible to fully utilise the power of
the Big Data Engines to create, analyse and bemefit from such multi-dimensional views in
diverse sectors.

In short, e-Pragati DataLytics system is proposed to be an Integrated Business Intelligence and Big
Data Analytics system. It is envisioned to provide data-based and informed decision making support
to GoAP in its mission in realising Sunrise AP 2022.

2.1.1. Business Drivers

The Business Drivers of DataLytics are described below:

Value Proposition Benefits

Improved Governance  Data-analysis-based insights improving quality of


governance
 Data-analysis-driven decisions leading to right planning
and right targeting
 Insights leading to effective regulation and better
governance through less government
Improved Department  Insights into performance with respect to KPIs
Performances  Recommendations and interventions to improve
performances
Advanced Analytical capability  Capability to foresee key events and take appropriate and
timely actions
 Capability to understand socio-economic events and
conditions and align government policies to suit the
same.
Utility of Data (Data as a true  Better utilisation of data – not merely for producing
asset) statistical reports on the past but intelligent reports that
throw light on the future.
 utilisation of data from multiple departments and
sources, leading to creation of a holistic picture of events
and happenings, and thereby enabling a more balanced
administration.

e-Pragati Requirements Specification Document - DataLytics Page 16 of 101


2.1.2. Objectives

The key objectives of DataLytics are as listed below:


1. To provide insights into how government schemes and policies are performing, and
Why(Descriptive and Causal Analyses)
2. To determine likely future scenarios and recommend best courses of action (Predictive and
Prescriptive Analyses)
3. To gauge sentiments of people of the state, and understand their perceptions of and attitudes
towards government policies
4. To provide a system of dashboards that enable administrators monitor and implement
Government programs effectively
5. To improve collaboration among departments.
6. To provide a tool for research in Data Sciences and statistical analysis
7. To enhance the effectiveness of regulatory and tax collection systems of the government

2.1.3. Stakeholder Analysis

The Table below lists the stakeholders of DataLytics and their expectations from it:

Stakeholder Requirement Expectations from DataLytics


Government Analytical insights 1. Insights into what has happened, is
Departments happening, and why?
2. Insights on what is likely to happen
(scenarios)
3. Guidance on best course of action to
take.
DataLytics Team A Platform for Big Data A system that supports Big data analytics and
Analytics statistical analysis.
Educational Institutes A platform for research A platform that enables them perform
& Research bodies and discovery statistical and other forms of data-science
research.
Government A Tool for good Understand sentiments, perceptions and
governance attitudes and mould Governance in a
proactive manner.

e-Pragati Requirements Specification Document - DataLytics Page 17 of 101


2.1.4. Factsheet

DataLytics is a new, whole-of-state analytics system, which is pioneering in the Governance systems.
The Table below gives the key facts of the proposed DataLytics initiative.

S No. Element Description Rate of Growth


1 Structured Data Size 50 TB 25% YoY growth
2 Unstructured Data Size 200 TB 20% YoY growth
3 Discovery Lab 25 TB 20% YoY growth
4 Number of citizens in AP 5 cr Currently at replacement
level
5 Number of citizen 50 lakhs per day, out of which 5% Electronic transactions are
interactions with are handled electronically now. expected to grow to 20%
Government with the implementation of
e-Pragati.
6 Number of parcels of Land Urban Parcels– 0.5 cr 10%
and transactions Rural Parcels– 2 cr
Transactions – 75 lakhs p.a
(Sales, other transfers/ devolutions,
Property tax payments)

7 Interest Groups Farmers – 60 lakhs -


Students – 25 lakhs
Govt Employees – 10 lakhs
Patients –
Children below 5 years of age –
Women in Self-help groups -
8 Number of Government 22 -
Departments to be
addressed by the DataLytics
Solution
9 List of Government 1. Cross-sectoral 11 departments in Year 1,
Departments to be 2. Planning including 10 line
addressed by the DataLytics 3. Civil Supplies department and on cross-
Solution 4. Health sectoral (shown in Bold
5. Agriculture & Allied italics)
6. Finance + 11 departments in Year 2
7. Police
8. Marketing
9. Municipal Administration
10. IT
11. Energy
12. Registration
13. Revenue

e-Pragati Requirements Specification Document - DataLytics Page 18 of 101


14. Excise
15. Irrigation
16. Tourism
17. 4 Welfare Departments (
SC, BC, ST, Minorities)
18. Women Development.
19. Labour
20. Housing
21. PR (MNREGA, RWS)
22. Transportation
10 Deployment Model Solution to be deployed on premise
at Interim DC of GoAP near Andhra
Pradesh new Capital with
infrastructure owned and provided
by GoAP .

Whenever the new AP state data


center at Amaravati is operational,
the DataLytics system will be
migrated to that at the cost of
GoAP. However the SI needs to
support the migration activity.
11 Business Model Application Service Provider Model,
with the SI getting paid on the basis
of analytical reports provided to the
departments on a quarterly basis for
a period of 3 years. For details
please refer volume III of this RFP
12 Project period 3 years, post go-live of the
DataLytics solution
T ABLE 3 : DATALYTICS FACTSHEET

e-Pragati Requirements Specification Document - DataLytics Page 19 of 101


2.2. Business Architecture
2.2.1. User Scenarios

The following Table lists DataLytics user scenarios. The scenarios are grouped by departments, and
are meant to provide a quick view of how the proposed DataLytics system will be used to improve
department processes.

S. No User Scenarios
Integrated Scenarios
1. Analysing the content in electronic and social media and other sources to understand
public sentiment on the 10 selected flagship programs of the Government, conducting a
root-cause analysis and suggesting appropriate interventions and mid-course corrections
to improve the delivery of the programs. (Complex)
2. Predicting a disaster(drought only), identifying the areas (districts & mandals) likely to be
affected, and suggesting advance interventions required to mitigate the adverse impact
on the population. (Complex)
3. Analysing the Text inputs (unstructured data) in the Grievance Portal (Mee Kosam) and
the popular print media (3 newspapers), identifying of key problem areas (Region / Type
of Problem / Frequency/Severity) and suggesting suitable remedial action (once a month)
(Medium)
4. Designing a Happiness Index, appropriate to the socio-economic profile of the State,
supporting the Government in conducting approriate sample surveys, analysing the
results and making suitable recommendations for enhancement of the Index. (twice in
the project period) (Medium)
Planning Department
1. Analysing the patterns of public expenditure on top 10 sectors of the economy,
identifying the correlations with the progress in achieving the relevant Sustainable
Development Goals and suggesting the desired areas and sectors for intervention
(Annually) (Simple)
2. Analysing the medium-term impact of development and welfare schemes, identifying the
gaps and realigning the schemes for enhanced effectiveness. (half-yearly) (Simple)
3. Analysing the geographical spread of various schemes and making corrections for even
distribution (Annual) (Simple)
4. Analysing the distribution of community assets, identifying the gaps in demand vs
locations and suggesting the right locations for creating new community assets. (Annual)
(Simple)
5. Analysing the trends of growth of GSDP, geographically and sector-wise (top 20 sectors),
identifying causal factors for high and low growth rates and suggesting the right mix of
interventions required to optimise the growth rate of the economy of the State.
(Complex)
Department of Energy
1. Demand-Supply Analytics and Optimization of generation and power purchase planning
(Medium)

e-Pragati Requirements Specification Document - DataLytics Page 20 of 101


2. Transmission and Distribution Losses (T&D Losses) (Medium)
3. Outage & Transformer Performance Analysis, leading to timely predictions of overloads/
failures and for triggering advance action for mitigation. (Simple)
4. Compliance with SOPs (Complex)
- Hours of Power Supply to Across various sectors (ex. Agriculture, Industries, Domestic
etc.,)
- Release of new services within SOP norms (Standard of Performance)
- Consumer grievances analysis within SOP norms
Welfare departments (SC,BC, Minorities and ST)
1. Analysing the geographical distribution of social benefits, identifying areas of uneven
distribution and recommending corrective action (half-yearly) (Simple)
2. Monitoring the trends in changes of levels of poverty among different categories of the
vulnerable sections of the population, conducting root cause analysis of the positive and
negative factors in the efforts to alleviate povery and making appropriate
recommendations for intervention. (yearly) (Complex)
Municipal Administration Department
1. Analyse the Trends of assessment and collection of property taxes in various
municipalities and the average collection per property, and provide suitable advisories to
the Municipalities to improve tax collections. (Simple)
2. Identify leakages of taxes and other major revenues, conduct causal analysis and provide
decision support (Medium)
3. Monitor the sanitary conditions, analyse w.r.t climatic and othe rconditions and predict
the outbreak of communicable diseases to enable the department to take corrective
action. (Simple)
4. To monitor the condition of the roads and provide advance recommendations on optimal
resource utilisation for producing best impact on taxpayers. (bi-annually) (Medium)
5. Anlyse the patterns of investments on infrastructure made in different wards, compare it
with norms and the specific requirements of the ward and provide decision support to
the city planners. (bi-annually) (Simple)
6. Conduct sentiment analysis based on social media and electronic media, and provide
appropriate inputs for action by the municipality. (Complex)
Agriculture Department
1. Providing generic advisories to the farmers with the granularity at the mandal level about
the possible variations in weather conditions that may have an impact (positive and
negative), at least 48 hours in advance, and help them make the right choices on farm
management practices. (Complex)
2. Analysis of trends of cropped areas and economics of various crops district and mandal-
wise over the last 5 years, and the demand-supply position for different agricultural
produce across the country and to arrive at the optimised crop area planning for various
crops in different agro-climatic regions of the Stateand giving decision support to
agricultural planners. (Complex)
3. Analysis of soil health records of the last 5 years, along with the crops grown during the

e-Pragati Requirements Specification Document - DataLytics Page 21 of 101


period, rainfall, irrigation, yield and other parameters, to arrive at a plan for maximising
micro-nutrient corrections, through focused interventions. (Simple)
4. Prediction of commodity prices for the next 6 months to 1 year, and disseminate the
information to the farmers, to enable them to take informed decisions on storing their
non-perishable agricultural produce and sell them at the most appropriate time.
(Complex)
5. Real-time analysis of climatic conditions, early detection of sporadic pests in AP and the
neighbouring states and other pertinent parameters to predict pest attacks on
agricultural crops and fish/prawn ponds and enhancing the preparedness for the same
through appropriate advisories to farmers and by adequate stocking of inputs required.
(Complex)
6. Analysis of global commodity prices and provision of advisories to farmers on the export
markets to be preferred for exporting grain and horticultural products. (Simple)
7. Analysis of the incidents of farmers under distress, with all the pertinent factors, like
seasonal conditions, degree of indebtedness, WPI and identification of farmers who need
special attention through aid and counselling. (Complex)
Health Department
1. Comparison between Government and Private hospitals in terms of Infrastructure,
Facilities and footfalls of patients and providing decision support to healthcare planners.
(half-yearly) (Simple)
2. Understanding factors affecting Child mortality rate, Pre-mature births, and still-births,
and determining correlation between socio-economic, demographic, and geographic
factors and above listed issues. Also arriving at best possible intervention for the
government in order to reduce IMR (Medium)
3. Integrating climatic, economic, and social data along with quality of healthcare provided,
identify geographic regions that are vulnerable to Viral diseases and providing decision
support to the department (realtime) (Complex)
4. Monitoring, Analysing and predicting the incidence and geographical disribution of
malnutrition of different degrees among children and adolescent women belonging to the
various vulnerable strata of the society and providing decision support to the department
for early interventions. (Medium)
5. Providing insights into prevalence of unfavourable practices like non-institutional
deliveries, not following vaccination cycles, and providing decision support to
administrators for taking suitable action. (Medium)
Labour Department analysing labour force in the State
1. Generating reports on distribution of labour across State grouped by skill sets, salaries,
and other important parameters (Medium)
2. Identifying key skills required for labour force in different districts to become employable
in market and advise on training programmes to implement those skills (Medium)
3. Analysis of the incidents of unorganised labour under distress, with all the pertinent
factors, like adverse seasonal conditions, degree of indebtedness, WPI and identification
of areas (Gram Panchayats) where laborers fitting a prticular profile who need special

e-Pragati Requirements Specification Document - DataLytics Page 22 of 101


attention through aid and counselling (Complex)
Civil Supplies Department
1. Analyse the trends of the factors affecting the prices of essential commodities, predicting
their prices for 6 months and one year ahead, and provide decision support to the
department for undertaking supply interventions where needed. (Complex)
2. Analyse the trends of prices of agricultural produce (top 5 crops for the last 5 years)
predict the prices over the next 6 to 12 months and advise the department on the market
interventions required. (Complex)
3. Analyse the demand, supply and consumption patterns and other appurtenant factors to
identify the areas where diversion of essential commodities is likely to happen or is
happenning. (Complex)
4. Analyse the estimated demand for ECs over the next 6 months and advise the deprtment
on optimum qualtities for procurement and storage, to minimise overheads. (Medium)
IT Department
1. Analyse the uptake of various e-Services at various outlets, in rural and urban areas
separately, analyse the reasons for variations from the average, the causal factors for the
same and advise the dept on the ways to correct the skew. (Complex)
2. Analyse the patterns of services taken through self-service and through agency, the
causal factors for each and recommend any interventions for a desirable change in the
mix. (Simple)
3. Conduct sentiment analysis and provide suggestions to the department in launching of
new e-Service or m-services or dospensing with any service or enhancing the quality of
delivery of any services. (Complex)
Panchayat Raj Department
1. Coverage of Habitations by providing potable drinking water supply to the rural areas in
the State

- Data Analysis & DSS: Qualitative & Quantitative analysis of potable drinking water
supplied to the rural people in the habitations as per defined norms through
implementation of various water supply schemes under different programs in the
State (Complex)

2. Coverage of Habitations by connecting Habitations through all-weather roads in the State

- Data Analysis & DSS: Analysis of Bus flying all-weather road connectivity to
habitations through implementation of various Road laying works under different
programs in the State (Medium)

T ABLE 4 : DATA LYTICS USER S CENARIOS

Notes:
1. The GoAP will be responsible to share the internal data relating to or generated by the
departments, as required for the SI to generate the reports/ insights. SI will be responsible
for creating the formats and online collection mechanisms for such data.
2. The SI shall be responsible to collect all the required data external to GoAP.

e-Pragati Requirements Specification Document - DataLytics Page 23 of 101


3. GoAP will be responsible to undertake any sample survey required for collection of data. SI
Shall be responsible for designing and monitoring the conduct of the Survey.
4. The 43 scenarios described above may be refined by the SI in consultation with the
concerned departments during the System Study.
5. The number, nature, complexity and profiles of the Reports / insights required for 10 Y2
departments is assumed to be the same as that for the 10 Y1 departments. The effort
estimate and sizing of the infrastructure shall be made by the SI uniformly on this
presumption.
6. The abstract of the number of user scenarios and complexity, department-wise is given
below (it is quite possible that each user scenario can lead to generating multiple reports):

User Scenarios Distribution across the Year


Department Complex Medium Simple Total No. of
Real- Half- User
Annual Qtrly Monthly
time yrly Scenarios
in a Year
Cross-sectoral 2 2 4 3 1 24
Planning 1 4 5 3 1 1 9
Energy 1 2 2 5 1 4 17
Welfare 1 1 2 1 1 3
Municipal 1 2 3 6 2 4 20
Agriculture 5 2 7 7 28
Health 1 3 1 5 1 1 3 15
Labour 1 2 3 3 12
Civil Supplies 3 1 4 4 16
IT 1 1 2 2 8
PR 2 1 1 4 1 3 14
Total User
Scenarios for
first
19 13 15 47 2 4 6 34 1 166
(11
Departments)
in a Year
Total User
Scenarios for
second 19 13 15 47 2 4 6 34 1 166
(11
Departments)

Second Total
First 11 11 Reports
Project Periods Depts. Depts. in Year
Year1 26 0 26
Year2 166 86 252
Year 3 166 166 332

e-Pragati Requirements Specification Document - DataLytics Page 24 of 101


Year 4 166 166 332
Total Reports by
Departments 524 418 942

The following Table provides the indicative details on distribution of user scenarios as per their
complexity and year wise execution.

Project Period Complex Medium Simple


Year 1 11 7 8
Year 2 102 70 80
Year 3 67 92 106
Year 4 67 92 106

Note:
I. Reports complexity is rounded off to the nearest by percentage to give an
indicative number of reports.
II. First Year SI has to delivery last quarter reports after PoC and Pilot for the first 11
departments.
III. Second Year SI has to delivery last two Quarter reports for the second 11
departments.
IV. From Year 1 onwards, 11 departments and Year 2 onwards, additional 11
departments shall be considered into the project scope. Hence the SI needs to
procure the necessary software as per the solution requirements on Year-on-Year
incremental basis and should also propose the rollout strategy for hardware
scaling up in technical proposal. Hardware shall be deployed at SDC/ NoC near the
new Capital of GoAP. Not the entire hardware would be deployed on day one,
however the bidder should ensure that all hardware is installed by the end of 2
Years from the date of signing of the contract. Hardware deployment shall be on
incremental basis spread over 2 years on need basis to meet the performance
requirements of the System as per SLA specified in the RFP.
V. The indicative complexity of Year 1 and Year 2 user scenarios are provided in this
Volume of RFP. The description of user scenarios of Year 2 and any pending user
scenarios of Year 1 will be provided to SI at the time of SRS phase of the project.

2.3. Application Architecture


2.3.1. Logical View

Logical view of DataLytics shows the key components and layers of DataLytics system. A brief
description of these components and layers is provided in this section.

e-Pragati Requirements Specification Document - DataLytics Page 25 of 101


F IGURE 1: LOGICAL VIEW OF DATALYTICS SYSTEM

e-Pragati Requirements Specification Document - DataLytics Page 26 of 101


Data Sources and Types

DataLytics shall support processing of all types of data from a variety of sources. Given below is an
indicative list of data sources, and categories.

Structured Semi/Unstructured
Internal 1. Departmental Database 1. e-Mails
2. Data Hubs 2. Documents
3. Data Warehouse 3. XML documents
4. Data Marts
External 1. Sensor Data
2. Log Stream Data
3. Web sites
4. Satellite Data
5. Social media
6. Bioinformatics
7. Blogs/Articles
8. Documents
9. E-mails
10. Audio-visuals
11. XML

1. Streaming Data – Streaming data comprises of unstructured data coming in from various
sources. The data shall be held in a buffer area and when a set limit is reached, it shall be
transmitted to DataLytics system (Hold-Transmit).
2. Batch Data – Batch data is normally extracted from within Government departments using ETL
or ELT processes. Structured data may be loaded directly to Data Warehouse, and
unstructured/semi structured data to Hadoop or equivalent(or better) unstructured data
processing platform
Special Note: Unstructured Data Processing Platform considerations – Unstructured Data
processing platform shall be open source, horizontally scalable, and should be capable of
handling large volumes, variety, and velocity of unstructured and structured data(Big Data).
Apache Hadoop fulfills most of the requirements. However, there might be better
alternatives available in market today. The rate at which these products are being emerged
is staggering, it is possible that entirely a new system of open-source Big Data analytics
framework might be in place. In the above context, the term “Hadoop” is used in this
document to represent “Open source Big Data processing framework” for the sake of
convenience.
3. Near Real-time data analytics zone – Capability shall process incoming stream data in real time
to provide quick insights into the data. This data may then be persisted on Hadoop system. Near
real-time analytics shall provide capabilities like log stream analysis, sensor data analysis etc. The
real-time analytics system must be able to quickly identify useful data and discard data that is
not useful. Near real-time data shall augment insights obtained from batch analysis

e-Pragati Requirements Specification Document - DataLytics Page 27 of 101


4. Batch Data analytics zone – Batch data zone shall ingest large amount of data in batch mode,
and also insights obtained in Real-time analytical zone.
5. Staging area – Staging area or landing area is a buffer space that holds data from various sources
(structured/unstructured). The data is then transformed applying standard transformation rules.
6. Hadoop (or equivalent or better platform) – shall be used to store and process
structured/unstructured data. The system shall support batch and near real-time data
processing.
7. ETL/ELT – Used for batch data movement and transformation. ETL is Extract, Transform, and
Load where data from source is extracted to a staging area, transformed and then loaded to
target. ELT on the other hand extracts data from source and directly loads it to the target, and
transformation could be part of Loading process or done after loading. Both ETL and ELT support
complex transformations such as cleansing, reformatting, aggregating, and converting large
volumes of data from many sources. DataLytics architecture shall us ETL/ELT for Structured data
and unstructured data, as applicable. ELT/ETL jobs shall be scheduled to run once every 24 hours,
preferably during off-business hours.
8. Strucutured data from various department applications and database may be
extracted/transformed and loaded using ETL mechanism
9. Text Processing/ Natural Language Processing (NLP) – This technology converts, segments, tags,
and derives meaning from information. Machine learning algorithms and linguistically based
rules perform NLP. The NLP detects language, parts of speech, de-duplicates files, extracts text
patterns and recognises entities and relationship between them. It transforms unstructured
source materials to semantically annotated information compliant with standard models for
data interchange such as RDF and XML.
10. Data Quality - Policies and processes that ensure quality of data shall be adopted in line with
Enterprise Data Quality principles.

Derived Data Sources – Big data platform includes Data Warehouses, Hadoop Data Store, Discovey
Lab, Data Marts, Data Hubs.
1. Data Warehouse – A State-wide Data warehouse shall be created to store structured data. The
warehouse shall store whole of state level data, comprising of structured data from
departmental database and data hubs. The warehouse shall support Massively Parallel
processing and provide optimal performance considering structured and unstructured data. It
shall have no single point of failure.
2. Data Lake - The lake shall have capabilities required to make it easy for developers, data
scientists and analysts to store data of any size, shape and speed and do all types of processing
and analytics. It shall ease ingesting and storing all data while making it faster to get up and
running with batch, streaming and interactive analytics.
3. Data Marts – Data marts shall hold department specific data. For example: Agriculture
department mart, Welfare department mart etc.
4. Data Model - The Big Data Platform shall have a common data model, which should be able to
manage all the information both structured and unstructured so that the department users are
able to get a complete view for supporting enterprise Reporting, Statistical analysis, Forecasting,
and high performance Advanced Analytics

e-Pragati Requirements Specification Document - DataLytics Page 28 of 101


5. Metadata Repository – Metadata repository shall be created separately for Structured,
Unstructured data and discovery data. Whether it is for structured data or unstructured,
metadata contain enough information to understand, track, explore, clean, and Transform data.
DataLytics shall have the capability to apply metadata on incoming data without any manual
intervention.
a) Metadata for Structured Data (DWH) – It includes Technical, Business, and Process
metadata. Besides these, rules of precedence such as which source tables can update
which data elements in which order of precedence must be defined and stored.
b) Metadata for Unstructured Data – contains rules, definitions, and datasets that help
filter out valuable data from incoming data streams or batch load, and persist only such
data that are useful. Metadata should enable lineage tracking of data that is loaded into
DataLytics system
c) Metadata for Discovery Platform– SI to propose the recommended metadat solution as
per requirements.
d) Reusing Data Objects – Standard queries, models, and metadata can be moved into one
layer and virtualise it so that these objects may be reused
6. Information Life-cycle Management – policies, processes, practices and technologies used to
manage data from source to sink. Data compression is an important aspect of ILM. And
compression ratio for data defined for Analytical system shall be applied. DataLytics should have
an in built Backup, Archive, and Restore (BAR) Solutions to protect data and ensure availability.

Analytical Data Virtualisation – Data Virtualisation shall provide a layer of abstraction and hide
complexity of data storage and retrieval underneath. It shall hide cryptic names of tables and
columns from users and provide business friendly definitions of data which can be used to create
reports even by non-technical people. Also, the data abstraction layer shall have capability to access
structured, unstructured, or both data in a single query. The language for query shall be standard
RDBMS, and query initiated at any level should have ability to process data from all data stores
(structured and unstructured). The layer should support a strong optimiser to tune query execution,
for response time as well as throughput

Data Usage Layer – These are the usage scenarios of DataLytics. Different users may want different
types of outputs based on their role, responsibilities, and functions. DataLytics shall provide the
following usage capabilities:
1. Reports and Ad-hoc queries – Analytical reporting (based on data warehouse/Datamart).
The system shall provide scripting language, ability to handle complex headers, footers,
nested subtotals, and multiple report bands on a single page.
2. The system shall support simple, medium, and complex queries against both structured and
unstructured data.
3. User specific reports shall be configured by the SI after soliciting requirements from
respective users.
4. Online Analytical Processing (OLAP) – Slicing and dicing, measuring dependent variables
against multiple independent variables. It enables users regroup, re-aggregate, and re-sort
by dimensions.

e-Pragati Requirements Specification Document - DataLytics Page 29 of 101


5. Advanced Analytics – This includes predictive, prescriptive, descriptive, causal, statistical,
spatial, and mathematical analysis, using structured and unstructured data
6. Dashboards – Displays variety of information in one page/screen. Typically they display Key
Performance Indicators visually
7. Textual Analytics – Textual analytics refers to the process of deriving high-quality
information from text in documents, emails, Government orders, web, etc. This is useful in
sentiment analysis, understand hot topics of discussion in public, and maintaining
government image.
8. Performance management – Analytical data can be used by departments to understand
their performance, and reasons for current levels of performance measured in terms of KPIs.
9. Data mining, Discovery, and Visualisation - It is about searching for patterns and values
within data streams such as sensor based data, social media, satellite images etc. Data
exploration is primarily used by Data scientists or statisticians to create new Analytical
models and test them so that they can be used for Analytics.

Delivery Layer – Describes how users and applications consume output from DataLytics system. This
may be in the form of DataLytics Services, alerts on emails and phones, actions, integration with
office applications like word, excel etc., collaboration(discussion threads etc.), mobile and so on. The
Delivery layer shall support delivery thru following mechanisms:

1. DataLytics services – It offers ability to embed actions, alerts, and reports in other
application, tool or UI. They shall have ability to refresh automatically based on predefined
schedule.
2. Alerts – This is to notify stakeholders if a certain event has occurred. Alerts may be delivered
in the form of email, reports, or messages.
3. Actions – Enable users take some action based on alerts or reports. For example: removing a
duplicate record or fixing a corrupted data.
4. Portal – Portals provide mechanism to catalogue and index, classify, and search for
DataLytics objects such as reports or dashboards. All DataLytics reports to be made available
to department users on the portals, based on the roles and responsibilities.
5. Mobile – Reports, dashboards, and portals shall be accessible on Mobile devices too.
6. Office Applications – The system should integrate with Standard Office products at the
minimum. The data and reports should be importable and exportable from/to Office
products

DataLytics Tools and GUI

The complete Big Data Platform comprising of Active Data Warehouse, Hadoop and advance
discovery lab should have a Single web based tool to manage and administer hardware, OS and
software. The management tool should allow DBA’s to determine system status, trends, capacity
usage and even individual queries across Big Data Platform. The management platform should allow
administer system throughput congestion and health. The management tool should provide rewind
feature to go back in time and understand the changes in performance a query has gone through.
The management tool should be an integral part of big data platform. The management tool should
provide security by fully featured role-based permission engine that can function in standalone or

e-Pragati Requirements Specification Document - DataLytics Page 30 of 101


seamlessly integrate with enterprise authentication, such as Active Directory, via LDAP. The
management tool should provide pre-built portlets for self-service, administration, workload
management, collect & view statistics in data temperature, storage grade of cylinder and backup
and recovery.

DataLytics shall contain GUI that allows users to configure the system, manage metadata, set up
workflow and business rules, set up ETL rules etc. This interface shall be available to the DataLytics
application maintenance team. It should support removing certain entities or attributes, cleaning up
the text, or merging/splitting attributes.

2.4. Technology Architecture


2.4.1. Deployment View

It is envisaged that DataLytics system will be deployed in the interim data centre of GoAP near
Andhra Pradesh New Capital. The SI has to propose a suitable deployment architecture keeping this
assumption into consideration.

An indicative DataLytics system Deployment Architecture is shown below:

Cloud
Other e-Pragati Packages
DataLytics
Delivery
Channels
Reports, Alerts, BI Services,
Mobile, Portals, Office Apps,
Actions

Core Switch Data Centre

Security Systems
Identity & Access Management
DataLytics System

Security Web
Switch Switch Switch Server DB Server
Security/SSO Server
Multi Node
Data LDAP
Server
Warehouse
System
•••• Centralized Monitoring System
Data Collection &
Processing System
Hadoop/ Hadoop/ Hadoop/ Hadoop/
Discovery Discovery Discovery Discovery Portal
Node 1 Node 2 Node 3 Node N Server

Reporting
Server DB Server
SAN for Structured
Data Backup Solution

F IGURE 2: D EPLOYMENT ARCHITECTURE VIEW

The following Table describes components of deployment architecture briefly:


Component Description

e-Pragati Requirements Specification Document - DataLytics Page 31 of 101


Hadoop System (or • This should be a special type of computational cluster designed for storing
equivalent or better) and analysing vast amount of unstructured data in a distributed computing
environment. It will host DataLytics Application components.
• The solution should consist minimum of
• 3 Name nodes - one primary and 2 secondary
• 3 data nodes
• 1 loader node
• The system should have high performance level and be easily scalable
Data Warehouse System • This should be a Database Server Cluster to support Data Warehouse,
Datamart, Virtual Data Layer, Database Management System, and database
capabilities such as ETL and Analytics. will be part of the same cluster. The
Data Warehouse System shall be connected with Hadoop systems through a
high-speed network. An appliance based solution or similar, which can
provide required performance and scalability shall be considered for this
purpose.
Advance Discovery Lab • Advance Discovery Lab shall include multiple solution components that
includes but not limited to SQL MapReduce, Path, Text Analysis, SQL Graph
Analysis, Statistics & ANSI SQL in a single platform.
Core Switch • Core switch will provide physical connectivity for department servers to pull
and push the data to Hadoop cluster.
Rack Switch • Rack Switch will connect with all the server nodes in rack for data movement.
Identity & Access • Identity and Access Management shall provide Authorisation, Authentication,
Management (IDM) Multi-factor authentication, Federation Management Server, Fraud
Prevention System. This component is part of Core Package implementation,
which will be made available to SI for integration.
Centralised Monitoring • Centralised Monitoring System shall manage and monitor all enterprise
System systems and IT Infrastructure components. It provides real time alert /
notification & dashboard functionaltiy for analysis and performance
management.
T ABLE 5: COMPONENTS OF DATALYTICS DEPLOYMENT ARCHITECTURE
A detailed bill of materials has been provided in Annexure 4.

2.4.2. Network View

The e-Pragati green field and brown field applications are distributed across multiple Data Centres
and different cloud environments. Hence the deployment location of DataLytics system shall be
connected with these Data Centres and cloud environments with suitable network bandwidth. The
GoAP will procure and establish the network connectivity with required bandwidth requirements.
The below Figure indicates the network architecture view of the DataLytics system.

e-Pragati Requirements Specification Document - DataLytics Page 32 of 101


F IGURE 3: NETWORK A RCHITECTURE V IEW

The following Table describes components of network architecture briefly:


Technology Component Requirement
DataLytics DataLytics system shall be deployed on premise in Interim Data Center of GoAP
near Andhra Pradesh new Capital
World Wide Web / DataLytics system shall have access to internet to access external applications
Internet for gathering data
e-Pragati Applications on DataLytics shall be able to retrieve data from third party clouds that are hosting
Cloud e-Pragati applications through dedicated links
APSWAN Department users will access DataLytics system through APSWAN.
DC of GoAP near Andhra A new 40 Rack Interim Data Center will be setup of GoAP near Andhra Pradesh
Pradesh Capital new Capital where other e-Pragati Packages will be deployed. It will have a
dedicated 2 Gbps link with existing Hyderabad SDC. The Interim Data Center will
host some of the e-Pragati applications such as People Hub, Land Hub, and e-

e-Pragati Requirements Specification Document - DataLytics Page 33 of 101


Pragati portal.
SDC Hyderabad Many existing applications such as Webland, CARD, MeeSeva are deployed in
Hyderabad State Data Center.
T ABLE 6: NETWORK ARCHITECTURE C OMPONENTS

2.4.3. DataLytics Security View

To enable information protection in its DataLytics life cycle, security controls in participating IT
systems have to be adhering information system security policies. To achieve this objective, the
following processes should be considered and implemented at minimum:

S. No Policy Category Policy Section


1 Foundation
a) Security Policies, Standards and Guidelines:
These are the high level guidelines that describes
how users access resources through security
mechanisms, and will govern the systems design,
development and operations. A draft/approved
GoAP security policies document will be provided,
which needs to be adhered by the SI while
implementing DataLytics package. The e-Pragati IT
security standards and guidelines are available at e-
Pragati website:
http://e-pragati.ap.gov.in/bestpractices.html

b) Risk Management: This deals with the threats


that exploit the vulnerabilities against the assets
possessed by the Department/Government, by
taking counter measures. The SI shall come up with
a risk management strategy specific to DataLytics
Package by taking its IT assets into consideration.

e-Pragati Requirements Specification Document - DataLytics Page 34 of 101


S. No Policy Category Policy Section
2 Process a. Identity Management: This component deals with
authentication, authorization and auditing
functionality that is relevant to this package. The
common Identity and Access Management (IAM)
component will be part of Core Package/e-Pragati
Portal implementation, which shall be leveraged by
rest of e-Pragati packages. The SI is responsible to
implement/customize authentication and
authorization functionality specific to DataLytics
Package and integrate/converge the same with Core
Package/e-Pragati Portal IAM component to enable
Single Sign-On (SSO) functionality. The section below
on Access Control Mechanism of DataLytics Package
describes the IAM integration approach in detail.

b. Threat Management: This component deals with


viruses, Trojans, warms, malicious hackers,
intentional and unintentional attacks on the system,
and force majeure. The SI need to design a detailed
threat model specific to DataLytics Package and
deploy all required tools and processes to address
envisaged threats to the system, namely security
monitoring and incident management, firewalls,
cryptography and forensic analysis tools.

c. Vulnerability Management: While the threat


management deals with unknown security risks at
the system, the vulnerability management deals
with known risks of the system at various levels. The
SI shall identify all vulnerabilities of the system
pertaining to data, application, host and network,
and implement processes and tools to monitor and
mitigate them.

d. Software Development: The SI shall implement


security best practices and patterns across all
software development life cycle (SDLC) phases of
this package, to ensure the defined policies and
processes are implemented in various architectural
components of the system.

e-Pragati Requirements Specification Document - DataLytics Page 35 of 101


S. No Policy Category Policy Section
3 Logical Access Controls User Access Management
User identification
Creation of user ID
Control of user ID
User transfer or termination controls
4 Application access policy Identification – the process of distinguishing one
user from all others.
Authentication – the process of identifying the
identity of the user.
Authorization and Access control – the means of
establishing and enforcing user rights and
privileges.
Administration – the functions required to establish,
manage and maintain security.
Audit – the process of reviewing and monitoring
activities

5 Information classification Need-to-Know, Applicability (all info), what data?


schemes Data Classification Matrix (Secret/Top Secret)
Facility of Upgrade/Downgrade levels of Data
classification
Custom classification Matrix for ePragati
information vs stakeholder access levels
Embedding Security levels and encryption during
data exchange with other agencies

6 Network Security Network Management Controls


Network Devices
Remote Access
Network Diagnostic Tools
Encryption of data while in-transit between data
centers and within data center
7 Computing Environment Management Documented Operating Procedures
Change Control
Incident Management Procedures
Computer Virus Control

e-Pragati Requirements Specification Document - DataLytics Page 36 of 101


S. No Policy Category Policy Section
8 Information Storage Encrypted Data on all storage media (e.g. hard disks,
floppy disks, magnetic tapes and CD-ROMs)
Physically secure all information storage media,
when not in use.
Back-up media to be stored in fire resistant safes or
cabinets.
Physical access to storage media restricted to
authorised personnel based on job responsibilities.

9 Application Security Role based access to application and information


Adequate logging, Input validation
Mandatory security testing before release in
production

e-Pragati Requirements Specification Document - DataLytics Page 37 of 101


3. Scope of Work of SI Selected for DataLytics
The overall scope of project involves requirements study, design, customise/development, test,
operationalise and maintain DataLytics application.
The Scope of work of the SI selected for the implementation of DataLytics has been defined in
different sections of the RFP. However, a high-level specification of the same is given in this section.

3.1. Overview of Scope of Work

The following Table gives high level scope of work of the SI in implementing DataLytics project:

Sl. Broad Scope of Work of SI

1 Deployment of a COTS Product based Solution


a. System Requirements Study including the user scenarios described in section
2.2.1
b. Solution Analysis, Architecture and Design
c. Product Customisation/Configuration of a COTS Product
d. Application Testing
e. Documentation
f. PoC, Pilot, and Rollout Strategy
2 Procurement and Deployment :
a. Procurement and deployment of the COTS Solution on premise at Interim DC of
GoAP near Andhra Pradesh New Capital.
b. Deployment of a team of Data Analysts, Data Scientists and Domain Experts, to
meet the reporting requirements specified in the RFP.
c. User Acceptance Testing
d. Change Management and Capacity Building
e. Integration with the 22 departments specified in section 2.1.4
f. Solution Certification
g. Go-Live and Operational Acceptance
3 Operations and Maintenance for 3 Years from the date of Go-Live:
a. Generating and Submission of reports to departments as per the user scenarios
requirements
b. Application Support & Maintenance
c. Technology Upgrade/Refresh to align with emerging technologies.
d. Information Security Services
e. Management of Technical Support
T ABLE 7: BROAD SCOPE OF WORK

e-Pragati Requirements Specification Document - DataLytics Page 38 of 101


3.2. Detailed Scope of Work
The following sections describe detailed scope of work, for implementation of a COTS product
based solution. Bespoke development of DataLytics solution is NOT ENVISAGED and hence
shall not be accepted. While this section focuses on scope of work, the detailed list of associated
deliverables has been provided in Section 3.5.

SI shall prepare the SRS document, Solution Architecture and Design documents in the form of a
BLUEPRINT for the complete functionality of DataLytics. Other remaining activities including
Software Product configuration/customisation, testing etc. shall be undertaken as per software
development processes.

Deployment of COTS Product based Solution


Performing System Requirement Study:
a. The SI shall perform a detailed assessment of the functional and technical requirements
for DataLytics
b. Based on the assessment, the SI shall submit the inception report and detailed Project
Work Plan for DataLytics project life cycle including COTS product Customisation,
Procurement & Deployment, and Operations & Maintenance, and schedule for delivery
of the predefined Analytical reports, and get it approved by the PMU.
c. The SI should prepare a Blueprint document, that can serve the following objectives:
i. Provides a situational analysis of the themes relating to all the specified
analytical reports for the 22 departments
ii. Defines a set of outcomes that can be expected by each target department,
through the use of the DataLytics Platform.
iii. Identifies and defines the data requirements for each of the analytical reports,
and the sources for the same, internal or external to the Government;
iv. The specific methodology proposed to be adopted for capturing, collecting,
extracting and transforming the required mass of data in respect of each theme,
and each department.
v. Prepare a RACI statement in respect of the collection of data.
vi. Defines the process flows and the algorithms designed to be used to undertake
the data analysis;
vii. Maps the standard functions of the DataLytics Platform that can enable the SI to
achieve the functionality and provide the specified outputs in the form of
insights or analytical reports;
viii. Defines the periodicity of the analytical reports for each theme/ scenario;
ix. Provides a set of advisories to the department concerned on the interventions
recommended for achieving the outcomes envisaged in (b) above.
x. Creates a methodology for measuring the effectiveness of the DataLytics
Platform vis-à-vis enabling the department to achieve the envisaged outcomes.
The SI should preferably follow the latest version of the Industry Standard 830 & 1233
for drafting the Blueprint document.

e-Pragati Requirements Specification Document - DataLytics Page 39 of 101


d. The Blueprint document should be accompanied with a detailed use case document of
all functions of DataLytics system in line with the minimum requirements specified in the
FRS (Refer Application specific sections of this Volume).
e. The SI shall provide for connectivity between Data sources (SDCs, Third party data
centers, and cloud service providers), and the facility where DataLytics application is
hosted. The SI shall also provision for required bandwidth. In case of connectivity to
other data centers and clouds has been already established, the SI shall take the
relevant approvals to reuse the connectivity as part of data governance/data collection
process.
f. The SI shall also prepare a Requirements Traceability Matrix (RTM) mapping the
requirements specified in the FRS with the sections dealing with those in the Blueprint.
The template for preparing the blueprint has to be approved by the PMU before it is
used to document the blueprint.
g. A formal sign-off should be obtained from the PMU, before proceeding with the Design,
Development, Customisation and Implementation of the System. The documents to be
presented for sign-off should include the Blueprint as well as the RTM.
h. The SI has also to customise and use the industry standard tool or the tool supplied by
the OEM of the DataLytics Product, for managing the lifecycle of the solution
development and maintenance.

Solution Analysis, Architecture and Design


1. The SI shall prepare the technical solution architecture and design, keeping in mind the
proposed solution architecture provided in various sections of this volume as well as the
approved Blueprint.
2. The Solution Architecture document should highlight the major components of the
solution and map it to the requirements identified in the Blueprint. It should also specify
the rationale of hardware and software sizing.
3. The detailed design document should describe complete low level details of various
components identified in solution architecture. This document covering all the modules
and sub-modules of DataLytics shall be prepared following the industry standard
practices. These will be complete with UML class diagrams, interfaces, application
control, database schema and GUI. All design components shall be mapped to solution
architecture components. AS THE RFP REQUIRES THE DEPLOYMENT OF A COTS
PRODUCT BASED SOLUTION, THE SI SHALL PROVIDE THE STANDARD SYSTEM
DOCUMENTATION OF THE OEM. SI SHALL PROVIDE DETAILED HLD AND LLD DOCUMENT
IN RESPECT OF THE CUSTOMISATIONS DONE AND COMPONENTS DEVELOPED BY HIM IN
ORDER TO FULFIL THE BUSINESS NAD FUNCTIONAL REQUIREMENTS OF THE RFP.
4. The SI shall undertake IT Infrastructure sizing and prepare documents based on
estimations and judgments drawn after undertaking a thorough study of the functional,
non-functional & technical requirements of the project, and subsequent completion of
PoC. The indicative number of existing users is provided in various sections of this

e-Pragati Requirements Specification Document - DataLytics Page 40 of 101


Volume. The scale up and scale out scenarios should be envisaged, and the
infrastructure requirements sized accordingly.
5. The System should be based on open standards. The objective of the designing exercise
should be to identify all possible mechanisms of IT implementation within the current
business context, identify reuse of existing components (both software and hardware)
and remove redundancies within the system.
6. SI shall carry out a comprehensive training needs analysis and accordingly design the
training program in consultation with the PMU.
7. The SI shall develop an ISMS for DataLytics using global standards such as ISO 27001. The
ISMS developed by the SI shall be in conformity with the security policy of the GoAP,
especially the policy on Unique Identification of Residents, 2015 notified by GoAP.
8. The SI shall submit the solution architecture and design documents to the PMU and
obtain a sign off on these documents before commencing the
development/customisation/implementation of the COTS product based solution.

Product Customisation/configuration of COTS


1. The product configuration/customisation shall extend to the full functionality of
DataLytics system described in various sections of this document.
2. Development of all interfaces for integration between DataLytics and other e-Pragati
Applications and components shall form part of the scope.
3. The SI shall consult the PMU while developing the user interfaces and design the
interfaces as per the PMU’s requirements.
4. SI to ensure that the DataLytics platform it should be able to interoperate with legacy
systems of the target departments so as to extract any data required for providing the
DataLytics services.
5. Data sharing shall be secure, and data shall be encrypted where needed. The SI shall
ensure that relevant security standards and industry best practices are incorporated in
securing the data.
6. The protocol for communication including data formats and the exact mechanism of
data transfer with participating entities/ stakeholders has not been explicitly defined in
this RFP and the SI should capture these during the requirements gathering phase and
ensure that all of these are documented as part of the Blueprint document.
7. All licenses required to deploy the COTS Solution
a. Should be held by and in the name of the GoAP as full use, enterprise licenses,
unrestricted and irreversible. This includes licenses for the COTS Products,
Database, Application Server and any of their components or any other software
required to run the application successfully.
b. Should in no way restrict the users of the department in terms of
view/write/modify rights. All Licenses quoted should provide the complete
rights to all department users.
c. The license shall permit at any instance of time 50 number of concurrent users
of the application.

e-Pragati Requirements Specification Document - DataLytics Page 41 of 101


Solution Testing
The SI shall conduct its testing process as per the Quality Assurance Plan and testing strategy
(including test plan and test cases) prepared and provided by it as per Solution Architecture
& Design stage. The objective of testing is to ensure that the entire system in totality,
including all hardware and software components together with the users perform as per the
objectives laid down in this RFP. The software solution testing shall include (but not limited
to) the following activities:

1. The SI shall perform the testing of the solution based on the approved test plan and
criteria; document the results and shall fix the bugs found during testing.
2. The application should undergo comprehensive testing which includes at least Unit
Testing, System Testing, Integration Testing, Performance Testing, Regression
Testing (in case of any change in the software) and Load & Stress testing.
3. The SI should preserve the test case results and should make them available to the
third party auditor (if any, to be appointed by the PMU at its own cost) for review,
on the directions of the PMU.
4. The SI shall share the tools used for testing the application system with the PMU. If
the tool is a proprietary tool then it should share at least one license with the PMU.
5. The testing of the application system shall include all components vis-à-vis the
functional, operational, performance and security requirements of the project, as
envisioned in this RFP.
6. Though the PMU is required to provide the formal approval for the test plan, it is
ultimately the responsibility of the SI to ensure that the end product delivered
meets all the requirements specified in this RFP and the signed off SRS. The
responsibility of testing the system lies with the SI.
7. The SI shall create a testing environment and ensure that all the application
software upgrades/releases are appropriately tested in this environment before
applying them on the actual production system. Any downtime/system outage for
Application system caused by applying such patches shall be attributed to the SI as
system downtime and shall attract penalties as per SLA.
8. GoAP shall engage a panel of TPAs for conducting the Acceptance Testing of
DataLytics system. A detailed Testing and Quality Assurance methodology to be
followed by SI is given in Annexure 5.

Solution Documentation
The SI shall prepare/update the documents including the following minimal set of Project
Documentation:
a. User Manual
b. Training Manual
c. Operations Manual
d. Maintenance Manual

e-Pragati Requirements Specification Document - DataLytics Page 42 of 101


e. Administrator Manual

PoC, Pilot and Rollout Strategy


As the use of Data Analytics on such a large scale by a Government is being attempted for
the first time, it is necessary for the SI to adopt a well-calibrated approach, which shall
consist of 3 distinct stages – PoC, Pilot and Rollout. The scope of each of these stages is
described below:

PoC
After the initial period of system study and analysis, the SI shall design a Proof-of-Concept
covering the 3 scenarios each in respect of the following 2 departments:

i. Civil Supplies
ii. Agriculture
The purpose of the PoC shall be to:
1. Establish the basic functionalities of the Product deployed for the DataLytics
solution
2. Establish the method of extracting data from a variety of sources (e-Pragati Projects,
legacy systems, external sources, web and social media), in a variety of modes
(online, batch mode, messages from sensors and other IoT devices)
3. Establish the ETL methodology
4. Prove the initial set of algorithms designed by the data scientists of the SI in
consultation with the SMEs of the 2 departments chosen for the PoC.
5. Provide the Analytical Reports @ 3 for each of the 2 departments
6. Validate the soundness of the methodology adopted and for the preparation of the
blueprint and confirm/change/enhance the contents thereof.
7. Validate the results thrown up by the PoC by comparing with ground realities.

THE POC SHALL BE COMPLETED IN A PERIOD OF 3 MONTHS OF SIGNING OF CONTRACT


WITH THE SI after which Pilot shall be done.
Pilot
After the successful execution of the PoC, the SI shall undertake the Pilot, with the scope of
80% functionality/ analytical requirements of 4 departments providing access to all the
target users of the pilot departments. The following departments shall be included in pilot
phase:
i. Civil Supplies
ii. Agriculture
iii. Healthcare
iv. Labour

e-Pragati Requirements Specification Document - DataLytics Page 43 of 101


The pilot phase shall deploy the solution as fully designed, configured and customised,
tested and deployed, to cover the 80% of the Functionality, and all the envisaged analytical
reports for the 4 Pilot Departments.

Rollout
THE PILOT IN 4 DEPRTMENTS SHALL BE COMPLETED IN ALL RESPECTS AND ACCEPTED BY
THE TARGE DEPARTMENTS IN A PERIOD NOT EXCEEDING 5 MONTHS FROM THE SIGNING OF
THE AGREEMENT WITH THE SI. After the successful implementation of the PoC and Pilot in
Civil Supplies, Agriculture, Healthcare and Labour Department, the rollout for the remaining
6 departments namely Planning, Municipal Administration, ITE&C, Energy, Welfare and
Panchayat Raj and one Cross sectional of phase 1 shall be taken up for the development of
Use case scenarios as per the terms of the RFP. In the second phase development of use
cases for the remaining 11 departments namely Finance, Police, Marketing, Registration,
Revenue, Excise, Irrigation, Tourism, Women Development, Housing and Transportation
shall be taken up as per the terms of the RFP.

The Rollout shall happen over a period of 12 months from the launch of the pilot.
It shall be in full compliance with all the requirements of the RFP i.e., remaining departments
targeted for Year 1 shall be covered in 7 months from the launch of the pilot and 11 more
departments in the following 6 months.
The below Table briefly summarizes the various phases of implementation:

Phase No. of Departments No. of User Scenarios Duration


to be Covered (T = Start of Project)
Proof of 2 2x3=6 T + 3 months
Concept (PoC)
Pilot 4 4 x 3 = 12 T + 5 months
Year 1 11 47 T + 12 months
Implementation
Year 2 11 47 T + 18 months
Implementation

Note: Total project scope is 1 year for development and 3 years of maintenance.

Procurement and Deployment Phase


Procurement and Deployment:
GoAP shall procure and provide required Infrastructure at its cost so as to satisfy the
performance requirements of the DataLytics Solution. However it is the responsibility of the
SI for Hardware sizing. The following mandatory requirements shall be ensured by the SI:

1. Certification of the IT Infrastructure


2. Maintenance of Security Standards
3. Maintenance of the confidentiality of the data collected by the SI

e-Pragati Requirements Specification Document - DataLytics Page 44 of 101


4. Maintenance of the privacy of personal data of the citizens of the data
5. Entering into NDA with all the Service Providers and their employees in respect of
any piece of data allowed to be acceded or extracted by the SI from any source,
government or private
6. The infrastructure on which the DataLytics solution is deployed at Interim DC of
GoAP near Andhra Pradesh New Capital
7. The data, information, and reports generated as a part of the DataLytics project shall
be surrendered to the GoAP on conclusion of the project period.
8. The authorised personnel of GoAP or the PMU shall have access to the facilities
where the DataLytics solution is hosted, and take extracts of the logs of the system
as may be required.

User Acceptance Testing:


a. The PMU shall perform a detailed User Acceptance Testing (UAT) over the software
application deployed in the Data Centre from the select locations of all stakeholders,
from where the System is expected to be accessed. These testing shall be performed
with the sample data and after the data migration has been performed.
b. The SI shall provide and ensure all necessary support to the PMU conducting the
Acceptance Testing including sharing necessary project documentation, source code
(wherever applicable), and systems designed & developed, testing strategy, test cases
developed for the project executable, test results etc. The SI shall be required to
facilitate this process and it shall be incumbent upon the SI to meet all the criteria.
c. Acceptance testing shall focus on testing the system for both AP SWAN and internet
connectivity. The interaction between application system and other stakeholders shall
also be tested as part of the acceptance testing. The testing shall be conducted by the
users, PMU and external agency users by simulating various real-time scenarios.
d. The quality of hardware procured under the contract shall be verified by the PMU or its
authorised agency. The PMU shall use the hardware devices procured under this
contract for a period of one month, after Go-Live is initiated, before giving its
acceptance.
e. SI should prepare and submit UAT Reports including:
i. Various Tests performed
ii. Test results
iii. Resolution reports for the issues identified during Testing

Change Management and Capacity Building


a. SI needs to execute the Change management and capacity building activity as per the
approved Change Management and Capacity Building Plan prepared by the SI and
approved by the PMU at the software Solution Analysis & Design Stage. The Change
Management and Capacity Building Plan shall be as per the Transition Plan prepared by
the SI and approved by the PMU.

e-Pragati Requirements Specification Document - DataLytics Page 45 of 101


b. The SI shall help the PMU in managing the transition from the existing system (both
manual and automated) to DataLytics as per the transition plan reviewed and approved
by the PMU and accepted by SI.
i. The SI has to ensure zero service disruption during transition as such it needs to
plan accordingly. The training to the officials may preferably be provided on
government holidays or before/ after office hours to the best possible extent.
The deployment of hardware and software should be done on a government
holiday or after office hours so as to ensure that there is no disruption in regular
work.
c. The SI to prepare required training material and manuals as desired.
d. The SI shall conduct the training at the regional offices. This is to ensure the effective
outcome and minimum disturbance in the regular working of the employees.
e. The duration of the training shall be jointly decided by PMU and SI, however, the
duration shall be sufficient to meet the training requirements of the user and facilitate
user in carrying out the routine activities on system. The training batch shall not have
more than 40 users.
f. In addition to above, the PMU shall facilitate the SI to deliver the training through video
conferencing at district level. Video conferencing facility is available at district level.
g. SI shall also provide effective training and access to the GoAP/PMU on project
management methodology and mechanism to be adopted to use these methodologies
(including tools), to manage DataLytics effectively.
h. SI shall also develop an e-training module to facilitate online training by the user by
downloading it and practice.
i. The training shall also focus on what-if scenarios and practical case studies with respect
to all the transactions/ entries made in DataLytics.
j. SI shall also provide trainings to the personnel, whenever any changes are made in the
application system.
k. The Training sessions should be participative in nature and respond to the queries/
doubts of the user.
l. SI shall also adopt the train the trainer approach and create champions amongst the
user offices, so that the training usage on the job becomes more sustainable.
m. Implementing change management:
i. Conduct at least two change management workshops (including presentation
materials and related documents) before the Go Live of the solution.
ii. Monitoring and reporting on PMU's preparedness to adopt planned changes and
identifying corrective actions to achieve the desired goals at all times.
iii. To provide the GoAP team assigned to the PMU with the necessary training in
the methods, principles and standards of change management to be adopted for
institutionalisation of the planned DataLytics implementation.
n. SI needs to submit the training and change management report after successful
completion of each training session, including user feedback and duly filled in User
Feedback form.

e-Pragati Requirements Specification Document - DataLytics Page 46 of 101


o. In addition to above, SI needs to submit the consolidated training and change
management report after successful completion of training(s) rollout.
p. SI shall work with PMU in creating a central DataLytics team, and identify skillset needed
by its members. Capability building includes forming and enabling this central team so
that it can cater to the analytical needs of various departments

Solution Certification
a. The application has to be free from any security threat and the SI shall have to produce
the third party audit certification for the same.
b. Further, the SI shall get the third party certification from the agencies which are
empaneled and approved by APTS/GoAP and shall submit the testing certificate to the
PMU before Go-Live of the DataLytics. The cost of certification shall be borne by
APTS/ITE&C dept.
c. In addition PMU at its own cost may also engage any other third party agency and get
the application tested. SI has to provide full support for this activity.

Go-Live and Operational Acceptance


It is mandatory to implement both POC and Pilot phases successfully before the system
getting into Go-Live and acceptance phase of Year 1 and Year 2 implementations separately.
The following description of Go-Live and Operational Acceptance is equally applicable for
both Year 1 and Year 2 rollouts.

Operational Acceptance for the DataLytics shall be awarded by the PMU only if all user
scenarios are met by the SI ensuring the full functionality.

In order to ensure effective completion/accomplishment of the above, the PMU, in


association with SI, shall carry out all the necessary operational acceptance tests including
but not limited to:
i. Functionality Test
ii. Database Test
iii. Integration Testing
iv. Unit Test
v. System Test
vi. Security Compliance
vii. Stress test
viii. Performance test, etc.

In order to accept the System, PMU must be satisfied that all of the work has been
completed and delivered to PMU’s complete satisfaction and that all aspects of the System
perform acceptably. The operational acceptance of the system shall only be certified when
the proposed system is installed and configured at the sites according to the design and that
all the detailed procedures of operating them have been carried out by the SI in the
presence of PMU staff. The system is said to be “Go-Live” when it is installed, configured,
and operationalised for all use scenarios related to 11 departments including cross-sectoral:

e-Pragati Requirements Specification Document - DataLytics Page 47 of 101


a. SI should submit a report for obtaining OPERATIONAL ACCEPTANCE after the Go-Live
phase. The report should include following:
i. All required activities for DataLytics delivered by the SI and accepted by the
PMU
ii. All required system functionalities of DataLytics delivered by the SI and
accepted by PMU
iii. All required documentation for the DataLytics prepared by the SI and
accepted by PMU
iv. All required training for DataLytics imparted by the SI and accepted by PMU
v. All identified shortcomings/defects in the Systems have been addressed to
PMU’s complete satisfaction
vi. All the required Project Documents (manuals, SOP, etc.) have been
submitted and accepted by the PMU
vii. No. of users that have access to the System and are using the System for the
respective functional areas
viii. Any other work which is required to be complied with by the SI. The
application has to be free from any security threat and the SI shall have to
produce the third party audit certification for the same.

Based on the above and only after being completely satisfied that at least a minimum
percentage of all the users of internal stakeholders have access to the System and are using
the System for the respective functional areas, the PMU shall issue such OPERATIONAL
ACCEPTANCE of DataLytics.

Operations and Maintenance


The SI shall be responsible for the overall management of DataLytics for a period of 3 years (from the
effective date of Contract) including software and entire related ICT Infrastructure. The operation
and maintenance phase shall commence after Go-Live of Phase by the SI.

SI has to work with departments for data collection and design of new models for implementation of
new use case scenarios, if any.

SI should develop the Standard Operating Procedures (SOPs), in accordance with the ISMS, ISO
27005 & ITIL standards. These SOPs shall cover all the aspects including Infrastructure installation,
monitoring, management, data backup & restoration, security policy, business continuity & disaster
recovery, operational procedures etc. The SI shall obtain sign-offs on the SOPs from the department
and shall make necessary changes, as and when required, to the fullest satisfaction of GoAP. GoAP IT
& IT related polices and security policy shall be adhered.

SI shall provide automated tool based monitoring of all performance indices and online reporting
system for SLAs defined in Volume III of RFP. The tools should have the capability for the PMU to log
in anytime to see the status.

e-Pragati Requirements Specification Document - DataLytics Page 48 of 101


Submission of reports to departments
Additionally, SI should also prepare and submit the Daily, Weekly, Monthly and Quarterly SLA report
during Operation and Maintenance phase based on the SLAs provided in the Volume III of RFP.

The weekly SLA report is the summary of the daily SLA reports. The Monthly SLA report is the
summary of the Weekly SLA reports. The Quarterly SLA report is the summary of the Monthly SLA
reports.

Besides the SLA reports SI also need to annually submit the following:
a. Certification stating all patches/ upgrades/ service releases have been properly installed
b. Asset Information Register
c. Standard operating procedure
d. Updated Project Exit Management Plan

Further at the last quarter of Operation and Management phase SI needs to submit the Project Exit
report.

The broad activities to be undertaken by the SI during the operation and maintenance phase are
discussed in subsequent paragraphs.

Providing Application Support and Maintenance


During the contract period, the SI shall be completely responsible for defect free functioning of the
application software and shall resolve any issues that include bug fixing, improvements in
presentation and/or functionality and others at no additional cost during the operations &
maintenance period, within a duration specified in SLA.

The SI shall be responsible including but not limited to:

a. Providing for 24X7 onsite warranty/support for the software (Application/system/support)


Developed/Customised.
b. Ensuring compliance to uptime and performance requirements as indicated in the SLA
c. Management of Integration Component including the Component for integrating with
external agencies and any third party component used in the application software
d. Providing and installing patches and upgrades without any additional cost for contract
period for the quoted hardware, software, etc. In case the software patches are not
available free of charge, the cost of the same should be included in the contract price.
e. Ensuring timely resolution and fixing of bug/defects reported.
f. Undertaking performance tuning of the System (application and database) to enhance
System’s performance and comply with SLA requirements on a continuous basis.
g. Version management, License Management and software documentation management,
reflecting current features and functionality of the solution.

Technology Upgrade /Refresh

e-Pragati Requirements Specification Document - DataLytics Page 49 of 101


a. GoAP, at its own cost, in consultation with PMU may ask SI for Technology refresh at the end of
the fourth year of the project duration to take advantage of latest and Green technology at that
time and also save on maintenance cost of old hardware.
b. For the above purpose, SI, at its own cost, shall conduct a study for Technology upgrade/ refresh
and also consider Trade-in option for the hardware including Server, Storage, Networking,
Desktop etc. at Data Centre site to ensure increased efficiency in overall performance of the
System (taking into consideration increased no. of transactions, higher volume of database,
future requirements at that point in time, etc.)
c. SI shall ensure that the necessary migration of the data / application to new system is successful
during the above refresh.
d. SI shall also upgrade all the relevant documents including IT asset register, architecture
documents and other documents for the above refresh.

Information Security Services


a. The SI shall be responsible for ensuring overall information security of DataLytics, including but
not limited to:
i. Web Portal
ii. Application software
iii. System Software
iv. Support Software
v. Data
vi. Hardware
b. The SI shall be responsible for the regular update of the security policy as formulated during
project development/ customisation phase.
c. The SI is responsible for implementing measures to ensure complete security of the DataLytics
and confidentiality of the related data, in conformity with the security policy (framed by the SI in
consultation with the PMU).
d. The SI shall be responsible for guarding the Systems against virus, malware, spyware and spam
infections using the latest Antivirus corporate/Enterprise edition suites which include anti-
malware, anti-spyware and anti-spam solution for DataLytics solution deployment.
e. The SI shall monitor security and intrusions, which mandatorily shall include taking necessary
preventive and corrective actions.
f. The SI, with appropriate co-operation of the PMU, shall manage the response to security
incidents. The incident response process shall seek to limit damage and shall include the
investigation of the incident and notification to the appropriate authorities. A summary of all
security incidents should be made available to the PMU on a weekly basis; however the
significant security incidents should be reported immediately on occurrence of the incident.
g. The SI shall have to maintain strict privacy and confidentiality of all the data it gets access to and
adequate provisions shall be made not to allow unrestricted access to the data. In particular the
SI cannot give access to data to people in its organisation who have not signed the Non-
Disclosure Agreement (NDA). The SI cannot sell or part with any data in any form.
h. The above security services are subject to guidelines/ procedures of hosting server and other ICT
equipment at SDC facility.

e-Pragati Requirements Specification Document - DataLytics Page 50 of 101


i. PMU shall maintain the physical security of the DC sites

Setting up and Management of Technical Support (including manpower & other field support staff)
a. The SI shall be required to provide Technical Support (Tech Support) services to enable effective
support to the internal and external users for technical.
b. Additionally SI shall be required to provide Field Support Staff to enable effective field support.
c. SI shall ensure helpdesk facility shall have following:
i. Call logging mechanism through Phone
ii. Call logging mechanism through e-mail
iii. Call logging mechanism through portals and applications
d. The SI shall provide at least the following services:
i. Provision and supervision of personnel for the Tech Support. Minimum qualification
requirements for personnel for this process are stated in the Volume II of RFP. Further SI
shall ensure that helpdesk resources and also the technical field staff should have
knowledge of local language i.e. Telugu.
ii. Helpdesk shall provide its services on all working days of GoAP between 08:00 Hrs. to
20:00 Hrs. However, minimal support be available for remaining hours of the day and
non-working days.
iii. All grievances shall be assigned a ticket number and the number shall be made available
to the user along with the identification of the agent, without the user having to make a
request in this regard, at the beginning of the interaction.
iv. Tech Support team shall provide support for technical queries and other software
related issues arising during day to day operations
e. The Physical space for the helpdesk and any other required infrastructure shall be provided by
the SI.
f. The SI shall categorise the technical issues and potential faults in three levels – Low, Medium
and High, in consultation with the PMU. The levels shall be based on the following criteria.
i. Impact on business through disruption of services and operations;
ii. Number of offices and geographical locations being affected by the issue;
g. The SI shall adhere to the service level agreement with respect to the resolution of issues at
various levels.
h. The interactions shall also be recorded and the records maintained for reference for a period of
3 months from the date of resolution of the problem.
i. All complaints/ grievances of users shall be recorded and followed up for resolution and an
escalation matrix to be developed for any delay in resolution.
j. Apart from using helpdesk for recording grievances received through telephone, e-mail, and
portal facility should be made available to the users to record their grievances.
k. The Technical Team should register the complaints to the Tech Support for the
server/network/Application related problems. It shall be ensured that the complaints logged by
the technical team must be on High Priority Basis.
l. The SI shall provide the following helpdesk performance monitoring reports –
i. Calls per week, month or other period;
ii. Numeric and graphical representation of call volume;

e-Pragati Requirements Specification Document - DataLytics Page 51 of 101


iii. Calls for each interaction tracked by type (calls for information on specific service, calls
for specific enquiries);
iv. Number of dropped calls after answering, including:
m. Calls that ended while on hold, indicating that the caller hung up;
n. Call that ended due to entry errors using the automated system, indicating difficulty in using the
system;
o. Field support staff shall be deployed physically at Datacenter locations. Field support staff shall
provide the first hand technical support to the end users.

3.3. Deliverables

The following Table outlines detailed deliverables expected from the SI. PMU will be created for
each package with a SME (department person) and PMU will be made responsible for the sign-
off of deliverables.

Sl. Milestone Deliverables from SI

Planning and Inception Phase


1 Signing of the Contract Performance Bank Guarantee for 10 % of total contract value.

2 Submission of Project a. Detailed Project Work Plan/Inception report for Design,


Inception report Development, Implementation, Testing, Operations and
Maintenance of DataLytics, based on COTS product .
b. Comprehensive Project Risk Management Plan
Systems Study, Architecture and Design Phase
3 Submission of Blueprint a. Blueprint Document for the solution in 3 phases viz,
Documents 1. PoC
2. Pilot and
3. Rollout
b. Functional Requirements Traceability Report
c. Data Assessment Report which should include Data quality,
availability of all Data elements required for DataLytics
d. Data Access Requirements Report which will specify what
level of access is required for which users on which data
elements.
e. Data Security Requirements Report which will describe
how Data Security Standards and Requirements will be
met.
f. Data Extraction & Sourcing Strategy
g. Data Cleansing and Standardisation Strategy
h. Data and Application Integration Strategy (shall be SOA
based)
i. Change Management and Capacity Building Strategy
j. Quality Assurance Plan

e-Pragati Requirements Specification Document - DataLytics Page 52 of 101


Sl. Milestone Deliverables from SI

4 Submission of Solution a. Detailed Software Architecture document describing


Architecture Documents various stakeholder requirements, architecture views
addressing the requirements, architecture styles and
patterns, design decisions and strategies addressing key
quality attributes, technology components of the
DataLytics platform and their relationships, tools and
methodologies for interaction with external entities,
standards that will be followed in relation to the data
management cycle
5 Submission of Solution a. Design documents for all application modules and sub-
Design Documents modules
b. Technical / System Design Document including but not
limited to:
i. High level modules and sub modules structure,
including detailed specifications for interfaces at
method and attribute levels

ii. Class Diagrams, Collaboration diagrams in UML


notations

iii. Component and Deployment views of the


Application

iv. Detailed specification of workflow logic for


modules and sub-modules

v. Updated Conceptual Data models


vi. Logical and Physical Data models
vii. Data Dictionary

viii. Updated metadata and reference data


ix. Database structures including ER, Data Dictionaries
and DFD diagrams

x. Security design
xi. NoSQL database and Data lake design

b. Detailed interface design specifications to integrate data


and applications with e-Pragati applications and external
systems

c. Document on Testing Approach and Strategy along with


the test cases and test results including but not limited to:
i. Type of inputs (Functional / Performance / Stress /
Acceptance / Structural), including Test Coverage /

e-Pragati Requirements Specification Document - DataLytics Page 53 of 101


Sl. Milestone Deliverables from SI

Boundary conditions
ii. Machine Configuration

iii. Test Assumptions

iv. Exact test stimuli as applicable

v. Response Time / Execution Time / Throughput


d. Design of Wireframes and User Interface Screens in
consultation with end users and stakeholders
e. Design documents for all relevant atomic and compound
Services

f. Security architecture & policies (subject to SDC, APSWAN’s


policies or any other IT policy by GoAP)
g. Network architecture covering various user locations,
deployment locations, bandwidth requirements, software
and hardware components

h. Deployment architecture covering Network, Servers,


Storage, System Software
i. Data backup & recovery strategy
j. Integration approach with e-mail, Helpdesk services of
GoAP
k. Software/ Hardware Configuration Management Plan
l. Data Backup and Management Policies
Implementation Phase

6 Submission of a. Development, Testing, and Deployment Plan which should


Implementation Plan at minimum contain the following:

i. Breakdown of total scope into phases or iterations


ii. Phase or Iteration-wise Identification of
requirements for Software development
iii. Phase-wise Testing and Deployment Strategy

iv. Phase-wise User Sign-off Strategy


b. Automated testing and automated test case generation are
recommended to ensure appropriate test cases are
generated, reducing waste and enhancing application
quality. The scope and coverage of test cases and their
results are to be verified and signed off by PMU.

7 Completion of At the end of software configuration/ customisation phase, the


Configuration/ following artefacts shall be delivered:
Customisation

e-Pragati Requirements Specification Document - DataLytics Page 54 of 101


Sl. Milestone Deliverables from SI

a. Updated Test Plans and Test Results of


i. Unit Testing

ii. Integration Testing (with other e-Pragati Packages)

iii. System Testing

b. Source code, library files, DLLs, Programs and other


relevant software components, to the extent of
customisation and any required development. Please refer
Volume III of this RFP for more details

c. Fully functional Software Applications

d. Updated Requirement Traceability indicating fulfilled


requirement

e. Application Readiness Report, Development completion


report including minimum following:
i. Source code (soft copy)
ii. Report formats (soft copy)
iii. Test scripts (soft copy)
iv. Databases (soft copy)
v. Data digitised and Migrated (soft copy)
vi. Executable files (soft copy)
vii. Product CDs
viii. Other relevant documents deemed necessary
f. Configuration management strategy

g. Technical documentation to explain source code, solution


design maintenance manuals for Data Center, Software,
Networks, Servers and other hardware.
8 Completion of Data a. Data migration strategy report.
Migration / extraction (for
b. Report on data migration/ extraction including the extent
each Phase (PoC, Pilot and
of data migrated/ extracted, methodology for data
Rollout)
cleaning, etc.

9 Completion of User a. UAT Reports including


Acceptance Testing and i. Various Tests performed
Third Party Audit ii. Test results
iii. Resolution reports for the issues identified during
Testing
b. PMU Sign-off of UAT

c. Application Audit Compliance and Security Certification


from Third Party Agency

e-Pragati Requirements Specification Document - DataLytics Page 55 of 101


Sl. Milestone Deliverables from SI

10 Insights and Analytical a. Insights, analytical reports and specific


Reports recommendations for interventions to achieve the
outcomes, in conformity with the broad
requirements for the 22 departments given in
Section 2.1.4 above.
b. Delivery of the insights analytical reports and
specific recommendations for interventions at the
specified timelines for the PoC, Pilot and Rollout.

Change Management and Capacity Building

11 Submission of Change a. Change Management Workshops including presentation


Management Plan materials and related documents.
b. Updated Legacy Application Retirement strategy for
existing application (if applicable).

c. Change Communication Strategy – Expectations from


various Core Package stakeholders, how the change
impacts them.
12 Completion of User Training a. Training plan & material for various kinds of trainings
Training manuals and schedules.
b. Completion of trainings for all users as identified in
Capacity Building Plan, in terms of:
i. Sensitisation training
ii. Basic computer awareness
iii. Application administration training
iv. Functional user training
c. Training feedback forms (post training sessions).

d. Training effectiveness score (based on feedback from


participants).
Deployment Phase
13 Proof of Concept (PoC) a. Successfully implemented use cases that were identified
for PoC phase, meeting the defined objectives
b. Analytical and statistical models, custom algorithms if any
and other relevant artefacts along with generated
DataLytics reports, are delivered to the departments
c. Learnings from the PoC phase and incorporated into design
artefacts

14 Pilot Implementation a. Successfully implemented use cases that were identified


for Pilot phase, meeting the defined objectives
b. Analytical and statistical models, custom algorithms if any
and other relevant artefacts along with generated

e-Pragati Requirements Specification Document - DataLytics Page 56 of 101


Sl. Milestone Deliverables from SI

DataLytics reports, are delivered to the departments

15 Go-Live a. All necessary hardware, software and human resources are


deployed as per the requirements
(Separate milestones for
b. Poc and Pilot phases deliverables are approved by
Year 1 and Year 2 targeted
Departments
departments)
c. Successfully implemented use cases that were identified
for Year 1 and Year 2 departments, meeting the defined
objectives
d. Analytical and statistical models, custom algorithms if any
and other relevant artefacts along with generated
DataLytics reports, are delivered to the departments
e. Go-Live Certificate

Operations and Maintenance

16 Post Deployment a. Submission of DataLytics reports along with other artefacts


such as analytical and statistical models, custom algorithms
if any and other relevant artefacts to departments as per
the requirements in terms of monthly, quarterly, half-
yearly and annual
b. Post Implementation Support
c. Call Log & Resolution Reports for Helpdesk.
d. Daily/Weekly/Fort-nightly/Monthly Performance
Monitoring Reports
e. Operational Document on Strategic Control of DataLytics
over the Project.
f. Quarterly Report for Operations and Maintenance
Activities carried out during the quarter, including:
g. Post Implementation Support to PMU
h. Software change logs, etc.
i. Annual Certification stating all patches/ upgrades/ service
releases have been properly installed
j. Other Reports as mentioned in the Scope of Work.
T ABLE 8 : DATALYTICS PACKAGE D ELIVERABLES

3.4. IT Service Delivery

e-Pragati program is user-centric and aims at providing better government services to the end
user. The goal is to build a common vision of success amongst all stakeholders and ensure the
institutional and implementation arrangements support strategy.

e-Pragati Requirements Specification Document - DataLytics Page 57 of 101


a. The SI shall ensure that the IT Service Delivery shall be in compliance with the SLA set in
this RFP and adhere to global best practises and principles of e-Pragati.
b. The SI should ensure that the service delivery mechanism is improved through e-Pragati,
achieve better information management & transparency and ensure utmost citizen’s
involvement in participative governance.

e-Pragati Requirements Specification Document - DataLytics Page 58 of 101


4. Training and Change Management
In order to successfully address the capacity requirements of rollout of DataLytics system at various
levels the Government, a comprehensive training plan has been proposed for DataLytics. The
training need requirements, at various levels has been identified based on the functional changes in
the working of government officials in the new set up.

The trainer need to provide the special training on DataLytics to the training / faculty team of the
departments.

4.1. Indicative Training Requirements

Sl. Description

1. Stakeholders shall be trained on basic functionalities, features, workflows, business


processes, reports and operations that can be performed, as relevant to user types.

2. Where necessary, relevant case studies may be given.

3. Detailed training plan shall be created, and training material shall be prepared and soft
copies to be distributed to the participants. 5 copies of printed training manuals shall be
supplied to each of 22 departments.

4. Training plan shall include details like participant names, training location, date, and
time. And all necessary arrangements shall be made to enable smooth running of
sessions.

5. At the end of training sessions, assessments will have to be performed in order to


evaluate the level of understanding of the participants. Assessments may be in the form
of quiz, tests, or real-life simulation. All necessary arrangements, including preparation of
test materials, administering tests, and evaluating test reports must be planned and
done in advance.

6. PMU will conduct an exit test at the end of the training session and allots grades to the
all participants. There will not be any fail criteria. All the participants with bottom grade
will have to be re-trained. If more than 30% of the participants receive the bottom
grade, then the training will have to be re-conducted.

7. The nature of trainings can be:


i. Need for Analytics in government
ii. Use cases/scenarios specific to departments
iii. Proposed DataLytics System overview.
iv. Services and capabilities of DataLytics System.
v. The Information security and their relevance and importance to the department
data confidentiality. SI has to adhere to security policies of GoAP.
vi. Knowledge of Departmental Systems, Operational Procedures etc.
vii. System Administration training to IT Operation Management team
viii. Technical support training
ix. Data Security and privacy

e-Pragati Requirements Specification Document - DataLytics Page 59 of 101


8. It will be department wise training and training manuals need to be provided for each
department. Coverage of training of each department may not be same.
9. The size of each batch should not be more than 40.
CBT based trainings shall also be provided for all key trainings especially for new joiners
and also as refresher trainings. The SI shall finalise list of CBT based trainings after
consultation with PMU.
T ABLE 9 : I NDICATIVE T RAINING R EQUIREMENTS

4.2. Training Deliverables

The following Table describes the DataLytics training deliverables:

Sl. Details
1. Training Plan
2. Training Manuals
3. User Guides and Materials
4. Documented Evidence of Successful User Training.
T ABLE 10: Training Deliverables

4.3. Training Responsibility & Duration

Training shall introduce the GoAP resources on systems, procedures and processes in an
elaborate manner. The actual requirement of training may be assessed while implementing
DataLytics and will be decided mutually by Government designated team and SI. Concept of
Trainer’s Training program will be organised by Government designated team to train the
trainers of the DataLytics SI.

The trainings shall be conducted at four locations (Vishakhapatnam, Vijayawada, Tirupati


and Kurnool). The expected duration for the training at each location will be 15 days and it
should cover all the users those are required to be trained as per the RFP. Based on the
training need, SI has to develop the training material. This SI would have to maintain the
repository of the material and would have to train service agents on account of general
expansion or attrition. Trainings which are not related to functionality of the process and
client applications would have to be provided by the SI itself; this will be technical training
on general application usage and applications provided by SI.

4.4. Training Need Analysis

The training need analysis of all key stakeholders has to be done and then training plan will
have to be developed in line with overall project plan. Given below are high level
requirements of DataLytics Training Plan:

Sl. Training Modules Target Audiences


1. e-Pragati DataLytics Overview – a. Department HoDs, Department Users –
Overview of the entire system, value 7 from each department on an average
proposal, stakeholders and benefits, (22*7) = 154
roll out strategy, expectations, roles b. Central DataLytics team – 10

e-Pragati Requirements Specification Document - DataLytics Page 60 of 101


Sl. Training Modules Target Audiences
and responsibilities of departments
2. DataLytics Deep-dive: Department a. Department Users – 5 from each
Users department – Total 110
Functionalities available to b. DataLytics team – 10
department users, Training on using
DataLytics system to generate basic
reports, run queries,
Working with DataLytics team for
advanced analytics and reporting
3. DataLytics Deep-dive: DataLytics DataLytics team – 10
Team
DataLytics component wise training,
roles and responsibilities of each
member of the DataLytics team,
team standard operating procedure
and SLAs
4. Data Governance Data Governance team – 1 representative
Data collection strategy, decision on from each department and DataLytics team,
data access, and exchange, and governance members – Total 32
Reviewing and approving major
enhancements to DataLytics system

4
.
T ABLE 11: DATALYTICS T RAINING REQUIREMENTS

4.5. Change Management

Change Management initiative shall focus on addressing key aspects of project including
building awareness among stakeholders. Change management shall also include
development and execution of communication strategy for stake holders.

Change Management workshops shall be planned and conducted based on needs of various
stakeholders of DataLytics. Key considerations for Change Management Process are given
below:

Sl. Description

1. Impact Assessment – In the light of changes, how current functioning, Org


structures, roles and responsibilities are impacted.

2. Assess Change Readiness – How ready departments and stakeholders are? Are

e-Pragati Requirements Specification Document - DataLytics Page 61 of 101


there potential blockers? Stakeholder issues and concerns etc.

3. Design Change Management Approach – This is to come up with an optimal way


of implementing DataLytics (Phases, Pilot Groups etc.) and time frames.

4. Develop Change Plan – This includes creating plan, identifying milestones,


developing benefit tracking mechanisms.

5. Define Change Governance – Including appropriate decision making and review


structures.
T ABLE 12: CHANGE MANAGEMENT REQUIREMENTS

A special consideration will have to be given to Change communication strategy, planning


and execution given below are recommended steps are listed below:

Sl. Description

1. Conduct a Baseline Communication Assessment.

2. Develop and validate Communications Strategy.

3. Develop and Validate Communication Plan.

4. Implement Communications Programs.

5. Measure Results of Communication Plan.

6. Adjust Communications Program.


T ABLE 13: CHANGE MANAGEMENT – SPECIAL C ONSIDERATIONS

e-Pragati Requirements Specification Document - DataLytics Page 62 of 101


ANNEXURE 1 – FUNCTIONAL REQUIREMENTS OF DATALYTICS
In order to drive state business intelligence for improved service delivery to all stakeholders,
DataLytics application would provide analysis and reporting based on all types of data
(structured/unstructured, real time/ batch data, current/historical department data). DataLytics
would be providing big data analytics capabilities to provide a unified, 360-degree view of e-Pragati
data enabling users to make smarter decisions based on deep and relevant insights.

At a broad level, Datalytics Big data analytics functions will combine:

Data Analytics Data Analytics capabilities should provide minimum functionalities of data
integration, ETL/ELT, data quality, reporting, dashboards, data query, advanced
analytics (descriptive, causal, mining, predictive, prescriptive, statistical,
mathematical) and real time analytics (Sensor analytics, log stream analytics).
These functionalities would be available for all types of data i.e. structured,
semi structured and unstructured
Big Data Platform The Big Data platform would integrate and analyse all types of data i.e.
Structured or unstructured, Big or Small, Real-time or Batch across multiple
data stores. This platform will comprise of following three core components:
the Enterprise Data Warehouse (EDW), the Advance Discovery Lab, and the
Hadoop Platform.
a. An EDW will be RDBMS based warehouse and it would hold structured
data in centralised, consistent manner and will deliver strategic and
operational analytics.
b. A Hadoop Platform will be used for capturing, storing, archiving, and
refining all types of structured/unstructured data.
c. An Advance Discovery Lab will be used by business analysts to unlock
insights from big data with rapid exploration capabilities and a variety of
analytic techniques.
The Big data platform should also provide seamless query across 3
components.

The Table below describes minimal functional requirements of DataLytics:

Sl. No Requirements Description


Report Generation
1. Pre-defined Analytical reports A set number of pre-packaged reports are available to
the user from the DataLytics system. The indicative list
of analytical reports required by the 22 departments is
provided in Section 2.1.4
2. Additional Analytical Reports In addition to pre-defined and scenarios and the
consequent insights, analytical reports and
recommendations, GoAP may prescribe additional
scenarios to be created and the SI will be compensated

e-Pragati Requirements Specification Document - DataLytics Page 63 of 101


extra at the rates arrived at through bid process for
simple, medium and complex scenarios.
3. Report formats Users should be able to generate report in multiple
formats including at the minimum – PDF, XCEL, Word,
CSV, XML, Web Report
4. Interactive reports Users should be able to analyse data from a lower and
higher level of the hierarchy (drill-down/drill-up), filter,
sort, find, outline view of data etc.
5. OLAP Viewer Must be intuitive for end users in terms of drill and pivot
functions Charts must be drillable. Tool must support
user defined hierarchies, user defined calculations, and
must have write-back capabilities
6. Spread sheet view Users should be able to export /import data to/from
spreadsheet without losing formatting.
7. Compatibility with Office Users should be able to import DataLytics reports into
tools their Office suite
8. Exception Reporting DataLytics platform should be able to meet the
emergency analytical requirements arising out of natural
or man-made disasters in a rapid manner. The SI will be
compensated extra at the rates arrived at through bid
process for simple, medium and complex scenarios
9. Compatibility with Web Reports shall be viewable on web using any standard
browsers including but not limited to Internet explorer,
Chrome, Firefox, Safari
10. Compatibility with mobile Majority of the reports should be accessible on mobile
devices devices and tablets on apps as well as on browsers
11. Layout and Cosmetics Users should be able to view multiple objects on a single
document and must have columnar, cross-tabular, and
banded displays with multiple report tabs and pivot
capability. Font and color schemes must be pleasant and
professional
12. Graphical capabilities Tool must support rich variety of graphical displays ( 2D,
3D, multiple scales, split scales, maps, custom etc) and
chart types (Pie, bar, stacked bar, line, histogram, radar,
etc)
13. Report linking & Interaction Report linking should be enabled between two reports.
By selecting one parameter in one report relevant data
should be filtered highlighted in the linked report
14. Report prompting Where user inputs are required, the tool must prompt
users to provide them. Example: Filters, hierarchy,
column selection, dropdowns, saving the report etc.
15. Report development/ease of Non-Technical users must be able to easily develop

e-Pragati Requirements Specification Document - DataLytics Page 64 of 101


use reports using mobile device or standard computers by
drag and drop of relevant fields. The development must
be interactive wizard driven. Users without any
knowledge in querying or software development must
be able to develop reports in minimum time
Dashboards
16. Dashboard/Scorecards for The system should provide a set of metrics that provide
Key outcomes an “at-a-glance” summary of the outcomes envisaged as
a part of the Blueprint
Data Querying Requirements
17. Query data from multiple Users must be able to query data from multiple sources,
sources and multiple formats easily. The tool must facilitate this
process by providing a virtual data layer which hides all
complexities and presents data in a simple, business-
understandable terms. The users should be able to build
query interactively using wizards, but the tool should
also provide option to write SQL scripts
18. Simple Query Users should be able to retrieve data from single tables
using simple queries
19. Medium Query Users should be able to retrieve data from up to three
tables using a simple join or similar statements
20. Complex Queries Users should be able to apply filters, subqueries, and
apply set operators like (Union, intersection etc) using
simple wizards as well as native SQL scripts
21. Scheduled Queries Tool must be able to run queries at scheduled
times/intervals and or relevant business events (e.g. for
compliance reporting)
22. Multi pass query To improve performance, tools must support breaking
down queries and running them in parallel and then
aggregating them back into one report
23. High-volume query Tools must support in-memory analytics query against
very large datasets including big data
24. Ad-hoc query generator Users should be able to run their own query on ad-hoc
basis
25. Auto-Charting System must decide best graph/chart for the selected
data for less tech-savvy users
Data Management
26. Data Quality Tools should support data profiling, cleansing, and
methodologies to support preparation of data for
DataLytics applications for Structured, Semi-structured
and Unstructured data of large variety, flowing in large
volumes

e-Pragati Requirements Specification Document - DataLytics Page 65 of 101


27. Single View of Data System should provide fuzzy matching capabilities and
provide single view of entities such as citizen,
household, commodities, criminal etc.
28. Quality Knowledge Base System should provide inbuilt algorithms for
standardisation, data Parsing, Matching as part of the
Quality Knowledge Base. The fuzzy algorithms should be
India specific. System should also provide interactive
interface to customise these algorithms
29. Data Quality Monitoring System should provide business rules repository to
monitor the quality of incoming data on regular basis by
creating rules and tasks to take corrective/auditing
actions.
30. Business data Management Should provide capabilities for interactive and
collaborative management of business data glossary and
Reference data glossary (such as valid values of Gender,
Schemes etc.) that promotes a common understanding
between stakeholders in the organisation.
31. Master data Tool must support preparation of a system of
record(Master data) for DataLytics applications and
reporting
32. Metadata System must support creation, consolidation, ongoing
auditing and reporting on Whole-of-government level
metadata. Separate meta data management systems for
structured, unstructured data and Discovery Platform
shall be created
The system should support technical, business, and
analytical metadata management
Integration Requirements
33. APIs/Webservices Tool must provide APIs/Webservices that integrate with
other applications of various types and technologies
(REST/SOAP compatible)
34. Native Application integration System should integrate seamlessly with with other
enterprise applications
35. Data Integration The suite must provide batch integration through ETL
and ELT and/or data federation. The integration must
work on all types of data – Structured, unstructured, and
semi-structured.
36. Batch updates System should allow users to request records, make
updates to the records, and then send the updates back
to the data source at some other time without
maintaining a connection to the database
37. Interactive updates The system should have ability to change information in

e-Pragati Requirements Specification Document - DataLytics Page 66 of 101


real-time
38. Data Warehouse write back Ability to update data in the warehouse by certain users
(if necessary)
39. Big Data Integration The tool should provide relevant access components to
accees data in Hadoop and apply all the ETL
transformation on it.
Analytical Requirements
40. Advanced Analytics Support for descriptive, causal, predictive, prescriptive,
statistical, mathematical and big data analytics
41. Real-time Analytics Sensor Analytics, Log Stream Analytics
At the minimum the following Analytical capabilities shall be provided
42. OLAP and Data mining Must support data mining tasks like decision trees and
neural networks.
Must support OLAP Analysis including slicing and dicing,
drill up and down analyses
43. Data Mining a. Collaborative and easy-to-use GUI with an
integrated, complete view of data. Incorporate
analytics in a secure and scalable manner.
b. Build more, better models faster. Scalable
processing
c. Deliverable via the web.
d. Should support asynchronous model training.
e. Provides XML diagram exchange.
f. Batch processing.
g. Tool should help develop custom models.
h. Should be able to reuse diagrams as templates for
other projects or users.
i. Should have interactive self-documenting process
flow diagram environment to shorten model
development time
j. Easily derive insights in a self-sufficient and
automated manner.
k. Parallel processing – run multiple tools and
diagrams concurrently.
l. Should support concurrent multithreaded
execution of modelling algorithms.
m. Should support In-database and in-Hadoop scoring
to deliver faster.
n. Should be able to access and integrate structured
and unstructured data sources, including time
series data, web paths and survey data etc.
o. Should support iterative processing, stratified
modelling, bagging and boosting, multiple targets,

e-Pragati Requirements Specification Document - DataLytics Page 67 of 101


and cross-validation.
p. Tool should document the analysis process and
facilitate results sharing.
q. Tool should support easy-to-use large repository
of both batch and interactive graph plots and
visualisation. Plots and tables should be
interactively linked, supporting tasks such as
brushing and banding.
Should support embedding training and scoring
processes into customised applications.
44. Forecasting/Planning Tools a. System must provide tools for forecasting and
planning. The tools may be used by specific
departments at their discretion.
b. Tool shouldprovide large-scale automatic
forecasting solution that offers unsurpassed
scalability.
c. It should enable automatic diagnostics and statistical
forecasting in batch mode or through the interactive
graphical user interface.
d. Should have easy-to-use GUI.
e. Should automatically determine the forecasting
models that are most suitable for the historical data.
f. Should have a web interface which provides
standardised workflow.
g. Should provide large scale, hierarchical forecasting
and reconciliation. Also should support flexible user-
customisable hierarchies.
h. Should have features to convert time-stamped data
to time-series data.
i. Should provide user override facility, flexible
scenario analysis.
j. Should incorporate events management and
support exceptional rule settings.
k. Should have an extensible model repository that
includes intermittent demand models, unobserved
components models, ARIMAX models, dynamic
regression, exponential smoothing models with
optimised parameters, as well as user-defined
models.
l. Tool should have capabilities to identify data
problems, such as outlier detection and
management, missing values, and date ID issues.
m. It should support Enhanced performance through

e-Pragati Requirements Specification Document - DataLytics Page 68 of 101


multithreading of the diagnostic and forecasting
engines.
Should support rules-based segmentation of time series
45. Descriptive and prescriptive a. Tool must computes correlation statistics,
analysis nonparametric measures of association and the
probabilities associated with these statistics.
b. Computes directly and indirectly standardised rates
and risks
c. Should compute descriptive statistics based on
moments (including skewness and kurtosis),
quantiles, percentiles (such as the median),
frequency tables, and extreme values and goodness-
of-fit tests for a variety of distributions.
d. Should have techniques for Global/Local Search
Optimisation and Constraint Programming.
e. Should support Discrete-Event Simulation
f. Should have capabilities Project and Resource
Scheduling through a single, integrated system.
46. Statistical Modelling Tools a. System must provide robust sampling and Data
(including predictive partitioning techniques.
modeling) b. Tool should support state of the art data
replacement, imputation and Transformations
techniques.
c. Tool should provide exhaustive techniques for
dimension reduction and variable selection.
d. Tool should Interactive variable binning capabilities
and support creation of ad hoc data-driven rules and
policies.
e. Incorporate prior probabilities into the model
development process
f. Tool should support decision trees methodologies
like CHAID, CART, Bagging, boosting, bootstrap
forest, gradient boosting. Tree selection based on
profit or lift objectives and prune accordingly, K-fold
cross-validation. Decision tree algorithms should
support multiple splitting criteria.
g. Tools should support multiple regression techniques
both linear and non-linear, cross validation, fast
forward stepwise least squares regression
h. Tool should support variable binning to detect
nonlinear relationships, class variable reduction
techniques.
i. Tool should support multiple neural networks

e-Pragati Requirements Specification Document - DataLytics Page 69 of 101


techniques.
j. Tool should incorporate Partial Least Squares, Two-
stage modelling, MBR, Recursive predictive
modelling, incremental response models.
k. Tool should support time series data mining.
l. Should support link analysis and survival analysis
m. Tool should also have features to automatically
generate predictive models for a variety of business
problems.
n. Should support ensemble modelling, automatic
model comparison and exhaustive model evaluation
techniques.
o. Support interactive scoring of new data. Scoring
code should captures modelling, clustering,
transformations and missing value imputation code.
p. Support in-database modelling.
q. Should support model registration and
management.
47. Cluster Analysis a. System must support Cluster analysis.
b. Should support User defined or automatic selection
of clusters.
c. Should handles missing values.
d. Should support multiple strategies for encoding
class variables into the analysis.
e. Tools should have capabilities to show variable
segment profile plots, distribution of the inputs and
other factors within each cluster.
f. Tool should have capabilities of executing Self
Organising maps (SOMs) for Statistical Pattern
Recognition
g. Interact with other modelling techniques.
48. Planning, Budgeting and Standard planning and budgeting operations like Net
Financial Functions Present Value, ROI etc. should be supported by the tool
49. Dedicated department The tool must have dedicated
specific business application applications/APIs/functionalities to support department
suites specific analytics. For e.g. performance management of
Agriculture department, or other analytics which
department may use only for internal purposes.
50. Textual analytics a. Tools should have a single system for guided text
model development and deployment
b. The system must have single, point-and-click GUI
interface for natural language processing.
c. Should support hybrid approach to categorising

e-Pragati Requirements Specification Document - DataLytics Page 70 of 101


documents
d. Should have ability to create, modify and enable (or
disable) custom concepts and test linguistic rule
definitions with validation checks within the same
interactive interface.
e. Should have centralised metadata management for
all project properties.
f. Automatic and user based discovery of topics.
Should have ability to control the number of topics
generated by splitting topics (splitting one topic into
two, repeatedly if necessary), and merging similar
topics into one topic.
g. Tool should have features to interrogate term
relationships within topics, explore with term clouds
(with configurable thresholds), interactive term
maps and by drilling into topics to evaluate
relevancy and refine discovered topics.
h. Should support generation of configurable rules and
improved linguistic context.
i. Rules can be tested on an input data set prior to
deployment.
j. Should provide a seamless and scalable text
analytics solution.
51. Sentiment analysis a. Sentiment analysis (also known as opinion mining)
refers to the use of natural language processing, text
analysis and computational linguistics to identify and
extract subjective information in source materials.
DataLytics must have the capability to perform
sentiment analysis.
b. Tool should have unique hybrid approach to
sentiment analysis. Should provide capabilities to
incorporate both statistical rigor and linguistic rules
to define sentiment models driving more detailed
sentiment evaluations.
c. Tool should provide dynamic sentiment analysis.
Tool should have real-time and batch run
capabilities to scour intranet and Internet websites
and to process document collections.
d. Easy-to-use interface for model development and
refinement.
e. Support complex linguistic rules for one or several
matches of a term, regular expressions, part-of-
speech tags etc. Also should have prebuilt tasks to

e-Pragati Requirements Specification Document - DataLytics Page 71 of 101


simplify linguistic pattern identification.
f. Support High-performance, multithreaded crawler
that can be deployed in a distributed and/or grid
mode to maximise processing and support
extremely large-scale crawls for both internal file
system and Internet crawls.
g. Should support Crawler plugins for social media
sites.
52. Location based analysis Analyses based on geographic locations. The data is
normally collected from internet when people check-in
to locations, IP address from where the data is
originating from and so on. DataLytics should have
capability to perform location based analysis
53. Pattern detection a. Discovery of pattern in large amount of data in areas
as diverse as health, farming, climate, etc. DataLytics
should have capability to perform pattern
recognition functions.
b. Tool should support machine learning algorithms for
pattern detection.
c. Tool should support supervised and Unsupervised
learning with techniques like SOMs, ANN, nearest-
neighbour mapping, k-means clustering and SVDs
etc.
54. Forecasting/Planning Tools System must provide tools for forecasting and planning.
The tools may be used by specific departments at their
discretion
55. Descriptive and Decision Tool must support descriptive and decision modelling
modelling
56. Regression Analysis System should include regression analysis tool (Linear
and Logistic regression)
57. Topic detection Topic detection is the task of detecting topics that are
previously unknown to the system. Topic means
abstraction of a cluster of stories that discuss the same
event.
58. Pattern detection Discovery of pattern in large amount of data in areas as
diverse as health, farming, climate, etc. DataLytics
should have capability to perform pattern recognition
functions
Information Delivery Capabilities
59. Report bursting Ability to send out single reports to multiple locations at
the same time
60. Time based scheduled Ability to send out reports to multiple locations at

e-Pragati Requirements Specification Document - DataLytics Page 72 of 101


reporting defined time intervals
61. Event based reporting Ability to send out reports to multiple locations after
critical business events (month end, quarterly etc)
62. Alert/Alarm Automatic notification system that can be triggered to
alert a user when an event occurs
63. Content and Report Archiving Versioning and archiving reports
64. Search Ability to search databases and find reports and
unstructured data
65. Dedicated DataLytics portal Provides users with a dedicated and personalised access
to organisational data through a single portal interface,
where they can collaborate and share data. Enables
remote access.
66. Integration with 3rd Party Tool can publish reports and data to 3rd Party tools like
Portals SharePoint
67. Mobile Device support Tool must enable access to data via apps in handheld
mobile devices (smart phones and tablets). Reports,
alerts, notification, dashboard, and certain basic query
facilities must be available on handheld devices.
68. Mobile Device Management System should allow management of devices which can
connect to the reporting application by providing device
block/white listing and other such capabilities
Technical Specifications
69. Client Operating System DataLytics system must support standard operating
Supported systems like Windows, Linux, iOS, Android, unix or any
standard operating systems.
Specifications for Big Data Platform
70. Enterprise Data warehouse a. The proposed EDW platform should offer enough
( EDW) redundancy and load balancing features to maximise
availability. If one node fails then the workload
should be distributed to other nodes without any
user intervention.
b. EDW System should have "Hot Pluggable" disks.
Global hot spare drives for quick recovery is
desirable.
c. EDW platform shall be capable of storing different
data types including but not limited to Geospatial,
Temporal, XML, JSON etc. natively.
d. EDW shall provide a Logical Data Model or the
capability to design the same in the 3NF,
representing the information relevant to state
government in the most logical form and shall be
easily customisable. EDW shall support normalised

e-Pragati Requirements Specification Document - DataLytics Page 73 of 101


data and Dimensional data.
e. EDW should support advanced data compression to
manage larger data sets more efficiently. The tool
should support at least 3X compression ratio.
f. Depending on type of data and business
requirements, EDW solution should be capable of
storing data in both row and column format
simultaneously in a given table.
g. The EDW should have built in functionality of
temporal data types to support date and time based
analytics
h. EDW shall provide support and tight integration
capabilities with leading ETL/ELT, BI and Analytics
tools.
i. EDW should efficiently manage simultaneous mixed
workload like loading, transformation, Business
Analytics, data mining, development, etc.
j. EDW solution should scale in a linear fashion and
behave consistently with growth in data, number of
concurrent users and complexity of queries. The
System should have the ability to deliver very
predictable performance improvements as new
nodes are added into the system without any
performance penalty, when nodes/ racks are added
in the environment the overall network bandwidth,
performance, concurrency, storage etc. should also
scale linearly.
k. EDW solution should provide capability of linear
scalability from terabytes to petabytes of data. The
data warehouse System should be proven to scale to
support minimum 5 Petabytes of data or more.
l. EDW should support a strong optimizer to tune
query execution, for response time as well as
throughput.
m. EDW should be designed and sized to provide In-
database support transformation function (ELT),
mining, GIS Analytics and other analytical functions.
It should combine large amounts of data with
sophisticated analytical processing capabilities
available within database for fast, efficient, parallel
and scalable execution of queries.
n. EDW solution should offer intelligent memory
management which determines data temperature

e-Pragati Requirements Specification Document - DataLytics Page 74 of 101


based on frequency-of-access and least-recently-
used methods simultaneously. Depending on
temperature, data should move between disk,
memory and cache without any user intervention.
o. The EDW platform should know what data is
available in memory and automatically uses that
copy instead of going to disk.
p. For better performance, the SI should propose
required SAN disks. It is recommended that disk size
could be offered as per the proposed solution.
q. The EDW should have the capability to classify users
and prioritize their queries. It should automatically
define workloads based on business needs. The
work load management should do the following
i. Establish service level goals for multiple
workload types.
ii. Develop prioritization strategies according
to business rules.
iii. Separate system resources into virtual
partitions for departments, geographic
regions, or funding models.
iv. Monitor adherence to your service level
goals.
v. Priority decisions based on business
objectives
vi. CPU as well as IO prioritization
vii. Ability to set automatic points where query
priorities can be adjusted to either give
more or less resources to a query for
dynamic query scaling.
viii. Ability to dynamically adjust priorities
manually while the query is running.

r. EDW platform should have Role Based Access


Control (RBAC) Mechanism. These controls should
be configured to fully protect the data against
unauthorised access and monitoring of activity by
privileged users (system administrators, DBAs,
power users with full, ad hoc query access). Clearly
defined Separation of Duties should be
implemented and enforced throughout the EDW
Platform.
s. EDW Platform solution should provide data

e-Pragati Requirements Specification Document - DataLytics Page 75 of 101


encryption.
t. EDW Platform solution should have Tamper proof
Audit trail mechanism for user access. It should
record details of all access to the system at the
individual user level. This includes, in part,
logon/logoff times. Query run times, data accessed
or modified, etc.
u. EDW Platform should also track and log attempted
access to unauthorised data. In case of critical data
it should trigger an alert and send a notification to
the administration and security personnel
v. The data warehouse Systems should provide
network interconnects based on minimum 40Gbps
InfiniBand or equivalent network.
w. As scalability is one of the key requirements for e-
Pragati and considering fast developments in
hardware industry, it is essential that the data
warehouse System platform should allow co-
residence support of at least two generations of
hardware. Co-residence would preserve the current
investment in data warehouse system as new
hardware generations are added to the existing Data
Warehouse system and overall act as one
warehouse.
71. Hadoop (equivalent or better) a. If the solution is based on Apache Hadoop, it should
support technology distribution from Hortonworks
or Cloudera or MapR. If the solution is not based on
Hadoop, then it should still be based on an
equivalent or better open source framework.
b. It is recommended to have an appliance based or
equivalent solution that is easily scalable,
maintainable, and provides high levels of
performance even when volume, variety, and
velocity of data is high.
c. The solution should have network interconnect
within and across appliances of 40 Gbps InfiniBand
or better This interconnect should be redundant for
failover
d. The system should come with built in High
Availability and Redundancy of all the components.
e. The system should be linearly scalable and capable
of holding multiple petabytes of data.
f. The system should have a single management tool

e-Pragati Requirements Specification Document - DataLytics Page 76 of 101


to monitor hardware, OS, software, administration
and management.
g. It should have features and interfaces such as Map
Reduce, embedded statistical algorithm and mining
libraries, predictive modelling integration, decision
automation, and mixed workload management.
h. The system solution should tightly integrate with
data warehouse and advanced discovery lab at
architecture level. These systems should be able to
initiate a SQL query which can get executed on
Hadoop environment and result given back to them
respectively without any data movement for query
processing, duplicate copies of data or human
intervention.
Metadata for Unstructured Data Processing
a. The system would perform analytics on
unstructured and semi structured data so it is
important to have metadata. It is important to
dynamically and automatically profiles datasets,
collect statistics on underlying data, attributes and
statistical profiles of these values
b. Metadata Management tool should have capabilities
for data understanding, tracking, exploring, cleaning,
and transformation
c. Metadata Management tool should track the lineage
of every dataset from the beginning when it is
loaded into HDFS, throughout its entire lifecycle as it
is processed, cleaned, and refined
d. As data is transformed, whether the transformation
happens through cleaning and wrangling tools,
through several other tools in the ecosystem, these
transformations should be tracked by Metadata
Management tool and stored in its registry
e. The tool automatically detects when new data is
registered in HDFS or Hive, and performs this
profiling without user intervention.
f. The solution should be based on open API (like
RESTful API) so that it can be extending to other Big
Data environment in future.
72. Advanced Discovery Lab Data Exploration
a. The solution should provide connectors to connect
to industry leading database to fetch data for
analysis. If required, DataLytics Discovery platform

e-Pragati Requirements Specification Document - DataLytics Page 77 of 101


should be able to connect and acquire data from
new sources on its own.
b. The Lab should be capable of storing data in row and
column format and as well as in file format. This will
bring flexibility in the architecture.
c. The Lab should demonstrate near similar
performance with growth in data volume, and the
algorithms should be highly scalable without having
to sample the data. The tool should be able to
support both model building and model scoring on
complete data.
d. The Lab should have query management to
minimise data movement and execute query where
data resides. All analytical queries initiated through
Lab should be intelligent to determine query
execution based on where data resides across Big
Data Platform, break the query accordingly, push
the part of the query to where data resides and
subsequently merge the results at Lab. The user
initiating the query should not be bothered where
data resides.
Advanced Analytics
Discovery platform shall have all Advanced Analytical
capabilities that are specified as part of DataLytics
requirements – Descriptive, prescriptive, causal,
predictive, mathematical, statistical, and Big Data
Analysis
e. The lab should offer ready-to-deploy advance
analytical functions (Graph, Path, Advance Analytics,
etc.) to be utilised in a single SQL query without the
need of coding and juggling through multiple tools
and should be able to integrate with the various
open source libraries like “R”, etc.
f. It should have prebuilt functions like
o Graph Analysis (analysis and visual
representation of complex networks)
o Path Discovery Analysis (To describe
directed dependencies among a set of
variables, an extension of multiple
regression. To provide estimates of the
magnitude and significance of hypothesised
causal connections between sets of
variables.)

e-Pragati Requirements Specification Document - DataLytics Page 78 of 101


o Pattern discovery analysis(branch of
machine learning that focuses on the
recognition of patterns and regularities in
data).
o Statistical Analysis(study of the collection,
analysis, interpretation, presentation, and
organisation of data)
o Machine learning Analysis (pattern
recognition and computational learning
theory)
o Text and Sentiment Analysis(use of natural
language processing, text analysis and
computational linguistics to identify and
extract subjective information in source)
o Temporal and time series Analysis(support
for handling data involving time, being
related to the changing dimensions)
g. The Lab should discover new patterns and trends of
interest and any potential correlations or triggering
events in any defined subject area.
h. The Lab should be designed for rapid, on the fly
discovery of insights that allows for iterations that
keep enriching previously gained insights.
Pre-Built Functions
i. Datalytics Discovery platform shall provide a wide
variety of pre-built functionalities
Visual IDE
j. A visual IDE is required for Data scientists and other
users of Discovery platform so that they can focus
on data discovery. DataLytics must be able to get
best out of available systems, and should allow Data
scientists and Statisticians evaluate and manipulate
statistical algorithms if required.
k. The Lab should allow to create multiple applications
with pre built advance analytical functions or
analytical code into an App (like app store on smart
phone) that can be parameterised and run
repetitively, thus simplifying and accelerating the
app building, configuring, running and sharing
experience. The end users should be able to run this
app instead of executing technical queries.
Support for R/open source analytical platform
l. Tool should have capabilities to code in R , Train and

e-Pragati Requirements Specification Document - DataLytics Page 79 of 101


score supervised and unsupervised R models.
m. R functions should run against all the data and run in
a coordinated parallel manner often performing
iteration and aggregations where a single answer is
returned.
n. Generate model comparisons using internal models
and models made using R.
Note: Functionality given here can be met as a
separate solution component or part of the overall
Datalytics solution. SI can propose a better
optimised best practice for Discovery Platform,
However same should be demonstrated and proven
during the PoC acceptance stage.

73. Server Operating System DataLytics system must be deployable on Open source
Big Data Processing Platform (Hadoop equivalent or
better). The system should be deployable on Windows
or Linux or any common operating systems
74. Application development Application development must be language agnostic and
interactive with no dependency on Hadoop or any niche
skill sets.
75. Development tools Tools must provide interactive and wizard driven
application development environment. But at the same
time must provide native development kits or tools so
that people with programming knowledge and skills can
develop and deploy Applications
76. Cloud readiness DataLytics solution should be deployable in virtualised
envirnoments

e-Pragati Requirements Specification Document - DataLytics Page 80 of 101


ANNEXURE 2 – TECHNICAL REQUIREMENTS OF DATALYTICS
The Table below provides high-level technical requirements of DataLytics system:

Sl Requirements
no
1. The SI shall provide a single system that is optimised and tuned to provide maximum
performance, scalability, and efficiency for DataLytics
2. The hardware and software configuration must be built to protect against component failures
such as disk failures, CPU failures, memory failure, network card failures, and system
controller failures.
5. The proposed system should have an integrated management and monitoring system from
disk to applications.
6. The proposed system should have a unified patching approach where a single release should
patch the entire system viz firmware, Bios, OS, Server, Network and system software’s.
7. The proposed system should have a high-speed network interconnect between all
components.
9. The SI should provide single support to all the DataLytics components, operating system, and
hardware.

 Operating System
 Virtualisation
 Servers
 Storage
 Network
 Embedded network switching technology

e-Pragati Requirements Specification Document - DataLytics Page 81 of 101


ANNEXURE 3 – NON-FUNCTIONAL REQUIREMENTS OF DATALYTICS
3.1. Audit
Logging and Exception management will help in tracing and troubleshooting specific problems. Logging
may be required for fulfilling regulatory requirements as well. A Robust system will contain dual-purpose
logs and activity traces for audit and monitoring, and make it easy to track a transaction without
excessive effort or access to the system.
S. No Requirements Description
1. DataLytics System shall maintain log of all transactions that take place in the system.

2. Audit Logs should be written once and be readable on multiple devices in a secure manner

3. Audit Logs should be backed up periodically to a permanent storage

4. DataLytics System shall develop and apply Data Lifecycle policies to the Log files

5. DataLytics System shall ensure security and confidentiality of Log data

6. Audit Logs should be useful for debugging, error reconstruction, and attack detection

7. Audit Logs shall capture audit trails to a secure location

8. Exception messages shall ensure that no unintended information, which could compromise
data, is displayed
9. DataLytics System shall always be fail safe

10. DataLytics shall have Tamper proof Audit trail mechanism for user access. It should record
details of all access to the system at the individual user level. This includes, in part,
logon/logoff times. Query run times, data accessed or modified, etc.
DW Platform should also track and log attempted access to unauthorised data. In case of
critical data it should trigger an alert and send a notification to the administration and
security personnel
11. DW Platform should also track and log attempted access to unauthorised data. In case of
critical data it should trigger an alert and send a notification to the administration and
security personnel
12. DataLytics should have the capability to perform database activity monitoring and blocking
on the network combined with consolidation of audit data from popular databases like
Oracle, MySQL, Microsoft SQL Server, SAP Sybase, and IBM DB2 as well as from operating
system logs.
13. The system should have the capability to perform White list, black list, and exception list
based enforcement on the network.
14. DataLytics should have extensible audit collection framework with templates for XML and
table-based audit data.

e-Pragati Requirements Specification Document - DataLytics Page 82 of 101


15. DataLytics should have built-in customisable interactive compliance reports (in various
formats like pdf, excel and so on) combined with proactive alerting and notifications.
16. The system should have fine grained source-based authorisations for auditors and
administrators
17. Audit Data from multiple sources should be stored in a secured place for convenience, and
reliability
18. The system should have a centralised, secured and highly scalable architecture to support
large number of databases with high traffic volume and also should provide high availability
support

3.2. Service Categorisation


An indicative categorisation of services are shown below. This classification shall be used to
determine requirements such as Data retention, resource priority etc.

Usage
Criticality High Medium Low
High Type 1 Type 2 Type 3
Medium Type 4 Type 5 Type 6
Low Type 7 Type 8 Type 9
T ABLE 14: SERVICE CATEGORIES

Criticality shall be defined in terms of Application, and Users. An indicative categorisation matrix is
provided below. Resources shall be assigned in the order of priority

Criticality Users Applications


High Such as CM, ministers, police Applications used by high priority users, and
department, and other top priority those applications that are critical – Disaster
users as identified by government management etc.
Medium Department level users Dashboard applications, e-Pragati
applications that require analytical services,
Low Data Scientists, researchers, Discovery platform
Statisticians, Business users.
T ABLE 15: USER CATEGORIES

e-Pragati Requirements Specification Document - DataLytics Page 83 of 101


3.3. In-memory Capabilities
In-memory capability requirements are given below:

Sl. no Description
1 Database should have capability to form in-memory columnar format to accelerate
analytics
2 The in-memory capability should work transparently with existing applications, BI and
Reporting tools
3 The in-memory capability should be compatible with Cloud Computing, Big Data and Data
Warehousing
4 The in-memory capability should be scalable and should ensure data availability and
scalability
5 The in-memory capability should not have any hardware lock-in or limitations
6 The in-memory capability should not have any database size limit

3.4. In-Database Processing capabilities


DataLytics system shall have In-database processing capabilities, and the requirements are provided
below:

Sl. no Description
1 The Datalytics system shall have in-Database mining, analytical and querying capabilities,
and should be able to interoperate with other DBMS.

2 It should combine large amounts of data with sophisticated analytical processing capabilities
available within database for fast, efficient, parallel and scalable execution of queries.

3 Features and interfaces such as Map Reduce, embedded statistical algorithm and mining
libraries, predictive modelling integration, decision automation, and mixed workload
management would be preferred.

3.5. Search Capabilities


DataLytics content management system must provide capability to search archived reports using key
words. It should also provide Rich content-based indexing and advanced search capability to quickly
find specific near real-time or historic events.
3.6. Content Management
DataLytics system must integrate with Enterprise Content management system. The system must
provide facility to store reports, documents and other artefacts created by DataLytics system. The
content management system must provide search facility, subscription facility, versioning capability,
access control, publish and ability to hold content management metadata.
3.7. Security, Access Control and Privacy
High level Security, Access Control and Privacy requirements are provided below:

e-Pragati Requirements Specification Document - DataLytics Page 84 of 101


S. No Requirements Description
1. The following principles govern the assurance of the privacy of personal information:
a. Notice—residents should be given notice when their data is being collected.
b. Purpose—data should only be used for the purpose stated and not for any other
purposes.
c. Consent—data should not be disclosed without the resident’s consent.
d. Security—collected data should be kept secure from any potential abuses.
e. Disclosure—residents should be informed as to who is collecting their data.
f. Access—residents should be allowed to access their data and make corrections to any
inaccurate data.
g. Accountability—residents should have a method available to them to hold data
collectors accountable for not following the above principles.

2. Personal information can only be processed in the following circumstances only:


a. for specified explicit and legitimate purposes and not in a way incompatible with those
purposes;
b. when processing is necessary for the performance of a task carried out in the public
interest;
c. when processing is necessary for compliance with a legal obligation;

3. Personal information may be processed only insofar as it is adequate, relevant and not
excessive in relation to the purposes for which they are collected and/or further processed.

4. The personal information must be accurate and, where necessary, kept up to date. Every
reasonable step must be taken to ensure that personal information which is inaccurate or
incomplete, having regard to the purposes for which they were collected or for which they
are further processed, is deleted or rectified.
5. The personal information in the custody of the DataLytics or any other body, which is a part
of DataLytics, shall not be transmitted to any other body or person without the appropriate
legal authority.
6. Encryption requirements must be identified and applied where relevant. For example:
Passwords, and other sensitive data

3.8. Special Accessibility Requirements


DataLytics system must be able to analyse textual content in Telugu, and Hindi language apart from
English.

3.9. Environmental Requirements


DataLytics shall support production, pre-production, testing and development environments in line
with best practices. This will ensure that quality of the software is maintained, and a standard
procedure for software change deployment is enforced. Other benefits include – reduced
production incidents, early identification and resolution of defects, reduced data security incidents
etc.

e-Pragati Requirements Specification Document - DataLytics Page 85 of 101


The pre-production, test, and development environment shall mirror production environment, and
efforts must be made to keep them in synch. However, data in production and pre-production shall
not be replicated to test as is, or if it has to be replicated, exceptions will have to be obtained and
non-public information be scrubbed.
3.10. Data Acquisition Considerations
Data acquisition considerations for DataLytics are given below:

S. No Requirements Description
1. DataLytics system shall have capability to collect structured, unstructured, and semi-
structured data from various sources such as logs, webpages, sensors, emails, documents
etc. Relevant pull or push based mechanisms shall be used to collect data. Example: Web
crawler to pull data from Web pages.
2. DataLytics shall provide capability to hold and transmit raw data collected from various
sources to Data centre
3. DataLytics shall provide capability to transit collected data within datacentres to place
them on right devices for processing
4. DataLytics system shall provide mechanisms to cleanse different types of data –
Traditional, Sensor based, Log Data, and data from internet. Relevant data cleansing
frameworks have to be included. For example: BIO-AJAX framework for standardising
biological data and so on
5. DataLytics system shall provide mechanisms to eliminate data redundancy. Data
deduplication, redundancy detection, and data compression are some of the common
ways to eliminate redundancy, and DataLytics shall adapt these in an optimal manner so
as to not to adversely impact computational capability

3.11. Data Transformation


Structured and Unstructured data shall be transformed according to business transformation rules
before being persisted. Detail transformation rules shall be identified during SRS phase.

e-Pragati Requirements Specification Document - DataLytics Page 86 of 101


3.12. Scalability
DataLytics shall provide for horizontal scalability from Terabytes to Petabytes of data, without
affecting performance, predictability, or stability of the system. Given below are initial size and an
estimate growth rate of data. Please note that these are high-level estimates.

Components Initial Load (TB) Y-o-Y Growth (expected)


Structured Data 50 20%
Unstructured Data 200 25%
Business Intelligence 10 20%
Discovery 5 20%
T ABLE 16: ESTIMATED DATA GROWTH

As scalability is one of the key requirements for e-Pragati and considering fast developments in
hardware industry, it is essential that the data warehouse system platform should allow co-
residence support of at least two generations of hardware. Co-residence would preserve the current
investment in data warehouse system as new hardware generations are added to the existing Data
Warehouse system and overall act as one warehouse.

3.13. Performance
3.13.1. Queries per second
DataLytics shall support following number of Queries per second

Simple Queries Medium Queries Complex Queries


350 100 50
T ABLE 17: QPS ESTIMATE

3.13.2. Service Prioritisation


Resource priority shall be based on Service types, in ascending order of Service types (Type 1 – Type
9 section 3.2)

3.13.3. Response Time


Response Time Refresh Rate
Simple Queries <= 20 Sec 90% of queries shall refresh within stipulated
time
Medium Queries <= 60 sec 90% of queries shall refresh within stipulated
time
Complex Queries <= 180 sec 90% of queries shall refresh within stipulated
time
T ABLE 18: QUERY RESPONSE TIMES

e-Pragati Requirements Specification Document - DataLytics Page 87 of 101


3.13.4. Concurrent Query Support
DataLytics Concurrent Query Support requirements are provided below:

Query type Concurrency YoY Growth


Simple 350 10%
Medium 100 10%
Complex 50 10%
TABLE 19: CONCURRENT QUERY SUPPORT

Number of power users - 15

Total number of users – 150 with an anticipated growth rate of 5% per annum

3.14. Data Ingestion Rate


A Very high level estimate of data ingestion is provided in the Table below. The peak ingestion load
depends on many factors includes events such as disasters etc. These values will have to be further
verified and validated by the SI.

Data Type Ingestion Rate


Structured 30 GB Per day (uncompressed data)
Unstructured Real-time/Streaming Batch
50 GB/Day (Uncompressed, 100 GB/Day (Uncompressed, Usable
Useable data) data)
T ABLE 20 : DATA INGESTION RATE

3.15. Data Compression and Storage Tiers


Data compression and storage tiers shall be based on type and Value/usage of data. Frequently
accessed data shall be made available in-memory for quick access and consumption. DataLytics shall
provide at least 3x data compression capability Storage tiers and Compression ratio has a direct
effect on performance of analytical system. Relevant compression techniques shall be applied so as
to minimise impact on query performance.

3.16. Manageability
DataLytics should efficiently manage simultaneous mixed workload like loading, transformation,
Business Analytics, data mining, development, etc.

3.17. Usability
Sl Description
no
1 DataLytics shall describe solutions capability to support multiple user interfaces and any
limitations to the ability to support major web browsers (i.e. Internet Explorer, Firefox,
etc.).

e-Pragati Requirements Specification Document - DataLytics Page 88 of 101


2 As scalability is one of the key requirements for e-Pragati and considering fast
developments in hardware industry, it is essential that the data warehouse system
platform should allow co-residence support of at least two generations of hardware. Co-
residence would preserve the current investment in data warehouse system as new
hardware generations are added to the existing Data Warehouse system and overall act
as one warehouse.

e-Pragati Requirements Specification Document - DataLytics Page 89 of 101


ANNEXURE 4 – BILL OF MATERIALS OF DATALYTICS
4.1. Sizing Requirements
Bidder is required to Size the hardware as per the indicative generic requirements given in this
section.

4.1.1. Generic Requirements


Generic requirements of DataLytics system is provided below:

S.No. Requirements
1. The SI shall provide a single system that is optimised and tuned to provide maximum
performance, scalability, and efficiency for DataLytics
2. The hardware and software configuration must be built to protect against component
failures such as disk failures, CPU failures, memory failure, network card failures, and
system controller failures.
3. The proposed system should have an integrated management and monitoring system
from disk to applications.
4. The proposed system should have a unified patching approach where a single release
should patch the entire system viz firmware, Bios, OS, Server, Network and system
software’s.
5. The proposed system should have a high-speed network interconnect between all
components.
6. The solution SI should provide single support to all the DataLytics components,
operating system, and hardware.
 Operating System
 Virtualisation
 Servers
 Storage
 Network
 Embedded network switching technology
T ABLE 21: GENERIC REQUIREMENTS

e-Pragati Requirements Specification Document - DataLytics Page 90 of 101


4.1.2. Hardware Requirements
The indicative bill of material has to be suggested by the Bidder in their proposal. The following sub-
sections provide a snapshot of bill of material required to be in place for the DataLytics.

4.1.3. Sizing Parameters


Minimum Hardware requirement for DataLytics system should be based on the below parameters:

1. Proposed solution should be scalable and upgradable


2. The Data Warehouse platform should have built-in redundancy and should have balancing
features to maximise availability. If one node fails then the workload should be distributed to
other nodes without any user intervention.
3. The Data Warehouse system should have "Hot Pluggable" disks. Global hot spare drives for quick
recovery is desirable.
4. The Data Warehouse system should support advanced data compression to manage larger data
sets more efficiently. The tool should support at least 3X compression ratio.
5. The Data Warehouse solution should scale in a linear fashion when nodes/ racks are added in
the environment the overall network bandwidth, performance, concurrency, storage etc. should
also scale linearly.
6. The Data Warehouse system solution should provide capability of linear scalability from
terabytes to petabytes of data. The data warehouse system should be proven to scale to support
minimum 5 Petabytes of data or more.
7. The data warehouse system should provide network interconnects based on minimum 40Gbps
InfiniBand or equivalent. This interconnect should be redundant for failover
8. The System should support High Availability and Redundancy of all the components.

4.1.4. Data Size


S No. Type of Data Initial Data Size ( in TB) – YoY growth
uncompressed capacity
1 Structured 50 25%
2 Unstructured 200 20%
3 Business Intelligence 10 20%
4 Discovery Lab 5 20%
T ABLE 22: DATA SIZE AND GROWTH RATES
Notes:
1. The data size mentioned in the above Table represents raw and uncompressed data capacity.
2. The sizing of the proposed solution has to be done based on above data points without assuming
any compression
3. It is expected that the proposed system would be highly data-parallel and compute-parallel to
address scalability needs of e-Pragati. We are introducing the concept of “unit” to build the
required infrastructure that is driven by the demand in a modular fashion. Such a unit will
contain the compute, storage and other hardware capacity necessary to host the DataLytics
requirements.

e-Pragati Requirements Specification Document - DataLytics Page 91 of 101


4. Based on type of data, following 3 types of units are required. The ratio in which key
components are to be provided in a unit type is also mentioned. The SI has to provide other
required components of the unit based on their solution design and best practices.

The Tables given below provide the details on compute, storage and memory requirements for
structured and unstructured data processing units, and discovery platform:

4.1.4.1. Structured Data Processing Unit

Usable Data Storage Usable Data & compute core ratio Memory
( uncompressed)
25 TB storage
(Indicative usable storage GoAP. will provide a standard Usable Data GoAP will provide a
and SI can choose any & Compute ratio as part of the proposal as standard RAM (Unit) for
standard storage ex: 20TB acceptance criteria (Unit) and for any the proposed usable
or any XX TB (Unit)) future change in usable data store during storage and compute as a
the contract period, SI should install the standard. For any
required compute as per the acceptance addition of storage or
criteria. compute during the
contract period, GoAP
will add equivalent
memory to the DPU.
T ABLE 23: STRUCTURED DATA UNIT

e-Pragati Requirements Specification Document - DataLytics Page 92 of 101


4.1.4.2. Advanced Discovery lab (Unstructured and Structured) Unit

Usable Data Storage Usable data & compute core ratio Memory
( uncompressed)
5 TB storage GoAP. Will provide a standard Usable GoAP. will provide a
(Indicative usable storage Data & Compute ratio as part of the standard RAM (Unit) for
and SI can choose any proposal as acceptance criteria (Unit) and the proposed usable
standard storage ex: 5TB for any future change in usable data store storage and compute as a
or any XX TB (Unit)) during the contract period, SI should standard. For any addition
install the required compute as per the of storage or compute
acceptance criteria. during the contract
period, GoAP will add
equivalent memory to the
DPU.
T ABLE 24: ADVANCED DISCOVERY LAB UNIT

4.1.4.3. Unstructured Data Processing Unit

Raw Data Storage Raw Data & compute core ratio Memory
( uncompressed)
48 TB storage
(Indicative usable storage GoAP. will provide a standard Usable Data GoAP will provide a
and SI can choose any & Compute ratio as part of the proposal as standard RAM (Unit) for
standard storage ex: 20TB acceptance criteria (Unit) and for any the proposed usable
or any XX TB (Unit)) future change in usable data store during storage and compute as a
the contract period, SI should install the standard. For any addition
required compute as per the acceptance of storage or compute
criteria. during the contract
period, GoAP. will add
equivalent memory to the
DPU.
T ABLE 25: U NSTRUCTURED DATA PROCESSING UNIT

Notes:

From an implementation perspective, the GoAP. will provision for infrastructure for size for initial
data, and YoY growth.

Server Specifications for reference

e-Pragati Requirements Specification Document - DataLytics Page 93 of 101


GoAP will procure the hardware for Data warehouse, Business Intelligence/Reporting, Hadoop
system and Advanced Discovery Lab at minimum. For this, SI has to propose an appropriate
infrastructure which:

 Should be deployable at Interim Data Center of GoAP near Andhra Pradesh New Capital.
 Meets the proposed solution requirement and SLA requirements for DataLytics
 Should support the concept of “Unit” for modular infrastructure requirements of DataLytics
and :
o The SI has to provision appropriate type and number of cells for initial data, YoY
growth.
o The data sizing (either usable or raw) has to be done as per Unit definitions.
o In case of additional infrastructure capacity required beyond planned YoY growth,
GoAP have to supply number of “Units” to address that requirement.
o Supports high performing optimized infrastructure (server, storage, and network)
deployment for reduced data center foot print.

 Should have separate environments for Development, Testing, Pre-production for the entire
project duration.

All other supporting hardware like load balancer, network equipment, backup architecture will be
considered by the GoAP to support for implement and working for the proposed DataLytics solution.

4.1.5. Application Software & System Software


S. No Components QTY

1. Application Software (COTS ) As per Solution


2. Operating System As per Solution
3. Database Server As per Solution
4. Portal Server As per Solution
5. If the system is Hadoop (Hortonworks or Cloudera or MapR As per Solution
distribution covering licenses for Name Node, Data Node,
metadata, query connectors, management and Others)
If not Hadoop, then SI shall consider equivalent parameters
for other unstructured/Big Data processing platforms
6. Data Warehouse System ( including database software, query As per Solution
connectors, backup & archival, DC replication, management
and other )
7. Advanced Discovery Lab ( including query connectors, As per Solution
management and others )
T ABLE 26: APPLICATION AND SYSTEM SOFTWARE

4.1.6. Indicative Manpower requirement to design, implement and provide services to


GoAP

e-Pragati Requirements Specification Document - DataLytics Page 94 of 101


The following Table describes the minimum man power requirements of DataLytics solution for the

Sl. Designated Role / Title Implementation Operations and


Phase Maintenance Phase
1. Project Manager(e-Pragati) 1 1
2. System Analysts 7 3
3. Analytics Expert (SME) 1
Preferably with Government
Experience
4. Data Scientist/Analytics 4 2
Modeller/Reporting

total project duration. However, in case any difference in number of resources that are proposed by
the SI, he has to provide a detailed justification on how the proposed resources would meet the
overall man power requirements for the total project duration. Also please note that as the phase 2
implementation will still be going on the Year 2 of the project, entire operations team may not be
required and the table below for Operations and Maintenance represents the indicative resources
from year 3 onwards.

e-Pragati Requirements Specification Document - DataLytics Page 95 of 101


Specialist
5. IT Infrastructure/Data 1 1
Center Manager
6. Database Administrator 1 1
7. Systems Administrator 1 1
8. Network Support staff 0 1
9. Big Data Administrator 1 1
10. Change Management Lead 1 0
11. Technical Support Staff 0 3
T ABLE 27: MANPOWER REQUIREMENTS

Specific SLA
The following describes various SLAs of queries:

S No. Type of Number of Typical response Stipulated Concurrent YoY


query tables joined time queries Growth
1 Simple Query with up At least 90% of the <=20 Sec 350 10%
to 2 table joins simple queries
with index should refresh
selection within the stipulated
time
2 Medium Query with up At least 90% of the <=60 sec 100 10%
to 4 table joins medium queries
with index should refresh
selection within the stipulated
time
3 Complex Query with up At least 90% of the <= 180 sec 50 10%
to 6 table joins complex queries
with sort and should refresh
group by clauses within the stipulated
time
T ABLE 28: SERVICE LEVEL AGREEMENT

e-Pragati Requirements Specification Document - DataLytics Page 96 of 101


ANNEXURE 5 - SOFTWARE TESTING &QUALITY ASSURANCE
Quality Assurance (QA) being termed as several processes across the organisation for every
department and group which are matured to different levels over a period of time. The Systems
Integrator is expected to follow a meticulous Quality Assurance process as defined by its
organisations during all activities and phases of the complete project duration. However, it is
mandatory to check the Government of Andhra Pradesh policies, standards, guidelines and
specifications that are recommended at all stages always using the latest revisions.

SOFTWARE QUALITY CONTROL (SQC)


The quality control teams should follow a meticulous process for testing. The SQC leads are expected
to participate in bottom up estimates, project planning, requirements analysis, create test plans and
test models, collect test data and run them in cycles and phases.

The teams are expected to follow a very systematic approach and use appropriate tools for
bidirectional traceability including the defect tracking. The tool is expected to provide end-to-end
trace-ability from requirements to defects and vice-versa (reverse traceability). The traceability has
to be achieved at least by mapping the defects to test cases which in turn are mapped to
requirements that are bundled into sub-modules under modules. As every Business Process is
expected to be achieved by composition of services (Orchestration or Choreography) all such
services have to be mapped to specific requirements. Similarly, it has to be noted that all non-
functional requirements have to be mapped to test cases which in turn should help to substantiate
the SLAs related to Performance, Scalability, Availability, Security and other criterion as defined in
the Non-functional requirement sections of each application specific ePRS documents of Volume I.
Methodology of development is the decision of the SI. The developed solution should adhere to
industry standards and the principles of e-Pragati.

The quality certification for every intermediate, demo or test release is expected to be given based
on detailed analysis and necessary reports substantiating the relevant criterion. Passing or failing
requirements based on traceability is mandatory. Every business process, service, module and sub-
module has to be certified not only based on the requirements that are passed or failed but also the
test cases and the defects still pending. All the senior managers from development, quality
assurance, project management, technical managers and GoAP would decide whether a version is fit
to release based on the reports. TPA is only for the Security Audit and it does not involve testing the
application. Any improved quality control processes and tools beyond the minimal scope stated
above is welcome and the system integrator will be given further attention during the proposal
review.

Different types of testing are anticipated like user interface testing, functional testing, compliance
testing, acceptance testing, smoke testing, integration testing, systems integration testing,
operational readiness testing, performance testing, load testing, stress testing, pre-production and
production testing. All Services and Business Processes have to be tested in a standalone mode. A
brief description of different types of testing presumed is given below in this section. Please note

e-Pragati Requirements Specification Document - DataLytics Page 97 of 101


that in all such testing defects are raised and they are fixed. However, in production applications
tickets are raised which are addressed in the support mode. Industry accepted best practices should
be used for testing.

Products, Applications (Desktop, Browser or Mobile) and Portals have to be tested for all the user
interface requirements in addition to the functional and non-functional requirements as applicable.
In all stages and different kinds of testing wherever possible it is required to use appropriate tools.

User Interface Testing


Graphics harmony, usability, navigation and functionality have to be tested using the same
traceability approach for the appropriate requirements. For repeated testing of user interface
recording, scripts and other techniques and tools are advised. Standards and policies for graphical
design have to be followed and the same are expected to be tested.

Functional Testing
As a part of the functional testing all the services (granular web services), business services and
business processes are expected to be tested independently in standalone mode using appropriate
tools. All messages (request/response) have to be tested for requirements including the compliance,
security, performance and other criteria. The products, applications and portals have to be tested
for the functional and non-functional requirements.

Integration Testing
The integration testing is the functional testing for integration requirements. All requirements for
integration between sub-modules, modules, intra-package and inter-package that are identified
during the requirements documentation have to be tested in this phase.

Systems Integration Testing


The systems that are alien to the developed by the system integrator are to be tested in this phase.
The systems could be other products like CRM, CMS, ERP, business processes, payment gateways,
other government products and other packages in e-Pragati. All such requirements have to
identified/tagged in the requirements document.

Compliance Testing
All requirements that are cross mapped to specific clauses in a specification, policy or standard have
to be tested in this phase. This testing is expected for all the clauses that are relevant in a
specification, standard or policy for the services, processes, products, applications and portals. A
report on the clauses of the specification with pass or fail results for compliance is mandatory.

Performance Testing
Performance Testing includes but not limited to load, stress, scalability and availability testing
requirements and related criteria. To meet specific SLAs or requirements necessary testing tools
have to be used to confirm that the results meet the defined criterion.

e-Pragati Requirements Specification Document - DataLytics Page 98 of 101


Security Testing
At different levels from services to products and applications, security has to be tested. Services are
to be tested not only for licensing, access, authorisation and other aspects but also for penetration
and injections for web, application and information tiers. User interfaces have to be tested specific
security requirements like URL rewriting, bots and others. Not only the access, authorisation,
auditing, validating, confidentiality, integrity, availability aspects of an application, service, process,
product or portal but also appropriate specifications referred or listed in the requirements
document. All security protocols (SSL, HTTPS, etc.), encryptions and other requirements have to be
tested in this phase.

Acceptance Testing
This otherwise called User Acceptance Testing is the testing of the product owners/stakeholders
who validate all functionalities as per the business. Such a subject matter expert team can also
review requirements and can pass or fail using the User Acceptance Test cases that are usually end
to end in nature.

Smoke Testing
The application or services are tested after deployment in its environment to ensure the application
sanctity. Some basic test cases are identified and run to check this.

Operational Readiness Testing


The production ready infrastructure and the environments are tested for its capacity, size, licenses,
upgrades, versions and all other aspects for compatibility including other systems for integration. All
scalability, availability in terms of redundancy, performance and all such requirements are ensured.
In a cloud the servers and environment procured initially also has to be tested to ensure the
requirements compatibility. Necessary loads may have to be generated using tools for Scalability
testing in a cloud environment.

Pre-Production Testing
This is otherwise called limited user testing. Before taking a release version to production limited
users are identified and rolled out to them to monitor the product or application. Any fixes required
are applied before taking to products.

Production Testing
A full release version is tested with full loads by end users for a limited period. This is otherwise
called the warranty state or stabilisation period.

It is recommend for a solution for automated testing and automated test case generation. This
ensures complete and appropriate test cases are generated, reducing waste and enhancing
application quality, as long as the scope and coverage of test cases and their results are verified and
signed off by PMU.

e-Pragati Requirements Specification Document - DataLytics Page 99 of 101


ANNEXURE 6 – REPORT CATEGORISATION
The system integrator is required to configure reports after gathering detailed report requirements
from departments. Broadly, reports can be categorised as: MIS reports using day-to-day
transactional data, and analytical reports using historical, and current data. While Online Transaction
Processing Systems (OLTP) such as People Hub, Land Hub, and other e-Pragati applications take care
of the former, DataLytics caters to the needs of latter.

Analytical reports may be categorised as Simple, Medium, and Complex reports. On a high-level, the
criteria for categorisation are:

1. Number of fields to be displayed on reports – As the number of fields increases, the complexity
increases
2. Number of data sources from which data has to be extracted – As the number of data sources
increases, the complexity increases
3. Timeliness of Reports (Real-time or batch) – More effort and resources are required to
configure real-time reports than batch. Also, if a large report has to be generated in few seconds
or a minute, then, naturally, it will require more resources than what is required to run the same
report in few hours.
4. Type of Data – Structured or Unstructured. It is much easier to run a query and generate report
using structured data than unstructured.
5. Complexity of query – Queries using complex joins and multiple sources will increase the
complexity of the reports

The Table below gives rule of thumb to determine whether a report is simple, medium or complex. It
is to be noted that these rules shall be considered as high level guidelines, and if required, the SI may
use his own rationale/rules to determine complexity of reports in consultation with departments
during the system study:

Report Complexity Parameters


Simple Number of fields: <= 20
Number of Data Sources: 1-2
Timeliness: Batch Mode
Query Complexity: Simple
Data Type: Structured Data
Medium Number of fields: 21-35
Number of Data Sources: 3-5
Timeliness: Batch Mode/Real-time
Query Complexity: Medium
Data Type: Mostly structured, but may include
unstructured data too
Complex Number of fields: > 36
Number of Data Sources: 5-6

e-Pragati Requirements Specification Document - DataLytics Page 100 of 101


Timeliness: Mostly real-time
Query Complexity: Medium-Complex
Data Type: Mostly unstructured data, but may include
structured data too

The following Table provides anticipated distribution of report complexity. However, these are high-
level figures, and it is the responsibility of the SI to validate these numbers.

Report Complexity Percentage of Total number of Reports


Simple 30%
Medium 40%
Complex 30%

*** End of Document ***

e-Pragati Requirements Specification Document - DataLytics Page 101 of 101