Sie sind auf Seite 1von 108

CONTENTS DECEMBER 2017 ISSN-2456-4885

86

28 38

R E G U L A R F E AT U R E S
07 FossBytes 16 New Products 104 Tips & Tricks

4 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


CONTENTS
EDITOR
RAHUL CHOPRA

EDITORIAL, SUBSCRIPTIONS & ADVERTISING


DELHI (HQ)
D-87/1, Okhla Industrial Area, Phase I, New Delhi 110020

43
Ph: (011) 26810602, 26810603; Fax: 26817563
E-mail: info@efy.in
62
MISSING ISSUES
E-mail: support@efy.in

BACK ISSUES
Kits ‘n’ Spares
New Delhi 110020
Ph: (011) 26371661, 26371662
E-mail: info@kitsnspares.com

NEWSSTAND DISTRIBUTION
Ph: 011-40596600
E-mail: efycirc@efy.in
ADVERTISEMENTS
MUMBAI
Ph: (022) 24950047, 24928520
E-mail: efymum@efy.in

BENGALURU
Ph: (080) 25260394, 25260023

Hive: The SQL-like Keras: Building Deep


E-mail: efyblr@efy.in

Data Warehouse Tool Learning Applications with


PUNE
Ph: 08800295610/ 09870682995
E-mail: efypune@efy.in

GUJARAT for Big Data High Levels of Abstraction


Ph: (079) 61344948
E-mail: efyahd@efy.in

JAPAN

20 66
Tandem Inc., Ph: 81-3-3541-4166
E-mail: japan@efy.in

SINGAPORE
Publicitas Singapore Pte Ltd
Ph: +65-6836 2272
E-mail: singapore@efy.in

TAIWAN
J.K. Media, Ph: 886-2-87726780 ext. 10
E-mail: taiwan@efy.in

UNITED STATES
E & Tech Media
Ph: +1 860 536 6677
E-mail: usa@efy.in

Printed, published and owned by Ramesh Chopra. Printed at Tara

“AI must be viewed in a Using jq to Consume


Art Printers Pvt Ltd, A-46,47, Sec-5, Noida, on 28th of the previous
month, and published from D-87/1, Okhla Industrial Area, Phase I, New
Delhi 110020. Copyright © 2017. All articles in this issue, except for

holistic manner” JSON in the Shell


interviews, verbatim quotes, or unless otherwise explicitly mentioned,
will be released under Creative Commons Attribution-NonCommercial
3.0 Unported License a month after the date of publication. Refer to
http://creativecommons.org/licenses/by-nc/3.0/ for a copy of the

Arjun Vishwanathan,
licence. Although every effort is made to ensure accuracy, no responsi- efy.in
for a free re
placement.

ort@
bility whatsoever is taken for any loss due to publishing errors. Articles us
at s
upp

associate director, emerging


to
rite Re
co
that cannot be used are returned to the authors if accompanied by a kp
ro
pe
rly
,w mm
en
d e

technologies, IDC India


self-addressed and sufficiently stamped envelope. But no responsibility
dS
r
wo

ys
ot

tem
sn

Re
oe

is taken for any loss or delay in returning the material. Disputes, if any,
Dd

qu
ire
DV

me
this

Solus 3 GNOME
n ts

will be settled in a New Delhi court only.


ase

: P4
In c

, 1G
B RA
M, D

Collection of
VD-RO

tended, and sh
s unin oul
c, i db
dis e
open source software
M Drive
att
the

rib
on

ute

for Windows
material, if found

d to t
he complex n

December 2017

DVD OF THE MONTH


able

SUBSCRIPTION RATES
atu
n

re
tio
ec

of

bj Int
o ern
Any t dat e
Note: a.

Year Newstand Price You Pay Overseas


(`) (`)
Five
Three
7200
4320
4320
3030

— Linux for your desktop.
Solus is an operating system that is designed for
One 1440 1150 US$ 120
• Solus 3 Gnome
home computing. It ships with a variety of
CD

Te

106
software out-of-the-box, so you can set it up
am
e-m
without too much fuss
Kindly add ` 50/- for outside Delhi cheques.
ail:
cd
tea
m@

• A collection of open source software


Please send payments only in favour of EFY Enterprises Pvt Ltd.
efy.
in

Non-receipt of copies may be reported to support@efy.in—do mention


your subscription number.

6 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


FOSSBYTES Compiled By: OSFY Bureau

Azure Functions gets Java support Debian 9.2 ‘Stretch’ brings


Support for Java functions has been added to Microsoft’s Azure Functions out 66 security fixes
serverless computing platform. The new beta inclusion is in addition to the existing The Debian Project has announced
support for JavaScript C#, the second maintenance update to the
F#, Python, PHP, Bash, Debian 9 Stretch operating system.
PowerShell and Batch codes. Debuted as version 9.2, the new
Azure Functions has platform includes a number of new
received all the features features and security patches.
of the Java runtime such The official announcement
as triggering options, data confirms that the new point release
bindings and serverless is not a new version of Debian 9,
models, with auto-scaling. but merely improves the included
The new support comes as packages. Therefore, instead of
an addition to the company’s performing a clean install of Debian
recently announced capabilities to run Azure Functions runtime on .NET Core. 9.2, you can opt for Debian’s up-to-
Developers with Java skills can use their existing tools to build new creations date mirror.
using Azure Functions. There is also support for plugins, and Microsoft has enabled
native integration of Maven projects using a specific plugin.
Azure Functions’ serverless computing platform already supports a list
of development languages and platforms. It competes with Amazon Web
Services’ AWS Lambda that is widely known for its out-of-the-box serverless
experience. Oracle, too, has recently announced its Fn project that competes
with Azure Functions.

Canonical drops 32-bit Ubuntu desktop ISO


Canonical has finally decided to drop support for the 32-bit live ISO release of the
Ubuntu distribution. With most of the architecture today being 64-bit, it was only a
matter of time that Linux distros stopped releasing 32-bit ISOs.
Confirming the development, Canonical engineer
Dimitri John Ledkov wrote, “…remove Ubuntu
desktop i386 daily-live images from the release “The Debian Project is pleased
manifest for beta and final milestones of 17.10 to announce the second update of
and therefore, do not ship ubuntu-desktop-i386. its stable distribution Debian 9
iso artifact for 17.10.” (codenamed ‘Stretch’). This point
It is worth noting that Canonical will only release mainly adds corrections for
stop building the 32-bit Ubuntu Desktop Live security issues, along with a few
ISO. The company will continue to focus on i386, adjustments for serious problems,”
which is becoming more of a purpose-built architecture for embedded devices. read the official announcement.
Canonical mainly wants to focus its efforts on the Internet of Things (IoT), Debian GNU/Linux 9.2 includes
where x86-32-bit is still very common. a total of 87 bug fixes and 66 new
You can continue to install Ubuntu on your 32-bit machines. However, security improvements. Various apps
Canonical will no longer release any new live ISO for these machines. and core components have also been
Canonical will continue to release minimal network installation ISOs for improved in this release. Another
a 32-bit hardware. These images will receive updates and security patches notable change is the inclusion of
until the next announcement. Linux kernel 4.9.51 LTS.
Alongside Canonical, open source distributions such as Arch Linux have also If you keep your Debian Stretch
recently phased out 32-bit support to encourage users to switch to newer hardware. installation updated, you need not
The 64-bit processors started becoming common since the launch of AMD update these packages using the point
Opteron and Athlon 64 in 2003. Today, every single mainstream processor release. The detailed changelog is
available in the market is based on either AMD64 or Intel 64 architecture. published on the official Web page.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 7


FOSSBYTES

OpenMessaging debuts to SUSE Linux Enterprise Server for SAP applications


provide an open standard coming to the IBM Cloud
for distributed messaging SUSE has announced that SUSE Linux Enterprise Server for SAP applications
The Linux Foundation has announced will be available as an operating system for SAP solutions on the IBM Cloud. In
a new project to bring about an open addition, IBM Cloud
standard for distributed messaging. is now a SUSE cloud
Called OpenMessaging, this project service provider,
is aimed at establishing a governance giving customers a
model and structure for companies supported open source
working on messaging APIs. platform that makes
Leading Internet companies them more agile and
like Yahoo, Alibaba, Didi and reduces operating
Streamlio are contributing to the costs as they only pay
OpenMessaging project, primarily for what they use.
to solve the challenges of messaging “Customers need
and streaming applications. access to a secure and scalable cloud platform to run mission-critical workloads,
The open standard design from one with the speed and agility of IBM Cloud, which is one of the largest open public
this project will be deployed in cloud deployments in the world,” said Phillip Cockrell, SUSE VP of worldwide
on-premise, cloud and hybrid alliance sales. “As the public cloud grows increasingly more popular for production
infrastructure models. workloads, SUSE and IBM are offering enterprise-grade open source software fully
Scaling with messaging services supported on IBM Cloud. Whether big iron or public cloud, SUSE is committed to
is a big problem. The lack of giving our customers the environments they need to succeed,” he added.
compatibility between wire-level Jay Jubran, director of offering management, IBM Cloud, said, “IBM
protocols and standard benchmarking Cloud is designed to give enterprises the power and performance they need to
is the most common issue faced. manage their mission-critical business applications. IBM provides a spectrum
When data gets transferred across of fully managed and Infrastructure as a Service solutions to support SAP
different streaming and messaging HANA applications, including SUSE Linux Enterprise Server as well as new
platforms, compatibility becomes bare metal servers with up to 8TB of memory.”
a problem. Additional resources as
well as higher maintenance costs Spotlight on adopting serverless technologies
are the main complaints about According to Gartner, “By 2022, most Platform as a Service (PaaS) offerings
messaging platforms. Existing will evolve to a fundamentally serverless model, rendering the cloud platform
solutions lack standardised guidelines architectures dominant in 2017 as legacy architectures.” Serverless is one
for fault tolerance, load balancing, of the hottest technologies in the cloud space today. The Serverless Summit
security and administration. organised on October 27 in Bengaluru by CodeOps Technologies put a spotlight
The needs of modern cloud- on serverless
oriented messaging and streaming technologies. The
applications are very different. conference helped
The Linux Foundation plans bring together
to address all these issues with the people who are
OpenMessaging project. The project passionate about
is also designed to address the issue learning and
of redundant work for developers, adopting serverless
and make it easier to meet the technologies in
cutting-edge demands around smart their organisations.
cities, IoT and edge computing. With speakers from
The project contributors plan to three different
facilitate a standard benchmark continents and
for application testing and enable 250 participants from all over India, the event was a wonderful confluence of
platform independence. experts, architects, developers, DevOps, practitioners, CXOs and enthusiasts.

8 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


FOSSBYTES

The highlight of the conference was the keynote by John Willis (of ‘The
Linux support comes to DevOps Handbook’ fame) who travelled all the way from the US for the event. He
Arduino Create talked about ‘DevOps in a Serverless World’ covering the best practices and how
The Arduino team has announced they manifest in a serverless environment. He also conducted a post-conference
a new update to the Arduino workshop on DevOps principles and practices.
Create Web platform. The initial Serverless technology is an interesting shift in the architecture of digital
release has been sponsored by solutions, where there is a convergence of serverless architecture, containers,
Intel and supports X86/X86_64 microservices, events and APIs in the delivery of modular, flexible and dynamic
boards. This enables fast and easy solutions. This is what Gartner calls the ‘Mesh App and Services Architecture’ (or
MASA, for short). With that theme, there were sessions on serverless frameworks
and platforms like the open source Fn platform and Kubernetes frameworks
(especially Fission), Adobe’s I/O runtime, and Microsoft’s Azure platform.
Serverless technology applications covered at the event included sessions like
‘Serverless and IoT (Internet of Things) devices’, ‘Serverless and Blockchain’,
etc. The hands-on sessions included building chatbots and artificial intelligence
(AI) applications with serverless architectures. The conference ended with
an interesting panel discussion between Anand Gothe (Prowareness), Noora
development and deployment (Euromonitor), John Willis (SJ Technologies), Sandeep Alur (Microsoft) and
of Internet of Things (IoT) Vidyasagar Machupalli (IBM).
applications with integrated Open Source For You (OSFY) was the media partner and the Cloud Native
cloud services on Linux-based Computing Foundation (CNCF) was the community partner for the conference.
devices. With Arduino Create
supporting Linux on Intel chips, Microsoft announces new AI, IoT and machine learning
users are now able to program tools for developers
their Linux devices as if these At Connect(); 2017, Microsoft’s annual event for professional developers, executive
were regular Arduinos. VP Scott Guthrie announced Microsoft’s new data platform technologies and cross-
The new Arduino Create platform developer tools. These tools will help increase developer productivity
features a Web editor, as well and simplify app development for intelligent cloud and edge technologies, across
as cloud-based sharing and
collaboration tools. The software
provides a browser plugin, letting
developers upload sketches to
any connected Arduino board
from the browser.
Arduino Create now allows
users to manage individual IoT
devices, and configure them
remotely and independently from
where they are located. To further
simplify the user journey, the devices, platforms or data sources. Guthrie outlined the company’s vision and
Arduino team has also developed shared what is next for developers across a broad range of Microsoft and open
a novel out-of-the-box experience source technologies. He also touched on key application scenarios and ways
that will let anyone set up a new developers can use built-in artificial intelligence (AI) to support continuous
device from scratch via the cloud, innovation and continuous deployment of today’s intelligent applications.
without any previous knowledge, “With today’s intelligent cloud, emerging technologies like AI have the potential
by following an intuitive to change every facet of how we interact with the world,” Guthrie said. “Developers
Web-based wizard. are in the forefront of shaping that potential. Today at Connect(); we’re announcing
In the coming months, the team new tools and services that will help developers build applications and services for
has plans to expand support for the AI-driven future, using the platforms, languages and collaboration tools they
Linux based IoT devices running already know and love,” he added.
on other hardware architectures too. Microsoft is continuing its commitment to delivering open technologies and
contributing to and partnering with the open source community.

10 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


FOSSBYTES

New version of Red Hat OpenShift Container Platform


launched for hybrid cloud environments MariaDB Foundation gets a
Red Hat has launched Red Hat OpenShift Container Platform 3.7, the latest new platinum level sponsor
version of Red Hat’s enterprise-grade Kubernetes container application platform. MariaDB Foundation recently
The new platform helps IT organisations to build and manage applications that announced that Microsoft has
use services from the data centre to the public cloud. become a platinum sponsor. The
The newest iteration is claimed to be the industry’s most sponsorship will help the Foundation
comprehensive enterprise Kubernetes platform; it includes in its goals to support continuity and
native integrations with Amazon Web Services (AWS) open collaboration in the MariaDB
Service Brokers that enable developers to bind services ecosystem, and to drive adoption,
across AWS and on-premise resources to create modern serving an ever-growing community
applications while providing a consistent, open standards- of users and developers.
based foundation to drive business evolution. “Joining the MariaDB
“Modern, cloud-native applications are not monolithic stacks with clear-cut Foundation as a platinum member
needs and resources; so to more effectively embrace modern applications, IT is a natural next step in Microsoft’s
organisations need to re-imagine how their developers find, provision and consume open source
critical services and resources across a hybrid architecture. Red Hat OpenShift journey. In
Container Platform 3.7 addresses these needs head-on by providing hybrid access to addition to
services through its service catalogue, enabling developers to more easily find and Microsoft
bind the necessary services to their business-critical applications—no matter where Azure’s strong
these services exist—and adding close integration with AWS to further streamline support for
cloud-native development and deployment,” said Ashesh Badani, vice president and open source technologies, developers
general manager, OpenShift, Red Hat. can use their favourite database as a
Red Hat OpenShift Container Platform 3.7 will ship with OpenShift Template fully managed service on Microsoft
Broker, which turns any OpenShift Template into a discoverable service for Azure that will soon include
application developers using OpenShift. OpenShift Templates are lists of OpenShift MariaDB,” said Rohan Kumar, GM
objects that can be implemented within specific parameters, making it easier for IT for database systems at Microsoft.
organisations to deploy reusable, composite applications comprising microservices. Monty Widenius, founder of
Also included with the new platform is OpenShift Ansible Broker for MySQL and MariaDB, stated,
provisioning and managing services through the OpenShift service catalogue by “Microsoft is here to learn from
using Ansible to define OpenShift Services. OpenShift Ansible Broker enables users and contribute to the MariaDB
to provision services both on and off the OpenShift platform, helping to simplify ecosystem. The MariaDB Foundation
and automate complex workflows involving varied services and applications across welcomes and supports Microsoft
on-premise and cloud-based resources. towards this goal.”
One of the fundamental principles
Announcing the general availability of Bash in AzureCloud Shell in Azure is about choice. Customers
Microsoft has announced the availability of Bash in Azure Cloud Shell. Bash in of Azure will now be able to run the
Cloud Shell comes equipped with commonly used CLI tools, including Linux shell apps they love, and Microsoft wants
interpreters, Azure tools, text editors, source control, build tools, container tools, to make sure that the MySQL and
database tools, and more. MariaDB experience on Windows and
Justin Luk, programme manager, Azure Linux hosts in Azure is excellent.
Compute, announced that Bash in Cloud Shell will Microsoft’s community
provide an interactive Web-based, Linux command engagement through open source
line experience from virtually anywhere. With foundations helps to nurture and
a single click through the Azure portal, Azure advance the core technologies that
documentation, or the Azure mobile app, users will the IT industry relies upon. MariaDB
gain access to a secure and authenticated Azure is a natural partner to Microsoft, as
workstation to manage and deploy resources from it is the fastest growing open source
a native Linux environment held in Azure. database. In most Linux distributions,
Bash in Cloud Shell will enable simple, secure authentication to use Azure MySQL has already been completely
resources with Azure CLI 2.0. Azure file shares enable file persistence through replaced with MariaDB.
CloudDrive to store scripts and settings.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 11


FOSSBYTES

Audacity 2.2.0 released All new Ubuntu 17.10 released


with an improved look With the new release of Ubuntu, there is some good news for GNOME lovers. After
and feel a long time, Ubuntu has come up with some major changes. The new release has
Audacity, a popular open source GNOME as the default desktop environment instead of Unity.
audio editing software, has received Ubuntu 17.10 comes with the newest software enhancements and nine months
a significant update. The new of security and maintenance updates. It is based on the Linux Kernel release
version, dubbed Audacity 2.2.0, series 4.13. It includes support
comes with four pre-configured, for the new IBM z14 mainframe
user-selectable themes. This enables CPACF instructions and new
you to choose the look and feel for KVM features. 32-bit installer
Audacity’s interface. It also has images are no longer provided for
playback support for MIDI files, and Ubuntu Desktop.
better organised menus. Around 198 Apart from this, GDM has
bugs have been fixed in this newly replaced LightDM as the default
released version — one of the major display manager. The login screen
changes is the improved recovery now uses virtual terminal 1 instead
from full file system errors. of virtual terminal 7. Window
The menus are shorter and control buttons are back on the right
clearer than in previous Audacity for the first time since 2010. Apps provided by GNOME have been updated to 3.26.
versions, and have been simplified Driverless printing support is now available for IPP Everywhere, Apple AirPrint
without losing functionality. The and Wi-Fi Direct. LibreOffice has been updated to 5.4 and Python 2 is no longer
most commonly used functions are installed by default. Python 3 has been updated to 3.6.
found in the top levels of the menus. Ubuntu 17.10 will be supported for nine months until July 2018. If you need
The functions that have moved long term support, it is recommended you use Ubuntu 16.04 LTS instead.
down into lower sub-menus are
better organised. End of Linux Mint with the KDE Desktop environment
You can download the update Linux Mint founder, Clement Lefebvre, announced in a blog post that
from www.audacityteam.org to try it the upcoming Linux Mint 18.3 Sylvia operating system will be the last
out on Windows/Mac or any Linux release to feature a KDE edition.
based operating system. In the post, Lefebvre said, “Users of the KDE edition represent a portion of
our user base. I know from their feedback that they really enjoy it. They will be
Blender tool to be used in able to install KDE on top of Linux Mint 19 of course, and I’m sure the Kubuntu
French animation movie PPA will continue to be available.
The soon-to-be-made animated They will also be able to port
movie ‘I Lost My Body’ will use Mint software to Kubuntu itself,
the open source Blender software or they might want to trade a
tool. The film will combine bit of stability away and move
Blender’s computer graphics with to a bleeding edge distribution
hand-drawn elements. such as Arch to follow upstream
At the recent Blender conference KDE more closely.” He added:
in Amsterdam, French filmmaker “KDE is a fantastic environment
Jérémy Clapin and his crew gave a but it’s also a different world,
presentation on the processes and one which evolves away from us and away from everything we focus on.
tools to be used in the making of Their apps, their ecosystem and the QT toolkit, which is central there, have
‘I Lost My Body’. very little in common with what we’re working on.”
‘The film’ will start production The bottom line is that Linux Mint 19 will be available only
next year, with a likely release in in Cinnamon, Xfce and MATE editions.
2019, adding to the open source
animation showreel, thanks to
Blender software. For more news, visit www.opensourceforu.com

12 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Guest Column Exploring Software

Aadhaar Could Simplify


Life in Important Ways
Anil Seth

All of you have perhaps been busy fulfilling the requirements


of linking your Aadhaar numbers with various organisations like
banks, mobile phone service providers, etc, just as I have been. I
am reminded of the time when relational databases were becoming
popular. Back then, the IT departments had to worry about
consistency of data when normalising it and removing duplication.

M
y name seems simple enough but it gets routinely to be online and paperless.
misspelled. All too often it does not matter and you, An easy option is to seed the public key with the Aadhaar
like me, may choose to ignore the incorrect spelling. database. It does not need to be issued by a certification
We tend to be more particular that the address is correct with authority. Any time a person digitally signs a digital document
an online shopping store, even if the name is misspelled. On the with his private key, it can be verified using the public key.
other hand, you will want to make sure that the name is correct on There is no need to worry about securing access to the public
a travel document, even if there is a discrepancy in the address. key as, by its very nature, it is public.
Reconciling data can be a very difficult process. Hence, This can save on court time as well. In case of any disputes,
it comes as a surprise that once PAN is linked to the bank no witnesses need be called; and even after many years, there
accounts and Aadhaar is linked to the PAN, why create will be no need to worry about the fallibility of the human mind.
the potential for discrepancies by forcing banks to link the
accounts with Aadhaar, especially as companies do not have Elimination of life certificates
Aadhaar and one person can create many companies? Even after linking the Aadhaar number to the pension account,
This set me thinking about some areas where the UID would one still needs to go through the annual ritual of physically
have saved me a lot of effort and possibly, at no risk to my identity proving that one is alive!
in the virtual world. Obviously, the use cases are for digitally I am reminded of an ERP installation where the plant manager
comfortable citizens and should never be mandatory. insisted on keeping track of the production status at various stages,
against our advice. It took a mere week for him to realise his
When registering a will error. While his group was spending more time creating data, he
While an unregistered will may be valid, the formalities himself was drowning in it. He had less control over production
become much simpler if the will is registered. A local than he had before the ERP system was installed. Since we had
government office told me that registering a will is simple — anticipated this issue, it did not take us long to change the process
just bring two witnesses, one of whom should be a gazetted to capture the minimum data, as we had recommended.
officer (I wonder if there is an app to find one). Pension-issuing authorities should assume that the
It would be much simpler if I register the will using an pensioner is alive till a death certificate is issued. The amount
Aadhaar verification, using biometrics. No witnesses needed. of data needed in the latter case is considerably less!
Now, no one needs to know what I would like to happen after I
am no longer around. Lessons from programming
Most programmers learn from experience that exception handling
When registering a nominee is the crucial part of a well written program. More often than not,
If the Aadhaar ID of a nominee is mentioned, the nominee greater effort is required to design and handle the exceptions.
does not need to provide any documentation or witnesses other Efficient programming requires that the common
than the death certificate, for the nomination formalities to be transactions take the minimal resources. The design and
completed. Even the verification of the person can be avoided implementation must minimise the interactions needed with a
if any money involved is transferred to the account linked to user and not burden the user by providing unnecessary data.
that Aadhaar ID. One hopes that any new usage of UID will keep these
lessons in mind.
Notarisation of documents
By: Dr Anil Seth
The primary purpose of notarisation is to ensure that the
document is an authentic copy and the person signing could be The author has earned the right to do what interests him. You
prosecuted if the information therein is incorrect. This requires can find him online at http://sethanil.com, http://sethanil.
blogspot.com, and reach him via email at anil@sethanil.com.
you to submit physical paper documents, whereas the desire is

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 13


CODE
SPORT
In this month’s column, we discuss a real life NLP problem, namely,
detecting duplicate questions in community question-answering forums.
Sandya Mannarswamy

W
hile we have been discussing many questions get generated, and the need for specific
questions in machine learning (ML) and domain expertise to tag a question as duplicate.
natural language processing (NLP), I had a Hence, there is a strong requirement for automated
number of requests from our readers to take up a real techniques that can help in identifying questions that
life ML/NLP problem with a sufficiently large data are duplicates of an incoming question.
set, discuss the issues related to this specific problem Note that identifying duplicate questions is
and then go into designing a solution. I think it is different from identifying ‘similar/related’ questions.
a very good suggestion. Hence, over the next few Identifying similar questions is somewhat easier as
columns, we will be focusing on one specific real life it only requires that there should be considerable
NLP problem, which is detecting duplicate questions similarity between a question pair. On the other
in community question-answering (CQA) forums. hand, in the case of duplicate questions, the answer
There are a number of popular CQA forums to one question can serve as the answer to the
such as Yahoo Answers, Quora and StackExchange second question. This identification requires stricter
where netizens post their questions and get answers and more rigorous analysis.
from domain experts. CQA forums serve as a At first glance, it appears that we can use
common means of distilling crowd intelligence and various text similarity measures in NLP to identify
sharing it with millions of people. From a developer duplicate questions. Given that people express
perspective, sites such as StackOverflow fill an their information needs in widely different forms,
important need by providing guidance and help it is a big challenge to identify the exact duplicate
across the world, 24x7. Given the enormous number questions automatically. For example, let us consider
of people who use such forums, and their varied skill the following two questions:
levels, many questions get asked again and again. Q1: I am interested in trying out local cuisine.
Since many users have similar informational Can you please recommend some local cuisine
needs, answers to new questions can typically be restaurants that are wallet-friendly in Paris?
found either in whole or part from the existing Q2: I like to try local cuisine whenever I travel.
question-answer archive of these forums. Hence, I would like some recommendations for restaurants
given a new incoming question, these forums which are not too costly, but serve authentic local
typically display a list of similar or related questions, cuisine in Athens?
which could immediately satisfy the information Now consider applying different forms of text
needs of users, without them having to wait for similarity measures. The above two questions
their new question to be answered by other users. score very high on various similarity measures—
Many of these forums use simple keyword/tag based lexical, syntactic and semantic similarity. While
techniques for detecting duplicate questions. it is quite easy for humans to focus on the one
However, often, these automated lists returned dissimilarity, which is that the locations discussed
by the forums are not accurate, frustrating users in the two questions are different, it is not easy
looking for answers. Given the challenges in to teach machines that ‘some dissimilarities are
identifying duplicate questions, some forums put in more important than other dissimilarities.’ It also
manual effort to tag duplicate questions. However, raises the question of whether the two words ‘Paris’
this is not scalable, given the rate at which new and ‘Athens’ would be considered as extremely

14 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Guest Column CodeSport

dissimilar. Given that one of the popular techniques for word 5. Create a Bag of Words classifier and report the accuracy.
similarity measures these days is the use of word-embedding I suggest that our readers (specifically those who have
techniques such as Word2Vec, it is highly probable that just started exploring ML and NLP) try these experiments and
‘Paris’ and ‘Athens’ end up getting mapped as reasonably share the results in a Python Jupiter notebook. Please do send
similar by the word-embedding techniques since they are both me the pointer to your notebook and we can discuss it in this
European capital cities and often appear in similar contexts. column. Another exercise that is usually recommended is to
Let us consider another example. go over the actual data and see what types of questions are
Q1: What’s the fastest way to get from Los Angeles to marked as duplicate and what are not.
New York? It would also be good to do some initial text exploration of
Q2: How do I get from Los Angeles to New York in the the data set. I suggest that readers use the Stanford CoreNLP
least amount of time? tool kit for this purpose because it is more advanced in its text
While there may not be good word-based text similarity analysis compared to NLTK. Since Stanford CoreNLP is Java
between the above two questions, the information needs of based, you need to run this as a server and use a client Python
both the questions are satisfied by a common answer and package such as https://pypi.python.org/pypi/stanford-corenlp/.
hence this question pair needs to be marked as a duplicate. Please try the following experiments on the Quora data set.
Let us consider yet another example. 1. Identify the different Named Entities present in the
Q1: How do I invest in the share market? Quora train data set and test the data set. Can you cluster
Q2: How do I invest in the share market in India? these identities?
Though Q1 and Q2 have considerable text similarity, they 2. Stanford CoreNLP supports the parse tree. Can you use it
are not duplicates since Q2 is a more specific form of question for different types of questions such as ‘what’, ‘where’,
and, hence, cannot share the same answer as Q1. ‘when’ and ‘how’ questions?
These examples are meant to illustrate the challenges While we can apply many of the classical machine learning
involved in identifying duplicate questions. Having chosen our techniques after identifying the appropriate features, I thought
task and defined it, now let us decide what would be our data it would be more interesting to focus on some of the neural
set. Last year, the CQA forum, Quora, had released a data set networks based approaches since the data set is sufficiently
for the duplicate question detection task. This data set was also large (Quora actually used a random forest classifier initially).
used in a Kaggle competition involving the same task. Hence let Next month, we will focus on some of the simple neural
us use this data set for our exploration. It is available at https:// network based techniques to attack this problem.
www.kaggle.com/c/quora-question-pairs. So please download I also wanted to point out a couple of NLP problems related
the train.csv and test.csv files for your exploratory data analysis. to this task. One is the task of textual entailment recognition
Given that this was run as a Kaggle competition, there are where, given a premise statement and hypothesis statement, the
a lot of forum discussions on Kaggle regarding the various task is to recognise whether the hypothesis follows from the
solutions to this task. While I would encourage readers to go premise, contradicts the premise or is neutral to the premise.
through them to enrich their knowledge, we are not going to Note that textual entailment is a 3-class classification problem.
use any non-text features as we attempt to solve this problem. Another closely related task is that of paraphrase generation.
For instance, many of the winners have used question ID Given two statements S1 and S2, the task is to identify whether
as a feature in their solution. Some others have used graph S1 and S2 are paraphrases. Some of the techniques that have
features, such as learning the number of neighbours that been applied for paraphrase identification and textual entailment
a duplicate question pair would have compared to a non- recognition can be leveraged for our task of duplicate question
duplicate question pair. However, we felt that these are identification. I’ll discuss more on this in next month’s column.
extraneous features to text and are quite dependent on the If you have any favourite programming questions/
data. Hence, in order to arrive at a reliable solution, we will software topics that you would like to discuss on this forum,
only look at text based features in our approaches. please send them to me, along with your solutions and
As with any ML/NLP task, let us begin with some feedback, at sandyasm_AT_yahoo_DOT_com.
exploratory data analysis. Here are a few questions to our
readers (Note: Most of these tasks are quite easy, and can be
done with simple commands in Python using Pandas. So I By: Sandya Mannarswamy
urge you to try them out). The author is an expert in systems software and is currently
1. Find out how many entries there are in train.csv? working as a research scientist at Conduent Labs India
(formerly Xerox India Research Centre). Her interests include
2. What are the columns present in train.csv?
compilers, programming languages, file systems and natural
3. Can you find out whether this is a balanced data set or not? language processing. If you are preparing for systems
How many of the question pairs are duplicates? software interviews, you may find it useful to visit Sandya’s
4. Are there any NaNs present in the entries for Question 1 LinkedIn group ‘Computer Science Interview Training (India)‘
at http://www.linkedin.com/groups?home=&gid=2339182.
and Question 2 columns?

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 15


NEW PRODUCTS
Touch based Google Pixel 2
gesture control now available in India
headphones from
Beyerdynamic
Audio equipment manufacturer,
Beyerdynamic, has launched the
Aventho wireless headphones with
sound customisation technology. The
headphones have been developed
by Berlin based Mimi Hearing
Technologies.
These headphones sport touch
based gesture control on the right
ear cup, through which users can
receive and disconnect calls, increase
or decrease volume levels, etc. They
use the Bluetooth 4.2 protocol with
the aptX HD codec from Qualcomm,
which guarantees the best sound even Price:
without wires. ` 61,000 and ` 70,000 for
The Aventho wireless headphones the 64GB and 128GB variants of Pixel 2; and
come with the Tesla sound transducers Search giant Google has launched two
to offer a great acoustic performance. successors to its first smartphone in its
` 73,000 and ` 82,000 for
the 64GB and 128GB variants of Pixel 2 XL.
The compact size and cushion cups Pixel hardware line—the Pixel 2 and
ensure comfort over long hours. Pixel 2 XL.
Additional features include a The Pixel 2 comes with an expandable. Users can enjoy unlimited
frequency response of 32 ohms, a ‘always-on’ display of 12.7cm (5 online storage for photos and videos.
transmission rate of 48kHz/24 bits, and inches) with a full HD (1920 x 1080 With a battery of 2700mAh for
a play time of more than 20 hours on a pixels) AMOLED at 441ppi, and 2.5D Pixel 2 and 3520mAh for Pixel 2
single charge. Corning Gorilla Glass 5 protection. The XL, the smartphones are designed
The Beyerdynamic Aventho Pixel 2 XL comes with a full 15.2cm with stereo front-firing speakers,
headphones are packed in a sturdy (6 inches) QHD (2880 x 1440 pixels) and a headphone adaptor to connect
fabric bag and are available in black pOLED at 538ppi, with 3D Corning a 3.5mm jack. On the connectivity
and brown, at retail stores. Gorilla Glass protection. and wireless front, the devices offer
The Pixel 2 is powered by a 1.9GHz Wi-Fi 2.4GHz+5GHz 802.11/a/b/g/n/
Price: octa-core Qualcomm Snapdragon 835 ac 2x2 MIMO with Bluetooth
` 24,999 processor and runs on Android’s latest 5.0+LE, NFC, GPS, etc.
8.0.0 Oreo OS version. The Pixel 2 XL The feature-packed smartphones
also runs on Android Oreo, and comes have an aluminium unibody with a
with the Adreno 540 GPU. hybrid coating, and IP67 standard
On the camera front, both the water and dust resistance.
variants sport a 12.2 megapixel rear Both the smartphones are available
and an 8 megapixel front camera, online and at retail stores in black,
which have optical and electronic white and blue.
image stabilisation along with
fixed focus. Address: Google Inc., Unitech Signature
Address: Beyerdynamic India Pvt Ltd, Both phones offer a RAM of 4GB Tower-II, Tower-B, Sector-15, Part-II,
1, 10th Main Road, Malleshwaram West, and internal storage of 64GB and 128GB Village Silokhera, Gurugram, Haryana
Bengaluru, Karnataka 560003 (in two variants), which is not further 122001; Ph: 91-12-44512900

16 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


iBall’s latest wireless keyboard is silent Price:
iBall, a manufacturer of innovative ` 3,499
technology products, has launched a
wireless ‘keyboard and mouse’ set,
which promises a silent workspace
at home or the office. The set comes
with unique silent keys and buttons
for quiet typing and a distraction-free
work environment. Crafted with a rich,
piano-like sheen, the set adds elegance
to the desk space. accurate cursor movement, enabling The keyboard and mouse set is
The ultra-slim and stylish keyboard users to work on any surface, available at all the leading stores across
has a sealed membrane, which ensures ranging from wood to glass, the India with a three-year warranty.
greater durability and reliability. It is sofa or even a carpet. The high-
designed with 104 special soft-feel keys, speed 1600cpi mouse allows users Address: iBall, 87/93, Mistry Industrial
including the full numeric keypad. to adjust the speed of the mouse as Complex, MIDC Cross Road A, Near
The wireless mouse features blue- per requirements. The device offers Hotel Tunga International, Andheri
eye technology and an optical tracking reliable performance up to 10 metres East, Mumbai, Maharashtra 400093;
engine, which ensures responsive and and 2.4GHz wireless transmission. Ph: 02230815100

Voice assistant speakers launched by Amazon


Amazon has launched its voice
assistant speaker in three variants –
Echo Dot, Amazon Echo and Echo
Plus. The devices connect to ‘Alexa’
— a cloud based voice assistant
service that helps users to play music,
set alarms, get information, access a
calendar, get weather reports, etc.
The speakers support Wi-Fi
802.11 a/b/g/n and the advanced audio
distribution profile (A2DP). They enable
360-degree omni-directional audio to
deliver crisp vocals and dynamic bass.
With seven microphones, beam-
forming technology and noise
cancellation, the speakers can take Price:
commands from any direction, even ` 3,149 for Echo Dot,
in noisy environments or while ` 6,999 for Amazon Echo and
playing music. They are also capable ` 10,499 for Echo Plus
of assisting users in controlling lights, woofers, and 1.52cm (0.6 inches)
switches, etc, with compatible connected and 2.03cm (0.8 inches) tweeters, All three variants are available in
devices, or even in ordering food online. respectively. black, grey and white, via Amazon.in.
The Amazon Echo Dot is the most All the speakers come with four
compact and affordable version with physical buttons on the top to control Address: Amazon India, Brigade
a 1.52cm (0.6 inches) tweeter. The the volume, the microphone, etc. At Gateway, 8th Floor, 26/1, Dr Rajkumar
Amazon Echo and Echo Plus are larger the bottom, the device has a power Road, Malleshwaram West, Bengaluru,
variants, with 6.35cm (2.5 inches) port and 3.5mm audio output. Karnataka 560055

The prices, features and specifications are based on information provided to us, or as available
on various websites and portals. OSFY cannot vouch for their accuracy. Compiled by: Aashima Sharma

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 17


Advertorial

The Growing Popularity of


Bug Bounty Platforms
After a spate of high-profile ransomware and malware infiltrated IT systems
worldwide, Indian enterprises are now sitting up and adopting bug bounty
programmes to protect their applications from hacking attacks.

T
he global security threat scenario has changed radically followed by frequent attempts to conceal the fact after
in recent times. If hackers of yore were mainly the incident, it can seriously impact whether customers
hobbyists testing the security limits of corporate will continue to deal with the company in any way. In
systems as an intellectual challenge, the new threat comes the final analysis, customers are not willing to put their
from well-concerted plans hatched by criminal gangs working data at risk with a vendor who does not value and protect
online with an eye to profit, or to compromise and damage their personal information.
information technology systems. India has not been spared in this regard. Recent reports
The widespread hack attacks have also become possible allege that customer data at telecom giant Reliance Jio
because of the high degree of connectivity of devices, was compromised and previously, this occurred at online
like smartphones, laptops and tablets, that run a variety of restaurant guide Zomato.
operating systems. Companies need to team up with the right kind of
When consumer data gets compromised it has an hackers. Organisations cannot on their own match the wiles
immediate impact on the brand and reputation of the of the thousands of very smart hackers. This battle cannot
affected company, as was evident when Verizon cut its be fought with internal resources alone. Companies need to
purchase price for Yahoo by US$ 350 million, after a build a culture of information-sharing on security issues with
online portal revealed that it had been repeatedly hacked. government CERTs (computer emergency response teams),
When the data of a company gets compromised and is security companies and security researchers.

18 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Advertorial

Countering malicious hackers needs a large number Companies also take cover behind a smokescreen of
of ‘ethical hackers’, also known as ‘white hats’, who denial when they are actually hit by cyber attacks, as Indian
will probe your systems just as any hacker would, but law does not make it mandatory to report security incidents
responsibly report to you any vulnerabilities in your to the CERT or any government agency. However, the
system. Many of them do this work for recognition, so regulatory framework is expected to change with the Reserve
don’t hesitate to name the person who helped you. Do Bank of India, for example, making it mandatory for banks to
appreciate the fact that they are spending a lot of their time report cyber security incidents within two to six hours of the
identifying the security holes in your systems. attacks being noticed.
This concept is not new. It has been tried by a number of Indian organisations also do not have a local platform
Internet, information technology, automobile and core industry for engaging with researchers, which would define the
companies. Google, Facebook, Microsoft, ABN AMRO, financial, technical and legal boundaries for the interaction
Deutsche Telekom and the US Air Force are some of the many in compliance with local regulations. Such a platform would
organisations that have set up their own reward programmes. give these companies the confidence that they can engage
And it has helped these companies spot bugs in their systems safely with people who are not on their payroll, even if their
that were not evident to their own in-house experts, because main objective is to hack for bugs.
the more pairs of eyes checking your code, the better. Bug bounty platforms like SafeHats are connecting
Some companies might hesitate to work with hobbyist enterprises with white hacker communities in India.
researchers, since it is difficult to know, for example, Safehats.com, powered by Instasafe Technologies, is a
whether they are encouraging criminal hackers or not. leading Security as a Service provider. It offers a curated
What if the hobbyists steal company data? platform that helps organisations to create a responsible
As more and more organisations are becoming vulnerability disclosure policy that lays down the rules of
digital, startups now offer their services through Web engagement, empanels reputed researchers, and makes sure
or mobile applications, so their only assets are the that the best and the safest white hackers get to your systems
software apps and customer data. Once the data breach before the bad guys do.
happens, customer credentials get stolen or denial of SafeHats has been working with some leading banking
services attacks occur, leading to huge losses in revenue, organisations and e-commerce players in securing their
reputation and business continuity. By becoming part of applications. Once vulnerabilities are discovered, SafeHat
the bug bounty platform, companies can create a security helps to fix them and to launch secure apps to the market.
culture within the organisations. The key difference with this kind of platform is that the
Indian companies have a unique advantage if they organisations pay the security researchers only if the bug is
decide to crowdsource the identification of security found, and the amount paid is based on the severity of the bug.
vulnerabilities in their IT infrastructure since the country A large number of Indian enterprises are in dire
has one of the largest number of security researchers, who need of tightening up on their security, as the compute
is part of the crowd that are willing to help organisations infrastructures of an increasing number of organisations are
spot a bug before a criminal does. being breached. On the other hand, we see an opportunity
The 2017 Bugcrowd report cited 11,663 researchers for Indian companies to leverage the large talent pool of
in India that worked on bug bounty programmes, which white hackers from India. SafeHats in Bengaluru was born
is behind the US with about 14,244 white hat hackers. out of the need to bring Indian companies and hackers
While most of them have jobs or identified themselves as together, in a safe environment.
students, 15 per cent of bug hunters were fully engaged More organisations are now aware about their
in the activity, with this number expected to increase, security needs after the high-profile Wannacry and Petya
according to Bugcrowd. ransomware attacks. Lot of growth stage startups have
Although Indian hackers earned over US$ 1.8 million shown interest in adopting bug bounty programmes as
in bounties in 2016-17, the bounties paid by Indian they have realised application security is key to their
companies added up to a paltry US$ 50, according to next round of funding.
HackerOne, indicating that local firms are not taking Sandip Panda, CEO of Instasafe, says, “Security is
advantage of the crowdsourcing option. now an important topic in every organisation’s board
Part of the reason is that Indian companies are still wary room discussions. Investment in security is as important as
of having their security infrastructure and any vulnerability investment in the product itself. Bug bounty platforms will
in it exposed to the public. This over-cautious approach create an entirely new security culture in India.”
could backfire in the long term, as it is always better to
look for bugs cooperatively with responsible hackers in a By: Shasanka Sahu
controlled environment, rather than have the vulnerabilities
The author works at Instasafe Technologies Pvt Ltd.
eventually spotted and exploited by criminals.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 19


For U & Me Interview

Arjun Vishwanathan,
associate director, emerging
technologies, IDC India

Q What are the latest trends in


the world of AI?
IDC predicts that by 2018, 75 per cent
of enterprise and ISV development
will include cognitive, AI or machine
learning functionality in at least one
application, including all business
analytics tools. The adoption of AI
solutions is set to grow at a fast pace,
especially in the Asia Pacific region
(excluding Japan) (APEJ). More
than half of the organisations in this
region are planning to adopt AI within
a five-year timeline.

Q What are your observations

“AI must be
on the evolution of AI?
Global spending on cognitive and AI
solutions will continue to see significant

vIewed In
corporate investment over the next
several years, achieving a compound
annual growth rate (CAGR) of 54.4 per

A holIstIc
cent through 2020 when revenues will
be more than US$ 46 billion. Around 59
per cent of organisations plan to make

mAnner”
new software investments in cognitive
or AI technologies, whereas 45 per cent
will make new investments in hardware,
IT services and business services. Data
services have the lowest rank in all the
categories. Overall, IDC forecasts that
Artificial intelligence (AI) is touching new worldwide revenues for cognitive and
heights across all verticals, including consumer AI systems will reach US$ 12.5 billion
services, e-commerce, mobile phones, life in 2017, an increase of 59.3 per cent
sciences and manufacturing, among others. over 2016.
But how will AI transform itself over time?
Arjun Vishwanathan, associate director,
emerging technologies, IDC India, discusses
the transformation of AI in an exclusive
Q In what ways can AI become
the icing on the cake for
enterprises moving towards
conversation with Jagmeet Singh of OSFY. digital transformation (DX)?
Edited excerpts... The adoption status of cognitive/AI
correlates highly with the information DX

20 | December 2017 | OPeN SOUrce FOr YOU | www.OpenSourceForU.com


Interview For U & Me

maturity of the organisations. More


organisations that adopted AI solutions
have moved into the later stages of
Q What are the obstacles
slowing down the growth of
AI nowadays?
information DX maturity, which is The primary barriers to AI solutions
then managed and optimised by AI. include a shortage of skill sets, an
To promote digital transformation that understanding of vendor solutions,
utilises IoT and cognitive systems, governance and regulatory implications.
it is important for user companies
to cultivate an ‘agile’ mindset. For
instance, it is necessary to determine IDC predicts that by
the ROI while using IoT and cognitive 2022, nearly 40 per
systems in actual situations.
cent of operational
processes will be
Q How do you see the
growth of machine
learning and deep learning in
self-healing and self-
learning — minimising
the growing space of AI?
IDC predicts that by 2019, all
the need for human
effective IoT efforts will merge intervention or
streaming analytics with machine adjustments.
learning trained on data lakes
and content stores, accelerated
by discrete or integrated
processors. An increase in the use
of machine learning will lower
Q Do you think companies like
Apple, Facebook, Google and
Microsoft will take the current
reliance on the programmatic AI model to the next level in
model of development. the future?
Amazon, Google, IBM and Microsoft

Q There is a belief
that AI will one day
become a major reason for
certainly have the largest mindshare in
the Asia-Pacific region. But domestic
platforms, such as Alibaba PAI and
unemployment in the IT world. Baidu PaddlePaddle are even better
What is your take on this? known in local markets. Also, the
AI must be viewed in a holistic IBM Watson platform has a larger
manner. Having said as much, mindshare compared with that of
AI and cognitive developments other bigger organisations.
are expected to make significant
inroads into hitherto unchartered
and mostly manual/human domains.
IDC predicts that by 2022, nearly
Q Why should enterprises
focus on enabling AI
advances to move towards a
40 per cent of operational processes profitable future?
will be self-healing and self- Increased employee productivity and
learning— minimising the need for greater process automation are the
human intervention or adjustments. most common expectations among
Additionally, as IDC recently organisations adopting or planning to
forecasted, as much as 5 per cent of adopt AI solutions. AI is presumed to
business revenues will come through bring significant business value to half
interaction with a customer's digital of the organisations in the APAC region
assistant by 2019. within two years. Customer service
All this merely proves that and support are the business processes
AI will increasingly complement that receive the most immediate
businesses in driving new and more benefits, whereas supply chain and
authentic experiences while also physical assets management see
driving business value. the least urgency.

www.OpenSourceForU.com | OPeN SOUrce FOr YOU | December 2017 | 21


Open Source India 2017

osi 2017:
The Show Continues
to Grow

T
he beautiful city of Bengaluru, threatened by thick dark
clouds, kept convention delegates anxious about reaching the
venue on time, since they also knew they would be braving
the city’s legendary traffic. Thankfully, the weather gods heard
the OSI team’s prayers and the clouds refrained from drenching
KEY FACTS
the visitors, allowing many from the open source industry as
Show dates: October 13-14, 2017
well as enthusiasts from the community to reach the NIMHANS
Convention Center well before the 8:30 a.m. registration time. Location: NIMHANS Convention
While there were a lot of familiar faces — participants who have Centre, Bengaluru, Karnataka, India
been loyally attending the event over the past years, it was also
heartwarming to welcome new enthusiasts, who’d come on account Number of exhibitors: 27
of the positive word-of-mouth publicity the event has been building
Brands represented: 33
up over the years.
In terms of numbers, the 14th edition of the event, which Unique visitors: 2,367
happened on October 13 and 14, 2017, witnessed 2,367 unique
visitors over the two days, breaking all previous attendance Number of conferences: 09
records. The event boasted of 70+ industry and community experts
Number of workshops: 14
coming together to speak in the nine conference tracks and 14
hands-on workshops. Number of speakers: 70+
The visitors, as usual, comprised a cross-section of people in
terms of their expertise and experience. Since the tracks for the

22 | december 2017 | OPeN SOUrce FOr YOU | www.OpenSourceForU.com


Open Source India 2017

conferences are always planned with this to the stage at which, today, the who’s Technologies, Zoho Corporation, Digital
diversity in mind, there were tracks for the who of open source are demonstrating Ocean, SUSE, Siemens, Huawei and
creators (developers, project managers, their solutions to the audience. I hope others for their valuable participation and
R&D teams, etc) as well as for the that we continue to make this event even support for the event. We look forward
implementers of open source software. more exciting, and ensure it becomes one to having an even bigger showcase
The star-studded speakers’ list of the biggest open source events across of open source technologies with the
included international experts like Tony the globe,” said Rahul Chopra, editorial support of our existing partners as well
Wasserman (professor of the software director, EFY Group. as many new stalwarts from the tech
management practice at Carnegie Mellon Delegates looking to buy workshop industry,” said Atul Goel, vice president
University, Silicon Valley), Joerg Simon passes on the spot were disappointed, as of events at EFY.
(ISECOM and Fedora Project), and Soh all the workshops were sold out earlier, Adding to this, Rahul Chopra said,
Hiong, senior consultant, NetApp. There online, even before the commencement of “With the overwhelming response of the
was also active participation from the the event. Workshops like ‘Self-Service tech audience and the demand for more
government of India with Debabrata Automation in OpenStack Cloud’, sessions, we have decided to expand the
Nayak (project director, NeGD, MeitY) ‘Building Machine Learning Pipelines with event, starting from the 2018 edition.
and K. Rajasekhar (deputy director PredictionIO’, ‘Analyzing Packets using The event will now become a three-day
general, NIC, MeitY), delivering speeches Wireshark’, ‘OpenShift DevOps Solution’, affair instead of a two-day one. This
at the event. ‘Make your First Open Source IoT Product’ means more tracks, more speakers,
Industry experts like Andrew Aitken and ‘Tools and Techniques to Dive into the more workshops and more knowledge-
(GM and global open source practice Mathematics of Machine Learning’ drew a sharing on open source. We have
leader, Wipro Technologies), Sandeep lot of interest from the techie audience. already announced the dates for the next
Alur (director, technical engagements “We would like to thank our edition. It’s going to happen right here,
(partners), Microsoft Corporation sponsors, Microsoft, IBM, Oracle, at the same venue, on October 11, 12
India), Sanjay Manwani (MySQL India Salesforce, 2nd Quadrant, Wipro and 13, 2018.”
director, Oracle), Rajdeep Dua (director,
developer relations at Salesforce), Gagan
Mehra (director, information strategy,
MongoDB), Rajesh Jeyapaul (architect,
mentor and advocate, IBM), Valluri
Kumar (chief architect, Huawei India),
and Ramakrishna Rama (director -
software, Dell India R&D) were amongst
the 70+ experts who spoke at the event.
The many intriguing topics covered
compelled visitors to stay glued to their
seats till late in the evening on both days.
Connectivity
A few topics that solicited special interest Partner
from the audience included a panel
discussion on ‘Open Source vs Enterprise
Open Source. Is this a Key Reason for the
Success of Open Source?’, ‘Accelerating
the Path to Digital with Cloud Data
Strategy’, ‘Open Source - A Blessing or a
Curse?’, ‘Intelligent Cloud and AI – The
Next Big Leap’ and ‘IoT Considerations -
Decisive Open Source Policy’.
“We owe our success to the active
participation of the community and
the industry. It’s exciting to see how
this event, which had just a handful of
exhibitors in its initial stages, has grown

www.OpenSourceForU.com | OPeN SOUrce FOr YOU | december 2017 | 23


Open Source India 2017

Key tracks @ osi 2017


Open Source and You (Success Stories) hot in OpenStack. This track was done in collaboration with the
This was a full-day track with multiple sessions, during which Open Stack community and attracted a lot of cloud architects, IT
enterprise end users (CXOs, IT heads, etc) shared their success managers, CXOs, etc. Speakers for this track included famous
stories. This track was led by speakers like K. Rajasekhar OpenStack enthusiasts like Anil Bidari, Chandan Kumar, M. Ranga
(deputy director general, NIC, MeitY), Ravi Trivedi (founder, Swami and Janki Chhatbar, amongst others.
PushEngage.com), Vikram Mehta (associate director, information
Cyber Security Day
security, MakeMyTrip), Dr Michael Meskes (CEO, credativ
Open source plays an important role with respect to security. So it was
international GmbH) and Prasanna Lohar (head, technology,
security experts, IT heads and project leaders working on mission-
DCB Bank Ltd).
critical projects who attended this track that had multiple sessions on
Application Development Day understanding cyber security and open source.
This was a half-day track with multiple sessions on the role Speakers for this track included Biju George
of open source in hybrid application (co-founder, Instasafe), Sandeep Athiyarath
development. Speakers (founder, FCOOS), and Joerg Simon (ISECOM
included Vivek Sridhar and Fedora Project), amongst others.
(developer advocate,
Open Source in IoT
Digital Ocean),
This half-day track with multiple sessions
Rajdeep Dua (director
was on the role of open source in the Internet
– developer relations,
of Things (IoT). This track was one of the
Salesforce India) and
most sought after and saw thought leaders
Bindukaladeepan
like Adnyesh Dalpati (director – solutions
Chinasamy (senior
architect and presales, Alef Mobitech),
technical evangelist,
Rajesh Sola (education specialist, KPIT
Microsoft), amongst others.
Technologies), Aahit Gaba (counsel -
The Cloud and Big Data open source, HP Enterprise)
This was a half-day track and Ramakrishna Rama
with multiple sessions (director - software, Dell
that helped IT managers/ India R&D) sharing their
heads in understanding the role of knowledge on the subject.
open source in different aspects of the
Container Day
cloud and in Big Data. This track had
This half-day track, conducted
speakers like Sudhir Rawat (senior
in collaboration with the
technical evangelist, Microsoft),
Docker community, had leaders
Rajkumar Natarajan (CIO, Prodevans
from the Docker community as
Technologies), Rajesh Jeyapaul
well as experts from the industry
(architect, mentor and advocate, A panel discus
sion on ‘Open share their thoughts. The
IBM), Suman Debnath (project Is this a key re Source vs Ente
ason for the su rprise Open Sou speakers included Neependra
leader, Toshiba) and Sangeetha Priya ccess of Open rce:
Source?’ Khare (founder, CloudYuga
(director, Axcensa Technologies), amongst others.
Technologies), Uma Mukkara (co-
Database Day founder and COO, OpenEBS), Ananda Murthy (data centre solutions
Open source databases have always been of great importance. architect, Microfocus India) and Sudharshan Govindan (senior
The event had speakers from leading database companies developer, IBM India), amongst others.
including Sujatha Sivakumar and Madhusudhan Joshi (Oracle
Open Source and You
India), Soh Hiong (senior consultant, NetApp), Pavan Deolasee
This half-day track was about the latest in open source and how
(PostgreSQL consultant, 2nd Quadrant India), Gagan Mehra
its future is shaping up. This track had speakers like Krishna
(director, Information Strategy, MongoDB) and Karthik P. R.
M. Kumar and Sanil Kumar D. (chief architects in cloud R&D,
(CEO and DB architect, Mydbops), amongst others.
Huawei India), Vishal Singh (VP – IT infra and solutions, Eli
OpenStack India Research India), Biju K. Nair (executive director, Sflc.in) and
This was a half-day track with multiple sessions on what’s Kumar Priyansh (developer, BackSlash Linux).

24 | december 2017 | OPeN SOUrce FOr YOU | www.OpenSourceForU.com


Open Source India 2017

Key workshops @ osi 2017


Self-Service Automation in OpenStack Cloud packets. Attendees also learned how to use Wireshark for
(by Soh Hiong, senior consultant, NetApp) troubleshooting network problems.
To build an OpenStack-powered cloud infrastructure, there is
only one choice for block storage. SolidFire delivers a very OpenStack Cloud Solution (by Manoj and Asutosh, Open
comprehensive OpenStack block storage integration. This Stack consultants for Prodevans)
workshop helped participants to learn how SolidFire’s integration The workshop helped attendees to master their ability to work on
with OpenStack Cinder seamlessly enables self-service automation Red Hat Enterprise Linux (RHEL) OpenStack platform services
of storage and guarantees QoS for each and every application. with Red Hat Ceph Storage, and implement advanced networking
features using the OpenStack Neutron service.
Building Machine Learning Pipelines with PredictionIO
(by Manpreet Ghotra and Rajdeep Dua, Salesforce India) OpenShift DevOps Solution (by Chanchal Bose, CTO, Prodevans)
Apache PredictionIO is an open source machine learning server This was a hands-on workshop to get familiar with containers.
built on top of a state-of-art open source stack for developers It covered OpenShift’s advantages and features as well as the
and data scientists to create predictive engines for any machine DevOps CI/CD scenario on OpenShift.
learning task. This workshop helped attendees understand how to
Ansible Automation (by Manoj, Sohail and Varsha, Ansible
build ML pipelines using PredictionIO.
consultants for Prodevans)
Software Architecture: Principles, Patterns and Practices This workshop was for those with basic Linux knowledge and an
(by Ganesh Samarthyam, co-founder, CodeOps Technologies) interest in Linux systems administration. It covered an introduction
Developers and designers to Ansible, and then highlighted its advantages and features. The key
aspiring to become architects and takeaway was on how to automate infrastructure
hence wanting to learn about using Ansible.
the architecture of open source
Make your First Open
applications, using case studies
Source IoT Product
and examples, participated in (by Arzan Amaria,
this workshop. It introduced senior solutions
key topics on software architect for the cloud
architecture including and IoT, CloudThat)
architectural principles, This workshop involved
constraints, non-functional a hands-on session
requirements (NFRs), on building a small
architectural styles and prototype of an ideal
design patterns, viewpoints product using open
and perspectives, and source technologies.
architecture tools. By following step-by-
step instructions and
A workshop in
Serverless Platforms: What to Expect and What progress at OS practical assistance,
I 2017
to Ask For (by Monojit Basu, founder and director, TechYugadi participants were able to
IT Solutions and Consulting) build and make a connected device, a process that inspired them
The advent of serverless platforms evokes a feeling of déjà to innovate new products. The instructor shared the right logic
vu. This workshop narrowed down resource utilisation to and resources required for anyone to jumpstart the journey of
the granularity of a function call while remaining in control IoT development.
of execution of the code! Serverless platforms offer various
capabilities to address the questions that users of these platforms
Hands-on experience of Kubernetes and Docker in action (by
Sanil Kumar, chief architect, Cloud Team, Huawei India)
need to be aware of. The goal of this workshop was to help
This workshop provided exposure to and visualisation of the
attendees identify the right use cases for going serverless.
cloud from the PaaS perspective. It also introduced Kubernetes
Analysing packets using Wireshark (by Sumita Narshetty, and containers (Docker). It was a hands-on session, designed
security researcher at QOS Technology) to help participants understand how to start setting up and
The workshop helped attendees understand packets capture and using Kubernetes and containers, apart from getting to learn
analyse packets using Wireshark. It covered different aspects application deployment in a cloud environment, inspecting
of packet capture and the tools needed to analyse captured containers, pods, applications, etc.

www.OpenSourceForU.com | OPeN SOUrce FOr YOU | december 2017 | 25


Open Source India 2017

Obstacle Avoidance Robot with Open are currently enjoying a field day and are only getting better at
Hardware (by Shamik Chakraborty, Amrita it, while product developers lag behind.
School of Engineering) This workshop was geared towards helping participants
This workshop explored the significance of robotics in Digital understand where software vulnerabilities exist, while
India and looked at how Make in India can be galvanised by programming and after; OS hardening techniques; what tools and
robotics. It also propagated better STEM education for a better methodologies help prevent and mitigate security issues, etc.
global job market for skilled professionals, and created a space
for participants to envision the tech future. Tools and Techniques to Dive Into the Mathematics of
Machine Learning (by Monojit Basu, founder and director,
Microservices Architecture with Open Source Framework TechYugadi IT Solutions and Consulting)
(by Dibya Prakash, founder, ECD Zone) In order to build an accurate model for a machine learning
This workshop was designed for developers, architects and problem, one needs better insights into the mathematics
engineering managers. The objective was to discuss the high- behind these models. For those primarily focused on the
level implementation of the microservices architecture using programming aspects of machine learning initiatives, this
Spring Boot and the JavaScript (Node.js) stack. workshop gave the opportunity to regain a bit of mathematical
context into some of the models and algorithms frequently
Hacking, Security and Hardening Overview for Developers used, and to learn about a few open source tools that will
– on the Linux OS and Systems Applications (by Kaiwan come in handy when performing deeper mathematical analysis
Billimoria, Linux consultant and trainer, kaiwanTECH) of machine learning algorithms.
The phenomenal developments in technology, and especially
software-driven products (in domains like networking, By: Omar Farooq
telecom, embedded-automotive, infotainment, and now IoT, The author is product head at Open Source India.
ML and AI), beg for better security on end products. Hackers

Asheem Bakhtawar Divyanshu Verma Balaji Kesavaraj Janardan Revuru Dibya Prakash Dhiraj Khare
regional director, senior engineering head marketing, Open Source founder, national alliance
India, Middle East and manager, Intel R&D India and SAARC, Evangelist ECDZone manager, Liferay India
Africa, 2ndQuadrant Autodesk
India Pvt Ltd

26 | december 2017 | OPeN SOUrce FOr YOU | www.OpenSourceForU.com


Admin Insight

Reduce Security Risks


whether a process can access a file,
directory or port. By default, the policy

with SELinux
does not allow any interaction unless an
explicit rule grants access. If there is no
rule, no access is allowed.
SELinux labels have several
contexts—user, role, type and
sensitivity. The targeted policy, which
is the default policy in Red Hat
Enterprise Linux, bases its rules on the
third context—the type context. The
type context normally ends with _t.
The type context for the Web server is
httpd_t . The type context for files and
directories normally found in
/var/www/html is httpd_sys_content_t,
and for files and directories normally
found in /tmp and /var/tmp is tmp_t.
The type context for Web server ports is
httpd_port_t.
Discover SELinux, a security module that provides There is a policy rule that permits
extra protocols to ensure access control security. It Apache to access files and directories
supports mandatory access controls (MAC) and is an with a context normally found
integral part of RHEL’s security policy. in /var/www/html and other Web server
directories. There is no ‘allow’ rule

S
for files found in /var/tmp directory,
ecurity-Enhanced Linux or security model. This is a user and group so access is not permitted. With
SELinux is an advanced access based model known as discretionary SELinux, a malicious user cannot
control built into most modern access control. SELinux provides access the /tmp directory. SELinux
Linux distributions. It was initially an additional layer of security that is has a rule for remote file systems such
developed by the US National Security object based and controlled by more as NFS and CIFS, although all files
Agency to protect computer systems sophisticated rules, known as mandatory on such file systems are labelled with
from malicious tampering. Over time, access control. To allow remote the same context.
SELinux was placed in the public anonymous access to a Web server,
domain and various distributions have firewall ports must be opened. However, SELinux modes
incorporated it in their code. To many this gives malicious users an opportunity For troubleshooting purposes, SELinux
systems administrators, SELinux is to crack the system through a security protection can be temporarily disabled
uncharted territory. It can seem quite exploit, if they compromise the Web using SELinux modes.
daunting and at times, even confusing. server process and gain its permissions SELinux works in three modes--
However, when properly configured, — the permissions of Apache user and enforcing mode, permissive mode and
SELinux can greatly reduce a system’s Apache group, which user/group has disabled mode.
security risks and knowing a bit about read write access to things like document Enforcing mode: In the enforcing
it can help you to troubleshoot access root (/var/www/html), as well as the mode, SELinux actively denies access
related error messages. write access to /var, /tmp and any other to Web servers attempting to read
directories that are world writable. files with the tmp_t type context. In
Basic SELinux security Under discretionary access control, this mode, SELinux both logs the
concepts every process can access any object. interactions and protects files.
Security-Enhanced Linux is an But when SELinux enables mandatory Permissive mode: This mode is
additional layer of system security. The access control, then a particular often used to troubleshoot issues. In
primary goal of SELinux is to protect context is given to an object. Every permissive mode, SELinux allows all
the users’ data from system services that file, process, directory and port has a interactions, even if there is no explicit
have been compromised. Most Linux special security label, called a SELinux rule, and it logs the interactions that
administrators are familiar with the context. A context is a name that is used it would have denied in the enforcing
standard user/group/other permissions by the SELinux policy to determine mode. This mode can be used to

28 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Insight Admin

temporarily allow access to content that to the command. Often, the -t option is
SELinux is restricting. No reboot is used to specify only the type component
required to go from enforcing mode to of the context.
permissive mode. The restorecon command is the
Disabled mode: This mode preferred method for changing the
completely disables SELinux. A system SELinux context of a file or directory.
reboot is required to disable SELinux Unlike chcon, the context is not
entirely, or to go from disabled mode to explicitly specified when using this
enforcing or permissive mode. Figure 1: Checking the status of SELinux command. It uses rules in the SELinux
policy to determine what the context of
SELinux status a file should be.
To check the present status of SELinux,
run the sestatus command on a Defining SELinux
terminal. It will tell you the mode of default file context rules
SELinux. The semanage fcontext command can
be used to display or modify the rules
# sestatus that the restorecon command uses to set
the default file context. It uses extended
Changing the current Figure 2: Changing the SELinux mode to enforcing mode regular expressions to specify the path
SELinux mode and filenames. The most common
Run the command setenforce with extended regular expression used in
either 0 or 1 as the argument. A value fcontext rules is (/.*)? which means
of 1 specifies enforcing mode; 0 would “optionally match a / followed by any
specify permissive mode. number of characters.” It matches the
directory listed before the expression
# setenforce and everything in that directory
recursively.
Setting the default The restore command is part of the
SELinux mode Figure 3: Default configuration file of SELinux policycoreutil package and semanage
The configuration file that determines is part of the policycoreutil-Python
what the SELinux mode is at boot package.
time is /etc/selinux/config. Note that it As shown in Figure 6, the
contains some useful comments. permission is preserved by using the
Use /etc/selinux/config to change mv command while the cp command
the default SELinux mode at boot time. will not preserve the permission, which
In the example shown in Figure 3, it is will be the same as that of the parent
set to enforcing mode. directory. To restore the permission, run
restorecon which will give the parent
Initial SELinux context Figure 4: Checking the context of files directory permission to access the files.
Typically, the SELinux context of a Figure 7 shows how to use
file’s parent directory determines the semanage to add a context for a new
initial SELinux context. The context directory. First, change the context of
of the parent directory is assigned to the parent directory using the semanage
newly created files. This works for command, and then use the restorecon
commands like vim, cp and touch. command to restore the parent
However, if a file is created elsewhere permission to all files contained in it.
and the permissions are preserved (as
with mv cp -a), the original SELinux SELinux Booleans
context will be unchanged. Figure 5: Restoring context of the file with the SELinux Booleans are switches that
parent directory change the behaviour of the SELinux
Changing the SELinux policy. These are rules that can be
context of a file file—chcon and restorecon. The chcon enabled or disabled, and can be used
There are two commands that are used command changes the context of a file by security administrators to tune the
to change the SELinux context of a to the context specified as an argument policy to make selective adjustments.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 29


Admin Insight

The getsebool command is content isn’t published by


used to display SELinux Booleans the users. If access has been
and their current values. The -a granted, then additional steps
option helps this command to list need to be taken to solve the
all the Booleans. problem.
The getsebool command is used ƒ The most common SELinux
to display SELinux Booleans and issue is an incorrect file context.
setsebool is used to modify these. This can occur when a file is
setsebool -P modifies the SELinux created in a location with one
policy to make the modifications file context, and moved into a
persistent. semanage boolean Figure 6: Preserving the context of the file place where a different context
-1 will show whether or not a is expected. In most cases,
Boolean is persistent, along with a running restorecon will correct
short description of it. To list only the issue. Correcting issues
local modifications to the state in this way has a very narrow
of the SELinux Booleans (any impact on the security of the
setting that differs from the default rest of the system.
in the policy), the -C option with ƒ Another remedy could be
semanage Boolean can be used. Figure 7: Changing the context of a file adjustment of the Boolean. For
In Figure 8, the Boolean example, the ftpd_anon_write
was first modified, and then this Boolean controls whether
modification was made persistent; anonymous FTP users can
the -C option was used with upload files. This Boolean may
semanage to list the modifications. be turned on if you want to
allow anonymous FTP users to
Troubleshooting in SELinux upload files to a server.
Sometimes, SELinux prevents ƒ It is possible that the SELinux
access to files on the server. policy has a bug that prevents
Here are the steps that should be Figure 8: Changing the SELinux Boolean a legitimate access. However,
followed when this occurs. since SELinux has matured, this
ƒ Before thinking of making any compromise of the service if Web is a rare occurrence.
adjustments, consider that SELinux
may be doing its job correctly by By: Kshitij Upadhyay
prohibiting the attempted access.
The author is RHCSA and RHCE certified and loves to write about new technologies.
If a Web server tries to access the He can be reached at upadhyayk04@gmail.com.
files in /home, this could signal a

OSFY Magazine Attractions During 2017-18


MONTH THEME
March 2017 Open Source Firewall, Network security and Monitoring
April 2017 Databases management and Optimisation
May 2017 Open Source Programming (Languages and tools)
June 2017 Open Source and IoT
July 2017 Mobile App Development and Optimisation
August 2017 Docker and Containers
September 2017 Web and desktop app Development
October 2017 Artificial Intelligence, Deep learning and Machine Learning
November 2017 Open Source on Windows
December 2017 BigData, Hadoop, PaaS, SaaS, Iaas and Cloud
January 2018 Data Security, Storage and Backup
February 2018 Best in the world of Open Source (Tools and Services)

30 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


How To Admin

Unit Testing Ansible Code


with Molecule and Docker
Containers
Molecule is an open source framework that is easy to use for validating
and auditing code. It can be easily introduced into the CI-CD pipeline,
thus keeping Ansible scripts relevant.

D
evOps teams rely on ‘Infrastructure as a Code’ may have broken dependencies due to changes. Scripts may
(IaaC) for productivity gains. Automation and speed also fail if a dependent URL is not available. Those working
are prerequisites in the cloud environment, where with Ansible would have faced these challenges.
resources are identified by cattle nomenclature rather than pet This is where we introduce unit testing, which can run
nomenclature due to the sheer volumes. during the nightly build and can detect these failures well
Ansible is one of the leading technologies in the IaaC in advance. The Molecule project is a useful framework
space. Its declarative style, ease of parameterisation and to introduce unit testing into Ansible code. One can use
the availability of numerous modules make it the preferred containers to test the individual role or use an array of
framework to work with. containers to test complex deployments. Docker containers
Any code, if not tested regularly, gets outdated and are useful, as they can save engineers from spawning
becomes irrelevant over time, and the same applies to Ansible. multiple instances or using resource-hogging VMs in
Daily testing is a best practice and must be introduced for the cloud or on test machines. Docker is a lightweight
Ansible scripts too—for example, keeping track of the latest technology which is used to verify the end state of the
version of a particular software during provisioning. Similarly, system, and after the test, the provisioned resources are
the dependency management repository used by apt and Yum destroyed thus cleaning up the environment.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 31


Loonycorn
is hiring
Interested?

Mail Resume + Cover Letter to


contact@loonycorn.com
You:

 Really into tech - cloud, ML, anything and everything


 Interested in video as a medium
 Willing to work from Bangalore
 in the 0-3 years of experience range

Us:

 ex-Google | Stanford | INSEAD


 100,000+ students
 Video content on Pluralsight, Stack, Udemy...
Our Content:
Loonycorn
 The Ultimate Computer Science Bundle
9 courses | 139 hours

 The Complete Machine Learning Bundle


10 courses | 63 hours

 The Complete Computer Science Bundle


8 courses | 78 hours

 The Big Data Bundle


9 courses | 64 hours

 The Complete Web Programming Bundle


8 courses | 61 hours

 The Complete Finance & Economics Bundle


9 courses | 56 hours

 The Scientific Essentials Bundle


7 courses | 41 hours

 ~35 courses on Pluralsight


~80 on StackSocial
~75 on Udemy
Admin How To

Installing and working with Molecule is simple. Follow Table 1: A comparison of folders created by Ansible and Molecule
the steps shown below. First, get the OS updated as follows: Folders creat- Folders Remarks
ed by Ansible- created by
sudo apt-get update && sudo apt-get -y upgrade galaxy init Molecule init
Default Default No changes are
Next, install Docker: needed and folder
Handler Handler
structure is identical
sudo apt install docker.io Meta Meta
README.md README.md
Now install Molecule with the help of Python Pip: Tasks Tasks
Vars Vars
sudo apt-get install python-pip python-dev build-essential
sudo pip install --upgrade pip Files Files Folders needs to be
sudo pip install molecule created manually
templates templates to take advantage
After the install, do a version check of Molecule. If the of file and template
module of ansible
molecule version is not the latest, upgrade it as follows:
molecule All Molecule related
sudo pip install --upgrade molecule scripts and test
scripts are placed in
It is always good to work with the latest version of this folder
Molecule as there are significant changes compared to an
earlier version. Enabling or disabling modules in the latest The Molecule folder has files in which one can put
version of Molecule is more effective. For example, a common in a pre-created Molecule playbook and test scripts. The
problem faced is the forced audit errors that make Molecule environment that is created is named default. This can be
fail. When starting to test with Molecule, at times audit errors changed as per the project’s requirements.
can pose a roadblock. Disabling the Lint module during the One file that will be of interest is molecule.yml, which
initial phase can give you some speed to concentrate on writing is placed at:
tests rather than trying to fix the audit errors.
Here are a few features of Molecule explained, though the ./molecule/default/molecule.yml
full toolset offers more. Another file which describes the playbook for the role is
1. Create: This creates a virtualised provider, which in our playbook.yml placed at:
case is the Docker container.
2. Converge: Uses the provisioner and runs the Ansible ./molecule/default/playbook.yml
scripts to the target running Docker containers.
3. Idempotency: Uses the provisioner to check the Note: Molecule initialises the environment called
idempotency of Ansible scripts. default but engineers can use a name as per the
4. Lint: This does code audits of Ansible scripts, test code, environment used in the project.
test scripts, etc.
5. Verify: Runs the test scripts written.
6. Test: Runs the full steps needed for Molecule, i.e., create,
converge, Lint, verify and destroy.
The roles need to be initialised by the following command:

molecule init role –role-name abc

This will create all the folders needed to create the role,
and is similar to the following command:

ansible-galaxy init abc

There are certain differences in the directory structure,


and some amount of manual folder creation may be
required. Figure 1: molecule.xml sample file

34 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


How To Admin

Figure 2: playbook.yml sample file Figure 4: Molecule test run

tests. The file contains assertions as shown in Figure 3.


As can be seen, these assertions are declarative; hence
it requires no effort to add them. The overall test is run with
the Molecule test command as shown in Figure 4. It runs
steps in an opinionated way—cleaning and provisioning the
infrastructure, checking idempotency, testing assertions and
at the end, cleaning resources when the tests are finished.
Molecule can be extended to test complex scenarios and
distributed clustered environments, too, in test machines.
As seen from the steps, Molecule can spin up a Docker
container, do all the testing required and destroy the
container after cleaning the infrastructure.
Figure 3: Sample assertions in test file

The sample Molecule file is shown in Figure 1. One needs References


to modify it as per the project’s requirement.
[1] https://github.com/metacloud/molecule
[2] http://testinfra.readthedocs.io
Note: Docker images used for testing need to be [3] https://github.com/dj-wasabi/ansible-zabbix-agent
systemd enabled and should have privileges. [4] https://blog.opsfactory.rocks/testing-ansible-roles-with-
molecule-97ceca46736a
[5] https://medium.com/@elgallego/ansible-role-testing-
Take a look at the playbook. Make sure that you have molecule-7a64f43d95cb
made all the changes needed for all the parameters to be [6] http://giovannitorres.me/testing-ansible-roles-with-
molecule-testinfra-and-vagrant.html#installing-
passed to the role. packages-with-pip

(http://testinfra.readthedocs.io/en/latest/). For More


Now we are ready to write test scripts using TestInfra

We need to go to the folder Molecule/default/tests. You By: Ranajit Jana


can create a test assertion.
Test and Equipment Stories
Test assertion is one of the most important steps in a
The author is a senior architect in service transformation
(open source) at Wipro Technologies. He is interested
in all the technologies related to microservices—
Visit www.electronicsb2b.com
testing framework. Molecule uses TestInfra (http://testinfra.
readthedocs.io/en/latest) as its test validation framework. The
containerisation, monitoring, DevOps, etc. You can contact
him at Ranajit.jana@wipro.com
tests are located in the Python file located at ./molecule/default/

For More
Testand
Test andMeasurement
EquipmentStories
Stories
Visit www.electronicsb2b.com

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 35


{}
Admin How To

DevOps Series
Using Ansible to Deploy a
Piwigo Photo Gallery
Piwigo is Web based photo gallery software that is written in PHP. In
this tenth article in our DevOps series, we will use Ansible to install and
configure a Piwigo instance.

P
iwigo requires a MySQL database for its back- Linux
end and has a number of extensions and plugins The Piwigo installation will be on an Ubuntu 15.04 image
developed by the community. You can install it on running as a guest OS using KVM/QEMU. The host system
any shared Web hosting service provider or install it on is a Parabola GNU/Linux-libre x86_64 system. Ansible is
your own GNU/Linux server. It basically uses the (G) installed on the host system using the distribution package
LAMP stack. In this article, we will use Ansible to install manager. The version of Ansible used is:
and configure a Piwigo instance, which is released under the
GNU General Public License (GPL). $ ansible --version
You can add photos using the Piwigo Web interface or use an ansible 2.4.1.0
FTP client to synchronise the photos with the server. Each photo config file = /etc/ansible/ansible.cfg
is made available in nine sizes, ranging from XXS to XXL. A configured module search path = [u’/home/shakthi/.ansible/
number of responsive UI themes are available that make use plugins/modules’, u’/usr/share/ansible/plugins/modules’]
of these different photo sizes, depending on whether you are ansible python module location = /usr/lib/python2.7/site-
viewing the gallery on a phone, tablet or computer. The software packages/ansible
also allows you to add a watermark to your photos, and you can executable location = /usr/bin/ansible
create nested albums. You can also tag your photos, and Piwigo python version = 2.7.14 (default, Sep 20 2017, 01:25:59)
stores metadata about the photos too. You can even use access [GCC 7.2.0]
control to make photos and albums private. My Piwigo gallery is
available at https://www.shakthimaan.in/gallery/. The /etc/hosts file should have an entry for the guest

38 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


How To Admin

“ubuntu” VM as indicated below: The Ansible playbook updates the software package
repository by running apt-get update and then proceeds to
192.168.122.4 ubuntu install the Apache2 package. The playbook waits for the server
to start and listen on port 80. An execution of the playbook is
You should be able to issue commands from Ansible to shown below:
the guest OS. For example:
$ ansible-playbook -i inventory/kvm/inventory playbooks/
$ ansible ubuntu -m ping configuration/piwigo.yml --tags web -K
SUDO password:
ubuntu | SUCCESS => {
“changed”: false, PLAY [Install Apache web server] ******************
“ping”: “pong” TASK [setup] ***********************************
} ok: [ubuntu]

On the host system, we will create a project directory TASK [Update the software package repository] *************
structure to store the Ansible playbooks: changed: [ubuntu]

ansible/inventory/kvm/ TASK [Install Apache] ***************************************


/playbooks/configuration/ changed: [ubuntu] => (item=[u’apache2’])
/playbooks/admin/
TASK [wait_for] *********************************************
An ‘inventory’ file is created inside the inventory/kvm ok: [ubuntu]
folder that contains the following:
PLAY RECAP **************************************************
ubuntu ansible_host=192.168.122.4 ansible_connection=ssh ubuntu : ok=4 changed=2
ansible_user=xetex ansible_password=pass unreachable=0 failed=0

Apache The verbosity in the Ansible output can be achieved by


The Apache Web server needs to be installed first on the passing ‘v’ multiple times in the invocation. The more number
Ubuntu guest VM. The Ansible playbook for the same of times that ‘v’ is present, the greater is the verbosity level. The
is as follows: -K option will prompt for the sudo password for the xetex user.
If you now open http://192.168.122.4, you should be able to see
- name: Install Apache web server the default Apache2 index.html page as shown in Figure 1.
hosts: ubuntu
become: yes
become_method: sudo
gather_facts: true
tags: [web]

tasks:
- name: Update the software package repository
apt:
update_cache: yes

- name: Install Apache


package:
name: “{{ item }}”
state: latest
with_items:
- apache2
Figure 1: Apache2 default index page

- wait_for: MySQL
port: 80 Piwigo requires a MySQL database server for its back-end,

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 39


Admin How To

and at least version 5.0. As the second step, you can install +-------------------------------------------+
the same using the following Ansible playbook: 1 row in set (0.00 sec)

- name: Install MySQL database server Also, the default MySQL root password is empty. You
hosts: ubuntu should change it after installation. The playbook can be
become: yes invoked as follows:
become_method: sudo
gather_facts: true $ ansible-playbook -i inventory/kvm/inventory playbooks/
tags: [database] configuration/piwigo.yml --tags database -K

tasks: PHP
- name: Update the software package repository Piwigo is written using PHP (PHP Hypertext Preprocessor),
apt: and it requires at least version 5.0 or later. The documentation
update_cache: yes website recommends version 5.2. The Ansible playbook to
install PHP is given below:
- name: Install MySQL
package: - name: Install PHP
name: “{{ item }}” hosts: ubuntu
state: latest become: yes
with_items: become_method: sudo
- mysql-server gather_facts: true
- mysql-client tags: [php]
- python-mysqldb
tasks:
- name: Start the server - name: Update the software package repository
service: apt:
name: mysql update_cache: yes
state: started
- name: Install PHP
- wait_for: package:
port: 3306 name: “{{ item }}”
state: latest
- mysql_user: with_items:
name: guest - php5
password: ‘*F7B659FE10CA9FAC576D358A16CC1BC646762FB2’ - php5-mysql
encrypted: yes
priv: ‘*.*:ALL,GRANT’
state: present Update the software package repository, and install
PHP5 and the php5-mysql database connectivity package.
The APT software repository is updated first and the The Ansible playbook for this can be invoked as follows:
required MySQL packages are then installed. The database
server is started, and the Ansible playbook waits for the $ ansible-playbook -i inventory/kvm/inventory playbooks/
server to listen on port 3306. For this example, a guest configuration/piwigo.yml --tags php -K
database user account with osfy as the password is chosen
for the gallery Web application. In production, please use Piwigo
a stronger password. The hash for the password can be The final step is to download, install and configure Piwigo.
computed from the MySQL client as indicated below: The playbook for this is given below:

mysql> SELECT PASSWORD(‘osfy’); - name: Setup Piwigo


+-------------------------------------------+ hosts: ubuntu
| PASSWORD(‘osfy’) | become: yes
+-------------------------------------------+ become_method: sudo
| *F7B659FE10CA9FAC576D358A16CC1BC646762FB2 | gather_facts: true

40 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


How To Admin

Figure 4: Piwigo home page

dest: “{{ piwigo_dest }}/gallery”


remote_src: True

- name: Restart apache2 server


service:
name: apache2
Figure 2: Piwigo install page state: restarted

The piwigo_dest variable stores the location of the


default Apache hosting directory. The APT software
package repository is then updated. Next, an exclusive
MySQL database is created for this Piwigo installation.
A target folder gallery is then created under /var/www/
html to store the Piwigo PHP files. Next, the latest
Figure 3: Piwigo install success page version of Piwigo is downloaded (2.9.2, as on date)
and extracted under the gallery folder. The Apache Web
tags: [piwigo] server is then restarted.
You can invoke the above playbook as follows:
vars:
piwigo_dest: “/var/www/html” $ ansible-playbook -i inventory/kvm/inventory playbooks/
configuration/piwigo.yml --tags piwigo -K
tasks:
- name: Update the software package repository If you open the URL http://192.168.122.4/gallery in a
apt: browser on the host system, you will see the screenshot given
update_cache: yes in Figure 2 to start the installation of Piwigo.
After entering the database credentials and creating an
- name: Create a database for piwigo admin user account, you should see the ‘success’ page, as
mysql_db: shown in Figure 3.
name: piwigo You can then go to http://192.168.122.4/gallery to see the
state: present home page of Piwigo, as shown in Figure 4.

- name: Create target directory Backup


file: The Piwigo data is present in both the installation folder and
path: “{{ piwigo_dest }}/gallery” in the MySQL database. It is thus important to periodically
state: directory make backups, so that you can use these archive files to
restore data, if required. The following Ansible playbook
- name: Download latest piwigo creates a target backup directory, makes a tarball of the
get_url: installation folder, and dumps the database contents to a
url: http://piwigo.org/download/dlcounter. .sql file. The epoch timestamp is used in the filename. The
php?code=latest backup folder can be rsynced to a different system or to
dest: “{{ piwigo_dest }}/piwigo.zip” secondary backup.

- name: Extract to /var/www/html/gallery - name: Backup Piwigo


unarchive: hosts: ubuntu
src: “{{ piwigo_dest }}/piwigo.zip” become: yes

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 41


Admin How To

become_method: sudo file:


gather_facts: true path: “{{ piwigo_dest }}/gallery”
tags: [backup] state: absent

vars: - name: Drop database


piwigo_dest: “/var/www/html” mysql_db:
name: piwigo
tasks: state: absent
- name: Create target directory
file: - name: Uninstall PHP packages
path: “{{ piwigo_dest }}/gallery/backup” package:
state: directory name: “{{ item }}”
state: absent
- name: Backup folder with_items:
archive: - php5-mysql
path: “{{ piwigo_dest }}/gallery/piwigo” - php5
dest: “{{ piwigo_dest }}/gallery/backup/piwigo-
backup-{{ ansible_date_time.epoch }}.tar.bz2” - name: Stop the database server
service:
- name: Dump database name: mysql
mysql_db: state: stopped
name: piwigo
state: dump - name: Uninstall MySQL packages
target: “{{ piwigo_dest }}/gallery/backup/piwigo-{{ package:
ansible_date_time.epoch }}.sql” name: “{{ item }}”
state: absent
The above playbook can be invoked as follows: with_items:
- python-mysqldb
$ ansible-playbook -i inventory/kvm/inventory playbooks/ - mysql-client
configuration/piwigo.yml --tags backup -K - mysql-server

Two backup files that were created from executing the - name: Stop the web server
above playbook are piwigo-1510053932.sql and piwigo- service:
backup-1510053932.tar.bz2. name: apache2
state: stopped
Cleaning up
You can uninstall the entire Piwigo installation using an - name: Uninstall apache2
Ansible playbook. This has to happen in the reverse order. package:
You have to remove Piwigo first, followed by PHP, MySQL name: “{{ item }}”
and Apache. A playbook to do this is included in the state: absent
playbooks/admin folder and given below for reference: with_items:
- apache2
---
- name: Uninstall Piwigo The above playbook can be invoked as follows:
hosts: ubuntu
become: yes $ ansible-playbook -i inventory/kvm/inventory playbooks/admin/
become_method: sudo uninstall-piwigo.yml -K
gather_facts: true
tags: [uninstall] You can visit http://piwigo.org/ for more documentation.

vars:
By: Shakthi Kannan
piwigo_dest: “/var/www/html”
tasks: The author is a free software enthusiast and blogs at
shakthimaan.com.
- name: Delete piwigo folder

42 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


How To Admin

Hive: The SQL-like


Data Warehouse Tool for Big Data
The management of Big Data is crucial if enterprises are to benefit from the huge
volumes of data they generate each day. Hive is a tool built on top of Hadoop that
can help to manage this data.

H
ive is a data warehouse infrastructure tool to process instances were getting filled pretty fast and it was time to
structured data in Hadoop. It resides on top of Hadoop develop a new kind of system that handled large amounts of
to summarise Big Data, and makes querying and data. It was Facebook that first built Hive, so that most people
analysing easy. who had SQL skills could use the new system with minimal
A little history about Apache Hive will help you changes, compared to what was required with other RDBMs.
understand why it came into existence. When Facebook The main features of Hive are:
started gathering data and ingesting it into Hadoop, the data ƒ It stores schema in a database and processes data into HDFS.
was coming in at the rate of tens of GBs per day back in 2006. ƒ It is designed for OLAP.
Then, in 2007, it grew to 1TB/day and within a few years ƒ It provides an SQL-type language for querying, called
increased to around 15TBs/day. Initially, Python scripts were HiveQL or HQL.
written to ingest the data in Oracle databases, but with the ƒ It is familiar, fast, scalable and extensible.
increasing data rate and also the diversity in the sources/types Hive architecture is shown in Figure 1.
of incoming data, this was becoming difficult. The Oracle The components of Hive are listed in Table 1.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 43


Admin How To

Table 1
USER INTERFACES WEB UI HIVE COMMAND LINE HD Insight
Unit name Operation
Hive QL Process Engine
User interface Hive is data warehouse infrastructure Meta Store
Execution Engine
software that can create interactions Map Reduce

between the user and HDFS. The user


interfaces that Hive supports are Hive
HDFS or HBASE Data Storage
Web UI, the Hive command line, and
Hive HD Insight (in Windows Server).
Figure 1: Hive architecture
Meta store Hive chooses respective database
servers to store the schema or
metadata of tables, databases and
columns in a table, along with their
data types, and HDFS mapping.

HiveQL pro- HiveQL is similar to SQL for querying


cess engine on schema information on the meta
store. It replaces the traditional ap-
proach of the MapReduce program.
Instead of writing the MapReduce
program in Java, we can write a
query for a MapReduce job and
process it.

Execution The conjunction part of the HiveQL


engine Process engine and MapReduce is
the Hive Execution engine, which
processes the query and generates
results that are the same as MapRe-
duce results.

HDFS or Hadoop distributed file system or


HBASE HBASE comprises the data storage
techniques for storing data into the file Figure 2: Hive configuration
system.

The importance of Hive in Hadoop


Apache Hive lets you work with Hadoop in a very efficient
manner. It is a complete data warehouse infrastructure that
is built on top of the Hadoop framework. Hive is uniquely
placed to query data, and perform powerful analysis and
data summarisation while working with large volumes of
data. An integral part of Hive is the HiveQL query, which is
an SQL-like interface that is used extensively to query what
is stored in databases.
Hive has the distinct advantage of deploying high-speed data
reads and writes within the data warehouses while managing
large data sets that are distributed across multiple locations, all
thanks to its SQL-like features. It provides a structure to the
data that is already stored in the database. The users are able to
connect with Hive using a command line tool and a JDBC driver.

How to implement Hive


First, download Hive from http://apache.claz.org/hive/
stable/. Next, download apache-hive-1.2.1-bin.tar.
gz 26-Jun-2015 13:34 89M . Extract it manually and
rename the folder as hive. Figure 3: Getting started with Hive

44 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Admin How To

In the command prompt, type the following commands: HIVE_CONF_DIR=”${HIVE_CONF_DIR:-$HIVE_HOME/conf”


export HIVE_CONF_DIR=$HIVE_CONF_DIR
sudo mv hive /usr/local/hive export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH
sudo gedit ~/.bashrc
Below this line, write the following code:
# Set HIVE_HOME export HIVE_HOME=/usr/local/hive
PATH=$PATH:$HIVE_HOME/bin export PATH export HADOOP_HOME=/usr/local/hadoop
user@ubuntu:~$ cd /usr/local/hive (write the path where the Hadoop file is)
user@ubuntu:~$ sudo gedit hive-config.sh
Now, start Hadoop. Type hive, and you will
Go to the line where the following statements are written: see the tables 2.

# Allow alternate conf dir location. Pig vs Hive


Table 2 illustrates the differences between pig and Hive.
Table 2
Pig Hive
A procedural data flow A declarative SQLish lan- References
language guage [1] https://www.dezyre.com/article/difference-between-
pig-and-hive-the-two-key-components-of-hadoop-
For programming For creating reports
ecosystem/79
Mainly used by research- [2] https://intellipaat.com/blog/what-is-apache-hive/
Mainly used by data analysts
ers and programmers [3] https://hortonworks.com/apache/hive/
[4] https://www.tutorialspoint.com/hive/
Operates on the client Operates on the server side
[5] https://www.guru99.com/hive-tutorials.html
side of a cluster of a cluster
Makes use of the exact vari-
Does not have a
ation of the dedicated SQL
dedicated metadata By: Prof. Prakash Patel and Prof. Dulari Bhatt
DDL language by defining
database
tables beforehand Prof. Prakash Patel is an assistant professor in the IT
Directly leverages SQL and department of the Gandhinagar Institute of Technology. You
Pig is SQL-like but varies can contact him at prakash.patel@git.org.in.
is easy to learn for database
to a great extent
experts Prof. Dulari Bhatt is also an assistant professor in the IT
Pig supports the Avro file Hive does not support this department of the Gandhinagar Institute of Technology. You
can contact her at dulari.bhatt@git.org.in.
format file format

46 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Success Story For U & Me

THE MAKING OF A
‘MADE IN INDIA’
LINUX DISTRO

Kumar Priyansh, the developer of BackSlash Linux

BackSlash Linux is one of the newest Linux distributions developed in India, and that too, by
a 20-year-old. The operating system has a mix of Ubuntu and Debian platforms, and offers
two different worlds under one roof, with KDE and GNOME integration.

“I
t is not very hard to build that debuted as BackSlash Linux in It’s Ubuntu behind the scenes,
a Linux distribution,” says November 2016. but with many tweaks
Kumar Priyansh, the 20-year- “To try and build a new Linux Priyansh picked Ubuntu as the Linux
old developer who has single-handedly distro I decided to dedicate a lot of distribution to build his platform. But
created BackSlash Linux. As a child, time for development, and started to give users an even more advanced
Priyansh had always been curious attending online tutorials,” says experience, he deployed Hardware
about how operating systems worked. Priyansh. Going through many online Enablement (HWE) support that
But instead of being merely curious tutorial sessions, the Madhya Pradesh allows the operating system to work
and dreaming of developing an resident observed that he needed to with newer hardware, and provides
operating system, he started making combine multiple parts of different an up-to-date delivery of the Linux
OpenSUSE-based distributions in tutorials to understand the basics. “I kernel. The developer also added
2011 to step into the world of open started connecting parts of different a proprietary repository channel
source platforms. He had used the tutorials from the Web, making an that helps users achieve better
SUSE Studio to release three versions authentic and working tutorial for compatibility with their hardware. “I
of his very own operating system myself that allowed me to build the ship the platform with a proprietary
that he called Blueberry. All that necessary applications and compile repository channel enabled, which
experience helped Priyansh in bringing the very first version of my own Ubuntu does not offer by default.
out a professional Linux platform distribution,” he says. This is because I don’t want

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 47


For U & Me Success Story

BackSlash users to hop on different


websites for unsupported hardware,”
says Priyansh, a Samrat Ashok
Technological Institute alumnus.
Alongside Ubuntu, the BackSlash
maker opted for Debian package
support. He initially wanted to
integrate the Wayland protocol as
well as enable a space for Qt and
GTK apps. However, in the initial
implementation, Priyansh found
it a challenge to offer a unified
experience across KDE and GNOME
environments. “I found that GTK
apps running on KDE didn’t look as
attractive as on GNOME. Therefore,
all I had to do was to provide a
theme available for both KDE and
GTK, and put it into the respective
locations from where the apps which is the name of a Sami iceman in beta release is doing quite well and
acquire an identical look and feel,” the animated film, while its first stable receiving much praise from the
Priyansh says. release was launched as ‘Anna’, named community,” the developer says.
after a princess in ‘Frozen’.
Uniqueness all around Sources of revenue
Although the original aim of Key features that have helped Experts often believe that selling an
BackSlash Linux wasn’t to compete BackSlash Linux clock 75,000 open source solution is more difficult
with any other Linux distros, it has a downloads than trading a proprietary technology.
list of features that distinguish it from • Resembles Apple’s MacOS For BackSlash Linux, Priyansh
others. “I start building at the stage • Snap package support has opted for a model that involves
that the other distros stop their work. • BackSlash Sidebar receiving donations and sponsorships.
I craft solutions around what users “Our primary source of revenue
• Fingerprint integration
want and continue to improve things will always be donations and
• Redshift night light filter
until everything seems pixel perfect,” sponsorships for the project,”
• Microsoft fonts
affirms Priyansh. Priyansh asserts.
The current beta version of the • Backup utility onboard
platform includes a new login screen Future plans
that displays aerial background video Security features under BackSlash Linux, available on AMD64
updates, fingerprint protection for the hood and Intel x64 platforms, certainly has
logging in, access to the terminal and BackSlash Linux is not targeted at the potential to grow bigger. Priyansh
other apps, multi-touch gestures, the enterprise users. Having said that, is planning to add his own technologies
coverflow Alt+Tab switcher, Snap the operating system does have some to the platform, going forward. He is
support, and an updated Plasma Shell. security features to make the experience set to develop his own Web browser,
There are also new updates, including safer for those who want to begin with music player and some ‘awesome
Wine 2.14 for running Windows a Linux distribution. It receives security apps’ to take the ‘Made in India’
packages, Redshift (a blue light filter), updates directly from Canonical to operating system to the global stage.
a new email client, an enhanced keep the environment secure. The “There are also plans to build a custom
system optimiser with an advanced preinstalled System Optimiser app compiled Linux kernel in the future to
app uninstaller and Google Play Music also helps users optimise performance, deliver better support out-of-the-box,”
Desktop Edition. toggle startup programs, and uninstall the developer concludes.
Priyansh has chosen characters applications and packages.
from the Disney movie ‘Frozen’ to Additionally, community feedback
name the versions of his operating that has recently started rolling out By: Jagmeet Singh
system. The current beta version of enables Priyansh to enhance the The author was an assistant editor at
EFY until recently.
BackSlash Linux is called Kristoff, security of the platform. “The current

48 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Admin Let's Try

A Brief Introduction to
Puppet

As a server configuration management tool, Puppet offers an automated way to


inspect, deliver and operate software regardless of its deployment. It provides
control and enforces consistency, while allowing for any of the modifications
dictated by business needs.

T
he burgeoning demand for scalability has driven the operations of the company — is evolving in a manner
technology into a new era with a focus on distributed that would have been hard to fathom a decade ago. It
and virtual resources, as opposed to the conventional was originally built as isolated chunks of machinery
hardware that drives most systems today. Virtualisation is pieced together into a giant to provide storage and
a method for logically dividing the computing resources network support for day-to-day operations. Archetypal
for a system between different applications. Tools offer representations include a mass of tangled wires and rolled
either full virtualisation or para-virtualisation for a system, up cables connecting monster racks of servers churning
driving away from the ‘one server one application’ model data minute after minute. A few decades ago, this was the
that typically under-utilises resources, towards a model norm; today companies require much more flexibility, to
that is focused on more efficient use of the system. In scale up and down!
hardware virtualisation, a virtual machine gets created and With the advent of virtualisation, enterprise infrastructure
behaves like a real computer with an operating system. The has been cut down, eliminating unnecessary pieces of
terminology, ‘host’ and ‘guest’ machine, is utilised for the hardware, and managers have opted for a cloud storage
real and virtual system respectively. With this paradigm network that they can scale on-the-fly. Today, the business
shift, software and service architectures have undergone a of cloud service providers is booming because not only
transition to virtual machines, laying the groundwork for startups but corporations, too, are switching to a virtual
distributed and cloud computing. internal infrastructure to avoid the hassles of preparing and
maintaining their own set of servers, especially considering
Enterprise infrastructure the man hours required for the task. Former US Chief
The classical definition of enterprise infrastructure — the Information Officer (CIO) Vivek Kundra’s paper on Federal
data centre, a crucial piece of the puzzle serving to bolster Cloud Computing, states: “It allows users to control the

50 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Let's Try Admin

Custom Apps Packaged Apps Databases

CRM ERP
Mainframe
App
Pay
1
roll Baremetal
App

Figure 3: Puppet: The server configuration management tool (Source: puppet.com)

Figure 1: Overview of virtualisation (Source: VMware.com)

1990s Today
App App

App App Virtualization

Server Server Server Server Scale-Out

Storage Storage

Storage Storage Storage Storage Storage Storage


Controller Controller Controller Controller Controller Controller Scale-Up

Figure 2: Enterprise infrastructure through the ages (Source: itxbit.com)

computing services they access, while sharing the investment


in the underlying IT resources among consumers. When the Figure 4: Automation plays a vital role in DevOps (Source: puppet.com)
computing resources are provided by another organisation
over a wide area network, cloud computing is similar to an
electric power utility. The providers benefit from economies
of scale, which in turn enables them to lower individual usage
costs and centralise infrastructure costs.”
However, setting up new virtual machines at scale presents
a new challenge when the size of the organisation increases and
the use cases balloon. It becomes nearly impossible to manage
the customisation and configuration of each individual instance,
considering that an average time of a few minutes needs to be
spent per deployment. This raises questions about the switch
to the cloud itself, considering the overheads are now similar
in terms of the time spent in the case of physical infrastructure. Figure 5: Puppet offered on the AWS Marketplace (Source: puppet.com)
This is where Puppet enters the picture. It is a server
configuration management tool that automates the deployment It also allows for the monitoring of each stage to ensure
and modification of virtual machines and servers with a single visibility and compliance.
configuration script that can serve to deploy thousands of Puppet supports DevOps by providing automation and
virtual machines simultaneously. enabling faster releases without sacrificing on security or
stability. It allows for security policies to be set and monitored
Features of Puppet for regulatory compliance so that the risk of misconfigurations
A server configuration management tool, Puppet gives you and failed audits is minimised. By treating the infrastructure
an automated way to inspect, deliver and operate software as code, Puppet ensures the deployments are faster and
regardless of its deployment. It provides control and continuous shipping of code results in lower risk of failure.
enforces consistency while allowing for any modifications It streamlines heterogenous technologies and unifies them
as dictated by business needs. It uses a Ruby-like easy- under a singular configuration management interface. Puppet
to-read language in preparing the deployment of the supports containers and encourages analysis of what the
infrastructure. Working with both the cloud and the data container is made of, at scale. Tools are provided to monitor
centre, it is platform-independent, and can enforce and any discrepancies in functionality in order to gain deeper
propagate all the necessary changes for the infrastructure. insight into the deployments of a product.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 51


Admin Let's Try

Puppet has found widespread application across large


infrastructure networks in companies like Google, Amazon
and Walmart. In fact, it is an integrated solution provided with
Amazon Web Services and the Google Cloud Engine as well.
DevOps has been proven to yield remarkable results and it
is time organisations focus on reducing latency and frivolity
within cycles to increase efficiency in rollouts.

Case studies of Puppet


Studies include the case of Staples, which faced a challenge
in terms of creating a private cloud with automated
provisioning and speeding up development cycles. With the
introduction of Puppet into the picture, the developers had
Figure 6: DevOps is gaining visibility (Source: puppet.com) the freedom to provision their own systems and create their
own configurations. This resulted in increased stability and,
needless to say, faster deployments. SalesForce and Hewlett
Packard have also attested as to how integrating Puppet into
their workflow enabled code delivery timelines to reduce
to hours from weeks and allowed support for more efficient
DevOps practices, including automation.
Getty Images, another popular service provider, that
originally used the open source version of Puppet decided to
try out the enterprise version on a smaller scale. As it switched
to an agile model, test-driven development and automation
were key to its development cycles and the use of Puppet
expanded thereon.
Puppet offers a promising solution for configuration
management, as well as an array of assorted tools to bolster
its core product offering. It is a must-try for organisations
Figure 7: Companies that use Puppet facing code shipping issues, time constraints and
deployment woes.
Where is Puppet used?
There is a generalised subset of applications where Puppet
fits in, including automation, test-driven development and By: Swapneel Mehta
configuration management. The security and reliability of The author has worked with Microsoft Research, CERN and
the product, combined with its ease of use, allows for quick startups in AI and cyber security. An open source enthusiast,
adoption and integration into the software development he enjoys spending his time organising software development
cycle of a product. With less time spent on superfluous tasks, workshops for school and college students. You can connect
with him at https://www.linkedin.com/in/swapneelm and find
more focus can be afforded to core practices and product
out more at https://github.com/SwapneelM.
development, allowing for better returns for the company.

The latest from the Open Source world is here.


THE COMPLETE MAGAZINE ON OPEN SOURCE
OpenSourceForU.com
Join the community at facebook.com/opensourceforu
Follow us on Twitter @OpenSourceForU

52 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Let's Try Admin

Use ownCloud to Keep Your


Data Safe and Secure
ownCloud is an open source, self-hosted file ‘sync and share’ app platform, which
allows users to access, synchronise and share data anytime and anywhere, without
the worries associated with the public cloud.

T
he cloud is currently very popular across businesses. can register a domain and hosting space with any service
It has been stable for some time and many industries provider like GoDaddy or BigRock. The only requirement
are moving towards cloud technologies. However, is that the hosting service should support PHP and MySQL,
a major challenge in the cloud environment is privacy and which most hosting providers usually do.
the security of data. Organisations or individuals host their
professional or personal data on the cloud, and there are ownCloud server
many providers who promise a 99 per cent guarantee for There are different ways to install ownCloud server but here, we
the safety of user data. Yet, there are chances of security will use the easiest and quickest method. It will not take more
breaches and privacy concerns. Therefore, many companies than five minutes to get your cloud ready. We need to download
have pulled their data from public clouds and started the latest version of the ownCloud server and install it.
creating their own private cloud storage. Using ownCloud, To install and configure ownCloud, go to the
one can create services similar to Dropbox and iCloud. We link https://owncloud.org/install/. From this link,
can sync and share files, calendars and more. Let’s take a download the ownCloud server, https://owncloud.org/
look at how we can do this. install/#instructions-server.
The current version is 10.0.3. Click on Download
ownCloud ownCloud Server. Select the Web installer from the top options
ownCloud is free and open source software that operates like in the left side menus. Check Figure 1 for more details.
any cloud storage system on your own domain. It is very Figure 1 mentions the installation steps. We need to
quick and easy to set up compared to other similar software. download the setup-owncloud.php file and upload it into the
It can be used not only for file sharing but also to leverage Web space.
many features like text editors, to-do lists, etc. ownCloud We will upload the file using winSCP or any similar
can be integrated with any desktop or mobile calendar and kind of software. Here, I am using winSCP. If you don’t
contact apps. So now, we don’t really require a Google Drive have winSCP, you can download it first from the link https://
or Dropbox account. winscp.net/eng/download.php.
The next step is to get to know your FTP credentials in
Requirements for ownCloud order to connect to your Web space. For that, you need to
log in to your hosting provider portal. From the menu, you
Web hosting will find the FTP options, where you see your credentials.
You will need hosting space and a domain name so that you This approach varies based on your hosting provider. Since I
can access your storage publicly, anywhere and anytime. You bought my hosting from ipage.com, this was how I found my

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 53


Admin Let's Try

Figure 5: ownCloud installed

Figure 1: ownCloud downloading the Web installer file

Figure 6: Creating an account

Figure 2: FTP connection to the Web space

Figure 7: ownCloud home page

Figure 3: ownCloud installation page

Figure 8: Files uploaded in ownCloud

Figure 4: Error while installing ownCloud selection and provide your user name and password for
data. Click Next and then you will come to the home page of
FTP credentials. If you are not able to find them, then Google ownCloud. Figure 7 demonstrates that.
what specific steps you need to take for your hosting provider, There are different options—you can download your
for which you will get lots of documentation. Figure 2 shows own cloud storage as an Android or iPhone app. You can also
my FTP connection to my Web space. connect to your Calendar or Contacts. Now everything else is
Transfer your setup-owncloud.php file to your Web space self-explanatory, just as in Google Drive or Dropbox.
by dragging it. Now you can load the URL Yourdomainname. You can now upload any file and share it with friends
com/setup-owncloud.php into your browser. In my case, the and colleagues, like you would with any other cloud
domain name is coolstuffstobuy.com and hence my URL is storage service.
coolstuffstobuy.com/setup-owncloud.php.
Figure 3 demonstrates this step. As you can see, Reference
ownCloud installation has started.
[1] https://owncloud.org/install/
While clicking on the next button, if we get the error
mentioned in Figure 4, it means we don’t have proper access.
This happens because of the root directory. In shared hosting, By: Maulik Parekh
we don’t have RW access to the root folder; so I have created The author works at Cisco as a consulting engineer and has an
M. Tech degree in cloud computing from VIT University, Chennai.
one folder and put setup-owncloud.php into it. Hence, the He constantly strives to learn, grow and innovate. He can be
current path is coolstuffstobuy.com/cloud/setup-owncloud.php. reached at maulikparekh2@gmail.com. Website: https://www.
While creating an account, you can click on Database linkedin.com/in/maulikparekh2.

54 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Let’s Try Admin

Spark’s MLlib:
Scalable Support for
Machine Learning
Designated as Spark’s scalable machine learning library,
MLlib consists of common algorithms and utilities as well as
underlying optimisation primitives.

T
he world is being flooded with data from all sources. MLlib
The hottest trend in technology is related to Big Data MLlib is Spark’s machine learning library. It is predominantly
and the evolving field of data science is a way to cope used in Scala but it is compatible with Python and Java as well.
with this data deluge. Machine learning is at the heart of data MLlib was initially contributed by AMPLab at UC Berkeley. It
science. The need of the hour is to have efficient machine makes machine learning scalable, which provides an advantage
learning frameworks and platforms to process Big Data. when handling large volumes of incoming data.
Apache Spark is one of the most powerful platforms for The main features of MLlib are listed below.
analysing Big Data. MLlib is its machine learning library, Machine learning algorithms: Regression, classification,
and is potent enough to process Big Data and apply all collaborative filtering, clustering, etc
machine learning algorithms to it efficiently. Featurisation: Selection, dimensionality reduction,
transformation, feature extraction, etc
Apache Spark Pipelines: Construction, evaluation and tuning of ML pipelines
Apache Spark is a cluster computing framework based on Persistence: Saving/loading of algorithms, models and pipelines
Hadoop’s MapReduce framework. Spark has in-memory Utilities: Statistics, linear algebra, probability, data handling, etc
cluster computing, which helps to speed up computation by Some lower level machine learning primitives like the
reducing the IO transfer time. It is widely used to deal with generic gradient descent optimisation algorithm are also
Big Data problems because of its distributed architectural present in MLlib. In the latest releases, the MLlib API is based
support and parallel processing capabilities. Users prefer it to on DataFrames instead of RDD, for better performance.
Hadoop on account of its stream processing and interactive
query features. To provide a wide range of services, it The advantages of MLlib
has built-in libraries like GraphX, SparkSQL and MLlib. The true power of Spark lies in its vast libraries, which are
Spark supports Python, Scala, Java and R as programming capable of performing every data analysis task imaginable. MLlib
languages, out of which Scala is the most preferred. is at the core of this functionality. It has several advantages.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 55


Admin Let’s Try

Ease of use: MLlib integrates well with four Clustering: k-means, fuzzy k-means, etc
languages— Java, R, Python and Scala. The APIs of Decomposition: SVD, randomised SVD, etc
all four provide ease of use to programmers of various
languages as they don’t need to learn a new one. Spark MLlib use cases
Easy to deploy: No preinstallation or conversion Spark’s MLlib is used frequently in marketing optimisation,
is required to use a Hadoop based data source such as security monitoring, fraud detection, risk assessment,
HBase, HDFS, etc. Spark can also run standalone or on operational optimisation, preventative maintenance, etc.
an EC2 cluster.
Scalability: The same code can work on small or Here are some popular use cases.
large volumes of data without the need of changing NBC Universal: International cable TV has tons of data.
it to suit the volume. As businesses grow, it is easy To reduce costs, NBC takes its media offline when it is not
to expand vertically or horizontally without breaking in use. Spark’s MLlib is used to implement SVM to predict
down the code into modules for performance. which files should be taken down.
Performance: The ML algorithms run up to ING: MLlib is used for its data analytics pipeline
100X faster than MapReduce on account of the to detect anomaly. Decision trees and k-means are
framework, which allows iterative computation. implemented by MLlib to enable this.
MLlib’s algorithms take advantage of iterative Toyota: Toyota’s Customer 360 insights platform uses
computing properties to deliver better performance, social media data in real-time to prioritise the customer
surpassing that of MapReduce. The performance gain reviews and categorise them for business insights.
is attributed to the in-memory computing, which is a
speciality of Spark. ML vs MLLib
Algorithms: The main ML algorithms included There are two main machine learning packages —spark.
in the MLlib module are classification, regression, mllib and spark.ml. The former is the original version and
decision trees, recommendation, clustering, topic has its API built on top of RDD. The latter has a newer,
modelling, frequent item sets, association rules, higher-level API built on top of DataFrames to construct
etc. ML workflow utilities included are feature ML pipelines. The newer version is recommended due
transformation, pipeline construction, ML persistence, to the DataFrames, which makes it more versatile and
etc. Single value decomposition, principal flexible. The newer releases support the older version as
component analysis, hypothesis testing, etc, are also well, due to backward compatibility. MLlib, being older,
possible with this library. has more features as it was in development longer. Spark
Community: Spark is open source software under ML allows you to create pipelines using machine learning
the Apache Foundation now. It gets tested and updated to transform the data. In short, ML is new, has pipelines,
by the vast contributing community. MLlib is the most DataFrames and is easier to construct. But MLlib is old,
rapidly expanding component and new features are has RDD and has more features.
added every day. People submit their own algorithms MLlib is the main reason for the popularity and the
and the resources available are unparalleled. widespread use of Apache Spark in the Big Data world. Its
compatibility, scalability, ease of use, good features and
Basic modules of MLlib functionality have led to its success. It provides many inbuilt
SciKit-Learn: This module contains many basic ML functions and capabilities, which makes it easy for machine
algorithms that perform the various tasks listed below. learning programmers. Virtually all known machine learning
Classification: Random forest, nearest neighbour, algorithms in use can be easily implemented using either
SVM, etc version of MLlib. In this era of data deluge, such libraries
Regression: Ridge regression, support vector certainly are a boon to data science.
regression, lasso, logistic regression, etc
Clustering: Spectral clustering, k-means clustering, etc
References
Decomposition: PCA, non-negative matrix
factorisation, independent component analysis, etc [1] spark.apache.org/
[2] www.tutorialspoint.com/apache_spark/

Mahout: This module contains many basic ML


algorithms that perform the tasks listed below. By: Preet Gandhi
Classification: Random forest, logistic regression,
The author is an avid Big Data and data science enthusiast.
naive Bayes, etc She can be reached at gandhipreet1995@gmail.com.
Collaborative filtering: ALS, etc

56 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Let's Try Admin

Apache CloudStack:
A Reliable and Scalable Cloud
Computing Platform
Apache CloudStack is yet another outstanding project that has contributed many
tools and projects to the open source community. The author has selected the
relevant and important extracts from the excellent documentation provided by the
Apache CloudStack project team for this article.

A
pache CloudStack is one among the highly visible accounting, and a first-class user interface (UI).
projects from the Apache Software Foundation (ASF). ƒ It currently supports the most popular hypervisors —
The project focuses on deploying open source software VMware, KVM, Citrix XenServer, Xen Cloud Platform
for public and private Infrastructure as a Service (IaaS) clouds. (XCP), Oracle VM server and Microsoft Hyper-V.
Listed below are a few important points about CloudStack. ƒ Users can manage their cloud with an easy-to-use Web
ƒ It is designed to deploy and manage large networks interface, command line tools and/or a full-featured
of virtual machines, as highly available and scalable RESTful API. In addition, CloudStack provides an API
Infrastructure as a Service (IaaS) cloud computing that’s compatible with AWS EC2 and S3 for organisations
platforms. that wish to deploy hybrid clouds.
ƒ CloudStack is used by a number of service providers to ƒ It provides an open and flexible cloud orchestration platform
offer public cloud services and by many companies to to deliver reliable and scalable private and public clouds.
provide on-premises (private) cloud offerings or as part of
a hybrid cloud solution. Features and functionality
ƒ CloudStack includes the entire ‘stack’ of features that Some of the features and functionality provided by
most organisations desire with an IaaS cloud -- compute CloudStack are:
orchestration, Network as a Service, user and account ƒ Works with hosts running XenServer/XCP, KVM,
management, a full and open native API, resource Hyper-V, and/or VMware ESXi with vSphere

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 57


Admin Let's Try

ƒ Provides a friendly Web-based UI for managing the cloud


ƒ Provides a native API
ƒ May provide an Amazon S3/EC2 compatible API Management
Hypervisor
ƒ Manages storage for instances running on the hypervisors Server
(primary storage) as well as templates, snapshots and ISO
images (secondary storage)
ƒ Orchestrates network services from the data link layer
(L2) to some application layer (L7) services, such as Machine 1 Machine 2
DHCP, NAT, firewall, VPN and so on
ƒ Accounting of network, compute and storage resources
ƒ Multi-tenancy/account separation Figure 1: A simplified view of a basic deployment
ƒ User management
Support for multiple hypervisors: CloudStack Deployment architecture
works with a variety of hypervisors and hypervisor- CloudStack deployments consist of the management server
like technologies. A single cloud can contain multiple and the resources to be managed. During deployment,
hypervisor implementations. As of the current release, you inform the management server of the resources to be
CloudStack supports BareMetal (via IPMI), Hyper-V, KVM, managed, such as the IP address blocks, storage devices,
LXC, vSphere (via vCenter), Xenserver and Xen Project. hypervisors and VLANs.
Massively scalable infrastructure management: The minimum installation consists of one machine
CloudStack can manage tens of thousands of physical servers running the CloudStack management server and another
installed in geographically distributed data centres. The machine acting as the cloud infrastructure. In its smallest
management server scales near-linearly, eliminating the need deployment, a single machine can act as both the
for cluster-level management servers. Maintenance or other management server and the hypervisor host.
outages of the management server can occur without affecting A more full-featured installation consists of a
the virtual machines running in the cloud. highly-available multi-node management server and
Automatic cloud configuration management: up to tens of thousands of hosts using any of several
CloudStack automatically configures the network and storage networking technologies.
settings for each virtual machine deployment. Internally, Management server overview: The management
a pool of virtual appliances supports the configuration server orchestrates and allocates the resources in your cloud
of the cloud itself. These appliances offer services such deployment. It typically runs on a dedicated machine or
as firewalling, routing, DHCP, VPN, console proxy, as a virtual machine. It controls the allocation of virtual
storage access, and storage replication. The extensive use machines to hosts, and assigns storage and IP addresses
of horizontally scalable virtual machines simplifies the to the virtual machine instances. The management server
installation and ongoing operation of a cloud. runs in an Apache Tomcat container and requires a MySQL
Graphical user interface: CloudStack offers an database for persistence.
administrator’s Web interface that can be used for The management server:
provisioning and managing the cloud, as well as an end user’s ƒ Provides the Web interface for both the administrator
Web interface, for running VMs and managing VM templates. and the end user
The UI can be customised to reflect the desired look and feel ƒ Provides the API interfaces for both the CloudStack
that the service provider or enterprise wants. API as well as the EC2 interface
API: CloudStack provides a REST-like API for the ƒ Manages the assignment of guest VMs to a specific
operation, management and use of the cloud. compute resource
AWS EC2 API support: It provides an EC2 API ƒ Manages the assignment of public and
translation layer to permit common EC2 tools to be used in private IP addresses
the CloudStack cloud. ƒ Allocates storage during the VM instantiation process
High availability: CloudStack has a number of features ƒ Manages snapshots, disk images (templates)
that increase the availability of the system. The management and ISO images
server itself may be deployed in a multi-node installation ƒ Provides a single point of configuration for
where the servers are load balanced. MySQL may be your cloud
configured to use replication to provide for failover in the Cloud infrastructure overview: Resources within the
event of a database loss. For the hosts, CloudStack supports cloud are managed as follows.
NIC bonding and the use of separate networks for storage as ƒ Regions: This is a collection of one or more
well as iSCSI Multipath. geographically proximate zones managed by one or more
management servers.

58 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Let's Try Admin

Figure 3: Installation complete


Figure 2: A region with multiple zones
Host/hypervisor system requirements: The host is where
ƒ Zones: Typically, a zone is equivalent to a single the cloud services run in the form of guest virtual machines. Each
data centre. It consists of one or more pods and host is one machine that meets the following requirements:
secondary storage. ƒ Must support HVM (Intel-VT or AMD-V enabled)
ƒ Pods: A pod is usually a rack, or row of racks that ƒ 64-bit x86 CPU (more cores result in better performance)
includes a Layer-2 switch and one or more clusters. ƒ Hardware virtualisation support required
ƒ Clusters: A cluster consists of one or more homogenous ƒ 4GB of memory
hosts and primary storage. ƒ 36GB of local disk
ƒ Host: This is a single compute node within a cluster; ƒ At least 1 NIC
often, a hypervisor. ƒ Latest hotfixes applied to hypervisor software
ƒ Primary storage: This is a storage resource typically ƒ When you deploy CloudStack, the hypervisor host must
provided to a single cluster for the actual running of not have any VMs already running
instance disk images. ƒ All hosts within a cluster must be homogeneous. The
ƒ Secondary storage: This is a zone-wide resource which CPUs must be of the same type, count, and feature flags
stores disk templates, ISO images, and snapshots. Installation steps: You may be able to do a simple trial
Networking overview: CloudStack offers many types of installation, but for full installation, do make sure you go
networking, but these typically fall into one of two scenarios. through all the following topics from the Apache CloudStack
ƒ Basic: This is analogous to AWS-classic style networking. It documentation (refer to the section ‘Installation Steps’ of this
provides a single flat Layer-2 network, where guest isolation documentation):
is provided at Layer-3 by the hypervisors bridge device. ƒ Choosing a deployment architecture
ƒ Advanced: This typically uses Layer-2 isolation such ƒ Choosing a hypervisor: Supported features
as VLANs, though this category also includes SDN ƒ Network setup
technologies such as Nicira NVP. ƒ Storage setup
ƒ Best practices
Installation The steps for the installation are as follows (you can refer to
In this section, let us look at the minimum system the Apache CloudStack documentation for detailed steps). Make
requirements and installation steps for CloudStack. sure you have the required hardware ready as discussed above.
Management server, database and storage system Installing the management server (choose
requirements: The machines that will run the management single- or multi-node): The procedure for installing the
server and MySQL database must meet the following management server is:
requirements. The same machines can also be used to provide ƒ Prepare the operating system
primary and secondary storage, such as via local disks or NFS. ƒ In the case of XenServer only, download and
The management server may be placed on a virtual machine. install vhd-util
ƒ Preferred OS: CentOS/RHEL 6.3+ or Ubuntu 14.04 (.2) ƒ Install the first management server
ƒ 64-bit x86 CPU (more cores lead to better performance) ƒ Install and configure the MySQL database
ƒ 4GB of memory ƒ Prepare NFS shares
ƒ 250GB of local disk space (more space results in better ƒ Prepare and start additional management servers (optional)
capability; 500GB recommended) ƒ Prepare the system VM template
ƒ At least 1 NIC
ƒ Statically allocated IP address Configuring your cloud
ƒ Fully qualified domain name as returned by the After the management server is installed and running, you can
hostname command add the compute resources for it to manage. For an overview

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 59


Admin Let's Try

of how a CloudStack cloud infrastructure is organised, see


‘Cloud Infrastructure Overview’ in the Apache CloudStack
documentation.
To provision the cloud infrastructure, or to scale it up at
any time, follow the procedures given below:
1. Define regions (optional)
2. Add a zone to the region
3. Add more pods to the zone (optional)
4. Add more clusters to the pod (optional)
5. Add more hosts to the cluster (optional)
6. Add primary storage to the cluster
7. Add secondary storage to the zone
8. Initialise and test the new cloud Figure 4: Conceptual view of a basic deployment
When you have finished these steps, you will have a
deployment with the basic structure, as shown in Figure 4. If you decide to increase the size of your deployment, you
For all the above steps, detailed instructions are available can add more hosts, primary storage, zones, pods and clusters.
in the Apache CloudStack documentation. You may also see the additional configuration parameter setup,
hypervisor setup, network setup and storage setup.
Initialising and testing CloudStack installation from the GIT repo (for
After everything is configured, CloudStack will perform its developers): See the section ‘CloudStack Installation from
initialisation. This can take 30 minutes or more, depending on the GIT repo for Developers’ in the Apache CloudStack
the speed of your network. When the initialisation has been documentation to explore these steps for developers.
completed successfully, the administrator’s dashboard should
be displayed in the CloudStack UI. The CloudStack API
1. Verify that the system is ready. In the left navigation bar, The CloudStack API is a query based API using HTTP, which
select Templates. Click on the CentOS 5.5 (64-bit) no returns results in XML or JSON. It is used to implement the
GUI (KVM) template. Check to be sure that the status is default Web UI. This API is not a standard like OGF OCCI
‘Download Complete’. Do not proceed to the next step or DMTF CIMI but is easy to learn. Mapping exists between
until this message is displayed. the AWS API and the CloudStack API as will be seen in the
2. Go to the Instances tab, and filter on the basis next section. Recently, a Google Compute Engine interface
of My Instances. was also developed, which maps the GCE REST API to the
3. Click Add Instance and follow the steps in the wizard. CloudStack API described here.
4. Choose the zone you just added. The CloudStack query API can be used via HTTP
5. In the template selection, choose the template to use in the GET requests made against your cloud endpoint (e.g.,
VM. If this is a fresh installation, it is likely that only the http://localhost:8080/client/api). The API name is passed
provided CentOS template is available. using the command key, and the various parameters for
6. Select a service offering. Be sure that the hardware you this API call are passed as key value pairs. The request
have allows the starting of the selected service offering. is signed using the access key and secret key of the user
7. In data disk offering, if desired, add another data disk. making the call. Some calls are synchronous while some
This is a second volume that will be available to but are asynchronous. Asynchronous calls return a JobID;
not mounted in the guest. For example, in Linux on the status and result of a job can be asked with the query
XenServer you will see /dev/xvdb in the guest after AsyncJobResult call. Let’s get started and look at an
rebooting the VM. A reboot is not required if you have a example of calling the listUsers API in Python.
PV-enabled OS kernel in use. First, you will need to generate keys to make requests. In
8. In the default network, choose the primary network for the dashboard, go to Accounts, select the appropriate account
the guest. In a trial installation, you would have only one and then click on Show Users. Select the intended users and
option here. generate keys using the Generate Keys icon. You will see an
9. Optionally, give your VM a name and a group. Use any APIKey and Secret Key field being generated. The keys will
descriptive text you would like to. be in the following form:
10. Click on Launch VM. Your VM will be created and
started. It might take some time to download the template API Key : XzAz0uC0t888gOzPs3HchY72qwDc7pUPIO8LxC-
and complete the VM startup. You can watch the VM’s VkIHo4C3fvbEBY_Ccj8fo3mBapN5qRDg_0_EbGdbxi8oy1A
progress in the Instances screen. Secret Key: zmBOXAXPlfb-LIygOxUVblAbz7E47eukDS_0JYUxP3JAmknOY
To use the VM, click the View Console button. o56T0R-AcM7rK7SMyo11Y6XW22gyuXzOdiybQ

60 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Let's Try Admin

Open a Python shell and import the basic modules ‘apikey=plgwjfzk4gys3momtvmjuvg-x-jlwlnfauj9gabbbf9edm-


necessary to make the request. Do note that this request could kaymmailqzzq1elzlyq_u38zcm0bewzgudp66mg&command=listusers&re
be made in many different ways—this is just a very basic sponse=json’
example. The urllib* modules are used to make the HTTP >>> sig=hmac.new(secretkey,sig_str,hashlib.sha1).digest()
request and do URL encoding. The hashlib module gives us >>> sig
the sha1 hash function. It is used to generate the hmac (keyed ‘M:]\x0e\xaf\xfb\x8f\xf2y\xf1p\x91\x1e\x89\x8a\xa1\x05\xc4A\
hashing for message authentication) using the secret key. The xdb’
result is encoded using the base64 module. >>> sig=base64.encodestring(hmac.new(secretkey,sig_
str,hashlib.sha1).digest())
$python >>> sig
Python 2.7.3 (default, Nov 17 2012, 19:54:34) ‘TTpdDq/7j/J58XCRHomKoQXEQds=\n’
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/ >>> sig=base64.encodestring(hmac.new(secretkey,sig_
clang-421.11.66))] on darwin str,hashlib.sha1).digest()).strip()
Type “help”, “copyright”, “credits” or “license” for more >>> sig
information. ‘TTpdDq/7j/J58XCRHomKoQXEQds=’
>>> import urllib2 >>> sig=urllib.quote_plus(base64.encodestring(hmac.
>>> import urllib new(secretkey,sig_str,hashlib.sha1).digest()).strip())
>>> import hashlib Finally, build the entire string by joining the baseurl, the
>>> import hmac request str and the signature. Then do an http GET:
>>> import base64 >>> req=baseurl+request_str+’&signature=’+sig
>>> req
Define the endpoint of the Cloud, the command that you want ‘http://localhost:8080/client/api?apikey=plgWJfZK4gyS3mOMTVmj
to execute, the type of the response (i.e., XML or JSON) and the UVg-X-jlWlnfaUJ9GAbBbf9EdM-kAYMmAiLqzzq1ElZLYq_u38zCm0bewzGUd
keys of the user. Note that we do not put the secret key in our P66mg&command=listUsers&response=json&signature=TTpdDq%2F7j%2
request dictionary because it is only used to compute the hmac. FJ58XCRHomKoQXEQds%3D’
>>> res=urllib2.urlopen(req)
>>> baseurl=’http://localhost:8080/client/api?’ >>> res.read()
>>> request={} {
>>> request[‘command’]=’listUsers’ “listusersresponse” : {
>>> request[‘response’]=’json’ “count”:1 ,
>>> request[‘apikey’]=’plgWJfZK4gyS3mOMTVmjUVg-X- “user” : [
jlWlnfaUJ9GAbBbf9EdM-kAYMmAiLqzzq1ElZLYq_u38zCm0bewzGUdP66mg’ {
>>> secretkey=’VDaACYb0LV9eNjTetIOElcVQkvJck_J_QljX_ “id”:”7ed6d5da-93b2-4545-a502-23d20b48ef2a”,
FcHRj87ZKiy0z0ty0ZsYBkoXkY9b7eq1EhwJaw7FF3akA3KBQ’ “username”:”admin”,
“firstname”:”admin”,
Build the base request string, which is the combination of all the “lastname”:”cloud”,
key/pairs of the request, url encoded and joined with ampersand. “created”:”2012-07-05T12:18:27-0700”,
“state”:”enabled”,
>>> request_str=’&’.join([‘=’.join([k,urllib.quote_ “account”:”admin”,
plus(request[k])]) for k in request.keys()]) “accounttype”:1,
>>> request_str “domainid”:”8a111e58-e155-4482-93ce-
‘apikey=plgWJfZK4gyS3mOMTVmjUVg-X-jlWlnfaUJ9GAbBbf9EdM- 84efff3c7c77”,
kAYMmAiLqzzq1ElZLYq_u38zCm0bewzGUdP66mg&command=listUsers&res “domain”:”ROOT”,
ponse=json’ “apikey”:”plgWJfZK4gyS3mOMTVmjUVg-
X-jlWlnfaUJ9GAbBbf9EdM-kAYMmAiLqzzq1ElZLYq_
Compute the signature with hmac, and do a 64-bit u38zCm0bewzGUdP66mg”,
encoding and a url encoding; the string used for the signature “secretkey”:”VDaACYb0LV9eNjTetIOElcVQkvJck_J_
is similar to the base request string shown above, but the keys/ QljX_FcHRj87ZKiy0z0ty0ZsYBkoXkY9b7eq1EhwJaw7FF3akA3KBQ”,
values are lower cased and joined in a sorted order. “accountid”:”7548ac03-af1d-4c1c-9064-
2f3e2c0eda0d”
>>> sig_str=’&’.join([‘=’.join([k.lower(),urllib.quote_ }
plus(request[k].lower().replace(‘+’,’%20’))])for k in ]
sorted(request.iterkeys())]) }
>>> sig_str } Continued on page...65

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 61


Developers How To

Keras:
Building Deep Learning Applications
with High Levels of Abstraction
Keras is a high-level API for neural networks. It is written in Python and its biggest
advantage is its ability to run on top of state-of-art deep learning libraries/
frameworks such as TensorFlow, CNTK or Theano. If you are looking for fast
prototyping with deep learning, then Keras is the optimal choice.

D
eep learning is the new buzzword among machine The primary reasons for using Keras are:
learning researchers and practitioners. It has ƒ Instant prototyping: This is ability to implement the deep
certainly opened the doors to solving problems learning concepts with higher levels of abstraction with a
that were almost unsolvable earlier. Examples of such ‘keep it simple’ approach.
problems are image recognition, speaker-independent voice ƒ Keras has the potential to execute without any barriers on
recognition, video understanding, etc. Neural networks CPUs and GPUs.
are at the core of deep learning methodologies for solving ƒ Keras supports convolutional and recurrent networks --
problems. The improvements in these networks, such combinations of both can also be used with it.
as convolutional neural networks (CNN) and recurrent
networks, have certainly raised expectations and the results Keras: The design philosophy
they yield are also promising. As stated earlier, the ability to move into action with instant
To make the approach simple, there are already powerful prototyping is an important characteristic of Keras. Apart
frameworks/libraries such as TensorFlow from Google and from this, Keras is designed with the following guiding
CNTK (Cognitive Toolkit) from Microsoft. The TensorFlow principles or design philosophy:
approach has already simplified the implementation of deep ƒ It is an API designed with user friendly implementation
learning for coders. Keras is a high-level API for neural as the core principle. The API is designed to be simple
networks written in Python, which makes things even simpler. and consistent, and it minimises the effort programmers
The uniqueness of Keras is that it can be executed on top of are required to put in to convert theory into action.
libraries such as TensorFlow and CNTK. This article assumes ƒ Keras’ modular design is another important feature.
that the reader is familiar with the fundamental concepts of The primary idea of Keras is layers, which can be
machine learning. connected seamlessly.

62 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


How To Developers

Instant Prototyping
Python at Core User Friendliness

Keras - Design Philosophy


Keras - Top Reasons CPU and GPU Support
Extensible Modular Design

Support for Convolutional, Recurrent Network


Figure 2: Keras’ design philosophy

Figure 1: Primary reasons for using Keras

#1.
Defi
#3
. Fit

eM n
ƒ Keras is extensible. If you are a researcher trying to bring

Mo

ode
del

l
in your own novel functionality, Keras can accommodate Keras Flow
such extensions.

# 4.

# 2.
ƒ Keras is all Python, so there is no need for tricky

Perf

Com
orm

ile M p
declarative configuration files.

Pre

ode
dicti

l
on
Installation
It has to be remembered that Keras is not a standalone Figure 3: The sequence of tasks
library. It is an API and works on top of existing libraries
(TensorFlow, CNTK or Theano). Hence, the installation of ƒ Model definition
Keras requires any one of these backend engines. The official ƒ Compilation of the model
documentation suggests a TensorFlow backend. Detailed ƒ Model fitting
installation instructions for TensorFlow are available at https:// ƒ Performing predictions
www.tensorflow.org/install/. From this link, you can infer The basic type of model is sequential. It is simply a
that TensorFlow can be easily installed in all major operating linear stack of layers. The sequential model can be built as
systems such as MacOS X, Ubuntu and Windows (7 or later). shown below:
After the successful installation of any one of the backend
engines, Keras can be installed using Pip, as shown below: from keras.models import Sequential
model = Sequential()
$sudo pip install keras
The stacking of layers can be done with the add() method:
An alternative approach is to install Keras from the source
(GitHub): from keras.layers import Dense, Activation
model.add(Dense(units=64, input_dim=100)) model.
#1 Clone the Source from Github add(Activation(‘relu’))
$git clone https://github.com/fchollet/keras.git model.add(Dense(units=10))
model.add(Activation(‘softmax’))
#2 Move to Source Directory
cd keras Keras has various types of pre-built layers. Some of the
prominent types are:
#3 Install using setup.py ƒ Regular Dense
sudo python setup.py install ƒ Recurrent Layers, LSTM, GRU, etc
ƒ One- and two-dimension convolutional layers
The three optional dependencies that are required for ƒ Dropout
specific features are: ƒ Noise
ƒ cuDNN (CUDA Deep Neural Network library): For ƒ Pooling
running Keras on the GPU ƒ Normalisation, etc
ƒ HDF5 and h5py: For saving Keras models to disks Similarly, Keras supports most of the popularly used
ƒ Graphviz and Pydot: For visualisation tasks activation functions. Some of these are:
ƒ Sigmoid
The way Keras works ƒ ReLu
The basic building block of Keras is the model, which is a ƒ Softplus
way to organise layers. The sequence of tasks to be carried ƒ ELU
out while using Keras models is: ƒ LeakyReLu, etc

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 63


Developers How To

The model can be compiled with compile(), as follows: from __future__ import print_function

model.compile(loss=’categorical_crossentropy’, import keras


optimizer=’sgd’, from keras.datasets import mnist
metrics=[‘accuracy’]) from keras.models import Sequential
from keras.layers import Dense, Dropout
Keras is very simple. For instance, if you want to from keras.optimizers import RMSprop
configure the optimiser given in the above mentioned code,
the following code snippet can be used: batch_size = 128
num_classes = 10
model.compile(loss=keras.losses.categorical_crossentropy, epochs = 20
optimizer=keras.optimizers.SGD(lr=0.01, momentum=0.9,
nesterov=True)) # the data, shuffled and split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()
The model can be fitted with the fit() function:
x_train = x_train.reshape(60000, 784)
model.fit(x_train, y_train, epochs=5, batch_size=32) x_test = x_test.reshape(10000, 784)
x_train = x_train.astype(‘float32’)
In the aforementioned code snippet, x_train and y_train x_test = x_test.astype(‘float32’)
are Numpy arrays. The performance evaluation of the model x_train /= 255
can be done as follows: x_test /= 255

loss_and_metrics = model.evaluate(x_test, y_test, batch_ print(x_train.shape[0], ‘train samples’)


size=128) print(x_test.shape[0], ‘test samples’)

The predictions on novel data can be done with the # convert class vectors to binary class matrices
predict() function: y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
classes = model.predict(x_test, batch_size=128)
model = Sequential()
The methods of Keras layers model.add(Dense(512, activation=’relu’, input_shape=(784,)))
The important methods of Keras layers are shown in Table 1. model.add(Dropout(0.2))
model.add(Dense(512, activation=’relu’))
Method Description
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation=’softmax’))
This method is used to return the
get_weights()
weights of the layer
model.summary()

This method is used to set the model.compile(loss=’categorical_crossentropy’,


set_weights()
weights of the layer optimizer=RMSprop(),
metrics=[‘accuracy’])
This method is used to return the
get_config() configuration of the layer as a history = model.fit(x_train, y_train,
dictionary batch_size=batch_size,
epochs=epochs,
Table 1: Keras layers’ methods verbose=1,
validation_data=(x_test, y_test))
MNIST training
MNIST is a very popular database among machine learning score = model.evaluate(x_test, y_test, verbose=0)
researchers. It is a large collection of handwritten digits. A print(‘Test loss:’, score[0])
complete example for deep multi-layer perceptron training on print(‘Test accuracy:’, score[1])
the MNIST data set with Keras is shown below. This source is
available in the examples folder of Keras (https://github.com/ If you are familiar with machine learning terminology,
fchollet/keras/blob/master/examples/mnist_mlp.py): the above code is self-explanatory.

64 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


How To Developers

Image classification with pre-trained models the batch)


An image classification code with pre-trained ResNet50 is as print(‘Predicted:’, decode_predictions(preds, top=3)[0])
follows (https://keras.io/applications/):
# Predicted: [(u’n02504013’, u’Indian_elephant’, 0.82658225),
from keras.applications.resnet50 (u’n01871265’, u’tusker’, 0.1122357), (u’n02504458’,
import ResNet50 from keras.preprocessing u’African_elephant’, 0.061040461)]
import image from keras.applications.resnet50
import preprocess_input, decode_predictions The simplicity with which the classification tasks are
import numpy as np carried out can be inferred from the above code.
Overall, Keras is a simple, extensible and easy-to-
model = ResNet50(weights=’imagenet’) implement neural network API, which can be used to build
img_path = ‘elephant.jpg’ deep learning applications with high level abstraction.
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0) By: Dr K.S. Kuppusamy
x = preprocess_input(x) The author is an assistant professor of computer science
preds = model.predict(x) at the School of Engineering and Technology, Pondicherry
Central University. He has 12+ years of teaching and research
# decode the results into a list of tuples (class, experience in academia and in industry. He can be reached at
kskuppu@gmail.com.
description, probability) # (one such list for each sample in

Continued from page...61


FoWXKg3RvjHgsufcKhC1SeiCbeEc0obKwUlwJamB_
All the clients you find on GitHub implement this gFmMJkFHYHTIafpUx0pHcfLvt-dzw”
signature technique, so you should not have to do it manually. secretkey=”oxV5Dhhk5ufNowey 7OVHgWxCBVS4deTl9qL0EqMthfP
Now that you have explored the API through the UI and you Buy3ScHPo2fifDxw1aXeL5cyH10hnLOKjyKphcXGeDA”
understand how to make low level calls, pick your favourite
client or use CloudMonkey. This is a sub-project of Apache region = boto.ec2.regioninfo.RegionInfo(name=”ROOT”,
CloudStack and gives operators/developers the ability to use endpoint=”localhost”)
any of the API methods. conn = boto.connect_ec2(aws_access_key_id=accesskey, aws_
Testing the AWS API interface: While the native secret_access_key=secretkey, is_secure=False, region=region,
CloudStack API is not a standard, CloudStack provides an port=7080, path=”/awsapi”, api_version=”2012-08-15”)
AWS EC2 compatible interface. A great advantage of this is
that existing tools written with EC2 libraries can be reused images=conn.get_all_images()
against a CloudStack based cloud. In the installation section, print images
we described how to run this interface by installing packages. res = images[0].run(instance_type=’m1.small’,security_
In this section, we find out how to compile the interface with groups=[‘default’])
Maven and test it with the Python Boto module.
Using a running management server (with DevCloud for Note the new api_version number in the connection
instance), start the AWS API interface in a separate shell with object, and also note that there was no need to perform a user
the following command: registration as in the case of previous CloudStack releases.
Let us thank those at Apache for contributing yet another
mvn -Pawsapi -pl :cloud-awsapi jetty:run outstanding product to the open source community, along
with the detailed documentation they have provided for
Log into the CloudStack UI http://localhost:8080/client, go CloudStack. All the contents, samples and pictures in this
to Service Offerings and edit one of the compute offerings to have article are extracts from CloudStack online documentation
the name m1.small or any of the other AWS EC2 instance types. and you may explore more about it at http://docs.cloudstack.
With access and secret keys generated for a user, you should apache.org/en/latest/.
now be able to use the Python Boto module:
By: Somanath T.V.
import boto The author has 14+ years’ experience in the IT industry and
import boto.ec2 is currently doing research in machine learning and related
areas, along with his tenure at SS Consulting, Kochi. He can
be reached at somanathtv@gmail.com.
accesskey=”2IUSA5xylbsPSnBQ

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 65


Developers How To

Using jq to Consume
JSON in the Shell
This article is a tutorial on using jq as a JSON parser and fetching information
about the weather from different cities.

J
SON has become the most prevalent way of consuming Usage
Web APIs. If you try to find the API documentation of For this demonstration, version 1.5 of jq was used. All
a popular service, chances are that the API will respond the code examples are available at https://github.com/
in JSON format. Many mainstream languages even have jatindhankhar/jq-tutorial. jq can be used in conjunction
JSON parsers built in. But when it comes to shell scripting, with other tools like cat and curl, by piping, or be used to
there is no inbuilt JSON parser, and the only hacker way of directly read from the file, although the former is more
processing JSON is with a combination of awk and sed, which popular in practice. When working with jq, two fantastic
are very painful to use. resources can be used. The first one is the documentation at
There are many JSON parsers apart from jq but, in this https://stedolan.github.io/jq/manual/, and the second is the
article, we will focus only on this option. Online Playground (https://jqplay.org/) where one can play
with jq and even share the snippets.
Installation Throughout this article, we will use different
jq is a single binary program with no dependencies, so API endpoints of the MetaWeather API (https://www.
installation is as simple as downloading the binary from metaweather.com/api). The simplest use of jq is to pretty
https://stedolan.github.io/jq/, copying the binary in /bin or format JSON data.
/usr/bin and setting permissions. Many Linux distributions Let’s fetch the list of cities that contain the word ‘new’ in
provide jq in the repositories, so installing jq is as easy as them, and then use this information to further fetch details of
using the following commands: a particular city, as follows:

sudo apt install jq curl -sS https://www.metaweather.com/api/location/


search/?query=new
...or:
The above command will fetch all cities containing ‘new’
sudo pacman -S jq in their name. At this point, the output is not formatted.

Installation instructions may vary depending upon the [{“title”:”New York”,”location_type”:”City”,”woeid”:24


distribution. Detailed instructions are available at https:// 59115,”latt_long”:”40.71455,-74.007118”},{“title”:”New
stedolan.github.io/jq/download/. Delhi”,”location_type”:”City”,”woeid”:28743736,”latt_long”

66 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


How To Developers

:”28.643999,77.091003”},{“title”:”New Orleans”,”location_ty To display only the available cities, we add another filter,
pe”:”City”,”woeid”:2458833,”latt_long”:”29.953690,- which is the key name itself (in our case, .title). We can
90.077713”},{“title”:”Newcastle”,”location_type”:”City”, combine multiple filters using the | (pipe) operator.
”woeid”:30079,”latt_long”:”54.977940,-1.611620”},{“title Here we combine the .[] filter with .title in this way: .[] |
”:”Newark”,”location_type”:”City”,”woeid”:2459269,”latt_ .title . For simple queries, we can avoid the | operator and rewrite
long”:”40.731972,-74.174179”}] it as .[] .title, but we will use the | operator to combine queries.

Let’s pretty format by piping the curl output to jq as follows: curl -sS https://www.metaweather.com/api/location/
search/\?query\=new | jq ‘.[] | .title’
curl -sS https://www.metaweather.com/api/location/
search/\?query\=new | jq “New York”
“New Delhi”
The screenshot shown in Figure 1 compares the output of “New Orleans”
both commands. “Newcastle”
Now that we have some data to work upon, we can use “Newark”
jq to filter the keys. The simplest filter available is ‘.’ which
does nothing and filters the whole document as it is. Filters But what if we want to display multiple keys together?
are passed to jq in single quotes. By looking at the output, we Just separate them by ‘,’.
can see that all the objects are trapped inside a JSON array. Now, let’s display the city along with its ID (woeid):
To filter out the array, we use .[] , which will display all items
inside an array. To target a specific item by index, we place curl -sS https://www.metaweather.com/api/location/
the index number inside .[0]. search/\?query\=new | jq ‘.[] | .title,.woeid’
To display the first item, use the following code:
“New York”
curl -sS https://www.metaweather.com/api/location/ 2459115
search/\?query\=new | jq ‘.[0]’ “New Delhi”
28743736
{ “New Orleans”
“title”: “New York”, 2458833
“location_type”: “City”, “Newcastle”
“woeid”: 2459115, 30079
“latt_long”: “40.71455,-74.007118” “Newark”
} 2459269

Figure 1: Output comparison

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 67


Developers How To

Figure 2: Basic filters

The output looks good, but what if we format the output The JSON structure for this endpoint looks like what’s
and print it on a single line? For that we can use string shown in Figure 3.
interpolation. To use keys inside a string pattern, we use Consolidated_weather contains an array of JSON objects
backslash and parentheses so that they are not executed. with weather information, and the sources key contains
an array of JSON objects from which particular weather
curl -sS https://www.metaweather.com/api/location/ information was fetched.
search/\?query\=new | jq ‘.[] | “For \(.title) code is \ This time, let’s store JSON in a file named weather.
(.woeid)”’ json instead of directly piping data. This will help us

“For New York code is 2459115”


“For New Delhi code is 28743736”
“For New Orleans code is 2458833”
“For Newcastle code is 30079”
“For Newark code is 2459269”

In our case, JSON is small, but if it is too big and we need


to filter it based on a key value (like display the information
for New Delhi), jq provides the select keyword for that
operation.

curl -sS https://www.metaweather.com/api/location/


search/\?query\=new | jq ‘ .[] | select(.title == “New
Delhi”) ‘

{
“title”: “New Delhi”,
“location_type”: “City”,
“woeid”: 28743736,
“latt_long”: “28.643999,77.091003”
}

Now that we have the Where on Earth ID (woeid) for


New Delhi, we can retrieve more information about New
Delhi using the endpoint https://www.metaweather.com/api/
location/woeid/. Figure 3: JSON structure

68 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


How To Developers

avoid making an API call every time we want to perform an


operation and, instead, we can use the saved JSON.

curl -sS https://www.metaweather.com/api/location/28743736/ >


weather.json

Now we can use jq in the format jq ‘filters’ weather.json


and we can also load filters from a file using the -f parameter.
The command is jq -f filters.txt weather.json, but we can just
load the JSON file and pass filters in the command line.
Let’s list the weather followed by the source name. Since
both sources and consolidated_weather is of the same length
(get the length using the length filter), we can use range
to generate an index and use string interpolation. There is
transpose and map inbuilt as well. Covering all of them won’t
be possible in a single article.

jq ‘range(0;([.sources[]] | length)) as $i | “ \(.sources[$i]


.title) predicts \(.consolidated_weather[$i] .weather_state_
name)”’ weather.json

“ BBC predicts Light Cloud”


“ Forecast.io predicts Clear”
“ Met Office predicts Clear” Figure 4: Final output
“ OpenWeatherMap predicts Clear”
“ World Weather Online predicts Clear” . as $root | print_location, (.consolidated_weather |
“ Yahoo predicts Clear” process_weather_data)

There are so many functions and filters but we will use Save the above code as filter.txt.
sort_by and date functions, and end this article by printing the sort_by sorts the value by data. format_date takes dates as
forecast for each day in ascending order. parameters and extracts short day names, dates and months.
print_location and print_data do not take any parameter,
# Format Date and can be applied after the pipe operator; and the default
# This function takes value via the Pipe (|) operator parameter for a parameterless function will be ‘.’
def format_date(x):
x |strptime(“%Y-%m-%d”) | mktime | strftime(“%a - %d, jq -f filter.txt weather.json -r
%B”);
-r will return a raw string. The output is shown in Figure 4.
def print_location: I hope this article has given you an overview of all that jq
. | “ can achieve. If you are looking for a tool that is easy to use in
Location: \(.title) shell scripts, jq can help you out; so give it a try.
Coordinates : \(.latt_long) “;

def print_data: References


. | “
[1] https://stedolan.github.io/jq/manual/v1.5/
------------------------------------------------ [2] https://github.com/jatindhankhar/jq-tutorial
| \(format_date(.applicable_date))\t\t | [3] https://www.metaweather.com/api/
| Humidity : .\(.humidity)\t\t | [4] https://jqplay.org/
| Weather State: \(.weather_state_name)\t\t\t |
------------------------------------------------”; By: Jatin Dhankhar
The author loves working with modern C++, Ruby, JavaScript
def process_weather_data:
and Haskell. He can be reached at jatin@jatindhankhar.in.
. | sort_by(.applicable_date)[] | print_data;

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 69


Developers How To

Developing Real-Time
Notification and Monitoring Apps
in IBM Bluemix
IBM Bluemix is a cloud PaaS that supports numerous programming languages
and services. It can be used to build, run, deploy and manage applications on the
cloud. This article guides the reader in building a weather app as well as an app to
remotely monitor vehicle drivers.

C
loud computing is one of the emerging research development tools without any complexities. In general, IBM
technologies today. With the wide use of sensor and Bluemix is used as a Platform as a Service (PaaS), as it has
wireless based technologies, the cloud has expanded many programming platforms for almost all applications. It
to the Cloud of Things (CoT), which is the merger of cloud provides programming platforms for PHP, Java, Go, Ruby,
computing and the Internet of Things (IoT). These technologies Node.js, ASP.NET, Tomcat, Swift, Ruby Sinatra, Python,
provide for the transmission and processing of huge amounts Scala, SQL databases, NoSQL platforms, and many others.
of information on different channels with different protocols. Function as a Service (FaaS) is also integrated in IBM
This integration of technologies is generating volumes of Bluemix along with serverless computing, leading to a higher
data, which then has to be disseminated for effective decision degree of performance and accuracy. IBM Bluemix’s services
making in multiple domains like business analytics, weather began in 2014 and gained popularity within three years.
forecasting, location maps, etc.
A number of cloud service providers deliver cloud based Creating real-time monitoring apps in IBM Bluemix
services and application programming interfaces (APIs) IBM Bluemix presents high performance cloud services
to users and developers. Cloud computing has different with the integration of the Internet of Things (IoT) so that
paradigms and delivery models including IaaS, PaaS, SaaS, real-time applications can be developed for corporate as
Communication as a Service (CaaS) and many others. A cloud well as personal use. Different types of applications can be
computing environment that uses a mix of cloud services is programmed for remote monitoring by using IBM and third
known as the hybrid cloud. There are many hybrid clouds, party services. In the following scenarios, the implementation
which differ on the basis of the types of services and features aspects of weather monitoring and vehicle driver behaviour
in their cloud environment. analysis are covered.
The prominent cloud service providers include IBM
Bluemix, Amazon Web Service (AWS), Red Hat OpenShift, Creating a weather notification app
Google Cloud Platform, etc. A weather notification app can be easily created with IBM
Bluemix so that real-time messages to the client can be
IBM Bluemix and cloud services delivered effectively. It can be used for real-life scenarios
IBM Bluemix is a powerful, multi-featured, hybrid cloud during trips and other purposes. Those planning to travel to
environment that delivers assorted cloud services, APIs and a particular place can get real-time weather notifications.

70 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


How To Developers

Figure 2: Creating an IoT app in IBM Bluemix

Figure 1: Login panel for IBM Bluemix


Figure 3: Selecting Weather Services in IBM Bluemix
The remote analytics of the weather, including temperature,
humidity and other parameters can be received by travellers,
and they can take the appropriate precautions.
First, a new IBM ID needs to be created on https://
console.bluemix.net/. IBM Bluemix provides users a
30-day trial, without need for credit card authentication.
Most other cloud services ask for international credit
card authentication. Users of IBM Bluemix, on the other
hand, can create, program and use the services with just
a unique e-mail ID.
After creating an account and logging in to the
Bluemix panel, there are many cloud based services Figure 4: Dashboard of IBM Bluemix with the weather notification app
which can be programmed for real-time use. IBM
Bluemix is a hybrid cloud that delivers many types of IBM Cloudant is a cloud based database product of the
cloud applications including infrastructure, containers, non-relational distributed variety.
VMWare, network management, storage, the Internet of Using the Node-RED editor, the overall flow of the process
Things, data analytics, high performance computation, can be programmed and visualised for any type of service and
mobile applications and Web apps. process. Node-RED is a flow based integrated development
To create a weather notification app, just search tool created by IBM so that objects and their connections can
the IBM Bluemix catalogue for the Boilerplates based be set up without complex programming. Once the connections
Internet of Things. are created with the IoT app and weather service, the different
The name of the new cloud app can be mentioned. The channels including input, weather services, and transformation
other values and parameters can be set as default, as we are of temperature format can be set in Node-RED. After the
working on a free trial account. Once the Cloud Foundry is successful deployment and launch of the service, the output
created, the next step is to connect this app with the Weather will be visible in the user panel. The conditions related to
Services in IBM Bluemix. Search for Weather Company different categories of weather (cloudy, sunny, rainy, humid,
Data and IBM Bluemix will display the appropriate option. etc) can be specified using the Node-RED editor.
Using Weather Company Data, live temperature and
weather parameters can be fetched effectively. Creating an app for vehicles and to
The newly created app will be visible in the remotely monitor drivers
dashboard of IBM Bluemix and, automatically, a By using another IoT application, the multi-dimensional
Cloudant database will be created here. This database behaviour of vehicle drivers can be analysed remotely using
will be used to store and process the data in JSON format. IBM Bluemix on the Watson IoT platform.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 71


Developers How To

Figure 5: Editing Node-RED editor Figure 8: Selecting the driver behaviour service in IoT

Figure 6: Message display panel

Figure 7: Creating a context mapping app for driver behaviour analysis


Figure 9: Credentials and tenant information
The following behavioural aspects of car drivers can be
monitored remotely:
ƒ Speed
ƒ Braking attitude
ƒ Sharp use of brakes
ƒ Smooth or harsh accelerations
ƒ Sharp turns
ƒ Frequent U-turns
You will first need to visit the IoT Context Mapping Figure 10: Creating remote devices with the related information
Service in the IBM Bluemix console.
The following credentials and authentication fields will be After setting all the parameters with respect to devices
displayed after creating the service: and gateways, click on Deploy and wait for the ‘successfully
ƒ Tenant ID deployed’ notification.
ƒ User name
ƒ Password Twilio for live messaging
It should be noted that we have to select the free plan in If there is a need for live messaging to a mobile phone,
every panel to avoid the charges of cloud services provided by Twilio can be used. Using the Twilio APIs, messages can be
IBM Bluemix. delivered to smartphones and other points. Twilio provides
After getting the tenant information and authentication APIs and authentication tokens, which can also be mapped
tokens, the next step is to create remote devices with in IBM Bluemix for live monitoring of weather or other IoT
information about their unique identities. Every sensor based applications.
based device has a unique ID, which is set in the IBM
Watson IoT platform so that live monitoring from remote By: Dr Gaurav Kumar
places can be done. The Watson platform provides options The author is the MD of Magma Research and Consultancy Pvt
to enter different devices with the help of ‘vehicles’ and Ltd, Ambala. He is associated with various academic and research
gateways. By using this approach, real-time signals are institutes, where he delivers lectures and conducts technical
transmitted to satellites and then to the end user for taking workshops on the latest technologies and tools. He can be
contacted at kumargaurav.in@gmail.com.
further action and decisions.

72 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Insight Developers

Regular Expressions in Programming


Languages: The JavaScript Story
Each programming language has its own way of parsing regular expressions. We have
looked at how regular expressions work with different languages in the earlier four articles
in this series. Now we explore regular expressions in JavaScript.

I
n the previous issue of OSFY, we tackled pattern matching Though we are mostly interested in the use of regular
in PHP using regular expressions. PHP is most often used expressions, as always, let’s begin with a brief discussion on
as a server-side scripting language but what if your client the syntax and history of JavaScript.
doesn’t want to bother the server with all the work? Well, JavaScript is an interpreted programming language.
then you have to process regular expressions at the client side ECMAScript is a scripting language specification from the
with JavaScript, which is almost synonymous with client-side European Computer Manufacturer’s Association (ECMA)
scripting language. So, in this article, we’ll discuss regular and International Organization for Standardization (ISO),
expressions in JavaScript. standardised in ECMA-262 and ISO/IEC 16262 for
Though, technically, JavaScript is a general-purpose JavaScript. JavaScript was introduced by Netscape Navigator
programming language, it is often used as a client-side (now defunct) in 1995; soon Microsoft followed with its own
scripting language to create interactive Web pages. With version of JavaScript which was officially named JScript.
the help of JavaScript runtime environments like Node.js, The first edition of ECMAScript was released in June 1997
JavaScript can also be used at the server-side. However, in in an effort to settle the disputes between Netscape and
this article, we will discuss only the client-side scripting Microsoft regarding the standardisation of JavaScript. The
aspects of JavaScript because we have already discussed latest edition of ECMAScript, version 8, was released in June
regular expressions in the server-side scripting language— 2017. All modern Web browsers support JavaScript with the
PHP. Just like we found out about PHP in the previous article help of a JavaScript engine that is based on the ECMAScript
in this series, you will mostly see JavaScript code embedded specification. Chrome V8, often known as V8, is an open
inside HTML script. As mentioned earlier in the series, source JavaScript engine developed by Google for the
limited knowledge of HTML syntax will in no way affect the Chrome browser. Even though JavaScript has borrowed a lot
understanding of the regular expressions used in JavaScript. of syntax from Java, do remember that JavaScript is not Java.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 73


Developers Insight

Standalone JavaScript applications World’ script hello.html in JavaScript:


Now that we have some idea about the scope and evolution
of JavaScript, the next obvious question is, can it be used to <html>
develop standalone applications rather than only being used <body>
as an embedded scripting language inside HTML scripts? <script>
Well, anything is possible with computers and yes, JavaScript alert(‘Hello World’);
can be used to develop standalone applications. But whether </script>
it is a good idea to do so or not is debatable. Anyway, there </body>
are many different JavaScript shells that allow you to run </html>
JavaScript code snippets directly. But, most often, this is
done during testing and not for developing useful standalone Now let us try to understand the code. The HTML part of
JavaScript applications. the code is straightforward and needs no explanation. All the
Like standalone PHP applications, standalone JavaScript JavaScript code should be placed within the <script> tags
applications are also not very popular because there are (<script> and </script>). In this example, the following code
other programming languages more suitable for developing uses the alert( ) function to display the message ‘Hello World’
standalone applications. JSDB, JLS, JSPL, etc, are some in a dialogue box:
JavaScript shells that will allow you to run standalone
JavaScript applications. But I will use Node.js, which I alert(‘Hello World’);
mentioned earlier, to run our standalone JavaScript file first.js
with the following single line of code: To view the effect of the JavaScript code, open the file
using any Web browser. I have used Mozilla Firefox for this
console.log(‘This is a stand-alone application’); purpose. Figure 2 shows the output of the file hello.html.
Please note that a file containing JavaScript code alone can
Open a terminal in the same directory containing the file have the extension .js, whereas an HTML file with embedded
first.js and execute the following command: JavaScript code will have the extension .html or .htm.

node -v

This will make sure that Node.js is installed in your


system. If Node.js is not installed, install it and execute the
following command:

node first.js

…at the terminal to run the script:

first.js

The message ‘This is a stand-alone application’ is


displayed on the terminal. Figure 1 shows the output of the
script first.js. This and all the other scripts discussed in this
article can be downloaded from opensourceforu.com/article_ Figure 2: Hello World in JavaScript
source_code/December17Javascript.zip
Regular expressions in JavaScript
There are many different flavours of regular expressions
used by various programming languages. In this series we
have discussed two of the very popular regular expression
styles. The Perl Compatible Regular Expressions (PCRE)
Figure 1: Standalone application in JavaScript style is very popular, and we have seen regular expressions
in this style being used when we discussed the programming
Hello World in JavaScript languages Python, Perl and PHP in some of the previous
Whenever someone discusses programming languages, it is articles in this series. But we have also discussed the
customary to begin with ‘Hello World’ programs; so let us ECMAScript style of regular expressions when we discussed
not change that tradition. The code below shows the ‘Hello regular expressions in C++. If you refer to that article on

74 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Insight Developers

regular expressions in C++ you will come across some is the same as that of regex1.html. The next few lines of code
subtle differences between PCRE and the ECMAScript style involve an if-else block. The following line of code uses the
of regular expressions. JavaScript also uses ECMAScript search( ) method provided by the String object:
style regular expressions. JavaScript’s support for regular
expressions is built-in and is available for direct use. Since we if(str.search(pat) != -1)
have already dealt with the syntax of the ECMAScript style
of regular expressions, we can directly work with a simple The search( ) method takes a regular-expression pattern as
JavaScript file containing regular expressions. an argument, and returns either the position of the start of the
first matching substring or −1 if there is no match. If a match is
JavaScript with regular expressions found, the following line of code inside the if block prints the
Consider the script called regex1.html shown below. To save message ‘MATCH FOUND’ in bold:
some space I have only shown the JavaScript portion of
the script and not the HTML code. But the complete file is document.write(‘<b>MATCH FOUND</b>’);
available for download.
Otherwise, the following line of code inside the else block
<script> prints the message ‘NO MATCH’ in bold:
var str = “Working with JavaScript”;
var pat = /Java/; document.write(‘<b>NO MATCH</b>’);
if(str.search(pat) != -1) {
document.write(‘<b>MATCH FOUND</b>’); Remember the search( ) method searches for a substring
} else { match and not for a complete word. This is the reason why the
document.write(‘<b>NO MATCH</b>’); script reports ‘Match found’. If you are interested in a literal
} search for the word Java, then replace the line of code:
</script>
var pat = /Java/;
Open the file regex1.html in any Web browser and you
will see the message ‘Match Found’ displayed on the Web …with:
page in bold text. Well, this is an anomaly, since we did not
expect a match. So, now let us go through the JavaScript code var pat = /\sJava\s/;
in detail to find out what happened. The following line of code
stores a string in the variable str: The script with this modification regex3.html is also
available for downloading. The notation \s is used to denote
var str = “Working with JavaScript”; a whitespace; this pattern makes sure that the word Java is
present in the string and not just as a substring in words like
The line of code shown below creates a regular expression JavaScript, Javanese, etc. If you open the script regex3.html
pattern and stores it in the variable pat: in a Web browser, you will see the message ‘NO MATCH’
displayed on the Web page.
var pat = /Java/;
Methods for pattern matching
The regular expression patterns are specified as characters In the last section, we had seen the search( ) method provided
within a pair of forward slash ( / ) characters. Here, the regular by the String object. The String object also provides three other
expression pattern specifies the word Java. The RegExp object methods for regular expression processing. The methods are
is used to specify regular expression patterns in JavaScript. replace( ), match( ) and split( ). Consider the script regex4.html
This regular expression can also be defined with the RegExp( ) shown below which uses the method replace( ):
constructor using the following line of code:
<html>
var pat = new RegExp(“Java”); <body>
<form id=”f1”>
This is instead of the line of code: ENTER TEXT HERE: <input type=”text” name=”data” >
</form>
var pat = /Java/; <button onclick=”check( )”>CLICK</button>
<script>
A script called regex2.html with this modification is function check( ) {
available for download. The output for the script regex2.html var x = document.getElementById(“f1”);

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 75


Developers Insight

var text =””;


text += x.elements[0].value;
text = text.replace(/I am/i,”We are”);
document.write(text);
}
</script>
</body>
</html>

Open the file regex4.html in a Web browser and you will Figure 3: Input to regex4.html
see a text box to enter data and a Submit button. If you enter
a string like ‘I am good’, you will see the output message ‘we
are good’ displayed on the Web page. Let us analyse the code
in detail to understand how it works. There is an HTML form
which contains the text box to enter data, with a button that,
when pressed, will call a JavaScript method called check( ).
The JavaScript code is placed inside the <script> tags. The
following line of code gets the elements in the HTML form:
Figure 4: Output of regex4.html
var x = document.getElementById(“f1”);
object for regular expression processing. Search( ) returns the
In this case, there is only one element in the HTML form, starting index of the matched substring, whereas the match( )
the text box. The following line of code reads the content of method returns the matched substring itself. What will happen
the text box to the variable text: if we replace the line of code:

text += x.elements[0].value; text = text.replace(/I am/i,”We are”);

The following line of code uses the replace( ) method to …in regex4.html with the following code?
test for a regular expression pattern and if a match is found,
the matched substring is replaced with the replacement string: text = text.match(/\d+/);

text = text.replace(/I am/i,”We are”); If you open the file regex5.html having this modification,
enter the string article part 5 in the text box and press the
In this case, the regular expression pattern is /I am/i and Submit button. You will see the number ‘5’ displayed on the
the replacement string is We are. If you observe carefully, you Web page. Here the regular expression pattern is /\d+/ which
will see that the regular expression pattern is followed by an matches for one or more occurrences of a decimal digit.
‘i’. Well, we came across similar constructs throughout the Another method provided by the String object for regular
series. This ‘i’ is an example of a regular expression flag, and expression processing is the split( ) method. This breaks the
this particular one instructs the regular expression engine to string on which it was called into an array of substrings, using
perform a case-insensitive match. So, you will get a match the regular expression pattern as the separator. For example,
whether you enter ‘I AM’, ‘i am’ or even ‘i aM’. replace the line of code:
There are other flags also like g, m, etc. The flag g will
result in a global match rather than stopping after the first text = text.replace(/I am/i,”We are”);
match. The flag m is used to enable the multi-line mode. Also
note the fact that the replace( ) method did not replace the …in regex4.html with the code:
contents of the variable text; instead, it returned the modified
string, which then was explicitly stored in the variable text. The text = text.split(“.”);
following line of code writes the contents on to the Web page:
…to obtain regex6.html.
document.write(text); If you open the file regex6.html, enter the IPv4 address
192.100.50.10 in dotted-decimal notation on the text box and
Figure 3 shows the input for the script regex4.html and press the Submit button. From then on, the IPv4 address will
Figure 4 shows the output. be displayed as ‘192, 100, 50, 10’. The IPv4 address string is
A method called match( ) is also provided by the String split into substrings based on the separator ‘.’ (dot).

76 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Insight Developers

String processing of regular expressions match both the spellings, ‘pretence’ and ‘pretense’. Here the
In previous articles in this series we mostly dealt with character class operator [ ] is used in the regular expression
regular expressions that processed numbers. For a change, pattern to match either the letter c or the letter s.
in this article, we will look at some regular expressions to I have only discussed specific solutions to the problems
process strings. Nowadays, computer science professionals mentioned here so as to make the regular expressions very
from India face difficulties in deciding whether to use simple. But with the help of complicated regular expressions
American English spelling or the British English spelling it is possible to solve many of these problems in a more
while preparing technical documents. I always get general way rather than solving individual cases. As
confused with colour/color, programme/program, centre/ mentioned earlier, C++ also uses ECMAScript style regular
center, pretence/pretense, etc. Let us look at a few simple expressions; so any regular expression pattern we have
techniques to handle situations like this. developed in the article on regular expressions in C++ can be
For example, the regular expression /colo(?:u)?r/ will match used in JavaScript without making any modifications.
both the spellings ‘color’ and ‘colour’. The question mark Just like the pattern followed in the previous articles in this
symbol ( ? ) is used to denote zero or one occurrence of the series, after a brief discussion on the specific programming
preceding group of characters. The notation (?:u) groups u with language, in this case, JavaScript, we moved on to the use of
the grouping operator ( ) and the notation ?: makes sure that the the regular expression syntax in that language. This should be
matched substring is not stored into a memory unnecessarily. enough for practitioners of JavaScript, who are willing to get
So, here a match is obtained with and without the letter u. their hands dirty by practising with more regular expressions.
What about the spellings ‘programme’ and ‘program’? In the next part of this series on regular expressions, we will
The regular expression /program(?:me)?/ will accept both discuss the very powerful programming language, Java, a
these spellings. The regular expression /cent(?:re|er)/ will distant cousin of JavaScript.
accept both the spellings, ‘center’ and ‘centre’. Here the pipe
symbol ( | ) is used as an alternation operator. By: Deepu Benson
What about words like ‘biscuit’ and ‘cookie’? In British
The author is a free software enthusiast and his area of
English the word ‘biscuit’ is preferred over the word ‘cookie’
interest is theoretical computer science. He maintains a
and the reverse is the case in American English. The regular technical blog at www.computingforbeginners.blogspot.in.
expression /(?:cookie|biscuit)/ will accept both the words — He can be reached at deepumb@hotmail.com.
‘cookie’ and ‘biscuit’. The regular expression /preten[cs]e/ will

Would You
Like More
DIY Circuits?

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 77


Developers Insight

Insights into Machine Learning


Machine learning is a fascinating study. If you are a beginner or simply curious
about machine learning, this article covers the basics for you.

M
achine learning is a set of methods by which building a model from sample records. These models are
computers make decisions autonomously. Using used in developing decision trees, through which the system
certain techniques, computers make decisions by takes all the decisions. Machine learning programs are also
considering or detecting patterns in past records and then structured in such a way that when exposed to new data,
predicting future occurrences. Different types of predictions they learn and improve over time.
are possible, such as about weather conditions and house
prices. Apart from predictions, machines have learnt how Implementing machine learning
to recognise faces in photographs, and even filter out email Before we understand how machine learning is implemented
spam. Google, Yahoo, etc, use machine learning to detect in real life, let’s look at how machines are taught. The
spam emails. Machine learning is widely implemented across process of teaching machines is divided into three steps.
all types of industries. If programming is used to achieve 1 Data input: Text files, spreadsheets or SQL databases
automation, then we can say that machine learning is used to are fed as input to machines. This is called the training
automate the process of automation. data for a machine.
In traditional programming, we use data and programs 2 Data abstraction: Data is structured using algorithms
on computers to produce the output, whereas in machine to represent it in simpler and more logical formats.
learning, data and output is run on the computer to produce a Elementary learning is performed in this phase.
program. We can compare machine learning with farming or 3. Generalisation: An abstract of the data is used as
gardening, where seeds --> algorithms, nutrients --> data, and input to develop the insights. Practical application
the gardener and plants --> programs. happens at this stage.
We can say machine learning enables computers to learn The success of the machine depends on two things:
to perform tasks even though they have not been explicitly ƒ How well the generalisation of abstraction data happens.
programmed to do so. Machine learning systems crawl ƒ The accuracy of machines when translating their learning
through the data to find the patterns and when found, adjust into practical usage for predicting the future set of actions.
the program’s actions accordingly. With the help of pattern In this process, every stage helps to construct a better
recognition and computational learning theory, one can study version of the machine.
and develop algorithms (which can be built by learning Now let’s look at how we utilise the machine in real life.
from the sets of available data), on the basis of which the Before letting a machine perform any unsupervised task, the
computer takes decisions. These algorithms are driven by five steps listed below need to be followed.

78 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Insight Developers

Training data for the


machine like text files,
SQL databases,
Data spreadsheets etc.
Output Actual learning happens
Program here by representing data
in simpler and logical
format using algorithm

PERSON

ISA

STUDENT TEACHER

Data
Program Practical application happens
Output here. It is used to generalize
the real-time data to derive
new insights

Figure 1: Traditional programming vs machine learning Figure 2: The process of teaching machines

Collecting data: Data plays a vital role in the machine


learning process. It can be from various sources and formats
like Excel, Access, text files, etc. The higher the quality and
quantity of the data, the better the machine learns. This is the Collecting
data
Preparing
the data
Training
the model
Model
Evaluation
Performance
Improvement

base for future learning.


Preparing the data: After collecting data, its quality
must be checked and unnecessary noise and disturbances that
are not of interest should be eliminated from the data. We
need to take steps to fix issues such as missing data and the
treatment of outliers.
Training the model: The appropriate algorithm is Figure 3: Implementing machine learning
selected in this step and the data is represented in the form
of a model. The cleaned data is divided into training data
and testing data. The training data is used to develop the Machine Learning
Algorithms
data model, while the testing data is used as reference to Classification

ensure that the model has been trained well to produce


accurate results. Supervised Unsupervised
Reinforcement
Model evaluation: In this step, the accuracy and (Classification,
Regression/
(Clustering,
Dimensionality (Association
Analysis)
precision of the chosen algorithm is ensured based on the Prediction) Reduction)

results obtained using the test data. This step is used to


evaluate the choice of the algorithm. Figure 4: Classification of algorithms
Performance improvement: If the results are not
satisfactory, then a different model can be chosen to Training
Text
implement the same or more variables are introduced to Documents,
Images,
increase efficiency. Sounds...
Machine
Learning

Types of machine learning algorithms Labels


Algorithm

Machine learning algorithms have been classified into three


major categories.
Supervised learning: Supervised learning is the most New Text
commonly used. In this type of learning, algorithms produce Document,
Image,
features
vector Predictive Expected
a function which predicts the future outcome based on Sound Model Label

the input given (historical data). The name itself suggests


that it generates output in a supervised fashion. So these Figure 5: Supervised learning model (Image credit: Google)
predictive models are given instructions on what needs to
be learnt and how it is to be learnt. Until the model achieves Here we know how we can identify the fruits based on their
some acceptable level of efficiency or accuracy, it iterates colour, shape, size, etc.
over the training data. Some of the algorithms we can use here are the
To illustrate this method, we can use the algorithm for neural network, nearest neighbour, Naïve Bayes, decision
sorting apples and mangoes from a basket full of fruits. trees and regression.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 79


Developers Insight

Training
features
Text
vectors
Documents,
Images, Machine
Sounds... Learning
Algorithm

New
Likelihood
Text features or
Document, vector
Model Cluster Id
Image,
or
Sound...
Better
representation

Figure 6: Unsupervised learning model (Image credit: Google)

Unsupervised learning: The objective of unsupervised


learning algorithms is to represent the hidden structure of
the data set in order to learn more about the data. Here, we Figure 7: Machine learning applications
only have input data with no corresponding output variables.
Unsupervised learning algorithms develop the descriptive offers. It also evaluates the risks involved with those offers.
models, which approach the problems irrespective of the And it can even predict which customers are most likely to be
knowledge of the results. So it is left to the system to find defaulters in repaying loans or credit card bills.
out the pattern in the available inputs, in order to discover Healthcare: Machine learning is used to diagnose fatal
and predict the output. From many possible hypotheses, the illnesses from the symptoms of patients, by comparing them
optimal one is used to find the output. with the history of patients with a similar medical history.
Sorting apples and mangoes from a basket full of fruits Retail: Machine learning helps to spot the products that
can be done using unsupervised learning too. But this time the sell. It can differentiate between the fast selling products
machine is not aware about the differentiating features of the and the rest. That analysis helps retailers to increase or
fruits such as colour, shape, size, etc. We need to find similar decrease the stocks of their products. It can also be used to
features of the fruits and sort them accordingly. recognise which product combinations can work wonders.
Some of the algorithms we can use here are the K-means Amazon, Flipkart and Walmart all use machine learning
clustering algorithm and hierarchical clustering. to generate more business.
Reinforcement learning: In this learning method, ideas Publishing and social media: Some publishing firms
and experiences supplement each other and are also linked use machine learning to address the queries and retrieve
with each other. Here, the machine trains itself based on the documents for their users based on their requirements and
experiences it has had and applies that knowledge to solving preferences. Machine learning is also used to narrow down
problems. This saves a lot of time, as very little human the search results and news feeds. Google and Facebook are
interaction is required in this type of learning. It is also called the best examples of companies that use machine learning.
the trial-error or association analysis technique, whereby the Facebook also uses machine learning to suggest friends.
machine learns from its past experiences and applies its best Games: Machine learning helps to formulate strategies for
knowledge to make decisions. For example, a doctor with a game that requires the internal decision tree style of thinking
many years of experience links a patient’s symptoms to the and effective situational awareness. For example, we can build
illness based on that experience. So whenever a new patient intelligence bots that learn as they play computer games.
comes, he uses his experience to diagnose the illness of the Face detection/recognition: The most common
patient. example of face detection is this feature being widely
Some of the algorithms we can use here are the Apriori available in smartphone cameras. Facial recognition has
algorithm and the Markov decision process. even evolved to the extent that the camera can figure out
when to click – for instance, only when there is a smile on
Machine learning applications the face being photographed. Face recognition is used in
Machine learning has ample applications in practically every Facebook to automatically tag people in photos. It’s machine
domain. Some major domains in which it plays a vital role are learning that has taught systems to detect a particular
shown in Figure 7. individual from a group photo.
Banking and financial services: Machine learning plays Genetics: Machine learning helps to identify the genes
an important role in identifying customers for credit card associated with any particular disease.

80 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Insight Developers

Machine learning tools But for a non-programmer, Weka is highly


There are enough open source tools or frameworks recommended when working with machine learning
available to implement machine learning on a system. algorithms.
One can choose any, based on personal preferences for a
specific language or environment. Advantages and challenges
Shogun: Shogun is one of the oldest machine learning The advantages of machine learning are:
libraries available in the market. It provides a wide range ƒ Machine learning helps the system to decode based
of efficient machine learning processes. It supports on the training data provided in the dynamic or
many languages such as Python, Octave, R, Java/ Scala, undermined state.
Lua, C#, Ruby, etc, and platforms such as Linux/UNIX, ƒ It can handle multi-dimensional, multi-variety data, and
MacOS and Windows. It is easy to use, and is quite fast at can extract implicit relationships within large data sets
compilation and execution. in a dynamic, complex and chaotic environment.
Weka: Weka is data mining software that has a ƒ It saves a lot of time by tweaking, adding, or dropping
collection of machine learning algorithms to mine the data. different aspects of an algorithm to better structure
These algorithms can be applied directly to the data or the data.
called from the Java code. ƒ It also uses continuous quality improvement for any
Weka is a collection of tools for: large or complex process.
ƒ Regression ƒ There are multiple iterations that are done to deliver the
ƒ Clustering highest level of accuracy in the final model.
ƒ Association rules ƒ Machine learning allows easy application and
ƒ Data pre-processing comfortable adjustment of parameters to improve
ƒ Classification classification performance.
ƒ Visualisation The challenges of machine learning are as follows:
Apache Mahout: Apache Mahout is a free and ƒ A common challenge is the collection of relevant data.
open source project. It is used to build an environment Once the data is available, it has to be pre-processed
to quickly create scalable machine learning algorithms depending on the requirements of the specific algorithm
for fields such as collaborative filtering, clustering and used, which has a serious effect on the final results.
classification. It also supports Java libraries and Java ƒ Machine learning techniques are such that it is difficult
collections for various kinds of mathematical operations. to optimise non-differentiable, discontinuous loss
TensorFlow: TensorFlow performs numerical functions. Discontinuous loss functions are important in
computations using data flow graphs. It performs cases such as sparse representations. Non-differentiable
optimisations very well. It supports Python or C++, loss functions are approximated by smooth loss
is highly flexible and portable, and also has diverse functions without much loss in sparsity.
language options. ƒ It is not guaranteed that machine learning algorithms
CUDA-Convnet: CUDA-Convnet is a machine will always work in every possible case. It requires some
learning library widely used for neural network awareness about the problem and also some experience
applications. It has been developed in C++ and can even in choosing the right machine learning algorithm.
be used by those who prefer Python over C++. The ƒ Collection of such large amounts of data can sometimes
resulting neural nets obtained as output from this library be an unmanageable and unwieldy task.
can be saved as Python-pickled objects, and those objects
can be accessed from Python. References
H2O: This is an open source machine learning as well
[1] https://electronicsforu.com/technology-trends/
as deep learning framework. It is developed using Java, machine-learning-basics-newbies/3
Python and R, and it is used to control training due to its [2] https://www.analyticsvidhya.com/blog/2015/06/
machine-learning-basics/
powerful graphic interface. H2O’s algorithms are mainly
[3] https://martechtoday.com/how-machine-learning-
used for business processes like fraud or trend predictions. works-150366
[4] https://machinelearningmastery.com/basic-concepts-
in-machine-learning/
Languages that support machine learning
The languages given below support the implementation of
the machine language: By: Palak Shah
ƒ MATLAB The author is an associate consultant in Capgemini and
ƒ R loves to explore new technologies. She is an avid reader
and writer of technology related articles. She can be
ƒ Python
reached at palak311@gmail.com
ƒ Java

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 81


Developers Overview

A Peek at Popular and Preferred


Open Source Web Development Tools
Web development tools allow developers to test the user interface of a website or a
Web application, apart from debugging and testing their code. These tools should not
be mistaken for Web builders and IDEs.

E
ver wondered why different commercial software instance of Microsoft Word and Google Docs. The former is
applications such as eBay, Amazon or various social a common desktop based word-processing application which
platforms like Facebook, Twitter, etc, were initially uses the MS Word software installed on the desktop. Google
developed as Web applications? The obvious answer is that Docs is also a word processing application, but all its users
users can easily access or use different Web applications perform the word processing functions using the Web browser
whenever they feel like, with only the Internet. This is what on which it runs, instead of using the software installed on
helps different online retail applications lure their customers their computers.
to their products. There is no need to install Web applications Different Web applications use Web documents, which
specifically on a given system and the user need not even are written in a standard format such as JavaScript and
worry about the platform dependency associated with that HTML. All these formats are supported by a number of Web
application. Apart from these, there are many other factors browsers. Web applications can actually be considered as
that make Web applications very user friendly, which we will variants of the client-server software model, where the client
discuss as we go along. software is downloaded to the client system when the relevant
A Web application is any client-server software Web page is visited, using different standard procedures such
application that makes use of a website as its interface or as HTTP. There can be different client Web software updates
front-end. The user interface for any Web application runs which may take place whenever we visit the Web page.
in an Internet browser. The main function of any browser is During any session, the Web browser interprets and then
to display the information received from a server and also displays the pages and hence acts as the universal client for
to send the user’s data back to the server. Let’s consider the any Web application.

82 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Overview Developers

But first, let’s discuss why Web applications are needed


and what are their benefits.
1. Cost-effective development: Different users can access
THE DESIGN &
CONCEPT any Web application via a uniform environment,
DEVELOP
which is nothing but a Web browser. The interaction
of the user with the application needs to be tested
UPDATE &
MANAGE on different Web browsers, but the application can
STAGES IN only be developed for a single operating system. As
WEB DESIGN & DOMAIN it is not necessary to develop and test the application
DEVELOPMENT NAME on all possible operating system (OS) versions
and configurations, it makes the development and
TEST troubleshooting task much easier.
2. Accessible anywhere: Web applications are accessible
HOSTING anywhere, anytime and on any PC with the help of an
UPLOAD
Internet connection. This also opens up the possibilities
of using Web applications for real-time collaboration
and accessing them remotely.
Figure 1: Different stages in Web application development 3. Easily customisable: It is easier to customise the user
(Image source: googleimages.com) interface of Web based applications than of desktop
applications. Hence, it’s easier to update and customise
Web application development since its inception the look and feel of the Web applications or the way
Initially, each individual Web page was delivered as a static their information is presented to different user groups.
document to the client, but the sequence of the pages could 4. Can be accessed by a range of devices: The content
still provide an interactive experience, since a user’s input of any Web application can also be customised for use
was returned using Web form elements present in the page on any type of device connected to the Internet. This
markup. In the 1990s, Netscape came up with a client-side helps to access the application through mobile phones,
scripting language named JavaScript. Programmers could now tablets, PCs, etc. Users can receive or interact with the
add some dynamic elements to the user interface which ran on information in a way that best suits them.
the client side. Just after the arrival of Netscape, Macromedia 5. Improved interoperability: The Web based architecture
introduced Flash, which is actually a vector animation player of an application makes it possible to easily integrate the
that can be added to different browsers as a plugin to insert enterprise systems, improve workflow and other such
different animations on the Web pages. It even allows the use business processes. With the help of Internet technologies,
of scripting languages to program the interactions on the client we get an adaptable and flexible business model that can
side, without communicating with the server. be changed according to market demands.
Next, the concept of a Web application was introduced in 6. Easier installation and maintenance of the
the Java programming language in the Servlet Specification application: Any Web based application can be
(version 2.2). This was when XMLHttpRequest object had installed and maintained with comparatively less
also been introduced on Internet Explorer 5 as an ActiveX complications. Once a new version of the application
object. After all this, Ajax came in and applications like is installed on the host server, all users can access
Gmail made their client sides more interactive. Now, a Web it directly without any need to upgrade the PC of
page script is able to actually contact the server for retrieving each user. The roll-out of new software can also be
and storing data without downloading the entire Web page. accomplished easily, requiring only that the users
We should not forget HTML5, which was developed to have updated browsers and plugins.
provide multimedia and graphic capabilities without any need
for client side plugins. The APIs and Document Object Model
Quick & Easy
(DOM) are fundamental parts of the HTML5 specification. Increased
Installation
Revenue
WebGL API led the way for advanced 3D graphics using
the HTML5 canvas and JavaScript language. If we talk
about the current situation, we have different programming
Benefits of
languages like Python and PHP (apart from Java), which Efficiency & Web Easy
Effectiveness Application Customization
can be used to develop any Web application. We also have Development
different frameworks and open source tools that really help
in developing a full-fledged Web application quite easily. So
let’s discuss a few such tools as we go forward. Figure 2: Benefits of Web application development (Image source: googleimages.com)

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 83


Developers Overview

7. Can adapt to increased workloads: If a Web application


requires higher power to perform certain tasks then only
the server hardware needs to be upgraded. The capacity
of a Web application can be increased by ‘clustering’ the
software on different servers simultaneously.
8. Increased security: Web applications are deployed on
dedicated servers which are maintained and monitored
by experienced server administrators. This leads to
tighter security and any potential breaches are noticed
far more quickly.
9. Flexible core technologies: We can use any of the available
three core technologies for building Web applications,
based on the requirements of that application. Java-based
solutions (J2EE) involve technologies such as servlets
and JSP. The recent Microsoft .NET platform makes use
of SQL Server, Active Server Pages and .NET scripting Figure 3: Login page for phpMyAdmin (Image source: googleimages.com)
languages. The last option is the open source platforms
(PHP and MySQL), which are best suited for smaller and It supports a wide range of different operations on MariaDB
low budget websites. and MySQL. Some of the frequently used operations (such as
managing databases, relations, tables, indexes, users, etc) can
Different open source tools for developing be performed with the help of the user interface, while we can
Web applications still directly execute any of the SQL statements.

KompoZer Features
This is an open source HTML editor, which is based on the 1. Has an intuitive Web interface.
Nvu editor. It’s maintained as a community-driven software 2. Imports data from SQL and CSV.
development project, and is a project on Sourceforge. 3. Can administer multiple servers.
KompoZer’s WYSIWYG (what you see is what you get) 4. Has the ability to search globally in a database or a
editing capabilities are among its main attractions. The latest subset of it.
of its pre-release versions is KompoZer 0.8 beta 3. Its stable 5. Can create complex queries using QBE (Query-by-
version was KompoZer 0.7.10, released in August 2007. It example).
complies with the Web standards of W3C. By default, the 6. Can create graphics of our database layout in various
Web pages are created in accordance with HTML 4.01 Strict. formats.
It uses Cascading Style Sheets (CSS) for styling purposes, but 7. Can export data to various formats like SQL,
the user can even change the settings and choose between the CSV, PDF, etc.
following styling options: 8. Supports most of the MySQL features such as tables,
HTML 4.01 and XHTML 1.0 views, indexes, etc.
Strict and transitional DTD
CSS styling and the old <font> based styling. XAMPP
KompoZer can actually call on the W3C HTML validator, In XAMPP, the X denotes ‘cross-platform’, A stands for the
which uploads different Web pages to the W3C Markup Apache HTTP server, M for MySQL, and the two Ps for
Validation Service and then checks for compliance. PHP and Perl. This platform is very popular, and is widely
preferred for open source Web application development. The
Features development of any Web application using XAMPP helps to
1. Available free of cost. easily stack together a different number of programs in order
2. Easy to use, hence even non-techies can work with it. to constitute an application as desired. The best part of Web
3. Combines Web file management and easy-to-use applications developed using XAMPP is that these are open
WYSIWYG Web page editing. source with no licensing required, and are free to use. They
4. Allows direct code editing. can be customised according to one’s requirement. Although
5. Supports Split code graphic view. XAMPP can be installed on all the different platforms, its
installation file is specific to a platform.
phpMyAdmin
phpMyAdmin is an open source software tool written in Features
PHP. It handles the administration of MySQL over the Web. 1. Can be installed on all operating systems.

84 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Overview Developers

2. Easy installation and configuration. Some Developer Toolbar commands


3. Live community support. Developer Tool-
Command definition
4. Supports easy syndication of the operating system, bar command
application server, used programming language and Console Open, close, and clear the console
database to develop any open source Web application for
the desired outcome in an optimal development time. View and manipulate app-
Appcache
5. It is an all-in-one solution, with just one control panel cache entries
for installing and configuring all packaged programs
Dbg Command to control the debugger
Firefox Web Developer Toolbar calllog Log function calls to the console
The Firefox Developer Toolbar gives us command-line
List all the installed add-ons, en-
access to a large number of developer tools within
addon able or
Firefox. It’s a graphical command line interpreter, which disable a specific add-on
provides integrated help for its commands and also
displays rich output with the power of a command line. break List, add, or remove breakpoints
It is considered to be extensible, as we can add our own disconnect Disconnect from a remote server
local commands and even convert those into add-ons so
that others can also install and use them. Edit one of the resources loaded by
edit
We can open the Developer Toolbar by pressing the page
Shift+F2. This will appear at the bottom of the browser export Export the page
as shown in Figure 4.
inspect Examine a node in the inspector
The command-line prompt takes up most of the toolbar,
with the ‘Close’ button on its left and a button to toggle the
Toolbox on the right. Pressing Shift+F2 or even selecting the
Developer Toolbar menu item will set off the toolbar.

References
[1] http://www.wikipedia.org/
[2] http://www.magicwebsolutions.co.uk/
[3] https://www.rswebsols.com/
[4] http://binarysemantics.com/

By: Vivek Ratan


The author is a B. Tech in electronics and instrumentation
engineering. He currently works as an automation
test engineer at Infosys, Pune. He can be reached at
Figure 4: Firefox Developer Toolbar present at the bottom of the browser
ratanvivek14@gmail.com for any suggestions or queries.
(Image source: googleimages.com)

The latest from the Open Source world is here.


THE COMPLETE MAGAZINE ON OPEN SOURCE
OpenSourceForU.com
Join the community at facebook.com/opensourceforu
Follow us on Twitter @OpenSourceForU

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 85


Developers Let's Try

Simplify and Speed Up


App Development with OpenShift
OpenShift is Red Hat’s container application platform that brings Docker and
Kubernetes into play when deployed. With OpenShift, you can easily and quickly
build, develop and deploy applications, irrespective of the platform being used. It is
an example of a Platform as a Service.

P
latform as a Service or PaaS is a cloud computing mechanisms for service management, such as monitoring,
service model that reduces the complexity of building workflow management, discovery and reservation.
and maintaining the computing infrastructure. It gives There are some disadvantages of using PaaS. Every user
an easy and accessible environment to create, run and deploy may not have access to the full range of tools or to the high-
applications, saving developers all the chaotic work such as end tools like the relational database. Another problem is that
setting up, configuring and managing resources like servers PaaS is open only for certain platforms. Users need to depend
and databases. It speeds up app development, allowing users on the cloud service providers to update the tools and to stay
to focus on the application itself rather than worry about the in sync with other changes in the platform. They don’t have
infrastructure and runtime environment. control over this aspect.
Initially, PaaS was available only on the public cloud.
Later, private and hybrid PaaS options were created. Hybrid OpenShift
PaaS is typically a deployment consisting of a mix of public OpenShift is an example of a PaaS and is offered by Red
and private deployments. PaaS services available in the cloud Hat. It provides an API to manage its services. OpenShift
can be integrated with resources available on the premises. Origin allows you to create and manage containers.
PaaS offerings can also include facilities for application OpenShift helps you to develop, deploy and manage
design, application development, testing and deployment. applications which are container-based, and enables
PaaS services may include team collaboration, Web service faster development and release life cycles. Containers are
integration, marshalling, database integration, security, standalone processes, with their own environment, and are
scalability, storage, persistence, state management, application not dependent on the operating system or the underlying
versioning and developer community facilitation, as well as infrastructure on which they run.

86 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Let's Try Developers

Types of OpenShift services OpenShift Container: OpenShift Container or OpenShift


OpenShift Origin: OpenShift Origin is an open source Enterprise is a private PaaS product from Red Hat. It includes
application container platform from Red Hat, which has the best of both worlds—containers powered by Docker and
been released under the Apache licence. It uses the core the management provided by Kubernetes. Red Hat announced
of Docker container packaging and Kubernetes container OpenShift Container Platform 3.6 on August 9, 2017.
cluster management, which enables it to provide services
to create and manage containers easily. Essentially, it OpenShift features
helps you to create, deploy and manage applications in ƒ In OpenShift, we can create applications by using
containers that are independent of the operating system and programming languages such as Java, Node.js, .NET,
the underlying infrastructure. Ruby, Python and PHP.
OpenShift Online: This is Red Hat’s public ƒ OpenShift also provides templates that allow you to build
cloud service. (compile and create packages) and release application
OpenShift Dedicated: As its name suggests, this is frameworks and databases.
Red Hat’s offering for maintaining private clusters. Red Hat ƒ It provides service images and templates of JBoss
OpenShift Dedicated provides support for application images, middleware. These are available as a service on OpenShift.
database images, Red Hat JBoss middleware for OpenShift, A user can build (compile and create packages) applications
and Quickstart application templates. Users can get this on and deploy them across different environments.
the Amazon Web Services (AWS) and Google Cloud Platform ƒ OpenShift provides full access to a private database copy
(GCP) marketplaces. with full pledge control, as well as a choice of datastores
like MariaDB, MySQL, PostgreSQL, MongoDB, Redis,
and SQLite.
ƒ Users can benefit from a large community of Docker-
formatted Linux containers. OpenShift has the capability
to work directly with the Docker API, and unlocks a new
world of content for developers.
ƒ Simple methods are used to deploy OpenShift, such as
clicking a button or entering a Git push command.
ƒ OpenShift is designed to reduce many systems
administration problems related to building and deploying
containerised applications. It permits the user to fully
control the deployment life cycle.
ƒ The OpenShift platform includes Jenkins, which is an
Figure 1: Types of OpenShift services open source automation server that can be used for
continuous integration and delivery. It can integrate unit
test case results, promote builds, and orchestrate build
jobs. This is done by using upstream and downstream
jobs, and creating a pipeline of jobs.
ƒ OpenShift supports integration with IDEs such as Eclipse,
Visual Studio and JBoss Developer Studio. It is easier to
use any of these IDEs and work with OpenShift.

Getting started with OpenShift Online


Let’s take a quick tour of OpenShift Online. Go to
https://manage.openshift.com.

Figure 2: OpenShift Dedicated supports application images, database images,


Red Hat JBoss middleware for OpenShift, and Quickstart application templates Figure 3: OpenShift Online dashboard

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 87


Developers Let's Try

Figure 4: OpenShift online login

Figure 6: OpenShift Online — authorize Red Hat developer

Project and on the Welcome to OpenShift page, provide the


name and display name. Next, click on Create. Select the
language from the Browse Catalogue option. Select Red Hat
JBoss Web server (Tomcat). Select the version, provide the
name and the Git repository URL.
Next, click on Create. You will get the ‘Application
created’ message. Click on Continue to overview. Go to the
Overview section of the project created and verify the details
related to the application.
By following the above steps, you have created your first
project. Now, you can continue exploring further to get a
better understanding of OpenShift Online.

References
[1] https://www.openshift.com/
[2] https://www.openshift.org/
Figure 5: OpenShift Online – sign in to GitHub

Click on Login and log in by using any social media account. By: Bhagyashri Jain and Mitesh S.
Sign in to GitHub. Bhagyashri Jain is a systems engineer and loves Android
Click on Authorize redhat-developer. development. She likes to read and share daily news on her blog
Provide your account details. Then verify the email at http://bjlittlethings.wordpress.com.
address using your email account. Mitesh S. is the author of the book, ‘Implementing DevOps with
Next, select a starter plan, followed by the region you Microsoft Azure’. He occasionally contributes to https://clean-
want. Then confirm subscription. clouds.com and https://etutorialsworld.com. Book link: https://
www.amazon.com/DevOps-Microsoft-Visual-Studio-Services-
Now your account will be provisioned.
ebook/dp/B01MSQWO4w.
On the OpenShift online dashboard, click on Create

88 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Let's Try Developers

Cloud Foundry is an industry standard cloud application platform.


Developers can use it to build apps without having to worry about the nitty
gritty of hardware and software maintenance. By focusing solely on the
application, they can be more productive.

C
loud Foundry is an open source, Platform as a Service Why opt for a PaaS offering like Cloud Foundry?
(PaaS) offering, governed by the Cloud Foundry Choosing a PaaS offering has multiple benefits. It
Foundation. You can deploy it on AWS, Azure, GCP, abstracts away the hardware and infrastructure details
vSphere or your own computing infrastructure. so your workforce can concentrate more on application
development, and you require very few operations to be
The different landscapes for applications managed by the IT team. This leads to faster turnaround
Let’s take a step back and quickly check out all the landscapes times for your applications and better cost optimisation. It
for applications. If you want an application to cater to one of also helps in rapid prototyping as you have the platform
your needs, there are several ways of getting it to do so. taken care of; so you can build prototypes around your
1. Traditional IT: In a traditional landscape, you can procure business problems more rapidly.
your infrastructure, manage all the servers, handle the data
and build applications on top of it. This gives you the most Cloud Foundry: A brief description
control, but also adds operational complexity and cost. Cloud Foundry is multi-cloud, open source software that can
2. Infrastructure as a Service (IaaS): In this case, you can be hosted on AWS, Azure, GCP or your own stack. Since
buy or lease the infrastructure from a service provider, Cloud Foundry is open source, you get application portability
install your own operating system, programming runtimes, out-of-the-box, i.e., you are not locked in to a vendor. You
databases, etc, and build your custom applications on top can build your apps on it and move them across any of the
of it. Examples include AWS, Azure, etc. Cloud Foundry providers.
3. Platform as a Service (PaaS): With this, you get a The Cloud Foundry project is managed by the Cloud
complete platform from the service provider, with the Foundry Foundation, whose mission is to establish and
hardware, operating system, and runtimes managed by the sustain the development of the platform and to provide
service provider --you can build applications on top of it. continuous innovation and value to the users and operators
Examples include Cloud Foundry, OpenShift, etc. of Cloud Foundry. The members of the Cloud Foundry
4. Software as a Service (SaaS): Here, the service provider has Foundation include Cisco, Dell, EMC, GE, Google, IBM,
already a pre-built application running on the cloud —if it suits Microsoft, Pivotal, SAP, SUSE and VMware.
your needs, you just get a subscription and use it. There might From a developer’s point of view, Cloud Foundry has
be a pre-existing application to meet your needs but if there support for buildpacks and services. Buildpacks provide
isn’t, this offering provides very little room for customisation. the framework and runtime support for apps. Typically,
Examples include Gmail, Salesforce, etc. they examine your apps to determine what dependencies

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 89


Developers Let's Try

Router Routing

UAA Server Authentication

Cloud Controller Application availability services App Lifecycle

App Storage
Application Execution
BLOS Store and Execution
Framework

Figure 1: Cloud landscapes


Service Broker Services
to download, and how to configure the apps to communicate
with bound services. Services are often externally managed Components for internal communication between VMs Messaging

components that may or may not be hosted on the Cloud


Metrics and
Foundry stack (examples include databases, caches, etc). They Metrics Collector Loggregator
Logging

are available in the marketplace, and can be consumed by the


application by binding to them. Figure 2: Cloud Foundry architecture

Cloud Foundry architecture 5. Services


Cloud Foundry components can be broadly classified • Service broker: Cloud Foundry has external components
under the following categories. like a database, SaaS applications and platform features
1. Routing (e.g., the platform can offer services for analytics or
• Router: This is the entry point into the Cloud Foundry (CF) machine learning), which are classified as services. These
instance. CF provides REST services for administration services can be bound to an application and be consumed
purposes too. So, the call is initially received by the router by it. The service broker is responsible for provisioning an
and redirected to the cloud controller, if it’s an administration instance of the service and binding it to the application.
call, or to an application running on the stack. 6. Messaging
2. Authentication • The platform’s component VMs communicate with each
• User account and authentication (UAA) server: The role of other and share messages over HTTP and HTTPS. This
the UAA server is to log in users and issue OAuth tokens component is responsible for sharing messages, and also
for those logged in, which can be used by the applications. storing long-lived data (like the IP address of a container in
It can also provide SSO services, and has endpoints for a Consul server) and short-lived data (like application status
registering OAuth clients and user management functions. and heartbeat messages) on Diego’s bulletin board system.
3. Application life cycle 7. Metrics and logging
• Cloud connector: The cloud connector is responsible for • Metrics collector: This collects statistics from the
the deployment of applications. When you push an app to components, which are used for book-keeping and health
CF, it reaches the cloud connector, which coordinates with management by the framework as well as by the operators
other components and deploys the application on individual managing the infrastructure.
cells in your space. • Loggregator: Applications built on top of the Cloud
• Application availability components (nsync, BBS Foundry stack need to write their logs on the system output
and Cell Reps): These components are responsible streams. These streams are received by the Loggregator,
for the health management of the applications. They which can be used to redirect them to file systems,
constantly monitor an application’s state and reconcile databases or to external log management services.
them with their expected states, starting and stopping Cloud Foundry is a highly scalable, easy-to-manage, open
processes as required. source platform that can be used to develop applications of all
4. App storage and execution types and sizes. To get further information about the ecosystem,
• BLOB storage: This is binary storage, which stores your you can visit https://www.cloudfoundry.org.
application binaries and the buildpacks that are used to run
the applications. By: Shiva Saxena
• Application execution framework (Diego): Application The author is a FOSS enthusiast. He currently works as a
instances, application tasks and staging tasks all run as consultant, and is involved in developing enterprise application
Garden containers on the Diego cell VMs. The Diego cell rep and Software-as-a-Service (SaaS) products. He has hands-on
development experience with Android, Apache Camel, C#, .NET,
component manages the life cycle of those containers and the Hadoop, HTML5, Java, OData, PHP, React, etc, and loves to
processes running in them. It reports their status to the Diego explore new and bleeding-edge technologies. He can be reached
BBS, and emits their logs and metrics to Loggregator. at shivasaxena@outlook.com.

90 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Let's Try For U & Me

DokuWiki: An Ace Among the Wikis


A wiki could be defined as a simple, collaborative content management system which
allows content editing. DokuWiki can be used for several purposes like a discussion
forum for members of a team, providing tutorials and guides, providing knowledge
about a topic, and so on.

Once you have finished typing the content on the page,


you can preview it by clicking the ‘Preview’ button at the
bottom of the editor. This will show you how your page will
be displayed after it is published.
Pages can be removed by purging all their contents. Once
a page is removed, all its interlinked pages too get removed.
But we can restore a page by choosing its ‘Old revisions’
option. This option stores snapshots of the page over different
time periods, so it is easy to restore a page with its contents
from a particular time period.

Namespaces
Typically, a wiki may contain lots of pages. It is important to
organise the pages so that the information the user seeks can
be found easily. Namespaces serve this purpose by keeping all
the relevant pages in one place. The following namespaces are
bundled along with the standard DokuWiki installation:
ƒ wiki
ƒ playground
It is recommended that if you want to try anything
before getting into a live production environment, use the
‘playground’ namespace.

D
To create a namespace, use the following syntax:
okuWiki is PHP powered, modest but versatile
wiki software that handles all the data in plain text namespace_name: page_name
format, so no database is required. It has a clear
and understandable structure for your data, which allows If the defined namespace doesn’t exist, DokuWiki
you to manage your wiki without hassles. DokuWiki is automatically creates it without any break in linkage with the
really flexible and offers various customisation options at rest of the wiki.
different levels. Since DokuWiki is open source, it has a To delete a namespace, simply erase all of its pages,
huge collection of plugins and templates which extend its which leads to empty namespaces; DokuWiki automatically
functionalities. It is also well documented and supported by deletes these.
a vast community.
Although DokuWiki has numerous features, this article Links
focuses only on the basics, in order to get readers started. The linkage between the pages is vital for any wiki site.
This ‘linkability’ keeps the information organised and
Pages easily accessible. By the effective use of links, the pages
Pages can be easily created in DokuWiki by simply launching are organised in a concise manner. DokuWiki supports the
it on your browser. The first time, you will be shown a page following types of links in a page.
like the one in Figure 1. External links: These links deal with the external
Find the pencil icon on the right side; clicking on it resources, i.e., websites. You can use a complete URL for
will open up an editor and that’s it. You can start writing a website such as www.duckduckgo.com or you can add
content on that page. The various sections of the page can an alternative text for that website like [[https://www.
be identified by the headings provided on it. The sections duckduckgo.com | search with DuckDuckGo]]. Also, we can
are listed out as a ‘Table of Contents’ on the top right link an email ID by enclosing it with the angled brackets, for
side of the page. example <admin@localhost.com>.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 91


For U & Me Let's Try

recognised by your browser;


so it is recommended that
you add media files in the
multiple formats prescribed
above, so that any one of the
formats can be recognised
by your browser.

Access control lists


Figure 1: Sample blank page ACL or Access Control Lists
are one of the core features
of the DokuWiki. They
Figure 3: Examples of links define the access rights of
the wiki for its users. There
are seven types of permissions that can be assigned to the
users, which are Read, Edit, Create, Upload, Delete, None
and Admin. Of these, ‘Delete’ is the highest permission and
Figure 2: Old revisions of a page ‘Read’ is the lowest, so if the ‘Delete’ permission is assigned
to users, they can have ‘Read’, ‘Edit’, ‘Create’ and ‘Upload’
Internal links: Internal links point to the pages within permissions as well.
the wiki. To create an internal link, just enclose the page Admin permissions are assigned to the members of the
name within the square brackets. The colour of the page links admin group, and surpass all the other permissions. Also, the
shows the availability of the page. If the link is in green, members of the admin group are considered as ‘super users’,
then the page is available. And if the link is red, then the so regardless of the ACL restrictions, they can have unlimited
page is unavailable. access on the wiki.
Sectional links: Sectional links are used to link different Please note that ‘Create’, ‘Upload’ and ‘Delete’
sections of a page by using the ‘#’ character followed by the permissions can be applied to namespaces only.
section name, enclosed by the double square brackets on both To change the access rules for a page or a namespace,
the sides. For example: [[#section|current section]] follow the steps given below:
1. Log in to DokuWiki as the admin.
Media 2. Now click on the gear wheel icon next to the admin label
Media files can be added to your wiki by clicking the on the top of the page.
‘picture frame’ icon (next to the smiley in the editor). On 3. Choose the ‘Access control list management’ option in the
clicking the icon, a new dialogue box will pop up, and by administration page.
clicking the ‘Select files’ button you can select the files 4. On the left side you will see the available namespaces
for your wiki. When you’re finished, click the ‘Upload’ and their pages. You just have to click on your choice
button and once it is uploaded, the image will show up and select the group or the user by supplying the
in the dialogue box. You can further customise the image respective name.
by clicking on it. This will display the ‘Link settings’ 5. For each page or namespace, the current permissions for
option. You can define how the image can be linked to the selected group or user will be displayed. Below that,
your wiki, the alignment, and the size of your image. you can change the permissions and save them. These will
After customising the image, click on the ‘Insert’ button get updated on the ACL rules table.
to insert the image in the page. DokuWiki determines the access rights for each user by
You can add metadata to your image by using the the following constraints:
‘Media manager’ option and choose the ‘Edit’ option for the 1. DokuWiki will check all the permission rules against
particular image. a person or the group to which s/he belongs, but the
DokuWiki supports the following media formats: catch lies with the permission rule, which is closer to the
namespace: page level will take precedence and this will
Images: gif, jpg, png determine the access for that person.
Video: webm, ogv, mp4 2. When more than one rule is matched on the same level,
Audio: ogg, mp3, wav the rule with the highest permission will be chosen.
Flash: swf Please note that users in DokuWiki are compiled into
groups. Before a user is manually added to any group
Sometimes, the media file you added may not be by the administrator, all of its users will belong to the

92 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Let's Try For U & Me

User 1:
Name: Stella Ritchie, a non-registered user
User group: @ALL
For this user, in the rule table, the first
rule and the third one matches, but the third
rule matches on the namespace level since
her access to the washington_team is None.

Figure 4: Image uploaded Figure 5: Link settings User 2:


Name: Priscilla Hilpert, a registered user
following groups: User group: @wa_dc
ƒ @user: All registered users will belong to this group. For Priscilla, the Rules 1, 2, 4, 6 and 8 are matching.
ƒ @ALL: Every user of DokuWiki falls into this group, Priscilla can have access to the namespace washington_team
including registered and non-registered users. via Rule 4.
Let’s assume you want to create a namespace for your In Rule 8, it shows that Priscilla can have ‘Read only’
team in Washington and name it as washington_team. Now access to the page washington_team: tasks since this
we want this namespace to be accessible only to your team permission was set at the namespace level.
members. To achieve that, you will add team members to a Rule 6 shows that Priscilla is prohibited from accessing
user group @wa_dc. Now let’s analyse the ACL rule table the page washington_team: swift_private since the permission
definition for the namespace washington_team. for accessing this page is set to None.
The first line of the table tells us that any user, including
the registered and non-registered users, can only read the User 3:
contents on the root namespace. Name: Sylvester Swift
The second line of the table shows us that the registered User group: @wa_dc
users of the wiki can have ‘Upload’ access, which enables For Sylvester, the Rules 1, 2, 4, 7 and 8 are matching.
them additional accesses such as ‘Read’, ‘Edit’ and ‘Create.’ Rule 4 enables Sylvester to access the washington_
The third line tells us that users in the group @ALL cannot team. Rule 7 gives the Edit and Read access to the
access this namespace. This line specifically restricts access to page washington_team: swift_private. Rule 8 shows
all, including the intended users. that Sylvester has ‘Read only’ access to the page
The fourth line shows that the user group @wa_dc can washington_team: tasks.
have ‘Upload’ access. This line is the continuation of the third
line, since we have now successfully made the namespace Customisation
washington_team exclusive for the user group @wa_dc. The default look of DokuWiki can be changed by choosing
In the fifth line, it shows us that ‘Admin’ has given ‘Delete’ a template from the template collection that is available
access to the namespace washington_team; hence, the admin for download. It is recommended that you download the
can have full and unrestricted access to the namespace. template version that is equal to or more recent than your
Now let’s create two pages—tasks and swift_private — in DokuWiki version.
washington_team to see how ACL rules are applied to different DokuWiki offers a huge collection of plugins. As
users (Figure 7). of now, it has more than 900 plugins that are available
for download, each of which extend the functions of
DokuWiki. Plugin installation can be automated by using
the extension manager; or you could do it manually by
downloading the package on your computer and manually
upload it via the extension manager. Please note that the
Figure 6: ACL rule table for the Washington namespace plugins are contributed by the user community and may
not be properly reviewed by the DokuWiki team. Always
look out for the warnings and update information in
a plugin page to avoid problems.

By: Magimai Prakash


The author has a B.E. degree in computer science. As he is
deeply interested in Linux, he spends most of his leisure time
exploring open source.
Figure 7: ACL rule table for the Washington namespace and its pages

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 93


For U & Me Overview

The Connect Between


Deep Learning and AI
Deep learning is a sub-field of machine learning and is
related to algorithms. Machine learning is a kind of artificial
intelligence that provides computers with the ability to learn,
without explicitly programming them.

D
eep learning is a new area of machine learning The first neural nets were born out of the need to address
research, which has been introduced with the objective the inaccuracy of an early classifier, the perceptron. It was
of moving machine learning closer to one of its shown that by using a layered web of perceptrons, the
original goals—artificial intelligence (AI). Deep learning accuracy of predictions could be improved. This new breed of
is the sub-field of machine learning that is concerned with neural nets was called a multi-layer perceptron or MLP.
algorithms. Its structure and function is inspired by that You may have guessed that the prediction accuracy of a
part of the human brain called neural networks. It is the neural net depends on its weights and biases. We want the
work of well-known researchers like Andrew Ng, Geoff accuracy to be high, i.e., we want the neural net to predict a
Hinton, Yann LeCun, Yoshua Bengio and Andrej Karpathy value that is as close to the actual output as possible, every
which has brought deep learning into the spotlight. If you single time. The process of improving a neural net’s accuracy
follow the latest tech news, you may have even heard is called training, just like with other machine learning
about how important deep learning has become among big methods. Here’s that forward prop again – to train the net, the
companies such as: output from forward prop is compared to the output that is
ƒ Google buying DeepMind for US$ 400 million known to be correct, and the cost is the difference of the two.
ƒ Apple and its self-driving car The point of training is to make that cost as small as possible,
ƒ NVIDIA and its GPUs across millions of training examples. Once trained well, a
ƒ Toyota’s billion dollar AI research investments neural net has the potential to make accurate predictions each
All of this tells us that deep learning is really time. This is a neural net in a nutshell (refer to Figure 1).
gaining in importance.
Three reasons to consider deep learning
Neural networks When the patterns get really complex, neural nets start to
The first thing you need to know is that deep learning is about outperform all of their competition. Neural nets truly have the
neural networks. The structure of a neural network is like any potential to revolutionise the field of artificial intelligence.
other kind of network; there is an interconnected Web of nodes, We all know that computers are very good with repetitive
which are called neurons, and there are edges that join them calculations and detailed instructions, but they’ve historically
together. A neural network’s main function is to receive a set been bad at recognising patterns. Thanks to deep learning,
of inputs, perform progressively complex calculations, and this is all about to change. If you only need to analyse
then use the output to solve a problem. This series of events, simple patterns, a basic classification tool like an SVM or
starting from the input, where each activation is sent to the next logistic regression is typically good enough. But when your
layer and then the next, all the way to the output, is known as data has tens of different inputs or more, neural nets start to
forward propagation, or forward prop. win out over the other methods.

94 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Overview For U & Me

Simple Neural Network Deep Learning Neural Network net, a platform is the best way to go. We’ll also look at
two machine learning software platforms called H2O, and
GraphLab Create, both of which offer deep learning tools.
H2O: This started out as an open source machine learning
platform, with deep nets being a recent addition. Besides a set
of machine learning algorithms, the platform offers several
useful features, such as data pre-processing. H2O has built-
Input Layer Hidden Layer Output Layer
in integration tools for platforms like HDFS, Amazon S3,
Figure 1: Deep learning and neural networks SQL and NoSQL. It also provides a familiar programming
environment like R, Python, JSON, and several others to
Still, as the patterns get even more complex, neural access the tools, as well as to model or analyse data with
networks with a small number of layers can become unusable. Tableau, Microsoft Excel, and R Studio. H2O also provides
The reason is that the number of nodes necessary in each a set of downloadable software packages, which you’ll need
layer grows exponentially with the number of possible to deploy and manage on your own hardware infrastructure.
patterns in the data. Eventually, training becomes way too H2O offers a lot of interesting features, but the website can be
expensive and the accuracy starts to suffer. So for an intricate a bit confusing to navigate.
pattern – like an image of a human face, for example – basic GraphLab: The deep learning project requires graph
classification engines and shallow neural nets simply aren’t analytics and other vital algorithms, and hence Dato’s
good enough. The only practical choice is a deep net. GraphLab Create can be a good choice. GraphLab is a
But what enables a deep net to distinguish these complex software platform that offers two different types of deep
patterns? The key is that deep nets are able to break the nets depending on the nature of your input data – one is a
multifaceted patterns down into a series of simpler patterns. convolutional net and the other is a multi-layer perceptron.
For example, let’s say that a net has to decide whether or The convolutional net is the default one. It also provides
not an image contains a human face. A deep net would first graph analytics tools, which is unique among deep net
use edges to detect different parts of the face – the lips, platforms. Just like the H2O platform, GraphLab provides
nose, eyes, ears, and so on – and would then combine the a great set of data mugging features. It provides built-in
results together to form the whole face. This important integration for SQL databases, Hadoop, Spark, Amazon S3,
feature – using simpler patterns as building blocks to Pandas data frames, and many others. GraphLab also offers
detect complex patterns – is what gives deep nets their an intuitive UI for model management. A deep net platform
strength. These nets have now become very accurate and, can be selected based on your project.
in fact, a deep net from Google recently beat a human at a
pattern recognition challenge. Deep learning is gaining popularity
Deep learning is a topic that is making big waves at the
What is a deep net platform? moment. It is basically a branch of machine learning that
A platform is a set of tools that other people can build on top uses algorithms to, among other things, recognise objects and
of. For example, think of the applications that can be built off understand human speech. Scientists have used deep learning
the tools provided by iOS, Android, Windows, MacOS, IBM algorithms with multiple processing layers to make better
Websphere and even Oracle BEA. Deep learning platforms models from large quantities of unlabelled data (such as photos
come in two different forms – software platforms and full with no descriptions, voice recordings or videos on YouTube).
platforms. A deep learning platform provides a set of tools The three main reasons why deep learning is gaining
and an interface for building custom deep nets. Typically, it popularity are accuracy, efficiency and flexibility. Deep
provides a user with a selection of deep nets to choose from, learning automatically extracts features by which to classify
along with the ability to integrate data from different sources, data, as opposed to most traditional machine learning
manipulate data, and manage models through a UI. Some algorithms, which require intense time and effort on the part
platforms also help with performance if a net needs to be of data scientists. The features that it manages to extract are
trained with a large data set. more complex, because of the feature hierarchy possible in a
There are some advantages and disadvantages of using a deep net. They are also more flexible and less brittle, because
platform rather than using a software library. A platform is an the net is able to continue to learn on unsupervised data.
out-of-the-box application that lets you configure a deep net’s
hyper-parameters through an intuitive UI. With a platform,
By: Neetesh Mehrotra
you don’t need to know anything about coding in order to
use the tools. The downside is that you are constrained by the The author works at TCS as a systems engineer. His areas of
interest are Java development and automation testing. You can
platform’s selection of deep nets as well as the configuration
contact him at mehrotra.neetesh@gmail.com.
options. But for anyone looking to quickly deploy a deep

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 95


For U & Me Insight

What Can

BIG DATA
Do For You?

In today’s world, there is a proliferation of data. So much so that the one who
controls data today, holds the key to wealth creation. Let’s take a long look at what
Big Data means and what it can do for us.

B
ig Data has undoubtedly gained much attention within 2010. As a result of the tech advances, all these millions
academia and the IT industry. In the current digital of people are actually generating tremendous amounts of
and computing world, information is generated and data through the increased use of smart devices. Remote
collected at an alarming rate that is rapidly exceeding storage sensors, in particular, continuously produce an even greater
capabilities. About 4 billion people across the globe are volume of heterogeneous data that can be either structured or
connected to the Internet, and over 5 billion individuals own unstructured. All such data is referred to as Big Data.
mobile phones, out of which more than 3.39 billion users use We all know that this high volume of data is shared
the mobile Internet. Several social networking platforms like and transferred at great speed on different optical fibres.
WhatsApp, Facebook, Instagram, Twitter, etc, have a big hand However, the fast growth rate of such huge data volumes
in the indiscriminate increase in the production of data. Apart generates challenges in the following areas:
from the social media giants, there is a large amount of data ƒ In searching, sharing and transferring data
being generated by different devices such as sensors, actuators, ƒ Analysis and capturing of data
etc, which are used as part of the IoT and in robots as well. ƒ Data curation
By 2020, it is expected that more than 50 billion devices ƒ Storing, updating and querying data
will be connected to the Internet. At this juncture, predicted ƒ Information privacy
data production will be almost 44 times greater than that in Big Data is broadly identified by three aspects:

96 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


www.IndiaElectronicsWeek.com

DRIVING
TECHNOLOGY,
INNOVATION &
INVESTMENTS
Colocated
shows

Profit from IoT India’s Electronics Showcasing the Technology


Manufacturing Show that Powers Light
India’s #1 IoT show. At Electronics For You, we Is there a show in India that showcases the Our belief is that the LED bulb is the culmination
strongly believe that India has the potential to latest in electronics manufacturing such as of various advances in technology. And such a
become a superpower in the IoT space, in the rapid prototyping, rapid production and table product category and its associated industry
upcoming years. All that's needed are platforms top manufacturing? cannot grow without focusing on the latest
for different stakeholders of the ecosystem to technologies. But, while there are some good
come together. Yes, there is now - EFY Expo 2018. With this B2B shows for LED lighting in India, none has a
show’s focus on the areas mentioned and it focus on ‘the technology that powers lights’.
We’ve been building one such platform: being co-located at India Electronics Week, it Thus, the need for LEDAsia.in.
IoTshow.in--an event for the creators, the has emerged as India's leading expo on the
enablers and customers of IoT. In February latest manufacturing technologies and Who should attend?
2018, the third edition of IoTshow.in will bring electronic components. • Tech decision makers: CEOs, CTOs, R&D
together a B2B expo, technical and business and design engineers and those developing
conferences, the Start-up Zone, demo sessions Who should attend? the latest LED-based products
of innovative products, and more. • Manufacturers: CEOs, MDs, and those • Purchase decision makers: CEOs,
involved in firms that manufacture purchase managers and production
Who should attend? electronics and technology products managers from manufacturing firms that use
• Creators of IoT solutions: OEMs, design • Purchase decision makers: CEOs, LEDs
houses, CEOs, CTOs, design engineers, purchase managers, production managers • Channel partners: Importers, distributors,
software developers, IT managers, etc and those involved in electronics resellers of LEDs and LED lighting products
• Enablers of IoT solutions: Systems manufacturing • Investors: Startups, entrepreneurs,
integrators, solutions providers, distributors, • Technology decision makers: Design investment consultants interested in this
resellers, etc engineers, R&D heads and those involved sector
• Business customers: Enterprises, SMEs, in electronics manufacturing • Enablers: System integrators, lighting
the government, defence establishments, • Channel partners: Importers, distributors, consultants and those interested in smarter
academia, etc resellers of electronic components, tools lighting solutions (thanks to the co-located
and equipment IoTshow.in)
Why you should attend • Investors: Startups, entrepreneurs,
• Get updates on the latest technology trends investment consultants and others Why you should attend
that define the IoT landscape interested in electronics manufacturing • Get updates on the latest technology trends
• Get a glimpse of products and solutions that defining the LED and LED lighting sector
enable the development of better IoT Why you should attend • Get a glimpse of the latest components,
solutions • Get updates on the latest technology equipment and tools that help manufacture
• Connect with leading IoT brands seeking trends in rapid prototyping and production, better lighting products
channel partners and systems integrators and in table top manufacturing • Get connected with new suppliers from
• Connect with leading suppliers/service • Get connected with new suppliers from across India to improve your supply chain
providers in the electronics, IT and telecom across India to improve your supply chain • Connect with OEMs, principals, lighting
domain who can help you develop better IoT • Connect with OEMs, principals and brands brands seeking channel partners and
solutions, faster seeking channel partners and distributors systems integrators
• Network with the who’s who of the IoT world • Connect with foreign suppliers and • Connect with foreign suppliers and principals
and build connections with industry peers principals to represent them in India to represent them in India
• Find out about IoT solutions that can help • Explore new business ideas and • Explore new business ideas and investment
you reduce costs or increase revenues investment opportunities in this sector opportunities in the LED and lighting sector
• Get updates on the latest business trends • Get an insider’s view of ‘IoT + Lighting’
shaping the demand and supply of IoT solutions that make lighting smarter
solutions

www.IndiaElectronicsWeek.com
Colocated
shows

The themes
• Profit from IoT • Rapid prototyping and production
• Table top manufacturing • LEDs and LED lighting

The co-located shows

Why exhibit at IEW 2018?


More technology India’s only test Bag year-end orders;
decision makers and and measurement meet prospects in early
influencers attend IEW show is also a February and get orders
than any other event part of IEW before the FY ends

It’s a technology- 360-degree promotions The world’s No.1 IoT


centric show and not via the event, publications show is a part of IEW and
just a B2B event and online! IoT is driving growth

Over 3,000 visitors The only show in It’s an Electronics


are conference Bengaluru in the FY For You Group
delegates 2017-18 property

Besides purchase Your brand and solutions IEW is being held at a


orders, you can bag will reach an audience of venue (KTPO) that’s
‘Design Ins’ and over 500,000 relevant and closer to where all the
‘Design-Wins’ too interested people tech firms are

Co-located events IEW connects you with Special packages for


offer cross-pollination customers before the ‘Make in India’, ‘Design in
of business and event, at the event, and India’, ‘Start-up India’ and
networking even after the event ‘LED Lighting’ exhibitors
opportunities

Why you should risk being an early bird


1. The best locations sell out first
2. The earlier you book—the better the rates; and the more the deliverables
3. We might just run out of space this year!

To get more details on how exhibiting at IEW 2018 can help you achieve your sales and marketing goals,

Contact us at +91-9811155335 Or Write to us at growmybiz@efy.in


EFY Enterprises Pvt Ltd | D-87/1, Okhla Industrial Area, Phase -1, New Delhi– 110020
Colocated
shows

Reasons Why You Should NOT Attend IEW 2018


We spoke to a few members of the Where most talks will not be by people
tech community to understand why trying to sell their products? How
they had not attended earlier editions of boring! I can't imagine why anyone
India Electronics Week (IEW). Our aim would want to attend such an event. I
India’s Mega Tech Conference was to identify the most common love sales talks, and I am sure
reasons and share them with you, so everybody else does too. So IEW is a
The EFY Conference (EFYCON) started out as a tiny that if you too had similar reasons, you big 'no-no' for me.
900-footfall community conference in 2012, going by the may choose not to attend IEW 2018.
name of Electronics Rocks. Within four years, it grew This is what they shared… #7. I don't think I need hands-on
into ‘India’s largest, most exciting engineering knowledge
conference,’ and was ranked ‘the most important IoT #1. Technologies like IoT, AI and I don't see any value in the tech
global event in 2016’ by Postscapes. embedded systems have no future workshops being organised at IEW.
Frankly, I have NO interest in new Why would anyone want hands-on
In 2017, 11 independent conferences covering IoT, technologies like Internet of Things knowledge? Isn't browsing the Net and
artificial intelligence, cyber security, data analytics, cloud
(IoT), artificial intelligence, etc. I don't watching YouTube videos a better
technologies, LED lighting, SMT manufacturing, PCB
think these will ever take off, or become alternative?
manufacturing, etc, were held together over three days,
as part of EFY Conferences. critical enough to affect my organisation
or my career. #8. I love my office!
Key themes of the conferences and Why do people leave the comfort of
workshops in 2018 #2. I see no point in attending their offices and weave through that
• Profit from IoT: How suppliers can make money and tech events terrible traffic to attend a technical
customers save it by using IoT What's the point in investing energy event? They must be crazy. What’s the
• IT and telecom tech trends that enable IoT
development and resources to attend such events? I big deal in listening to experts or
• Electronics tech trends that enable IoT development would rather wait and watch—let others networking with peers? I'd rather enjoy
• Artificial intelligence and IoT take the lead. Why take the initiative to the coffee and the cool comfort of my
• Cyber security and IoT understand new technologies, their office, and learn everything by browsing
• The latest trends in test and measurement
equipment
impact and business models? the Net!
• What's new in desktop manufacturing
• The latest in rapid prototyping and production #3. My boss does not like me #9. I prefer foreign events
equipment My boss is not fond of me and doesn't While IEW's IoTshow.in was voted the
really want me to grow professionally. ‘World's No.1 IoT event’ on
Who should attend
• Investors and entrepreneurs in tech
And when she came to know that IEW Postscapes.com, I don't see much
• Technical decision makers and influencers 2018 is an event that can help me value in attending such an event in
• R&D professionals advance my career, she cancelled my India—and that, too, one that’s being
• Design engineers application to attend it. Thankfully, she put together by an Indian organiser.
• IoT solutions developers is attending the event! Look forward to Naah! I would rather attend such an
• Systems integrators
• IT managers a holiday at work. event in Europe.

#4. I hate innovators! Hope we've managed to convince


SPECIAL PACKAGES FOR
Oh my! Indian startups are planning to you NOT to attend IEW 2018!
• Academicians • Defence personnel
give LIVE demonstrations at IEW Frankly, we too have NO clue why
• Bulk/Group bookings
2018? I find that hard to believe. Worse, 10,000-plus techies attended IEW in
if my boss sees these, he will expect March 2017. Perhaps there's
me to create innovative stuff too. I better something about the event that we've
find a way to keep him from attending. not figured out yet. But, if we haven't
been able to dissuade you from
#5. I am way too BUSY attending IEW 2018, then you may
I am just too busy with my ongoing register at http://register.efy.in.
projects. They just don't seem to be
getting over. Once I catch up, I'll invest
some time in enhancing my knowledge
Conference Special privileges
and skills, and figure out how to meet Pass Pricing and packages for...
my deadlines. One day pass Defence and defence
INR 1999 electronics personnel
#6. I only like attending vendor PRO pass
Academicians
events INR 7999 Group and bulk
bookings
Can you imagine an event where most
of the speakers are not vendors?

www.IndiaElectronicsWeek.com
Insight For U & Me

BIG DATA SOURCES

Facebook
Business systems Blogs
Social Twitter
Transactions Media

Unstructured Sensor
data data

Figure 2: Major sources of Big Data (Image source: googleimages.com)

billion at the start of 2015 and around 1.65 billion at the


start of 2016. On an average, there are approximately 1.32
billion daily active users as of June 2017. Every day, 4.3
billion Facebook messages get posted. There are around
5.75 billion Facebook likes every day.
Figure 1: Challenges of Big Data (Image source: googleimages.com) 6. Mobile text messages: There are almost 22 billion text
messages sent every day (for personal and commercial
1. The data is of very high volume. purposes).
2. It is generated, stored and processed very quickly. 7. Google: On an average, in 2017, more than 5.2 billion
3. The data cannot be categorised into regular daily Google searches get initiated.
relational databases. 8. IoT devices: Devices are a huge source of the 2.5
Big Data has a lot of potential in business applications. quintillion bytes of data that we create every day –
It plays a role in the manufacture of healthcare machines, this not only includes mobile devices, but smart TVs,
social media, banking transactions and satellite imaging. airplanes, cars, etc. Hence, the Internet of Things is
Traditionally, the data is stored in a structured format in order producing an increasing amount of data.
to be easily retrieved and analysed. However, present data
volumes comprise both unstructured as well as semi-structured Characteristics of Big Data
data. Hence, end-to-end processing can be impeded during the There are several characteristics of Big Data as listed below.
translation between the structured data in a relational database Volume: This refers to the quantity of generated and stored
management system and the unstructured data for analytics. data sets. The size of the data helps in determining the value and
Among the problems linked to the staggering volumes of data potential insights into it; hence, it helps us to know if a specific
being generated is the transfer speed of data, the diversity of set of data can actually be considered as Big Data or not.
data, and security issues. There have been several advances Variety: This property deals with the different types and
in data storage and mining technologies, which enable the nature of the data. This actually helps people who analyse the
preservation of such increased amounts of data. Also, during large data sets to effectively use the resulting insights obtained
this preservation process, the nature of the original data after analysis. If a specific set of data contains different
generated by organisations is modified. varieties of data, then we can consider it as Big Data.
Velocity: The speed of data generation also plays a big
Some big sources of Big Data role when we classify something as Big Data. The speed
Let’s have a quick look at some of the main sources of data data is generated and further processed at to arrive at results
along with some statistics (Data source: http://microfocus.com). that can be analysed for further use is one of the major
1. Social media: There are around 1,209,600 (1.2 million) properties of Big Data.
new data producing social media users every day. Variability: When we talk about Big Data, there is always
2. Twitter: There are approximately 656 million tweets per day! some inconsistency associated with it. We consider the data
3. YouTube: There are more than 4 million hours of content set as inconsistent if it does not have a specific pattern or
uploaded to YouTube every day, with all its users watching structure. This can hamper the different processes required to
around 5.97 billion hours of YouTube videos each day. handle and manage the data.
4. Instagram: There are approximately 67,305,600 (67.30 Veracity: The quality of the captured data can also vary
million) Instagram posts uploaded each day. a lot, which affects the accurate analysis of the large data
5. Facebook: There have been more than 2 billion monthly sets. If the captured data’s quality is not good enough to be
active Facebook users in 2017 so far, compared to 1.44 analysed then it needs to be processed before analysis.

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 101


For U & Me Insight

the test. All the users in the group exposed to Variation


B are referred to as the treatment group. This technique
is used to optimise a conversion rate by measuring the
performance of the treatment against that of the control
using some mathematical calculations.
This testing methodology removes the possible uesswork from
the website optimisation process, and hence enables various
data-informed decisions which shift the business conversations
from what ‘we think’ to what ‘we know’. We can make sure
that each change produces positive results just by measuring
the impact that various changes have on our metrics.
Figure 3: Different types of Big Data (Image source: googleimages.com) 3. Natural language processing: This area of computational
linguistics is linked to the interactions between different
How is Big Data analysed? computers and human languages. In particular, it is
We all know that we cannot analyse Big Data manually, as it’s concerned with programming several computers to process
a highly challenging and tedious task. In order to make this large natural language corpora. The different challenges
task easier, there are several techniques that help us to analyse in natural language processing are natural language
the large sets of data very easily. Let us look at some of the generation, natural language understanding, connecting the
famous techniques being used for data analysis. machine and language perception or some combinations
1. Association rule learning: This is a rule-based Big thereof. Natural language processing research has mostly
Data analysis technique which is used to discover the relied on machine learning. Initially, there were many
interesting relations between different variables present language-processing tasks which involved direct hand
in large databases. It is intended to identify the strong coding of rules. Nowadays, different machine learning
rules that are discovered in the databases using different pattern calls are being used instead of the statistical
measures of what is considered ‘interesting’. It makes use inference to automatically learn various rules by analysing
of a set of techniques for discovering several interesting large sets of data from real-life examples. Many different
relationships, also called ‘association rules’, among all the classes of machine learning algorithms have been used for
different variables present in the large databases. NLP tasks. These algorithms utilise large sets of ‘features’
All such techniques use a variety of algorithms in order to as inputs. These features are developed from the input data
generate and then test different possible rules. One of its set. Recent research has focused more on statistical models,
most common applications is the market basket analysis. which take probabilistic decisions based on attaching the
This helps a retailer to determine the several products real-valued weights to each input feature. Such models
frequently bought together and use that information for really have the edge because they can easily express the
more focused marketing (like the discovery that most of the relative certainty for more than one different possible
supermarket shoppers who buy diapers also go to buy beer, answer rather than only one, therefore producing more
etc). Association rules are widely being used nowadays in reliable results, compared to when such a model is included
continuous production, Web usage mining, bioinformatics as only one of the many components of a larger system.
and intrusion detection. These rules do not take into
consideration the order of different items either within the How can Big Data benefit your business?
same transaction or across different transactions. Big Data may seem to be out of reach for different non-profit
2. A/B testing: This is a technique that compares the two and government agencies that do not have the funds to buy
different versions of an application to determine which into this new trend. We all have an impression that ‘big’
one performs better. It is also called split testing or usually means expensive, but Big Data is not really about
bucket testing. It actually refers to a specific type of using more resources; rather, it’s about the effective usage
the randomised experiment under which a set of users of the resources at hand. Hence, organisations with limited
are presented with two variations of the same product financial resources can also stay competitive and grow. For
(advertisements, emails, Web pages, etc) – let’s call
them Variation A and Variation B. All the users exposed
to Variation A are often referred to as the control group,
since its performance is considered as the baseline against
which any improvement in performance observed from
presenting the Variation B is measured. Also, at times,
Variation A itself acts as the original version of the Figure 4: Different processes involved in a Big Data system
product which is being tested against what existed before (Image source: googleimages.com)

102 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Insight For U & Me

that, we need to understand where we can find this data and


what we can do with it. 8
How can I 1
Let us see how Big Data can really help different optimize my How do I
define my target
marketing
organisations in their business budget?
market?
1. Targeted marketing: There are several small businesses
7 2
which cannot compete with the huge advertising budgets How do I Which offer
improve generates the
that large organisations have at their disposal. In order customer greatest
to remain in the game, they have to spend less, yet reach retention? response?
Using Data
qualified buyers. This is where the need for analysis and to Improve
measurement of data comes in, in order to target the Performance
6 3
person most likely to turn into a customer. There is a huge What is Which
the Lifetime channel is
amount of data that is freely accessible through different Value of my most
customer?
tools like Google Insights, etc. Organisations can find effective?

exactly what different people are looking for, when they 5 4


How can I Which
are really looking for it and also find out their locations. measure demographic
For instance, the CDC (Centre for Disease Control, USA) marketing responds to
results? my offer?
uses the Big Data provided by Google to analyse a large
number of searches relating to the flu. With the obtained
data, researchers are able to focus their efforts where there Figure 5: Different ways in which Big Data can help any business
is a greater need for flu vaccines. The same technique can (Image source: googleimages.com)
be applied for other products as well.
2. Actionable insights: Big Data can really become like problem control, etc, to reform different educational courses.
drinking from a fire hose if we do not know how to turn 4. Big Data is used in fraud detection in the banking sector.
different facts and figures into useable information. But 5. It is used by different search engines to provide the best
as soon as an organisation learns how to master different search results.
analytical tools, which turn its metrics into readable reports, 6. Different price comparison websites make use of Big Data
graphs and charts, it can make decisions that are more to come up with the best options for their users.
proactive and targeted. And that’s when it will gain a clear 7. Big Data is also used for analysing and processing
understanding of the ‘big problems’ affecting the business. the data obtained from different sensors and actuators
3. Social eavesdropping: A large chunk of the information in connected to IoT.
Big Data is obtained from social chatter on several social 8. Different speech recognition products such as Google
networking sites like Twitter and Facebook. By keeping an Voice and Siri also make use of Big Data to recognise the
eagle eye on what is being said in different social channels, speech patterns of the user.
organisations can really understand how the public 9. Big Data and data science have taken the gaming
perceives them and what to do if they need to improve their experience to new heights. Different games are now
reputation. For example, the Twitter mood predicts the designed using various Big Data and machine learning
stock market. Johan Bollen once tracked how the collective algorithms, which have the self-improving capability
mood from large sets of Twitter feeds correlated with the when a player jumps to a higher level.
Dow Jones Industrial Average. The algorithm which was 10. Big Data is of great help for the recommender and
used by Bollen and his group actually predicted market suggestion tool which prompts us about similar products
changes with 87.6 per cent accuracy. to purchase on different online shopping platforms like
Amazon, Flipkart, etc.
Applications of Big Data
There is a huge demand for Big Data nowadays, and there are References
numerous areas where it is already being implemented. Let’s [1] ‘Data Science for Business’ by Tom Facet
have a look at some of them. [2] http://www.wikipedia.org/
1. Big Data is used in different government sectors for different [3] http://bigdata-madesimple.com/
[4] https://bigdata.wayne.edu/
tasks like power theft investigation, deceit recognition and
ecological fortification. Big Data is also used to examine
different food based infections by the FDA. By: Vivek Ratan
2. It is widely used in the healthcare industry by physicians The author has completed his B. Tech in electronics and
and doctors to keep track of their patients’ history. instrumentation engineering. He is currently working as
an automation test engineer at Infosys, Pune. He can be
3. Big Data is also used in the education sector by
reached at ratanvivek14@gmail.com.
implementing different techniques such as adaptive learning,

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 103


TIPS
& TRICKS

Tips you can use daily on a Linux The command takes you to the last working directory.
computer This helps in directly moving to the last working directory
1. Shortcut for opening a terminal in Ubuntu from the current one, instead of remembering and typing the
To open a terminal in Ubuntu, press the Ctrl+Alt+t keys. whole last working directory path.
This creates a new terminal.
—Abhinay Badam, ohyesabhi2393@gmail.com
2. Running the previous command with ‘sudo’ in the terminal
In case you have forgotten to run a command with ‘sudo’, Execute parallel ssh on multiple hosts
you need not re-write the whole command. Just type Here are the steps to do a parallel ssh on multiple
‘sudo!!’ and the last command will run with sudo. hosts. We are going to use pssh, which is a program for
executing ssh in parallel on a number of hosts. It provides
3. How to change a file permission features such as sending inputs to all the processes, passing
An admin can change the file permissions by executing a password to ssh, saving the output to files, and timing out.
chmod u+<permission> filename on the terminal where… You can access the complete manual of pssh at https://linux.
die.net/man/1/pssh.
<permission> can be r(read), w(write), x(execute) First off, let us look at how to install it on a CentOS 7
system:
The admin can change permissions on the file that are
given by other users, by executing the above command, # yum install epel-release
and replacing ‘u’ with ‘g’ for group access and ‘u’ with ‘o’
for others. Now install pssh, as follows:

—Anirudh Kalwa, anirudh.3194@gmail.com # yum install pssh

Moving between the current and last Create pssh_hosts.txt file and enter the hosts you need
working directories easily to target:
Everyone knows that typing ‘cd’ in the terminal in Ubuntu
takes the user to the home directory. However, if you want # cat pssh_hosts.txt
to go to the last working directory, instead of entering the # write hosts per line like follows
following: #user@target_ip
root@192.168.100.100
$cd <directory path>
We should create a key-pair between the master host and
…directly type the command shown below in the targets -- this is the only way to get things done. Simply log
terminal: in the target from the master node for host key verification:

$cd - # ssh root@192.168.100.100

104 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


Next, test with single commands: Now try logging into the machine with ssh root@
serverB-IP, and check to make sure that only the key(s)
# pssh -h /path/to/pssh_hosts.txt -A -O PreferredAuthentication you wanted were added.
s=password -i “hostname” Now, try to use pssh without the password on the
# pssh -h /path/to/pssh_hosts.txt -A -O PreferredAuthentication command line.
s=password -i “uptime” —Ranjithkumar T., ranjith.stc@gmail.com

The output is: Find out what an unknown command does,


by using whatis
[root@master pssh]# pssh -h ./pssh_hosts.txt -A -O PreferredAut If you are new to the Linux terminal, then you will probably
hentications=password -i “uptime” wonder what each command does. You are most likely to do a
Warning: Do not enter your password if anyone else has superuser Google search for each command you come across.
privileges or access to your account. To avoid doing that, use the whatis command, followed
Password: by the command you don’t know. You will get a short
[1] 16:27:59 [SUCCESS] root@192.168.100.100 description of the command.
21:27:57 up 1 day, 1:30, 1 user, load average: 0.00, 0.01, Here is an example:
0.05
$ whatis ls
To execute scripts on the target machines, type: ls (1) -list directory contents

# cat pssh/tst.sh Now you’ll know what the command does, and won’t
#!/bin/bash have to open your browser and search.
touch /root/CX && echo “File created”
—Siddharth Dushantha, siddharth.dushantha@gmail.com
# pssh -h ./pssh_hosts.txt -A -O PreferredAuthentications=passw
ord -I<./tst.sh Know how many times a user has logged in
One way to find out the number of times users have
Now let us make it simple: logged into a multi-user Linux system is to execute the
following command:
# pssh -H ‘192.168.100.101’ -l ‘root’ -A -O PreferredAuthentica
tions=password -I< ./tst.sh $last | grep pts | awk ‘{print $1}’ | sort | uniq -c

The output is: The above command provides the list of users who
recently logged into the system. The grep utility is used to
[root@master pssh]# pssh -H ‘192.168.100.101’ -l ‘root’ -A -O remove the unnecessary information, the result of which is
PreferredAuthentications=password -I< ./tst.sh then sent to awk using the shell pipe. awk, which is used for
Warning: Do not enter your password if anyone else has superuser processing text based data, extracts only the user names from
privileges or access to your account. the text. This list of extracted names is now sorted by passing
Password: the list of names to the sort command, through a shell pipe.
[1] 16:24:30 [SUCCESS] 192.168.100.101 The sorted list of names is then piped to the uniq command,
which filters adjacent matching lines, and the matching lines are
To execute commands without password prompting, we merged to the first occurrence. The -c option of the uniq command,
need to create a key-pair between the servers. Let us look at which displays the number of times a line is repeated, gives you
how to do that. the number of logins of each user along with the user’s name.
We are trying to attempt to log in to serverB from —Sathyanarayanan S., ssathyanarayanan@sssihl.edu.in
serverA.
Create SSH-Kegen keys on serverA, as follows:
Share Your Open Source Recipes!
# ssh-keygen -t rsa The joy of using open source software is in finding ways to get
around problems—take them head on, defeat them! We invite
you to share your tips and tricks with us for publication in
Copy the id_rsa.pub file from serverA to master OSFY so that they can reach a wider audience. Your tips could
serverB: be related to administration, programming, troubleshooting or
general tweaking. Submit them at www.opensourceforu.com.
The sender of each published tip will get a T-shirt.
# ssh-copy-id root@<serverB-ip>

www.OpenSourceForU.com | OPEN SOURCE FOR YOU | DECEMBER 2017 | 105


OSFY DVD

DVD OF THE MONTH


Linux for your desktop

Solus 3 Gnome
Solus is an operating system that is designed for
home computing. It ships with a variety of software
out-of-the-box, so you can set it up without too
much fuss. It comes with the latest version of the
free LibreOffice suite, which allows you to work on
your documents, spreadsheets and presentations right
away. It has many useful tools for content creators.
Whether animating in Synfig Studio, producing
music with MuseScore or Mixxx, trying out graphics
design with GIMP or Inkscape, or editing video with
Avidemux, Kdenlive and Shotcut, Solus provides
software to help express your creativity.

A collection of open source software


This month, we also have a collection of different
open source software that can be installed on a
computer with the MS Windows operating system.
The collection includes a browser, email clients,
integrated development environments (IDEs) and
different productivity tools.

What is a live DVD?


A live CD/DVD or live disk contains a bootable
operating system, the core program of any computer,
which is designed to run all your programs and manage
all your hardware and software.
Live CDs/DVDs have the ability to run a complete,
modern OS on a computer even without secondary
storage, such as a hard disk drive. The CD/DVD
directly runs the OS and other applications from the
DVD drive itself. Thus, a live disk allows you to try the
OS before you install it, without erasing or installing
anything on your current system. Such disks are used to
demonstrate features or try out a release. They are also
used for testing hardware functionality, before actual
installation. To run a live DVD, you need to boot your
computer using the disk in the ROM drive. To figure out
how to set a boot device in BIOS, refer to the hardware
documentation for your computer/laptop.

106 | DECEMBER 2017 | OPEN SOURCE FOR YOU | www.OpenSourceForU.com


November 2017

Loonycorn
is hiring
Interested?

Mail Resume + Cover Letter to


contact@loonycorn.com
You:

 Really into tech - cloud, ML, anything and everything


 Interested in video as a medium
 Willing to work from Bangalore
 in the 0-3 years of experience range

Us:

 ex-Google | Stanford | INSEAD


 100,000+ students
 Video content on Pluralsight, Stack, Udemy...

Das könnte Ihnen auch gefallen