Beruflich Dokumente
Kultur Dokumente
86
28 38
R E G U L A R F E AT U R E S
07 FossBytes 16 New Products 104 Tips & Tricks
43
Ph: (011) 26810602, 26810603; Fax: 26817563
E-mail: info@efy.in
62
MISSING ISSUES
E-mail: support@efy.in
BACK ISSUES
Kits ‘n’ Spares
New Delhi 110020
Ph: (011) 26371661, 26371662
E-mail: info@kitsnspares.com
NEWSSTAND DISTRIBUTION
Ph: 011-40596600
E-mail: efycirc@efy.in
ADVERTISEMENTS
MUMBAI
Ph: (022) 24950047, 24928520
E-mail: efymum@efy.in
BENGALURU
Ph: (080) 25260394, 25260023
JAPAN
20 66
Tandem Inc., Ph: 81-3-3541-4166
E-mail: japan@efy.in
SINGAPORE
Publicitas Singapore Pte Ltd
Ph: +65-6836 2272
E-mail: singapore@efy.in
TAIWAN
J.K. Media, Ph: 886-2-87726780 ext. 10
E-mail: taiwan@efy.in
UNITED STATES
E & Tech Media
Ph: +1 860 536 6677
E-mail: usa@efy.in
Arjun Vishwanathan,
licence. Although every effort is made to ensure accuracy, no responsi- efy.in
for a free re
placement.
ort@
bility whatsoever is taken for any loss due to publishing errors. Articles us
at s
upp
ys
ot
tem
sn
Re
oe
is taken for any loss or delay in returning the material. Disputes, if any,
Dd
qu
ire
DV
me
this
Solus 3 GNOME
n ts
: P4
In c
, 1G
B RA
M, D
Collection of
VD-RO
tended, and sh
s unin oul
c, i db
dis e
open source software
M Drive
att
the
rib
on
ute
for Windows
material, if found
d to t
he complex n
December 2017
SUBSCRIPTION RATES
atu
n
re
tio
ec
of
bj Int
o ern
Any t dat e
Note: a.
Te
106
software out-of-the-box, so you can set it up
am
e-m
without too much fuss
Kindly add ` 50/- for outside Delhi cheques.
ail:
cd
tea
m@
The highlight of the conference was the keynote by John Willis (of ‘The
Linux support comes to DevOps Handbook’ fame) who travelled all the way from the US for the event. He
Arduino Create talked about ‘DevOps in a Serverless World’ covering the best practices and how
The Arduino team has announced they manifest in a serverless environment. He also conducted a post-conference
a new update to the Arduino workshop on DevOps principles and practices.
Create Web platform. The initial Serverless technology is an interesting shift in the architecture of digital
release has been sponsored by solutions, where there is a convergence of serverless architecture, containers,
Intel and supports X86/X86_64 microservices, events and APIs in the delivery of modular, flexible and dynamic
boards. This enables fast and easy solutions. This is what Gartner calls the ‘Mesh App and Services Architecture’ (or
MASA, for short). With that theme, there were sessions on serverless frameworks
and platforms like the open source Fn platform and Kubernetes frameworks
(especially Fission), Adobe’s I/O runtime, and Microsoft’s Azure platform.
Serverless technology applications covered at the event included sessions like
‘Serverless and IoT (Internet of Things) devices’, ‘Serverless and Blockchain’,
etc. The hands-on sessions included building chatbots and artificial intelligence
(AI) applications with serverless architectures. The conference ended with
an interesting panel discussion between Anand Gothe (Prowareness), Noora
development and deployment (Euromonitor), John Willis (SJ Technologies), Sandeep Alur (Microsoft) and
of Internet of Things (IoT) Vidyasagar Machupalli (IBM).
applications with integrated Open Source For You (OSFY) was the media partner and the Cloud Native
cloud services on Linux-based Computing Foundation (CNCF) was the community partner for the conference.
devices. With Arduino Create
supporting Linux on Intel chips, Microsoft announces new AI, IoT and machine learning
users are now able to program tools for developers
their Linux devices as if these At Connect(); 2017, Microsoft’s annual event for professional developers, executive
were regular Arduinos. VP Scott Guthrie announced Microsoft’s new data platform technologies and cross-
The new Arduino Create platform developer tools. These tools will help increase developer productivity
features a Web editor, as well and simplify app development for intelligent cloud and edge technologies, across
as cloud-based sharing and
collaboration tools. The software
provides a browser plugin, letting
developers upload sketches to
any connected Arduino board
from the browser.
Arduino Create now allows
users to manage individual IoT
devices, and configure them
remotely and independently from
where they are located. To further
simplify the user journey, the devices, platforms or data sources. Guthrie outlined the company’s vision and
Arduino team has also developed shared what is next for developers across a broad range of Microsoft and open
a novel out-of-the-box experience source technologies. He also touched on key application scenarios and ways
that will let anyone set up a new developers can use built-in artificial intelligence (AI) to support continuous
device from scratch via the cloud, innovation and continuous deployment of today’s intelligent applications.
without any previous knowledge, “With today’s intelligent cloud, emerging technologies like AI have the potential
by following an intuitive to change every facet of how we interact with the world,” Guthrie said. “Developers
Web-based wizard. are in the forefront of shaping that potential. Today at Connect(); we’re announcing
In the coming months, the team new tools and services that will help developers build applications and services for
has plans to expand support for the AI-driven future, using the platforms, languages and collaboration tools they
Linux based IoT devices running already know and love,” he added.
on other hardware architectures too. Microsoft is continuing its commitment to delivering open technologies and
contributing to and partnering with the open source community.
M
y name seems simple enough but it gets routinely to be online and paperless.
misspelled. All too often it does not matter and you, An easy option is to seed the public key with the Aadhaar
like me, may choose to ignore the incorrect spelling. database. It does not need to be issued by a certification
We tend to be more particular that the address is correct with authority. Any time a person digitally signs a digital document
an online shopping store, even if the name is misspelled. On the with his private key, it can be verified using the public key.
other hand, you will want to make sure that the name is correct on There is no need to worry about securing access to the public
a travel document, even if there is a discrepancy in the address. key as, by its very nature, it is public.
Reconciling data can be a very difficult process. Hence, This can save on court time as well. In case of any disputes,
it comes as a surprise that once PAN is linked to the bank no witnesses need be called; and even after many years, there
accounts and Aadhaar is linked to the PAN, why create will be no need to worry about the fallibility of the human mind.
the potential for discrepancies by forcing banks to link the
accounts with Aadhaar, especially as companies do not have Elimination of life certificates
Aadhaar and one person can create many companies? Even after linking the Aadhaar number to the pension account,
This set me thinking about some areas where the UID would one still needs to go through the annual ritual of physically
have saved me a lot of effort and possibly, at no risk to my identity proving that one is alive!
in the virtual world. Obviously, the use cases are for digitally I am reminded of an ERP installation where the plant manager
comfortable citizens and should never be mandatory. insisted on keeping track of the production status at various stages,
against our advice. It took a mere week for him to realise his
When registering a will error. While his group was spending more time creating data, he
While an unregistered will may be valid, the formalities himself was drowning in it. He had less control over production
become much simpler if the will is registered. A local than he had before the ERP system was installed. Since we had
government office told me that registering a will is simple — anticipated this issue, it did not take us long to change the process
just bring two witnesses, one of whom should be a gazetted to capture the minimum data, as we had recommended.
officer (I wonder if there is an app to find one). Pension-issuing authorities should assume that the
It would be much simpler if I register the will using an pensioner is alive till a death certificate is issued. The amount
Aadhaar verification, using biometrics. No witnesses needed. of data needed in the latter case is considerably less!
Now, no one needs to know what I would like to happen after I
am no longer around. Lessons from programming
Most programmers learn from experience that exception handling
When registering a nominee is the crucial part of a well written program. More often than not,
If the Aadhaar ID of a nominee is mentioned, the nominee greater effort is required to design and handle the exceptions.
does not need to provide any documentation or witnesses other Efficient programming requires that the common
than the death certificate, for the nomination formalities to be transactions take the minimal resources. The design and
completed. Even the verification of the person can be avoided implementation must minimise the interactions needed with a
if any money involved is transferred to the account linked to user and not burden the user by providing unnecessary data.
that Aadhaar ID. One hopes that any new usage of UID will keep these
lessons in mind.
Notarisation of documents
By: Dr Anil Seth
The primary purpose of notarisation is to ensure that the
document is an authentic copy and the person signing could be The author has earned the right to do what interests him. You
prosecuted if the information therein is incorrect. This requires can find him online at http://sethanil.com, http://sethanil.
blogspot.com, and reach him via email at anil@sethanil.com.
you to submit physical paper documents, whereas the desire is
W
hile we have been discussing many questions get generated, and the need for specific
questions in machine learning (ML) and domain expertise to tag a question as duplicate.
natural language processing (NLP), I had a Hence, there is a strong requirement for automated
number of requests from our readers to take up a real techniques that can help in identifying questions that
life ML/NLP problem with a sufficiently large data are duplicates of an incoming question.
set, discuss the issues related to this specific problem Note that identifying duplicate questions is
and then go into designing a solution. I think it is different from identifying ‘similar/related’ questions.
a very good suggestion. Hence, over the next few Identifying similar questions is somewhat easier as
columns, we will be focusing on one specific real life it only requires that there should be considerable
NLP problem, which is detecting duplicate questions similarity between a question pair. On the other
in community question-answering (CQA) forums. hand, in the case of duplicate questions, the answer
There are a number of popular CQA forums to one question can serve as the answer to the
such as Yahoo Answers, Quora and StackExchange second question. This identification requires stricter
where netizens post their questions and get answers and more rigorous analysis.
from domain experts. CQA forums serve as a At first glance, it appears that we can use
common means of distilling crowd intelligence and various text similarity measures in NLP to identify
sharing it with millions of people. From a developer duplicate questions. Given that people express
perspective, sites such as StackOverflow fill an their information needs in widely different forms,
important need by providing guidance and help it is a big challenge to identify the exact duplicate
across the world, 24x7. Given the enormous number questions automatically. For example, let us consider
of people who use such forums, and their varied skill the following two questions:
levels, many questions get asked again and again. Q1: I am interested in trying out local cuisine.
Since many users have similar informational Can you please recommend some local cuisine
needs, answers to new questions can typically be restaurants that are wallet-friendly in Paris?
found either in whole or part from the existing Q2: I like to try local cuisine whenever I travel.
question-answer archive of these forums. Hence, I would like some recommendations for restaurants
given a new incoming question, these forums which are not too costly, but serve authentic local
typically display a list of similar or related questions, cuisine in Athens?
which could immediately satisfy the information Now consider applying different forms of text
needs of users, without them having to wait for similarity measures. The above two questions
their new question to be answered by other users. score very high on various similarity measures—
Many of these forums use simple keyword/tag based lexical, syntactic and semantic similarity. While
techniques for detecting duplicate questions. it is quite easy for humans to focus on the one
However, often, these automated lists returned dissimilarity, which is that the locations discussed
by the forums are not accurate, frustrating users in the two questions are different, it is not easy
looking for answers. Given the challenges in to teach machines that ‘some dissimilarities are
identifying duplicate questions, some forums put in more important than other dissimilarities.’ It also
manual effort to tag duplicate questions. However, raises the question of whether the two words ‘Paris’
this is not scalable, given the rate at which new and ‘Athens’ would be considered as extremely
dissimilar. Given that one of the popular techniques for word 5. Create a Bag of Words classifier and report the accuracy.
similarity measures these days is the use of word-embedding I suggest that our readers (specifically those who have
techniques such as Word2Vec, it is highly probable that just started exploring ML and NLP) try these experiments and
‘Paris’ and ‘Athens’ end up getting mapped as reasonably share the results in a Python Jupiter notebook. Please do send
similar by the word-embedding techniques since they are both me the pointer to your notebook and we can discuss it in this
European capital cities and often appear in similar contexts. column. Another exercise that is usually recommended is to
Let us consider another example. go over the actual data and see what types of questions are
Q1: What’s the fastest way to get from Los Angeles to marked as duplicate and what are not.
New York? It would also be good to do some initial text exploration of
Q2: How do I get from Los Angeles to New York in the the data set. I suggest that readers use the Stanford CoreNLP
least amount of time? tool kit for this purpose because it is more advanced in its text
While there may not be good word-based text similarity analysis compared to NLTK. Since Stanford CoreNLP is Java
between the above two questions, the information needs of based, you need to run this as a server and use a client Python
both the questions are satisfied by a common answer and package such as https://pypi.python.org/pypi/stanford-corenlp/.
hence this question pair needs to be marked as a duplicate. Please try the following experiments on the Quora data set.
Let us consider yet another example. 1. Identify the different Named Entities present in the
Q1: How do I invest in the share market? Quora train data set and test the data set. Can you cluster
Q2: How do I invest in the share market in India? these identities?
Though Q1 and Q2 have considerable text similarity, they 2. Stanford CoreNLP supports the parse tree. Can you use it
are not duplicates since Q2 is a more specific form of question for different types of questions such as ‘what’, ‘where’,
and, hence, cannot share the same answer as Q1. ‘when’ and ‘how’ questions?
These examples are meant to illustrate the challenges While we can apply many of the classical machine learning
involved in identifying duplicate questions. Having chosen our techniques after identifying the appropriate features, I thought
task and defined it, now let us decide what would be our data it would be more interesting to focus on some of the neural
set. Last year, the CQA forum, Quora, had released a data set networks based approaches since the data set is sufficiently
for the duplicate question detection task. This data set was also large (Quora actually used a random forest classifier initially).
used in a Kaggle competition involving the same task. Hence let Next month, we will focus on some of the simple neural
us use this data set for our exploration. It is available at https:// network based techniques to attack this problem.
www.kaggle.com/c/quora-question-pairs. So please download I also wanted to point out a couple of NLP problems related
the train.csv and test.csv files for your exploratory data analysis. to this task. One is the task of textual entailment recognition
Given that this was run as a Kaggle competition, there are where, given a premise statement and hypothesis statement, the
a lot of forum discussions on Kaggle regarding the various task is to recognise whether the hypothesis follows from the
solutions to this task. While I would encourage readers to go premise, contradicts the premise or is neutral to the premise.
through them to enrich their knowledge, we are not going to Note that textual entailment is a 3-class classification problem.
use any non-text features as we attempt to solve this problem. Another closely related task is that of paraphrase generation.
For instance, many of the winners have used question ID Given two statements S1 and S2, the task is to identify whether
as a feature in their solution. Some others have used graph S1 and S2 are paraphrases. Some of the techniques that have
features, such as learning the number of neighbours that been applied for paraphrase identification and textual entailment
a duplicate question pair would have compared to a non- recognition can be leveraged for our task of duplicate question
duplicate question pair. However, we felt that these are identification. I’ll discuss more on this in next month’s column.
extraneous features to text and are quite dependent on the If you have any favourite programming questions/
data. Hence, in order to arrive at a reliable solution, we will software topics that you would like to discuss on this forum,
only look at text based features in our approaches. please send them to me, along with your solutions and
As with any ML/NLP task, let us begin with some feedback, at sandyasm_AT_yahoo_DOT_com.
exploratory data analysis. Here are a few questions to our
readers (Note: Most of these tasks are quite easy, and can be
done with simple commands in Python using Pandas. So I By: Sandya Mannarswamy
urge you to try them out). The author is an expert in systems software and is currently
1. Find out how many entries there are in train.csv? working as a research scientist at Conduent Labs India
(formerly Xerox India Research Centre). Her interests include
2. What are the columns present in train.csv?
compilers, programming languages, file systems and natural
3. Can you find out whether this is a balanced data set or not? language processing. If you are preparing for systems
How many of the question pairs are duplicates? software interviews, you may find it useful to visit Sandya’s
4. Are there any NaNs present in the entries for Question 1 LinkedIn group ‘Computer Science Interview Training (India)‘
at http://www.linkedin.com/groups?home=&gid=2339182.
and Question 2 columns?
The prices, features and specifications are based on information provided to us, or as available
on various websites and portals. OSFY cannot vouch for their accuracy. Compiled by: Aashima Sharma
T
he global security threat scenario has changed radically followed by frequent attempts to conceal the fact after
in recent times. If hackers of yore were mainly the incident, it can seriously impact whether customers
hobbyists testing the security limits of corporate will continue to deal with the company in any way. In
systems as an intellectual challenge, the new threat comes the final analysis, customers are not willing to put their
from well-concerted plans hatched by criminal gangs working data at risk with a vendor who does not value and protect
online with an eye to profit, or to compromise and damage their personal information.
information technology systems. India has not been spared in this regard. Recent reports
The widespread hack attacks have also become possible allege that customer data at telecom giant Reliance Jio
because of the high degree of connectivity of devices, was compromised and previously, this occurred at online
like smartphones, laptops and tablets, that run a variety of restaurant guide Zomato.
operating systems. Companies need to team up with the right kind of
When consumer data gets compromised it has an hackers. Organisations cannot on their own match the wiles
immediate impact on the brand and reputation of the of the thousands of very smart hackers. This battle cannot
affected company, as was evident when Verizon cut its be fought with internal resources alone. Companies need to
purchase price for Yahoo by US$ 350 million, after a build a culture of information-sharing on security issues with
online portal revealed that it had been repeatedly hacked. government CERTs (computer emergency response teams),
When the data of a company gets compromised and is security companies and security researchers.
Countering malicious hackers needs a large number Companies also take cover behind a smokescreen of
of ‘ethical hackers’, also known as ‘white hats’, who denial when they are actually hit by cyber attacks, as Indian
will probe your systems just as any hacker would, but law does not make it mandatory to report security incidents
responsibly report to you any vulnerabilities in your to the CERT or any government agency. However, the
system. Many of them do this work for recognition, so regulatory framework is expected to change with the Reserve
don’t hesitate to name the person who helped you. Do Bank of India, for example, making it mandatory for banks to
appreciate the fact that they are spending a lot of their time report cyber security incidents within two to six hours of the
identifying the security holes in your systems. attacks being noticed.
This concept is not new. It has been tried by a number of Indian organisations also do not have a local platform
Internet, information technology, automobile and core industry for engaging with researchers, which would define the
companies. Google, Facebook, Microsoft, ABN AMRO, financial, technical and legal boundaries for the interaction
Deutsche Telekom and the US Air Force are some of the many in compliance with local regulations. Such a platform would
organisations that have set up their own reward programmes. give these companies the confidence that they can engage
And it has helped these companies spot bugs in their systems safely with people who are not on their payroll, even if their
that were not evident to their own in-house experts, because main objective is to hack for bugs.
the more pairs of eyes checking your code, the better. Bug bounty platforms like SafeHats are connecting
Some companies might hesitate to work with hobbyist enterprises with white hacker communities in India.
researchers, since it is difficult to know, for example, Safehats.com, powered by Instasafe Technologies, is a
whether they are encouraging criminal hackers or not. leading Security as a Service provider. It offers a curated
What if the hobbyists steal company data? platform that helps organisations to create a responsible
As more and more organisations are becoming vulnerability disclosure policy that lays down the rules of
digital, startups now offer their services through Web engagement, empanels reputed researchers, and makes sure
or mobile applications, so their only assets are the that the best and the safest white hackers get to your systems
software apps and customer data. Once the data breach before the bad guys do.
happens, customer credentials get stolen or denial of SafeHats has been working with some leading banking
services attacks occur, leading to huge losses in revenue, organisations and e-commerce players in securing their
reputation and business continuity. By becoming part of applications. Once vulnerabilities are discovered, SafeHat
the bug bounty platform, companies can create a security helps to fix them and to launch secure apps to the market.
culture within the organisations. The key difference with this kind of platform is that the
Indian companies have a unique advantage if they organisations pay the security researchers only if the bug is
decide to crowdsource the identification of security found, and the amount paid is based on the severity of the bug.
vulnerabilities in their IT infrastructure since the country A large number of Indian enterprises are in dire
has one of the largest number of security researchers, who need of tightening up on their security, as the compute
is part of the crowd that are willing to help organisations infrastructures of an increasing number of organisations are
spot a bug before a criminal does. being breached. On the other hand, we see an opportunity
The 2017 Bugcrowd report cited 11,663 researchers for Indian companies to leverage the large talent pool of
in India that worked on bug bounty programmes, which white hackers from India. SafeHats in Bengaluru was born
is behind the US with about 14,244 white hat hackers. out of the need to bring Indian companies and hackers
While most of them have jobs or identified themselves as together, in a safe environment.
students, 15 per cent of bug hunters were fully engaged More organisations are now aware about their
in the activity, with this number expected to increase, security needs after the high-profile Wannacry and Petya
according to Bugcrowd. ransomware attacks. Lot of growth stage startups have
Although Indian hackers earned over US$ 1.8 million shown interest in adopting bug bounty programmes as
in bounties in 2016-17, the bounties paid by Indian they have realised application security is key to their
companies added up to a paltry US$ 50, according to next round of funding.
HackerOne, indicating that local firms are not taking Sandip Panda, CEO of Instasafe, says, “Security is
advantage of the crowdsourcing option. now an important topic in every organisation’s board
Part of the reason is that Indian companies are still wary room discussions. Investment in security is as important as
of having their security infrastructure and any vulnerability investment in the product itself. Bug bounty platforms will
in it exposed to the public. This over-cautious approach create an entirely new security culture in India.”
could backfire in the long term, as it is always better to
look for bugs cooperatively with responsible hackers in a By: Shasanka Sahu
controlled environment, rather than have the vulnerabilities
The author works at Instasafe Technologies Pvt Ltd.
eventually spotted and exploited by criminals.
Arjun Vishwanathan,
associate director, emerging
technologies, IDC India
“AI must be
on the evolution of AI?
Global spending on cognitive and AI
solutions will continue to see significant
vIewed In
corporate investment over the next
several years, achieving a compound
annual growth rate (CAGR) of 54.4 per
A holIstIc
cent through 2020 when revenues will
be more than US$ 46 billion. Around 59
per cent of organisations plan to make
mAnner”
new software investments in cognitive
or AI technologies, whereas 45 per cent
will make new investments in hardware,
IT services and business services. Data
services have the lowest rank in all the
categories. Overall, IDC forecasts that
Artificial intelligence (AI) is touching new worldwide revenues for cognitive and
heights across all verticals, including consumer AI systems will reach US$ 12.5 billion
services, e-commerce, mobile phones, life in 2017, an increase of 59.3 per cent
sciences and manufacturing, among others. over 2016.
But how will AI transform itself over time?
Arjun Vishwanathan, associate director,
emerging technologies, IDC India, discusses
the transformation of AI in an exclusive
Q In what ways can AI become
the icing on the cake for
enterprises moving towards
conversation with Jagmeet Singh of OSFY. digital transformation (DX)?
Edited excerpts... The adoption status of cognitive/AI
correlates highly with the information DX
Q There is a belief
that AI will one day
become a major reason for
certainly have the largest mindshare in
the Asia-Pacific region. But domestic
platforms, such as Alibaba PAI and
unemployment in the IT world. Baidu PaddlePaddle are even better
What is your take on this? known in local markets. Also, the
AI must be viewed in a holistic IBM Watson platform has a larger
manner. Having said as much, mindshare compared with that of
AI and cognitive developments other bigger organisations.
are expected to make significant
inroads into hitherto unchartered
and mostly manual/human domains.
IDC predicts that by 2022, nearly
Q Why should enterprises
focus on enabling AI
advances to move towards a
40 per cent of operational processes profitable future?
will be self-healing and self- Increased employee productivity and
learning— minimising the need for greater process automation are the
human intervention or adjustments. most common expectations among
Additionally, as IDC recently organisations adopting or planning to
forecasted, as much as 5 per cent of adopt AI solutions. AI is presumed to
business revenues will come through bring significant business value to half
interaction with a customer's digital of the organisations in the APAC region
assistant by 2019. within two years. Customer service
All this merely proves that and support are the business processes
AI will increasingly complement that receive the most immediate
businesses in driving new and more benefits, whereas supply chain and
authentic experiences while also physical assets management see
driving business value. the least urgency.
osi 2017:
The Show Continues
to Grow
T
he beautiful city of Bengaluru, threatened by thick dark
clouds, kept convention delegates anxious about reaching the
venue on time, since they also knew they would be braving
the city’s legendary traffic. Thankfully, the weather gods heard
the OSI team’s prayers and the clouds refrained from drenching
KEY FACTS
the visitors, allowing many from the open source industry as
Show dates: October 13-14, 2017
well as enthusiasts from the community to reach the NIMHANS
Convention Center well before the 8:30 a.m. registration time. Location: NIMHANS Convention
While there were a lot of familiar faces — participants who have Centre, Bengaluru, Karnataka, India
been loyally attending the event over the past years, it was also
heartwarming to welcome new enthusiasts, who’d come on account Number of exhibitors: 27
of the positive word-of-mouth publicity the event has been building
Brands represented: 33
up over the years.
In terms of numbers, the 14th edition of the event, which Unique visitors: 2,367
happened on October 13 and 14, 2017, witnessed 2,367 unique
visitors over the two days, breaking all previous attendance Number of conferences: 09
records. The event boasted of 70+ industry and community experts
Number of workshops: 14
coming together to speak in the nine conference tracks and 14
hands-on workshops. Number of speakers: 70+
The visitors, as usual, comprised a cross-section of people in
terms of their expertise and experience. Since the tracks for the
conferences are always planned with this to the stage at which, today, the who’s Technologies, Zoho Corporation, Digital
diversity in mind, there were tracks for the who of open source are demonstrating Ocean, SUSE, Siemens, Huawei and
creators (developers, project managers, their solutions to the audience. I hope others for their valuable participation and
R&D teams, etc) as well as for the that we continue to make this event even support for the event. We look forward
implementers of open source software. more exciting, and ensure it becomes one to having an even bigger showcase
The star-studded speakers’ list of the biggest open source events across of open source technologies with the
included international experts like Tony the globe,” said Rahul Chopra, editorial support of our existing partners as well
Wasserman (professor of the software director, EFY Group. as many new stalwarts from the tech
management practice at Carnegie Mellon Delegates looking to buy workshop industry,” said Atul Goel, vice president
University, Silicon Valley), Joerg Simon passes on the spot were disappointed, as of events at EFY.
(ISECOM and Fedora Project), and Soh all the workshops were sold out earlier, Adding to this, Rahul Chopra said,
Hiong, senior consultant, NetApp. There online, even before the commencement of “With the overwhelming response of the
was also active participation from the the event. Workshops like ‘Self-Service tech audience and the demand for more
government of India with Debabrata Automation in OpenStack Cloud’, sessions, we have decided to expand the
Nayak (project director, NeGD, MeitY) ‘Building Machine Learning Pipelines with event, starting from the 2018 edition.
and K. Rajasekhar (deputy director PredictionIO’, ‘Analyzing Packets using The event will now become a three-day
general, NIC, MeitY), delivering speeches Wireshark’, ‘OpenShift DevOps Solution’, affair instead of a two-day one. This
at the event. ‘Make your First Open Source IoT Product’ means more tracks, more speakers,
Industry experts like Andrew Aitken and ‘Tools and Techniques to Dive into the more workshops and more knowledge-
(GM and global open source practice Mathematics of Machine Learning’ drew a sharing on open source. We have
leader, Wipro Technologies), Sandeep lot of interest from the techie audience. already announced the dates for the next
Alur (director, technical engagements “We would like to thank our edition. It’s going to happen right here,
(partners), Microsoft Corporation sponsors, Microsoft, IBM, Oracle, at the same venue, on October 11, 12
India), Sanjay Manwani (MySQL India Salesforce, 2nd Quadrant, Wipro and 13, 2018.”
director, Oracle), Rajdeep Dua (director,
developer relations at Salesforce), Gagan
Mehra (director, information strategy,
MongoDB), Rajesh Jeyapaul (architect,
mentor and advocate, IBM), Valluri
Kumar (chief architect, Huawei India),
and Ramakrishna Rama (director -
software, Dell India R&D) were amongst
the 70+ experts who spoke at the event.
The many intriguing topics covered
compelled visitors to stay glued to their
seats till late in the evening on both days.
Connectivity
A few topics that solicited special interest Partner
from the audience included a panel
discussion on ‘Open Source vs Enterprise
Open Source. Is this a Key Reason for the
Success of Open Source?’, ‘Accelerating
the Path to Digital with Cloud Data
Strategy’, ‘Open Source - A Blessing or a
Curse?’, ‘Intelligent Cloud and AI – The
Next Big Leap’ and ‘IoT Considerations -
Decisive Open Source Policy’.
“We owe our success to the active
participation of the community and
the industry. It’s exciting to see how
this event, which had just a handful of
exhibitors in its initial stages, has grown
Obstacle Avoidance Robot with Open are currently enjoying a field day and are only getting better at
Hardware (by Shamik Chakraborty, Amrita it, while product developers lag behind.
School of Engineering) This workshop was geared towards helping participants
This workshop explored the significance of robotics in Digital understand where software vulnerabilities exist, while
India and looked at how Make in India can be galvanised by programming and after; OS hardening techniques; what tools and
robotics. It also propagated better STEM education for a better methodologies help prevent and mitigate security issues, etc.
global job market for skilled professionals, and created a space
for participants to envision the tech future. Tools and Techniques to Dive Into the Mathematics of
Machine Learning (by Monojit Basu, founder and director,
Microservices Architecture with Open Source Framework TechYugadi IT Solutions and Consulting)
(by Dibya Prakash, founder, ECD Zone) In order to build an accurate model for a machine learning
This workshop was designed for developers, architects and problem, one needs better insights into the mathematics
engineering managers. The objective was to discuss the high- behind these models. For those primarily focused on the
level implementation of the microservices architecture using programming aspects of machine learning initiatives, this
Spring Boot and the JavaScript (Node.js) stack. workshop gave the opportunity to regain a bit of mathematical
context into some of the models and algorithms frequently
Hacking, Security and Hardening Overview for Developers used, and to learn about a few open source tools that will
– on the Linux OS and Systems Applications (by Kaiwan come in handy when performing deeper mathematical analysis
Billimoria, Linux consultant and trainer, kaiwanTECH) of machine learning algorithms.
The phenomenal developments in technology, and especially
software-driven products (in domains like networking, By: Omar Farooq
telecom, embedded-automotive, infotainment, and now IoT, The author is product head at Open Source India.
ML and AI), beg for better security on end products. Hackers
Asheem Bakhtawar Divyanshu Verma Balaji Kesavaraj Janardan Revuru Dibya Prakash Dhiraj Khare
regional director, senior engineering head marketing, Open Source founder, national alliance
India, Middle East and manager, Intel R&D India and SAARC, Evangelist ECDZone manager, Liferay India
Africa, 2ndQuadrant Autodesk
India Pvt Ltd
with SELinux
does not allow any interaction unless an
explicit rule grants access. If there is no
rule, no access is allowed.
SELinux labels have several
contexts—user, role, type and
sensitivity. The targeted policy, which
is the default policy in Red Hat
Enterprise Linux, bases its rules on the
third context—the type context. The
type context normally ends with _t.
The type context for the Web server is
httpd_t . The type context for files and
directories normally found in
/var/www/html is httpd_sys_content_t,
and for files and directories normally
found in /tmp and /var/tmp is tmp_t.
The type context for Web server ports is
httpd_port_t.
Discover SELinux, a security module that provides There is a policy rule that permits
extra protocols to ensure access control security. It Apache to access files and directories
supports mandatory access controls (MAC) and is an with a context normally found
integral part of RHEL’s security policy. in /var/www/html and other Web server
directories. There is no ‘allow’ rule
S
for files found in /var/tmp directory,
ecurity-Enhanced Linux or security model. This is a user and group so access is not permitted. With
SELinux is an advanced access based model known as discretionary SELinux, a malicious user cannot
control built into most modern access control. SELinux provides access the /tmp directory. SELinux
Linux distributions. It was initially an additional layer of security that is has a rule for remote file systems such
developed by the US National Security object based and controlled by more as NFS and CIFS, although all files
Agency to protect computer systems sophisticated rules, known as mandatory on such file systems are labelled with
from malicious tampering. Over time, access control. To allow remote the same context.
SELinux was placed in the public anonymous access to a Web server,
domain and various distributions have firewall ports must be opened. However, SELinux modes
incorporated it in their code. To many this gives malicious users an opportunity For troubleshooting purposes, SELinux
systems administrators, SELinux is to crack the system through a security protection can be temporarily disabled
uncharted territory. It can seem quite exploit, if they compromise the Web using SELinux modes.
daunting and at times, even confusing. server process and gain its permissions SELinux works in three modes--
However, when properly configured, — the permissions of Apache user and enforcing mode, permissive mode and
SELinux can greatly reduce a system’s Apache group, which user/group has disabled mode.
security risks and knowing a bit about read write access to things like document Enforcing mode: In the enforcing
it can help you to troubleshoot access root (/var/www/html), as well as the mode, SELinux actively denies access
related error messages. write access to /var, /tmp and any other to Web servers attempting to read
directories that are world writable. files with the tmp_t type context. In
Basic SELinux security Under discretionary access control, this mode, SELinux both logs the
concepts every process can access any object. interactions and protects files.
Security-Enhanced Linux is an But when SELinux enables mandatory Permissive mode: This mode is
additional layer of system security. The access control, then a particular often used to troubleshoot issues. In
primary goal of SELinux is to protect context is given to an object. Every permissive mode, SELinux allows all
the users’ data from system services that file, process, directory and port has a interactions, even if there is no explicit
have been compromised. Most Linux special security label, called a SELinux rule, and it logs the interactions that
administrators are familiar with the context. A context is a name that is used it would have denied in the enforcing
standard user/group/other permissions by the SELinux policy to determine mode. This mode can be used to
temporarily allow access to content that to the command. Often, the -t option is
SELinux is restricting. No reboot is used to specify only the type component
required to go from enforcing mode to of the context.
permissive mode. The restorecon command is the
Disabled mode: This mode preferred method for changing the
completely disables SELinux. A system SELinux context of a file or directory.
reboot is required to disable SELinux Unlike chcon, the context is not
entirely, or to go from disabled mode to explicitly specified when using this
enforcing or permissive mode. Figure 1: Checking the status of SELinux command. It uses rules in the SELinux
policy to determine what the context of
SELinux status a file should be.
To check the present status of SELinux,
run the sestatus command on a Defining SELinux
terminal. It will tell you the mode of default file context rules
SELinux. The semanage fcontext command can
be used to display or modify the rules
# sestatus that the restorecon command uses to set
the default file context. It uses extended
Changing the current Figure 2: Changing the SELinux mode to enforcing mode regular expressions to specify the path
SELinux mode and filenames. The most common
Run the command setenforce with extended regular expression used in
either 0 or 1 as the argument. A value fcontext rules is (/.*)? which means
of 1 specifies enforcing mode; 0 would “optionally match a / followed by any
specify permissive mode. number of characters.” It matches the
directory listed before the expression
# setenforce and everything in that directory
recursively.
Setting the default The restore command is part of the
SELinux mode Figure 3: Default configuration file of SELinux policycoreutil package and semanage
The configuration file that determines is part of the policycoreutil-Python
what the SELinux mode is at boot package.
time is /etc/selinux/config. Note that it As shown in Figure 6, the
contains some useful comments. permission is preserved by using the
Use /etc/selinux/config to change mv command while the cp command
the default SELinux mode at boot time. will not preserve the permission, which
In the example shown in Figure 3, it is will be the same as that of the parent
set to enforcing mode. directory. To restore the permission, run
restorecon which will give the parent
Initial SELinux context Figure 4: Checking the context of files directory permission to access the files.
Typically, the SELinux context of a Figure 7 shows how to use
file’s parent directory determines the semanage to add a context for a new
initial SELinux context. The context directory. First, change the context of
of the parent directory is assigned to the parent directory using the semanage
newly created files. This works for command, and then use the restorecon
commands like vim, cp and touch. command to restore the parent
However, if a file is created elsewhere permission to all files contained in it.
and the permissions are preserved (as
with mv cp -a), the original SELinux SELinux Booleans
context will be unchanged. Figure 5: Restoring context of the file with the SELinux Booleans are switches that
parent directory change the behaviour of the SELinux
Changing the SELinux policy. These are rules that can be
context of a file file—chcon and restorecon. The chcon enabled or disabled, and can be used
There are two commands that are used command changes the context of a file by security administrators to tune the
to change the SELinux context of a to the context specified as an argument policy to make selective adjustments.
D
evOps teams rely on ‘Infrastructure as a Code’ may have broken dependencies due to changes. Scripts may
(IaaC) for productivity gains. Automation and speed also fail if a dependent URL is not available. Those working
are prerequisites in the cloud environment, where with Ansible would have faced these challenges.
resources are identified by cattle nomenclature rather than pet This is where we introduce unit testing, which can run
nomenclature due to the sheer volumes. during the nightly build and can detect these failures well
Ansible is one of the leading technologies in the IaaC in advance. The Molecule project is a useful framework
space. Its declarative style, ease of parameterisation and to introduce unit testing into Ansible code. One can use
the availability of numerous modules make it the preferred containers to test the individual role or use an array of
framework to work with. containers to test complex deployments. Docker containers
Any code, if not tested regularly, gets outdated and are useful, as they can save engineers from spawning
becomes irrelevant over time, and the same applies to Ansible. multiple instances or using resource-hogging VMs in
Daily testing is a best practice and must be introduced for the cloud or on test machines. Docker is a lightweight
Ansible scripts too—for example, keeping track of the latest technology which is used to verify the end state of the
version of a particular software during provisioning. Similarly, system, and after the test, the provisioned resources are
the dependency management repository used by apt and Yum destroyed thus cleaning up the environment.
Us:
Installing and working with Molecule is simple. Follow Table 1: A comparison of folders created by Ansible and Molecule
the steps shown below. First, get the OS updated as follows: Folders creat- Folders Remarks
ed by Ansible- created by
sudo apt-get update && sudo apt-get -y upgrade galaxy init Molecule init
Default Default No changes are
Next, install Docker: needed and folder
Handler Handler
structure is identical
sudo apt install docker.io Meta Meta
README.md README.md
Now install Molecule with the help of Python Pip: Tasks Tasks
Vars Vars
sudo apt-get install python-pip python-dev build-essential
sudo pip install --upgrade pip Files Files Folders needs to be
sudo pip install molecule created manually
templates templates to take advantage
After the install, do a version check of Molecule. If the of file and template
module of ansible
molecule version is not the latest, upgrade it as follows:
molecule All Molecule related
sudo pip install --upgrade molecule scripts and test
scripts are placed in
It is always good to work with the latest version of this folder
Molecule as there are significant changes compared to an
earlier version. Enabling or disabling modules in the latest The Molecule folder has files in which one can put
version of Molecule is more effective. For example, a common in a pre-created Molecule playbook and test scripts. The
problem faced is the forced audit errors that make Molecule environment that is created is named default. This can be
fail. When starting to test with Molecule, at times audit errors changed as per the project’s requirements.
can pose a roadblock. Disabling the Lint module during the One file that will be of interest is molecule.yml, which
initial phase can give you some speed to concentrate on writing is placed at:
tests rather than trying to fix the audit errors.
Here are a few features of Molecule explained, though the ./molecule/default/molecule.yml
full toolset offers more. Another file which describes the playbook for the role is
1. Create: This creates a virtualised provider, which in our playbook.yml placed at:
case is the Docker container.
2. Converge: Uses the provisioner and runs the Ansible ./molecule/default/playbook.yml
scripts to the target running Docker containers.
3. Idempotency: Uses the provisioner to check the Note: Molecule initialises the environment called
idempotency of Ansible scripts. default but engineers can use a name as per the
4. Lint: This does code audits of Ansible scripts, test code, environment used in the project.
test scripts, etc.
5. Verify: Runs the test scripts written.
6. Test: Runs the full steps needed for Molecule, i.e., create,
converge, Lint, verify and destroy.
The roles need to be initialised by the following command:
This will create all the folders needed to create the role,
and is similar to the following command:
For More
Testand
Test andMeasurement
EquipmentStories
Stories
Visit www.electronicsb2b.com
DevOps Series
Using Ansible to Deploy a
Piwigo Photo Gallery
Piwigo is Web based photo gallery software that is written in PHP. In
this tenth article in our DevOps series, we will use Ansible to install and
configure a Piwigo instance.
P
iwigo requires a MySQL database for its back- Linux
end and has a number of extensions and plugins The Piwigo installation will be on an Ubuntu 15.04 image
developed by the community. You can install it on running as a guest OS using KVM/QEMU. The host system
any shared Web hosting service provider or install it on is a Parabola GNU/Linux-libre x86_64 system. Ansible is
your own GNU/Linux server. It basically uses the (G) installed on the host system using the distribution package
LAMP stack. In this article, we will use Ansible to install manager. The version of Ansible used is:
and configure a Piwigo instance, which is released under the
GNU General Public License (GPL). $ ansible --version
You can add photos using the Piwigo Web interface or use an ansible 2.4.1.0
FTP client to synchronise the photos with the server. Each photo config file = /etc/ansible/ansible.cfg
is made available in nine sizes, ranging from XXS to XXL. A configured module search path = [u’/home/shakthi/.ansible/
number of responsive UI themes are available that make use plugins/modules’, u’/usr/share/ansible/plugins/modules’]
of these different photo sizes, depending on whether you are ansible python module location = /usr/lib/python2.7/site-
viewing the gallery on a phone, tablet or computer. The software packages/ansible
also allows you to add a watermark to your photos, and you can executable location = /usr/bin/ansible
create nested albums. You can also tag your photos, and Piwigo python version = 2.7.14 (default, Sep 20 2017, 01:25:59)
stores metadata about the photos too. You can even use access [GCC 7.2.0]
control to make photos and albums private. My Piwigo gallery is
available at https://www.shakthimaan.in/gallery/. The /etc/hosts file should have an entry for the guest
“ubuntu” VM as indicated below: The Ansible playbook updates the software package
repository by running apt-get update and then proceeds to
192.168.122.4 ubuntu install the Apache2 package. The playbook waits for the server
to start and listen on port 80. An execution of the playbook is
You should be able to issue commands from Ansible to shown below:
the guest OS. For example:
$ ansible-playbook -i inventory/kvm/inventory playbooks/
$ ansible ubuntu -m ping configuration/piwigo.yml --tags web -K
SUDO password:
ubuntu | SUCCESS => {
“changed”: false, PLAY [Install Apache web server] ******************
“ping”: “pong” TASK [setup] ***********************************
} ok: [ubuntu]
On the host system, we will create a project directory TASK [Update the software package repository] *************
structure to store the Ansible playbooks: changed: [ubuntu]
tasks:
- name: Update the software package repository
apt:
update_cache: yes
- wait_for: MySQL
port: 80 Piwigo requires a MySQL database server for its back-end,
and at least version 5.0. As the second step, you can install +-------------------------------------------+
the same using the following Ansible playbook: 1 row in set (0.00 sec)
- name: Install MySQL database server Also, the default MySQL root password is empty. You
hosts: ubuntu should change it after installation. The playbook can be
become: yes invoked as follows:
become_method: sudo
gather_facts: true $ ansible-playbook -i inventory/kvm/inventory playbooks/
tags: [database] configuration/piwigo.yml --tags database -K
tasks: PHP
- name: Update the software package repository Piwigo is written using PHP (PHP Hypertext Preprocessor),
apt: and it requires at least version 5.0 or later. The documentation
update_cache: yes website recommends version 5.2. The Ansible playbook to
install PHP is given below:
- name: Install MySQL
package: - name: Install PHP
name: “{{ item }}” hosts: ubuntu
state: latest become: yes
with_items: become_method: sudo
- mysql-server gather_facts: true
- mysql-client tags: [php]
- python-mysqldb
tasks:
- name: Start the server - name: Update the software package repository
service: apt:
name: mysql update_cache: yes
state: started
- name: Install PHP
- wait_for: package:
port: 3306 name: “{{ item }}”
state: latest
- mysql_user: with_items:
name: guest - php5
password: ‘*F7B659FE10CA9FAC576D358A16CC1BC646762FB2’ - php5-mysql
encrypted: yes
priv: ‘*.*:ALL,GRANT’
state: present Update the software package repository, and install
PHP5 and the php5-mysql database connectivity package.
The APT software repository is updated first and the The Ansible playbook for this can be invoked as follows:
required MySQL packages are then installed. The database
server is started, and the Ansible playbook waits for the $ ansible-playbook -i inventory/kvm/inventory playbooks/
server to listen on port 3306. For this example, a guest configuration/piwigo.yml --tags php -K
database user account with osfy as the password is chosen
for the gallery Web application. In production, please use Piwigo
a stronger password. The hash for the password can be The final step is to download, install and configure Piwigo.
computed from the MySQL client as indicated below: The playbook for this is given below:
Two backup files that were created from executing the - name: Stop the web server
above playbook are piwigo-1510053932.sql and piwigo- service:
backup-1510053932.tar.bz2. name: apache2
state: stopped
Cleaning up
You can uninstall the entire Piwigo installation using an - name: Uninstall apache2
Ansible playbook. This has to happen in the reverse order. package:
You have to remove Piwigo first, followed by PHP, MySQL name: “{{ item }}”
and Apache. A playbook to do this is included in the state: absent
playbooks/admin folder and given below for reference: with_items:
- apache2
---
- name: Uninstall Piwigo The above playbook can be invoked as follows:
hosts: ubuntu
become: yes $ ansible-playbook -i inventory/kvm/inventory playbooks/admin/
become_method: sudo uninstall-piwigo.yml -K
gather_facts: true
tags: [uninstall] You can visit http://piwigo.org/ for more documentation.
vars:
By: Shakthi Kannan
piwigo_dest: “/var/www/html”
tasks: The author is a free software enthusiast and blogs at
shakthimaan.com.
- name: Delete piwigo folder
H
ive is a data warehouse infrastructure tool to process instances were getting filled pretty fast and it was time to
structured data in Hadoop. It resides on top of Hadoop develop a new kind of system that handled large amounts of
to summarise Big Data, and makes querying and data. It was Facebook that first built Hive, so that most people
analysing easy. who had SQL skills could use the new system with minimal
A little history about Apache Hive will help you changes, compared to what was required with other RDBMs.
understand why it came into existence. When Facebook The main features of Hive are:
started gathering data and ingesting it into Hadoop, the data It stores schema in a database and processes data into HDFS.
was coming in at the rate of tens of GBs per day back in 2006. It is designed for OLAP.
Then, in 2007, it grew to 1TB/day and within a few years It provides an SQL-type language for querying, called
increased to around 15TBs/day. Initially, Python scripts were HiveQL or HQL.
written to ingest the data in Oracle databases, but with the It is familiar, fast, scalable and extensible.
increasing data rate and also the diversity in the sources/types Hive architecture is shown in Figure 1.
of incoming data, this was becoming difficult. The Oracle The components of Hive are listed in Table 1.
Table 1
USER INTERFACES WEB UI HIVE COMMAND LINE HD Insight
Unit name Operation
Hive QL Process Engine
User interface Hive is data warehouse infrastructure Meta Store
Execution Engine
software that can create interactions Map Reduce
THE MAKING OF A
‘MADE IN INDIA’
LINUX DISTRO
BackSlash Linux is one of the newest Linux distributions developed in India, and that too, by
a 20-year-old. The operating system has a mix of Ubuntu and Debian platforms, and offers
two different worlds under one roof, with KDE and GNOME integration.
“I
t is not very hard to build that debuted as BackSlash Linux in It’s Ubuntu behind the scenes,
a Linux distribution,” says November 2016. but with many tweaks
Kumar Priyansh, the 20-year- “To try and build a new Linux Priyansh picked Ubuntu as the Linux
old developer who has single-handedly distro I decided to dedicate a lot of distribution to build his platform. But
created BackSlash Linux. As a child, time for development, and started to give users an even more advanced
Priyansh had always been curious attending online tutorials,” says experience, he deployed Hardware
about how operating systems worked. Priyansh. Going through many online Enablement (HWE) support that
But instead of being merely curious tutorial sessions, the Madhya Pradesh allows the operating system to work
and dreaming of developing an resident observed that he needed to with newer hardware, and provides
operating system, he started making combine multiple parts of different an up-to-date delivery of the Linux
OpenSUSE-based distributions in tutorials to understand the basics. “I kernel. The developer also added
2011 to step into the world of open started connecting parts of different a proprietary repository channel
source platforms. He had used the tutorials from the Web, making an that helps users achieve better
SUSE Studio to release three versions authentic and working tutorial for compatibility with their hardware. “I
of his very own operating system myself that allowed me to build the ship the platform with a proprietary
that he called Blueberry. All that necessary applications and compile repository channel enabled, which
experience helped Priyansh in bringing the very first version of my own Ubuntu does not offer by default.
out a professional Linux platform distribution,” he says. This is because I don’t want
A Brief Introduction to
Puppet
T
he burgeoning demand for scalability has driven the operations of the company — is evolving in a manner
technology into a new era with a focus on distributed that would have been hard to fathom a decade ago. It
and virtual resources, as opposed to the conventional was originally built as isolated chunks of machinery
hardware that drives most systems today. Virtualisation is pieced together into a giant to provide storage and
a method for logically dividing the computing resources network support for day-to-day operations. Archetypal
for a system between different applications. Tools offer representations include a mass of tangled wires and rolled
either full virtualisation or para-virtualisation for a system, up cables connecting monster racks of servers churning
driving away from the ‘one server one application’ model data minute after minute. A few decades ago, this was the
that typically under-utilises resources, towards a model norm; today companies require much more flexibility, to
that is focused on more efficient use of the system. In scale up and down!
hardware virtualisation, a virtual machine gets created and With the advent of virtualisation, enterprise infrastructure
behaves like a real computer with an operating system. The has been cut down, eliminating unnecessary pieces of
terminology, ‘host’ and ‘guest’ machine, is utilised for the hardware, and managers have opted for a cloud storage
real and virtual system respectively. With this paradigm network that they can scale on-the-fly. Today, the business
shift, software and service architectures have undergone a of cloud service providers is booming because not only
transition to virtual machines, laying the groundwork for startups but corporations, too, are switching to a virtual
distributed and cloud computing. internal infrastructure to avoid the hassles of preparing and
maintaining their own set of servers, especially considering
Enterprise infrastructure the man hours required for the task. Former US Chief
The classical definition of enterprise infrastructure — the Information Officer (CIO) Vivek Kundra’s paper on Federal
data centre, a crucial piece of the puzzle serving to bolster Cloud Computing, states: “It allows users to control the
CRM ERP
Mainframe
App
Pay
1
roll Baremetal
App
1990s Today
App App
Storage Storage
T
he cloud is currently very popular across businesses. can register a domain and hosting space with any service
It has been stable for some time and many industries provider like GoDaddy or BigRock. The only requirement
are moving towards cloud technologies. However, is that the hosting service should support PHP and MySQL,
a major challenge in the cloud environment is privacy and which most hosting providers usually do.
the security of data. Organisations or individuals host their
professional or personal data on the cloud, and there are ownCloud server
many providers who promise a 99 per cent guarantee for There are different ways to install ownCloud server but here, we
the safety of user data. Yet, there are chances of security will use the easiest and quickest method. It will not take more
breaches and privacy concerns. Therefore, many companies than five minutes to get your cloud ready. We need to download
have pulled their data from public clouds and started the latest version of the ownCloud server and install it.
creating their own private cloud storage. Using ownCloud, To install and configure ownCloud, go to the
one can create services similar to Dropbox and iCloud. We link https://owncloud.org/install/. From this link,
can sync and share files, calendars and more. Let’s take a download the ownCloud server, https://owncloud.org/
look at how we can do this. install/#instructions-server.
The current version is 10.0.3. Click on Download
ownCloud ownCloud Server. Select the Web installer from the top options
ownCloud is free and open source software that operates like in the left side menus. Check Figure 1 for more details.
any cloud storage system on your own domain. It is very Figure 1 mentions the installation steps. We need to
quick and easy to set up compared to other similar software. download the setup-owncloud.php file and upload it into the
It can be used not only for file sharing but also to leverage Web space.
many features like text editors, to-do lists, etc. ownCloud We will upload the file using winSCP or any similar
can be integrated with any desktop or mobile calendar and kind of software. Here, I am using winSCP. If you don’t
contact apps. So now, we don’t really require a Google Drive have winSCP, you can download it first from the link https://
or Dropbox account. winscp.net/eng/download.php.
The next step is to get to know your FTP credentials in
Requirements for ownCloud order to connect to your Web space. For that, you need to
log in to your hosting provider portal. From the menu, you
Web hosting will find the FTP options, where you see your credentials.
You will need hosting space and a domain name so that you This approach varies based on your hosting provider. Since I
can access your storage publicly, anywhere and anytime. You bought my hosting from ipage.com, this was how I found my
Figure 4: Error while installing ownCloud selection and provide your user name and password for
data. Click Next and then you will come to the home page of
FTP credentials. If you are not able to find them, then Google ownCloud. Figure 7 demonstrates that.
what specific steps you need to take for your hosting provider, There are different options—you can download your
for which you will get lots of documentation. Figure 2 shows own cloud storage as an Android or iPhone app. You can also
my FTP connection to my Web space. connect to your Calendar or Contacts. Now everything else is
Transfer your setup-owncloud.php file to your Web space self-explanatory, just as in Google Drive or Dropbox.
by dragging it. Now you can load the URL Yourdomainname. You can now upload any file and share it with friends
com/setup-owncloud.php into your browser. In my case, the and colleagues, like you would with any other cloud
domain name is coolstuffstobuy.com and hence my URL is storage service.
coolstuffstobuy.com/setup-owncloud.php.
Figure 3 demonstrates this step. As you can see, Reference
ownCloud installation has started.
[1] https://owncloud.org/install/
While clicking on the next button, if we get the error
mentioned in Figure 4, it means we don’t have proper access.
This happens because of the root directory. In shared hosting, By: Maulik Parekh
we don’t have RW access to the root folder; so I have created The author works at Cisco as a consulting engineer and has an
M. Tech degree in cloud computing from VIT University, Chennai.
one folder and put setup-owncloud.php into it. Hence, the He constantly strives to learn, grow and innovate. He can be
current path is coolstuffstobuy.com/cloud/setup-owncloud.php. reached at maulikparekh2@gmail.com. Website: https://www.
While creating an account, you can click on Database linkedin.com/in/maulikparekh2.
Spark’s MLlib:
Scalable Support for
Machine Learning
Designated as Spark’s scalable machine learning library,
MLlib consists of common algorithms and utilities as well as
underlying optimisation primitives.
T
he world is being flooded with data from all sources. MLlib
The hottest trend in technology is related to Big Data MLlib is Spark’s machine learning library. It is predominantly
and the evolving field of data science is a way to cope used in Scala but it is compatible with Python and Java as well.
with this data deluge. Machine learning is at the heart of data MLlib was initially contributed by AMPLab at UC Berkeley. It
science. The need of the hour is to have efficient machine makes machine learning scalable, which provides an advantage
learning frameworks and platforms to process Big Data. when handling large volumes of incoming data.
Apache Spark is one of the most powerful platforms for The main features of MLlib are listed below.
analysing Big Data. MLlib is its machine learning library, Machine learning algorithms: Regression, classification,
and is potent enough to process Big Data and apply all collaborative filtering, clustering, etc
machine learning algorithms to it efficiently. Featurisation: Selection, dimensionality reduction,
transformation, feature extraction, etc
Apache Spark Pipelines: Construction, evaluation and tuning of ML pipelines
Apache Spark is a cluster computing framework based on Persistence: Saving/loading of algorithms, models and pipelines
Hadoop’s MapReduce framework. Spark has in-memory Utilities: Statistics, linear algebra, probability, data handling, etc
cluster computing, which helps to speed up computation by Some lower level machine learning primitives like the
reducing the IO transfer time. It is widely used to deal with generic gradient descent optimisation algorithm are also
Big Data problems because of its distributed architectural present in MLlib. In the latest releases, the MLlib API is based
support and parallel processing capabilities. Users prefer it to on DataFrames instead of RDD, for better performance.
Hadoop on account of its stream processing and interactive
query features. To provide a wide range of services, it The advantages of MLlib
has built-in libraries like GraphX, SparkSQL and MLlib. The true power of Spark lies in its vast libraries, which are
Spark supports Python, Scala, Java and R as programming capable of performing every data analysis task imaginable. MLlib
languages, out of which Scala is the most preferred. is at the core of this functionality. It has several advantages.
Ease of use: MLlib integrates well with four Clustering: k-means, fuzzy k-means, etc
languages— Java, R, Python and Scala. The APIs of Decomposition: SVD, randomised SVD, etc
all four provide ease of use to programmers of various
languages as they don’t need to learn a new one. Spark MLlib use cases
Easy to deploy: No preinstallation or conversion Spark’s MLlib is used frequently in marketing optimisation,
is required to use a Hadoop based data source such as security monitoring, fraud detection, risk assessment,
HBase, HDFS, etc. Spark can also run standalone or on operational optimisation, preventative maintenance, etc.
an EC2 cluster.
Scalability: The same code can work on small or Here are some popular use cases.
large volumes of data without the need of changing NBC Universal: International cable TV has tons of data.
it to suit the volume. As businesses grow, it is easy To reduce costs, NBC takes its media offline when it is not
to expand vertically or horizontally without breaking in use. Spark’s MLlib is used to implement SVM to predict
down the code into modules for performance. which files should be taken down.
Performance: The ML algorithms run up to ING: MLlib is used for its data analytics pipeline
100X faster than MapReduce on account of the to detect anomaly. Decision trees and k-means are
framework, which allows iterative computation. implemented by MLlib to enable this.
MLlib’s algorithms take advantage of iterative Toyota: Toyota’s Customer 360 insights platform uses
computing properties to deliver better performance, social media data in real-time to prioritise the customer
surpassing that of MapReduce. The performance gain reviews and categorise them for business insights.
is attributed to the in-memory computing, which is a
speciality of Spark. ML vs MLLib
Algorithms: The main ML algorithms included There are two main machine learning packages —spark.
in the MLlib module are classification, regression, mllib and spark.ml. The former is the original version and
decision trees, recommendation, clustering, topic has its API built on top of RDD. The latter has a newer,
modelling, frequent item sets, association rules, higher-level API built on top of DataFrames to construct
etc. ML workflow utilities included are feature ML pipelines. The newer version is recommended due
transformation, pipeline construction, ML persistence, to the DataFrames, which makes it more versatile and
etc. Single value decomposition, principal flexible. The newer releases support the older version as
component analysis, hypothesis testing, etc, are also well, due to backward compatibility. MLlib, being older,
possible with this library. has more features as it was in development longer. Spark
Community: Spark is open source software under ML allows you to create pipelines using machine learning
the Apache Foundation now. It gets tested and updated to transform the data. In short, ML is new, has pipelines,
by the vast contributing community. MLlib is the most DataFrames and is easier to construct. But MLlib is old,
rapidly expanding component and new features are has RDD and has more features.
added every day. People submit their own algorithms MLlib is the main reason for the popularity and the
and the resources available are unparalleled. widespread use of Apache Spark in the Big Data world. Its
compatibility, scalability, ease of use, good features and
Basic modules of MLlib functionality have led to its success. It provides many inbuilt
SciKit-Learn: This module contains many basic ML functions and capabilities, which makes it easy for machine
algorithms that perform the various tasks listed below. learning programmers. Virtually all known machine learning
Classification: Random forest, nearest neighbour, algorithms in use can be easily implemented using either
SVM, etc version of MLlib. In this era of data deluge, such libraries
Regression: Ridge regression, support vector certainly are a boon to data science.
regression, lasso, logistic regression, etc
Clustering: Spectral clustering, k-means clustering, etc
References
Decomposition: PCA, non-negative matrix
factorisation, independent component analysis, etc [1] spark.apache.org/
[2] www.tutorialspoint.com/apache_spark/
Apache CloudStack:
A Reliable and Scalable Cloud
Computing Platform
Apache CloudStack is yet another outstanding project that has contributed many
tools and projects to the open source community. The author has selected the
relevant and important extracts from the excellent documentation provided by the
Apache CloudStack project team for this article.
A
pache CloudStack is one among the highly visible accounting, and a first-class user interface (UI).
projects from the Apache Software Foundation (ASF). It currently supports the most popular hypervisors —
The project focuses on deploying open source software VMware, KVM, Citrix XenServer, Xen Cloud Platform
for public and private Infrastructure as a Service (IaaS) clouds. (XCP), Oracle VM server and Microsoft Hyper-V.
Listed below are a few important points about CloudStack. Users can manage their cloud with an easy-to-use Web
It is designed to deploy and manage large networks interface, command line tools and/or a full-featured
of virtual machines, as highly available and scalable RESTful API. In addition, CloudStack provides an API
Infrastructure as a Service (IaaS) cloud computing that’s compatible with AWS EC2 and S3 for organisations
platforms. that wish to deploy hybrid clouds.
CloudStack is used by a number of service providers to It provides an open and flexible cloud orchestration platform
offer public cloud services and by many companies to to deliver reliable and scalable private and public clouds.
provide on-premises (private) cloud offerings or as part of
a hybrid cloud solution. Features and functionality
CloudStack includes the entire ‘stack’ of features that Some of the features and functionality provided by
most organisations desire with an IaaS cloud -- compute CloudStack are:
orchestration, Network as a Service, user and account Works with hosts running XenServer/XCP, KVM,
management, a full and open native API, resource Hyper-V, and/or VMware ESXi with vSphere
Keras:
Building Deep Learning Applications
with High Levels of Abstraction
Keras is a high-level API for neural networks. It is written in Python and its biggest
advantage is its ability to run on top of state-of-art deep learning libraries/
frameworks such as TensorFlow, CNTK or Theano. If you are looking for fast
prototyping with deep learning, then Keras is the optimal choice.
D
eep learning is the new buzzword among machine The primary reasons for using Keras are:
learning researchers and practitioners. It has Instant prototyping: This is ability to implement the deep
certainly opened the doors to solving problems learning concepts with higher levels of abstraction with a
that were almost unsolvable earlier. Examples of such ‘keep it simple’ approach.
problems are image recognition, speaker-independent voice Keras has the potential to execute without any barriers on
recognition, video understanding, etc. Neural networks CPUs and GPUs.
are at the core of deep learning methodologies for solving Keras supports convolutional and recurrent networks --
problems. The improvements in these networks, such combinations of both can also be used with it.
as convolutional neural networks (CNN) and recurrent
networks, have certainly raised expectations and the results Keras: The design philosophy
they yield are also promising. As stated earlier, the ability to move into action with instant
To make the approach simple, there are already powerful prototyping is an important characteristic of Keras. Apart
frameworks/libraries such as TensorFlow from Google and from this, Keras is designed with the following guiding
CNTK (Cognitive Toolkit) from Microsoft. The TensorFlow principles or design philosophy:
approach has already simplified the implementation of deep It is an API designed with user friendly implementation
learning for coders. Keras is a high-level API for neural as the core principle. The API is designed to be simple
networks written in Python, which makes things even simpler. and consistent, and it minimises the effort programmers
The uniqueness of Keras is that it can be executed on top of are required to put in to convert theory into action.
libraries such as TensorFlow and CNTK. This article assumes Keras’ modular design is another important feature.
that the reader is familiar with the fundamental concepts of The primary idea of Keras is layers, which can be
machine learning. connected seamlessly.
Instant Prototyping
Python at Core User Friendliness
#1.
Defi
#3
. Fit
eM n
Keras is extensible. If you are a researcher trying to bring
Mo
ode
del
l
in your own novel functionality, Keras can accommodate Keras Flow
such extensions.
# 4.
# 2.
Keras is all Python, so there is no need for tricky
Perf
Com
orm
ile M p
declarative configuration files.
Pre
ode
dicti
l
on
Installation
It has to be remembered that Keras is not a standalone Figure 3: The sequence of tasks
library. It is an API and works on top of existing libraries
(TensorFlow, CNTK or Theano). Hence, the installation of Model definition
Keras requires any one of these backend engines. The official Compilation of the model
documentation suggests a TensorFlow backend. Detailed Model fitting
installation instructions for TensorFlow are available at https:// Performing predictions
www.tensorflow.org/install/. From this link, you can infer The basic type of model is sequential. It is simply a
that TensorFlow can be easily installed in all major operating linear stack of layers. The sequential model can be built as
systems such as MacOS X, Ubuntu and Windows (7 or later). shown below:
After the successful installation of any one of the backend
engines, Keras can be installed using Pip, as shown below: from keras.models import Sequential
model = Sequential()
$sudo pip install keras
The stacking of layers can be done with the add() method:
An alternative approach is to install Keras from the source
(GitHub): from keras.layers import Dense, Activation
model.add(Dense(units=64, input_dim=100)) model.
#1 Clone the Source from Github add(Activation(‘relu’))
$git clone https://github.com/fchollet/keras.git model.add(Dense(units=10))
model.add(Activation(‘softmax’))
#2 Move to Source Directory
cd keras Keras has various types of pre-built layers. Some of the
prominent types are:
#3 Install using setup.py Regular Dense
sudo python setup.py install Recurrent Layers, LSTM, GRU, etc
One- and two-dimension convolutional layers
The three optional dependencies that are required for Dropout
specific features are: Noise
cuDNN (CUDA Deep Neural Network library): For Pooling
running Keras on the GPU Normalisation, etc
HDF5 and h5py: For saving Keras models to disks Similarly, Keras supports most of the popularly used
Graphviz and Pydot: For visualisation tasks activation functions. Some of these are:
Sigmoid
The way Keras works ReLu
The basic building block of Keras is the model, which is a Softplus
way to organise layers. The sequence of tasks to be carried ELU
out while using Keras models is: LeakyReLu, etc
The model can be compiled with compile(), as follows: from __future__ import print_function
The predictions on novel data can be done with the # convert class vectors to binary class matrices
predict() function: y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
classes = model.predict(x_test, batch_size=128)
model = Sequential()
The methods of Keras layers model.add(Dense(512, activation=’relu’, input_shape=(784,)))
The important methods of Keras layers are shown in Table 1. model.add(Dropout(0.2))
model.add(Dense(512, activation=’relu’))
Method Description
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation=’softmax’))
This method is used to return the
get_weights()
weights of the layer
model.summary()
Using jq to Consume
JSON in the Shell
This article is a tutorial on using jq as a JSON parser and fetching information
about the weather from different cities.
J
SON has become the most prevalent way of consuming Usage
Web APIs. If you try to find the API documentation of For this demonstration, version 1.5 of jq was used. All
a popular service, chances are that the API will respond the code examples are available at https://github.com/
in JSON format. Many mainstream languages even have jatindhankhar/jq-tutorial. jq can be used in conjunction
JSON parsers built in. But when it comes to shell scripting, with other tools like cat and curl, by piping, or be used to
there is no inbuilt JSON parser, and the only hacker way of directly read from the file, although the former is more
processing JSON is with a combination of awk and sed, which popular in practice. When working with jq, two fantastic
are very painful to use. resources can be used. The first one is the documentation at
There are many JSON parsers apart from jq but, in this https://stedolan.github.io/jq/manual/, and the second is the
article, we will focus only on this option. Online Playground (https://jqplay.org/) where one can play
with jq and even share the snippets.
Installation Throughout this article, we will use different
jq is a single binary program with no dependencies, so API endpoints of the MetaWeather API (https://www.
installation is as simple as downloading the binary from metaweather.com/api). The simplest use of jq is to pretty
https://stedolan.github.io/jq/, copying the binary in /bin or format JSON data.
/usr/bin and setting permissions. Many Linux distributions Let’s fetch the list of cities that contain the word ‘new’ in
provide jq in the repositories, so installing jq is as easy as them, and then use this information to further fetch details of
using the following commands: a particular city, as follows:
:”28.643999,77.091003”},{“title”:”New Orleans”,”location_ty To display only the available cities, we add another filter,
pe”:”City”,”woeid”:2458833,”latt_long”:”29.953690,- which is the key name itself (in our case, .title). We can
90.077713”},{“title”:”Newcastle”,”location_type”:”City”, combine multiple filters using the | (pipe) operator.
”woeid”:30079,”latt_long”:”54.977940,-1.611620”},{“title Here we combine the .[] filter with .title in this way: .[] |
”:”Newark”,”location_type”:”City”,”woeid”:2459269,”latt_ .title . For simple queries, we can avoid the | operator and rewrite
long”:”40.731972,-74.174179”}] it as .[] .title, but we will use the | operator to combine queries.
Let’s pretty format by piping the curl output to jq as follows: curl -sS https://www.metaweather.com/api/location/
search/\?query\=new | jq ‘.[] | .title’
curl -sS https://www.metaweather.com/api/location/
search/\?query\=new | jq “New York”
“New Delhi”
The screenshot shown in Figure 1 compares the output of “New Orleans”
both commands. “Newcastle”
Now that we have some data to work upon, we can use “Newark”
jq to filter the keys. The simplest filter available is ‘.’ which
does nothing and filters the whole document as it is. Filters But what if we want to display multiple keys together?
are passed to jq in single quotes. By looking at the output, we Just separate them by ‘,’.
can see that all the objects are trapped inside a JSON array. Now, let’s display the city along with its ID (woeid):
To filter out the array, we use .[] , which will display all items
inside an array. To target a specific item by index, we place curl -sS https://www.metaweather.com/api/location/
the index number inside .[0]. search/\?query\=new | jq ‘.[] | .title,.woeid’
To display the first item, use the following code:
“New York”
curl -sS https://www.metaweather.com/api/location/ 2459115
search/\?query\=new | jq ‘.[0]’ “New Delhi”
28743736
{ “New Orleans”
“title”: “New York”, 2458833
“location_type”: “City”, “Newcastle”
“woeid”: 2459115, 30079
“latt_long”: “40.71455,-74.007118” “Newark”
} 2459269
The output looks good, but what if we format the output The JSON structure for this endpoint looks like what’s
and print it on a single line? For that we can use string shown in Figure 3.
interpolation. To use keys inside a string pattern, we use Consolidated_weather contains an array of JSON objects
backslash and parentheses so that they are not executed. with weather information, and the sources key contains
an array of JSON objects from which particular weather
curl -sS https://www.metaweather.com/api/location/ information was fetched.
search/\?query\=new | jq ‘.[] | “For \(.title) code is \ This time, let’s store JSON in a file named weather.
(.woeid)”’ json instead of directly piping data. This will help us
{
“title”: “New Delhi”,
“location_type”: “City”,
“woeid”: 28743736,
“latt_long”: “28.643999,77.091003”
}
There are so many functions and filters but we will use Save the above code as filter.txt.
sort_by and date functions, and end this article by printing the sort_by sorts the value by data. format_date takes dates as
forecast for each day in ascending order. parameters and extracts short day names, dates and months.
print_location and print_data do not take any parameter,
# Format Date and can be applied after the pipe operator; and the default
# This function takes value via the Pipe (|) operator parameter for a parameterless function will be ‘.’
def format_date(x):
x |strptime(“%Y-%m-%d”) | mktime | strftime(“%a - %d, jq -f filter.txt weather.json -r
%B”);
-r will return a raw string. The output is shown in Figure 4.
def print_location: I hope this article has given you an overview of all that jq
. | “ can achieve. If you are looking for a tool that is easy to use in
Location: \(.title) shell scripts, jq can help you out; so give it a try.
Coordinates : \(.latt_long) “;
Developing Real-Time
Notification and Monitoring Apps
in IBM Bluemix
IBM Bluemix is a cloud PaaS that supports numerous programming languages
and services. It can be used to build, run, deploy and manage applications on the
cloud. This article guides the reader in building a weather app as well as an app to
remotely monitor vehicle drivers.
C
loud computing is one of the emerging research development tools without any complexities. In general, IBM
technologies today. With the wide use of sensor and Bluemix is used as a Platform as a Service (PaaS), as it has
wireless based technologies, the cloud has expanded many programming platforms for almost all applications. It
to the Cloud of Things (CoT), which is the merger of cloud provides programming platforms for PHP, Java, Go, Ruby,
computing and the Internet of Things (IoT). These technologies Node.js, ASP.NET, Tomcat, Swift, Ruby Sinatra, Python,
provide for the transmission and processing of huge amounts Scala, SQL databases, NoSQL platforms, and many others.
of information on different channels with different protocols. Function as a Service (FaaS) is also integrated in IBM
This integration of technologies is generating volumes of Bluemix along with serverless computing, leading to a higher
data, which then has to be disseminated for effective decision degree of performance and accuracy. IBM Bluemix’s services
making in multiple domains like business analytics, weather began in 2014 and gained popularity within three years.
forecasting, location maps, etc.
A number of cloud service providers deliver cloud based Creating real-time monitoring apps in IBM Bluemix
services and application programming interfaces (APIs) IBM Bluemix presents high performance cloud services
to users and developers. Cloud computing has different with the integration of the Internet of Things (IoT) so that
paradigms and delivery models including IaaS, PaaS, SaaS, real-time applications can be developed for corporate as
Communication as a Service (CaaS) and many others. A cloud well as personal use. Different types of applications can be
computing environment that uses a mix of cloud services is programmed for remote monitoring by using IBM and third
known as the hybrid cloud. There are many hybrid clouds, party services. In the following scenarios, the implementation
which differ on the basis of the types of services and features aspects of weather monitoring and vehicle driver behaviour
in their cloud environment. analysis are covered.
The prominent cloud service providers include IBM
Bluemix, Amazon Web Service (AWS), Red Hat OpenShift, Creating a weather notification app
Google Cloud Platform, etc. A weather notification app can be easily created with IBM
Bluemix so that real-time messages to the client can be
IBM Bluemix and cloud services delivered effectively. It can be used for real-life scenarios
IBM Bluemix is a powerful, multi-featured, hybrid cloud during trips and other purposes. Those planning to travel to
environment that delivers assorted cloud services, APIs and a particular place can get real-time weather notifications.
Figure 5: Editing Node-RED editor Figure 8: Selecting the driver behaviour service in IoT
I
n the previous issue of OSFY, we tackled pattern matching Though we are mostly interested in the use of regular
in PHP using regular expressions. PHP is most often used expressions, as always, let’s begin with a brief discussion on
as a server-side scripting language but what if your client the syntax and history of JavaScript.
doesn’t want to bother the server with all the work? Well, JavaScript is an interpreted programming language.
then you have to process regular expressions at the client side ECMAScript is a scripting language specification from the
with JavaScript, which is almost synonymous with client-side European Computer Manufacturer’s Association (ECMA)
scripting language. So, in this article, we’ll discuss regular and International Organization for Standardization (ISO),
expressions in JavaScript. standardised in ECMA-262 and ISO/IEC 16262 for
Though, technically, JavaScript is a general-purpose JavaScript. JavaScript was introduced by Netscape Navigator
programming language, it is often used as a client-side (now defunct) in 1995; soon Microsoft followed with its own
scripting language to create interactive Web pages. With version of JavaScript which was officially named JScript.
the help of JavaScript runtime environments like Node.js, The first edition of ECMAScript was released in June 1997
JavaScript can also be used at the server-side. However, in in an effort to settle the disputes between Netscape and
this article, we will discuss only the client-side scripting Microsoft regarding the standardisation of JavaScript. The
aspects of JavaScript because we have already discussed latest edition of ECMAScript, version 8, was released in June
regular expressions in the server-side scripting language— 2017. All modern Web browsers support JavaScript with the
PHP. Just like we found out about PHP in the previous article help of a JavaScript engine that is based on the ECMAScript
in this series, you will mostly see JavaScript code embedded specification. Chrome V8, often known as V8, is an open
inside HTML script. As mentioned earlier in the series, source JavaScript engine developed by Google for the
limited knowledge of HTML syntax will in no way affect the Chrome browser. Even though JavaScript has borrowed a lot
understanding of the regular expressions used in JavaScript. of syntax from Java, do remember that JavaScript is not Java.
node -v
node first.js
first.js
regular expressions in C++ you will come across some is the same as that of regex1.html. The next few lines of code
subtle differences between PCRE and the ECMAScript style involve an if-else block. The following line of code uses the
of regular expressions. JavaScript also uses ECMAScript search( ) method provided by the String object:
style regular expressions. JavaScript’s support for regular
expressions is built-in and is available for direct use. Since we if(str.search(pat) != -1)
have already dealt with the syntax of the ECMAScript style
of regular expressions, we can directly work with a simple The search( ) method takes a regular-expression pattern as
JavaScript file containing regular expressions. an argument, and returns either the position of the start of the
first matching substring or −1 if there is no match. If a match is
JavaScript with regular expressions found, the following line of code inside the if block prints the
Consider the script called regex1.html shown below. To save message ‘MATCH FOUND’ in bold:
some space I have only shown the JavaScript portion of
the script and not the HTML code. But the complete file is document.write(‘<b>MATCH FOUND</b>’);
available for download.
Otherwise, the following line of code inside the else block
<script> prints the message ‘NO MATCH’ in bold:
var str = “Working with JavaScript”;
var pat = /Java/; document.write(‘<b>NO MATCH</b>’);
if(str.search(pat) != -1) {
document.write(‘<b>MATCH FOUND</b>’); Remember the search( ) method searches for a substring
} else { match and not for a complete word. This is the reason why the
document.write(‘<b>NO MATCH</b>’); script reports ‘Match found’. If you are interested in a literal
} search for the word Java, then replace the line of code:
</script>
var pat = /Java/;
Open the file regex1.html in any Web browser and you
will see the message ‘Match Found’ displayed on the Web …with:
page in bold text. Well, this is an anomaly, since we did not
expect a match. So, now let us go through the JavaScript code var pat = /\sJava\s/;
in detail to find out what happened. The following line of code
stores a string in the variable str: The script with this modification regex3.html is also
available for downloading. The notation \s is used to denote
var str = “Working with JavaScript”; a whitespace; this pattern makes sure that the word Java is
present in the string and not just as a substring in words like
The line of code shown below creates a regular expression JavaScript, Javanese, etc. If you open the script regex3.html
pattern and stores it in the variable pat: in a Web browser, you will see the message ‘NO MATCH’
displayed on the Web page.
var pat = /Java/;
Methods for pattern matching
The regular expression patterns are specified as characters In the last section, we had seen the search( ) method provided
within a pair of forward slash ( / ) characters. Here, the regular by the String object. The String object also provides three other
expression pattern specifies the word Java. The RegExp object methods for regular expression processing. The methods are
is used to specify regular expression patterns in JavaScript. replace( ), match( ) and split( ). Consider the script regex4.html
This regular expression can also be defined with the RegExp( ) shown below which uses the method replace( ):
constructor using the following line of code:
<html>
var pat = new RegExp(“Java”); <body>
<form id=”f1”>
This is instead of the line of code: ENTER TEXT HERE: <input type=”text” name=”data” >
</form>
var pat = /Java/; <button onclick=”check( )”>CLICK</button>
<script>
A script called regex2.html with this modification is function check( ) {
available for download. The output for the script regex2.html var x = document.getElementById(“f1”);
Open the file regex4.html in a Web browser and you will Figure 3: Input to regex4.html
see a text box to enter data and a Submit button. If you enter
a string like ‘I am good’, you will see the output message ‘we
are good’ displayed on the Web page. Let us analyse the code
in detail to understand how it works. There is an HTML form
which contains the text box to enter data, with a button that,
when pressed, will call a JavaScript method called check( ).
The JavaScript code is placed inside the <script> tags. The
following line of code gets the elements in the HTML form:
Figure 4: Output of regex4.html
var x = document.getElementById(“f1”);
object for regular expression processing. Search( ) returns the
In this case, there is only one element in the HTML form, starting index of the matched substring, whereas the match( )
the text box. The following line of code reads the content of method returns the matched substring itself. What will happen
the text box to the variable text: if we replace the line of code:
The following line of code uses the replace( ) method to …in regex4.html with the following code?
test for a regular expression pattern and if a match is found,
the matched substring is replaced with the replacement string: text = text.match(/\d+/);
text = text.replace(/I am/i,”We are”); If you open the file regex5.html having this modification,
enter the string article part 5 in the text box and press the
In this case, the regular expression pattern is /I am/i and Submit button. You will see the number ‘5’ displayed on the
the replacement string is We are. If you observe carefully, you Web page. Here the regular expression pattern is /\d+/ which
will see that the regular expression pattern is followed by an matches for one or more occurrences of a decimal digit.
‘i’. Well, we came across similar constructs throughout the Another method provided by the String object for regular
series. This ‘i’ is an example of a regular expression flag, and expression processing is the split( ) method. This breaks the
this particular one instructs the regular expression engine to string on which it was called into an array of substrings, using
perform a case-insensitive match. So, you will get a match the regular expression pattern as the separator. For example,
whether you enter ‘I AM’, ‘i am’ or even ‘i aM’. replace the line of code:
There are other flags also like g, m, etc. The flag g will
result in a global match rather than stopping after the first text = text.replace(/I am/i,”We are”);
match. The flag m is used to enable the multi-line mode. Also
note the fact that the replace( ) method did not replace the …in regex4.html with the code:
contents of the variable text; instead, it returned the modified
string, which then was explicitly stored in the variable text. The text = text.split(“.”);
following line of code writes the contents on to the Web page:
…to obtain regex6.html.
document.write(text); If you open the file regex6.html, enter the IPv4 address
192.100.50.10 in dotted-decimal notation on the text box and
Figure 3 shows the input for the script regex4.html and press the Submit button. From then on, the IPv4 address will
Figure 4 shows the output. be displayed as ‘192, 100, 50, 10’. The IPv4 address string is
A method called match( ) is also provided by the String split into substrings based on the separator ‘.’ (dot).
String processing of regular expressions match both the spellings, ‘pretence’ and ‘pretense’. Here the
In previous articles in this series we mostly dealt with character class operator [ ] is used in the regular expression
regular expressions that processed numbers. For a change, pattern to match either the letter c or the letter s.
in this article, we will look at some regular expressions to I have only discussed specific solutions to the problems
process strings. Nowadays, computer science professionals mentioned here so as to make the regular expressions very
from India face difficulties in deciding whether to use simple. But with the help of complicated regular expressions
American English spelling or the British English spelling it is possible to solve many of these problems in a more
while preparing technical documents. I always get general way rather than solving individual cases. As
confused with colour/color, programme/program, centre/ mentioned earlier, C++ also uses ECMAScript style regular
center, pretence/pretense, etc. Let us look at a few simple expressions; so any regular expression pattern we have
techniques to handle situations like this. developed in the article on regular expressions in C++ can be
For example, the regular expression /colo(?:u)?r/ will match used in JavaScript without making any modifications.
both the spellings ‘color’ and ‘colour’. The question mark Just like the pattern followed in the previous articles in this
symbol ( ? ) is used to denote zero or one occurrence of the series, after a brief discussion on the specific programming
preceding group of characters. The notation (?:u) groups u with language, in this case, JavaScript, we moved on to the use of
the grouping operator ( ) and the notation ?: makes sure that the the regular expression syntax in that language. This should be
matched substring is not stored into a memory unnecessarily. enough for practitioners of JavaScript, who are willing to get
So, here a match is obtained with and without the letter u. their hands dirty by practising with more regular expressions.
What about the spellings ‘programme’ and ‘program’? In the next part of this series on regular expressions, we will
The regular expression /program(?:me)?/ will accept both discuss the very powerful programming language, Java, a
these spellings. The regular expression /cent(?:re|er)/ will distant cousin of JavaScript.
accept both the spellings, ‘center’ and ‘centre’. Here the pipe
symbol ( | ) is used as an alternation operator. By: Deepu Benson
What about words like ‘biscuit’ and ‘cookie’? In British
The author is a free software enthusiast and his area of
English the word ‘biscuit’ is preferred over the word ‘cookie’
interest is theoretical computer science. He maintains a
and the reverse is the case in American English. The regular technical blog at www.computingforbeginners.blogspot.in.
expression /(?:cookie|biscuit)/ will accept both the words — He can be reached at deepumb@hotmail.com.
‘cookie’ and ‘biscuit’. The regular expression /preten[cs]e/ will
Would You
Like More
DIY Circuits?
M
achine learning is a set of methods by which building a model from sample records. These models are
computers make decisions autonomously. Using used in developing decision trees, through which the system
certain techniques, computers make decisions by takes all the decisions. Machine learning programs are also
considering or detecting patterns in past records and then structured in such a way that when exposed to new data,
predicting future occurrences. Different types of predictions they learn and improve over time.
are possible, such as about weather conditions and house
prices. Apart from predictions, machines have learnt how Implementing machine learning
to recognise faces in photographs, and even filter out email Before we understand how machine learning is implemented
spam. Google, Yahoo, etc, use machine learning to detect in real life, let’s look at how machines are taught. The
spam emails. Machine learning is widely implemented across process of teaching machines is divided into three steps.
all types of industries. If programming is used to achieve 1 Data input: Text files, spreadsheets or SQL databases
automation, then we can say that machine learning is used to are fed as input to machines. This is called the training
automate the process of automation. data for a machine.
In traditional programming, we use data and programs 2 Data abstraction: Data is structured using algorithms
on computers to produce the output, whereas in machine to represent it in simpler and more logical formats.
learning, data and output is run on the computer to produce a Elementary learning is performed in this phase.
program. We can compare machine learning with farming or 3. Generalisation: An abstract of the data is used as
gardening, where seeds --> algorithms, nutrients --> data, and input to develop the insights. Practical application
the gardener and plants --> programs. happens at this stage.
We can say machine learning enables computers to learn The success of the machine depends on two things:
to perform tasks even though they have not been explicitly How well the generalisation of abstraction data happens.
programmed to do so. Machine learning systems crawl The accuracy of machines when translating their learning
through the data to find the patterns and when found, adjust into practical usage for predicting the future set of actions.
the program’s actions accordingly. With the help of pattern In this process, every stage helps to construct a better
recognition and computational learning theory, one can study version of the machine.
and develop algorithms (which can be built by learning Now let’s look at how we utilise the machine in real life.
from the sets of available data), on the basis of which the Before letting a machine perform any unsupervised task, the
computer takes decisions. These algorithms are driven by five steps listed below need to be followed.
PERSON
ISA
STUDENT TEACHER
Data
Program Practical application happens
Output here. It is used to generalize
the real-time data to derive
new insights
Figure 1: Traditional programming vs machine learning Figure 2: The process of teaching machines
Training
features
Text
vectors
Documents,
Images, Machine
Sounds... Learning
Algorithm
New
Likelihood
Text features or
Document, vector
Model Cluster Id
Image,
or
Sound...
Better
representation
E
ver wondered why different commercial software instance of Microsoft Word and Google Docs. The former is
applications such as eBay, Amazon or various social a common desktop based word-processing application which
platforms like Facebook, Twitter, etc, were initially uses the MS Word software installed on the desktop. Google
developed as Web applications? The obvious answer is that Docs is also a word processing application, but all its users
users can easily access or use different Web applications perform the word processing functions using the Web browser
whenever they feel like, with only the Internet. This is what on which it runs, instead of using the software installed on
helps different online retail applications lure their customers their computers.
to their products. There is no need to install Web applications Different Web applications use Web documents, which
specifically on a given system and the user need not even are written in a standard format such as JavaScript and
worry about the platform dependency associated with that HTML. All these formats are supported by a number of Web
application. Apart from these, there are many other factors browsers. Web applications can actually be considered as
that make Web applications very user friendly, which we will variants of the client-server software model, where the client
discuss as we go along. software is downloaded to the client system when the relevant
A Web application is any client-server software Web page is visited, using different standard procedures such
application that makes use of a website as its interface or as HTTP. There can be different client Web software updates
front-end. The user interface for any Web application runs which may take place whenever we visit the Web page.
in an Internet browser. The main function of any browser is During any session, the Web browser interprets and then
to display the information received from a server and also displays the pages and hence acts as the universal client for
to send the user’s data back to the server. Let’s consider the any Web application.
KompoZer Features
This is an open source HTML editor, which is based on the 1. Has an intuitive Web interface.
Nvu editor. It’s maintained as a community-driven software 2. Imports data from SQL and CSV.
development project, and is a project on Sourceforge. 3. Can administer multiple servers.
KompoZer’s WYSIWYG (what you see is what you get) 4. Has the ability to search globally in a database or a
editing capabilities are among its main attractions. The latest subset of it.
of its pre-release versions is KompoZer 0.8 beta 3. Its stable 5. Can create complex queries using QBE (Query-by-
version was KompoZer 0.7.10, released in August 2007. It example).
complies with the Web standards of W3C. By default, the 6. Can create graphics of our database layout in various
Web pages are created in accordance with HTML 4.01 Strict. formats.
It uses Cascading Style Sheets (CSS) for styling purposes, but 7. Can export data to various formats like SQL,
the user can even change the settings and choose between the CSV, PDF, etc.
following styling options: 8. Supports most of the MySQL features such as tables,
HTML 4.01 and XHTML 1.0 views, indexes, etc.
Strict and transitional DTD
CSS styling and the old <font> based styling. XAMPP
KompoZer can actually call on the W3C HTML validator, In XAMPP, the X denotes ‘cross-platform’, A stands for the
which uploads different Web pages to the W3C Markup Apache HTTP server, M for MySQL, and the two Ps for
Validation Service and then checks for compliance. PHP and Perl. This platform is very popular, and is widely
preferred for open source Web application development. The
Features development of any Web application using XAMPP helps to
1. Available free of cost. easily stack together a different number of programs in order
2. Easy to use, hence even non-techies can work with it. to constitute an application as desired. The best part of Web
3. Combines Web file management and easy-to-use applications developed using XAMPP is that these are open
WYSIWYG Web page editing. source with no licensing required, and are free to use. They
4. Allows direct code editing. can be customised according to one’s requirement. Although
5. Supports Split code graphic view. XAMPP can be installed on all the different platforms, its
installation file is specific to a platform.
phpMyAdmin
phpMyAdmin is an open source software tool written in Features
PHP. It handles the administration of MySQL over the Web. 1. Can be installed on all operating systems.
References
[1] http://www.wikipedia.org/
[2] http://www.magicwebsolutions.co.uk/
[3] https://www.rswebsols.com/
[4] http://binarysemantics.com/
P
latform as a Service or PaaS is a cloud computing mechanisms for service management, such as monitoring,
service model that reduces the complexity of building workflow management, discovery and reservation.
and maintaining the computing infrastructure. It gives There are some disadvantages of using PaaS. Every user
an easy and accessible environment to create, run and deploy may not have access to the full range of tools or to the high-
applications, saving developers all the chaotic work such as end tools like the relational database. Another problem is that
setting up, configuring and managing resources like servers PaaS is open only for certain platforms. Users need to depend
and databases. It speeds up app development, allowing users on the cloud service providers to update the tools and to stay
to focus on the application itself rather than worry about the in sync with other changes in the platform. They don’t have
infrastructure and runtime environment. control over this aspect.
Initially, PaaS was available only on the public cloud.
Later, private and hybrid PaaS options were created. Hybrid OpenShift
PaaS is typically a deployment consisting of a mix of public OpenShift is an example of a PaaS and is offered by Red
and private deployments. PaaS services available in the cloud Hat. It provides an API to manage its services. OpenShift
can be integrated with resources available on the premises. Origin allows you to create and manage containers.
PaaS offerings can also include facilities for application OpenShift helps you to develop, deploy and manage
design, application development, testing and deployment. applications which are container-based, and enables
PaaS services may include team collaboration, Web service faster development and release life cycles. Containers are
integration, marshalling, database integration, security, standalone processes, with their own environment, and are
scalability, storage, persistence, state management, application not dependent on the operating system or the underlying
versioning and developer community facilitation, as well as infrastructure on which they run.
References
[1] https://www.openshift.com/
[2] https://www.openshift.org/
Figure 5: OpenShift Online – sign in to GitHub
Click on Login and log in by using any social media account. By: Bhagyashri Jain and Mitesh S.
Sign in to GitHub. Bhagyashri Jain is a systems engineer and loves Android
Click on Authorize redhat-developer. development. She likes to read and share daily news on her blog
Provide your account details. Then verify the email at http://bjlittlethings.wordpress.com.
address using your email account. Mitesh S. is the author of the book, ‘Implementing DevOps with
Next, select a starter plan, followed by the region you Microsoft Azure’. He occasionally contributes to https://clean-
want. Then confirm subscription. clouds.com and https://etutorialsworld.com. Book link: https://
www.amazon.com/DevOps-Microsoft-Visual-Studio-Services-
Now your account will be provisioned.
ebook/dp/B01MSQWO4w.
On the OpenShift online dashboard, click on Create
C
loud Foundry is an open source, Platform as a Service Why opt for a PaaS offering like Cloud Foundry?
(PaaS) offering, governed by the Cloud Foundry Choosing a PaaS offering has multiple benefits. It
Foundation. You can deploy it on AWS, Azure, GCP, abstracts away the hardware and infrastructure details
vSphere or your own computing infrastructure. so your workforce can concentrate more on application
development, and you require very few operations to be
The different landscapes for applications managed by the IT team. This leads to faster turnaround
Let’s take a step back and quickly check out all the landscapes times for your applications and better cost optimisation. It
for applications. If you want an application to cater to one of also helps in rapid prototyping as you have the platform
your needs, there are several ways of getting it to do so. taken care of; so you can build prototypes around your
1. Traditional IT: In a traditional landscape, you can procure business problems more rapidly.
your infrastructure, manage all the servers, handle the data
and build applications on top of it. This gives you the most Cloud Foundry: A brief description
control, but also adds operational complexity and cost. Cloud Foundry is multi-cloud, open source software that can
2. Infrastructure as a Service (IaaS): In this case, you can be hosted on AWS, Azure, GCP or your own stack. Since
buy or lease the infrastructure from a service provider, Cloud Foundry is open source, you get application portability
install your own operating system, programming runtimes, out-of-the-box, i.e., you are not locked in to a vendor. You
databases, etc, and build your custom applications on top can build your apps on it and move them across any of the
of it. Examples include AWS, Azure, etc. Cloud Foundry providers.
3. Platform as a Service (PaaS): With this, you get a The Cloud Foundry project is managed by the Cloud
complete platform from the service provider, with the Foundry Foundation, whose mission is to establish and
hardware, operating system, and runtimes managed by the sustain the development of the platform and to provide
service provider --you can build applications on top of it. continuous innovation and value to the users and operators
Examples include Cloud Foundry, OpenShift, etc. of Cloud Foundry. The members of the Cloud Foundry
4. Software as a Service (SaaS): Here, the service provider has Foundation include Cisco, Dell, EMC, GE, Google, IBM,
already a pre-built application running on the cloud —if it suits Microsoft, Pivotal, SAP, SUSE and VMware.
your needs, you just get a subscription and use it. There might From a developer’s point of view, Cloud Foundry has
be a pre-existing application to meet your needs but if there support for buildpacks and services. Buildpacks provide
isn’t, this offering provides very little room for customisation. the framework and runtime support for apps. Typically,
Examples include Gmail, Salesforce, etc. they examine your apps to determine what dependencies
Router Routing
App Storage
Application Execution
BLOS Store and Execution
Framework
Namespaces
Typically, a wiki may contain lots of pages. It is important to
organise the pages so that the information the user seeks can
be found easily. Namespaces serve this purpose by keeping all
the relevant pages in one place. The following namespaces are
bundled along with the standard DokuWiki installation:
wiki
playground
It is recommended that if you want to try anything
before getting into a live production environment, use the
‘playground’ namespace.
D
To create a namespace, use the following syntax:
okuWiki is PHP powered, modest but versatile
wiki software that handles all the data in plain text namespace_name: page_name
format, so no database is required. It has a clear
and understandable structure for your data, which allows If the defined namespace doesn’t exist, DokuWiki
you to manage your wiki without hassles. DokuWiki is automatically creates it without any break in linkage with the
really flexible and offers various customisation options at rest of the wiki.
different levels. Since DokuWiki is open source, it has a To delete a namespace, simply erase all of its pages,
huge collection of plugins and templates which extend its which leads to empty namespaces; DokuWiki automatically
functionalities. It is also well documented and supported by deletes these.
a vast community.
Although DokuWiki has numerous features, this article Links
focuses only on the basics, in order to get readers started. The linkage between the pages is vital for any wiki site.
This ‘linkability’ keeps the information organised and
Pages easily accessible. By the effective use of links, the pages
Pages can be easily created in DokuWiki by simply launching are organised in a concise manner. DokuWiki supports the
it on your browser. The first time, you will be shown a page following types of links in a page.
like the one in Figure 1. External links: These links deal with the external
Find the pencil icon on the right side; clicking on it resources, i.e., websites. You can use a complete URL for
will open up an editor and that’s it. You can start writing a website such as www.duckduckgo.com or you can add
content on that page. The various sections of the page can an alternative text for that website like [[https://www.
be identified by the headings provided on it. The sections duckduckgo.com | search with DuckDuckGo]]. Also, we can
are listed out as a ‘Table of Contents’ on the top right link an email ID by enclosing it with the angled brackets, for
side of the page. example <admin@localhost.com>.
User 1:
Name: Stella Ritchie, a non-registered user
User group: @ALL
For this user, in the rule table, the first
rule and the third one matches, but the third
rule matches on the namespace level since
her access to the washington_team is None.
D
eep learning is a new area of machine learning The first neural nets were born out of the need to address
research, which has been introduced with the objective the inaccuracy of an early classifier, the perceptron. It was
of moving machine learning closer to one of its shown that by using a layered web of perceptrons, the
original goals—artificial intelligence (AI). Deep learning accuracy of predictions could be improved. This new breed of
is the sub-field of machine learning that is concerned with neural nets was called a multi-layer perceptron or MLP.
algorithms. Its structure and function is inspired by that You may have guessed that the prediction accuracy of a
part of the human brain called neural networks. It is the neural net depends on its weights and biases. We want the
work of well-known researchers like Andrew Ng, Geoff accuracy to be high, i.e., we want the neural net to predict a
Hinton, Yann LeCun, Yoshua Bengio and Andrej Karpathy value that is as close to the actual output as possible, every
which has brought deep learning into the spotlight. If you single time. The process of improving a neural net’s accuracy
follow the latest tech news, you may have even heard is called training, just like with other machine learning
about how important deep learning has become among big methods. Here’s that forward prop again – to train the net, the
companies such as: output from forward prop is compared to the output that is
Google buying DeepMind for US$ 400 million known to be correct, and the cost is the difference of the two.
Apple and its self-driving car The point of training is to make that cost as small as possible,
NVIDIA and its GPUs across millions of training examples. Once trained well, a
Toyota’s billion dollar AI research investments neural net has the potential to make accurate predictions each
All of this tells us that deep learning is really time. This is a neural net in a nutshell (refer to Figure 1).
gaining in importance.
Three reasons to consider deep learning
Neural networks When the patterns get really complex, neural nets start to
The first thing you need to know is that deep learning is about outperform all of their competition. Neural nets truly have the
neural networks. The structure of a neural network is like any potential to revolutionise the field of artificial intelligence.
other kind of network; there is an interconnected Web of nodes, We all know that computers are very good with repetitive
which are called neurons, and there are edges that join them calculations and detailed instructions, but they’ve historically
together. A neural network’s main function is to receive a set been bad at recognising patterns. Thanks to deep learning,
of inputs, perform progressively complex calculations, and this is all about to change. If you only need to analyse
then use the output to solve a problem. This series of events, simple patterns, a basic classification tool like an SVM or
starting from the input, where each activation is sent to the next logistic regression is typically good enough. But when your
layer and then the next, all the way to the output, is known as data has tens of different inputs or more, neural nets start to
forward propagation, or forward prop. win out over the other methods.
Simple Neural Network Deep Learning Neural Network net, a platform is the best way to go. We’ll also look at
two machine learning software platforms called H2O, and
GraphLab Create, both of which offer deep learning tools.
H2O: This started out as an open source machine learning
platform, with deep nets being a recent addition. Besides a set
of machine learning algorithms, the platform offers several
useful features, such as data pre-processing. H2O has built-
Input Layer Hidden Layer Output Layer
in integration tools for platforms like HDFS, Amazon S3,
Figure 1: Deep learning and neural networks SQL and NoSQL. It also provides a familiar programming
environment like R, Python, JSON, and several others to
Still, as the patterns get even more complex, neural access the tools, as well as to model or analyse data with
networks with a small number of layers can become unusable. Tableau, Microsoft Excel, and R Studio. H2O also provides
The reason is that the number of nodes necessary in each a set of downloadable software packages, which you’ll need
layer grows exponentially with the number of possible to deploy and manage on your own hardware infrastructure.
patterns in the data. Eventually, training becomes way too H2O offers a lot of interesting features, but the website can be
expensive and the accuracy starts to suffer. So for an intricate a bit confusing to navigate.
pattern – like an image of a human face, for example – basic GraphLab: The deep learning project requires graph
classification engines and shallow neural nets simply aren’t analytics and other vital algorithms, and hence Dato’s
good enough. The only practical choice is a deep net. GraphLab Create can be a good choice. GraphLab is a
But what enables a deep net to distinguish these complex software platform that offers two different types of deep
patterns? The key is that deep nets are able to break the nets depending on the nature of your input data – one is a
multifaceted patterns down into a series of simpler patterns. convolutional net and the other is a multi-layer perceptron.
For example, let’s say that a net has to decide whether or The convolutional net is the default one. It also provides
not an image contains a human face. A deep net would first graph analytics tools, which is unique among deep net
use edges to detect different parts of the face – the lips, platforms. Just like the H2O platform, GraphLab provides
nose, eyes, ears, and so on – and would then combine the a great set of data mugging features. It provides built-in
results together to form the whole face. This important integration for SQL databases, Hadoop, Spark, Amazon S3,
feature – using simpler patterns as building blocks to Pandas data frames, and many others. GraphLab also offers
detect complex patterns – is what gives deep nets their an intuitive UI for model management. A deep net platform
strength. These nets have now become very accurate and, can be selected based on your project.
in fact, a deep net from Google recently beat a human at a
pattern recognition challenge. Deep learning is gaining popularity
Deep learning is a topic that is making big waves at the
What is a deep net platform? moment. It is basically a branch of machine learning that
A platform is a set of tools that other people can build on top uses algorithms to, among other things, recognise objects and
of. For example, think of the applications that can be built off understand human speech. Scientists have used deep learning
the tools provided by iOS, Android, Windows, MacOS, IBM algorithms with multiple processing layers to make better
Websphere and even Oracle BEA. Deep learning platforms models from large quantities of unlabelled data (such as photos
come in two different forms – software platforms and full with no descriptions, voice recordings or videos on YouTube).
platforms. A deep learning platform provides a set of tools The three main reasons why deep learning is gaining
and an interface for building custom deep nets. Typically, it popularity are accuracy, efficiency and flexibility. Deep
provides a user with a selection of deep nets to choose from, learning automatically extracts features by which to classify
along with the ability to integrate data from different sources, data, as opposed to most traditional machine learning
manipulate data, and manage models through a UI. Some algorithms, which require intense time and effort on the part
platforms also help with performance if a net needs to be of data scientists. The features that it manages to extract are
trained with a large data set. more complex, because of the feature hierarchy possible in a
There are some advantages and disadvantages of using a deep net. They are also more flexible and less brittle, because
platform rather than using a software library. A platform is an the net is able to continue to learn on unsupervised data.
out-of-the-box application that lets you configure a deep net’s
hyper-parameters through an intuitive UI. With a platform,
By: Neetesh Mehrotra
you don’t need to know anything about coding in order to
use the tools. The downside is that you are constrained by the The author works at TCS as a systems engineer. His areas of
interest are Java development and automation testing. You can
platform’s selection of deep nets as well as the configuration
contact him at mehrotra.neetesh@gmail.com.
options. But for anyone looking to quickly deploy a deep
What Can
BIG DATA
Do For You?
In today’s world, there is a proliferation of data. So much so that the one who
controls data today, holds the key to wealth creation. Let’s take a long look at what
Big Data means and what it can do for us.
B
ig Data has undoubtedly gained much attention within 2010. As a result of the tech advances, all these millions
academia and the IT industry. In the current digital of people are actually generating tremendous amounts of
and computing world, information is generated and data through the increased use of smart devices. Remote
collected at an alarming rate that is rapidly exceeding storage sensors, in particular, continuously produce an even greater
capabilities. About 4 billion people across the globe are volume of heterogeneous data that can be either structured or
connected to the Internet, and over 5 billion individuals own unstructured. All such data is referred to as Big Data.
mobile phones, out of which more than 3.39 billion users use We all know that this high volume of data is shared
the mobile Internet. Several social networking platforms like and transferred at great speed on different optical fibres.
WhatsApp, Facebook, Instagram, Twitter, etc, have a big hand However, the fast growth rate of such huge data volumes
in the indiscriminate increase in the production of data. Apart generates challenges in the following areas:
from the social media giants, there is a large amount of data In searching, sharing and transferring data
being generated by different devices such as sensors, actuators, Analysis and capturing of data
etc, which are used as part of the IoT and in robots as well. Data curation
By 2020, it is expected that more than 50 billion devices Storing, updating and querying data
will be connected to the Internet. At this juncture, predicted Information privacy
data production will be almost 44 times greater than that in Big Data is broadly identified by three aspects:
DRIVING
TECHNOLOGY,
INNOVATION &
INVESTMENTS
Colocated
shows
www.IndiaElectronicsWeek.com
Colocated
shows
The themes
• Profit from IoT • Rapid prototyping and production
• Table top manufacturing • LEDs and LED lighting
To get more details on how exhibiting at IEW 2018 can help you achieve your sales and marketing goals,
www.IndiaElectronicsWeek.com
Insight For U & Me
Facebook
Business systems Blogs
Social Twitter
Transactions Media
Unstructured Sensor
data data
Tips you can use daily on a Linux The command takes you to the last working directory.
computer This helps in directly moving to the last working directory
1. Shortcut for opening a terminal in Ubuntu from the current one, instead of remembering and typing the
To open a terminal in Ubuntu, press the Ctrl+Alt+t keys. whole last working directory path.
This creates a new terminal.
—Abhinay Badam, ohyesabhi2393@gmail.com
2. Running the previous command with ‘sudo’ in the terminal
In case you have forgotten to run a command with ‘sudo’, Execute parallel ssh on multiple hosts
you need not re-write the whole command. Just type Here are the steps to do a parallel ssh on multiple
‘sudo!!’ and the last command will run with sudo. hosts. We are going to use pssh, which is a program for
executing ssh in parallel on a number of hosts. It provides
3. How to change a file permission features such as sending inputs to all the processes, passing
An admin can change the file permissions by executing a password to ssh, saving the output to files, and timing out.
chmod u+<permission> filename on the terminal where… You can access the complete manual of pssh at https://linux.
die.net/man/1/pssh.
<permission> can be r(read), w(write), x(execute) First off, let us look at how to install it on a CentOS 7
system:
The admin can change permissions on the file that are
given by other users, by executing the above command, # yum install epel-release
and replacing ‘u’ with ‘g’ for group access and ‘u’ with ‘o’
for others. Now install pssh, as follows:
Moving between the current and last Create pssh_hosts.txt file and enter the hosts you need
working directories easily to target:
Everyone knows that typing ‘cd’ in the terminal in Ubuntu
takes the user to the home directory. However, if you want # cat pssh_hosts.txt
to go to the last working directory, instead of entering the # write hosts per line like follows
following: #user@target_ip
root@192.168.100.100
$cd <directory path>
We should create a key-pair between the master host and
…directly type the command shown below in the targets -- this is the only way to get things done. Simply log
terminal: in the target from the master node for host key verification:
# cat pssh/tst.sh Now you’ll know what the command does, and won’t
#!/bin/bash have to open your browser and search.
touch /root/CX && echo “File created”
—Siddharth Dushantha, siddharth.dushantha@gmail.com
# pssh -h ./pssh_hosts.txt -A -O PreferredAuthentications=passw
ord -I<./tst.sh Know how many times a user has logged in
One way to find out the number of times users have
Now let us make it simple: logged into a multi-user Linux system is to execute the
following command:
# pssh -H ‘192.168.100.101’ -l ‘root’ -A -O PreferredAuthentica
tions=password -I< ./tst.sh $last | grep pts | awk ‘{print $1}’ | sort | uniq -c
The output is: The above command provides the list of users who
recently logged into the system. The grep utility is used to
[root@master pssh]# pssh -H ‘192.168.100.101’ -l ‘root’ -A -O remove the unnecessary information, the result of which is
PreferredAuthentications=password -I< ./tst.sh then sent to awk using the shell pipe. awk, which is used for
Warning: Do not enter your password if anyone else has superuser processing text based data, extracts only the user names from
privileges or access to your account. the text. This list of extracted names is now sorted by passing
Password: the list of names to the sort command, through a shell pipe.
[1] 16:24:30 [SUCCESS] 192.168.100.101 The sorted list of names is then piped to the uniq command,
which filters adjacent matching lines, and the matching lines are
To execute commands without password prompting, we merged to the first occurrence. The -c option of the uniq command,
need to create a key-pair between the servers. Let us look at which displays the number of times a line is repeated, gives you
how to do that. the number of logins of each user along with the user’s name.
We are trying to attempt to log in to serverB from —Sathyanarayanan S., ssathyanarayanan@sssihl.edu.in
serverA.
Create SSH-Kegen keys on serverA, as follows:
Share Your Open Source Recipes!
# ssh-keygen -t rsa The joy of using open source software is in finding ways to get
around problems—take them head on, defeat them! We invite
you to share your tips and tricks with us for publication in
Copy the id_rsa.pub file from serverA to master OSFY so that they can reach a wider audience. Your tips could
serverB: be related to administration, programming, troubleshooting or
general tweaking. Submit them at www.opensourceforu.com.
The sender of each published tip will get a T-shirt.
# ssh-copy-id root@<serverB-ip>
Solus 3 Gnome
Solus is an operating system that is designed for
home computing. It ships with a variety of software
out-of-the-box, so you can set it up without too
much fuss. It comes with the latest version of the
free LibreOffice suite, which allows you to work on
your documents, spreadsheets and presentations right
away. It has many useful tools for content creators.
Whether animating in Synfig Studio, producing
music with MuseScore or Mixxx, trying out graphics
design with GIMP or Inkscape, or editing video with
Avidemux, Kdenlive and Shotcut, Solus provides
software to help express your creativity.
Loonycorn
is hiring
Interested?
Us: