Beruflich Dokumente
Kultur Dokumente
Amazon Web Services (AWS) describes both a technology and a company. The company AWS is
a subsidiary of Amazon.com and provides on-demand cloud computing platforms to both individuals,
companies and governments, on a paid subscription basis with a free-tier option available for 12
months. The technology allows subscribers to have at their disposal a full-fledged virtual cluster of
computers, available 24/7/365, through the internet. AWS' version of virtual computers have most of
the attributes of a real computer including hardware (CPU(s) & GPU(s) for processing, local/RAM
memory, hard-disk/SSD storage); a choice of operating systems; networking; and pre-loaded
application software such as web servers, databases, CRM... nearly anything. Each AWS system
also virtualizes its console I/O (keyboard, display, and mouse), allowing AWS subscribers to connect
to their AWS system using a modern browser. The browser acts as a window into the virtual
computer, letting subscribers log-in, configure and use their virtual systems just as they would a real
physical computer. They can choose to deploy their AWS systems to provide internet-based
services for their own and their customers' benefit.
The AWS technology is implemented at server farms throughout the world, and maintained by the
Amazon subsidiary. Fees are based on a combination of usage, the
hardware/OS/software/networking features chosen by the subscriber,
required availability, redundancy, security, and service options. Based on what the subscriber needs
and pays for, they can reserve a single virtual AWS computer, a cluster of virtual computers, a
physical (real) computer dedicated for their exclusive use, or even a cluster of dedicated physical
computers. As part of the subscription agreement,[3] Amazon manages, upgrades, and provides
industry-standard security to each subscriber's system. AWS services operate from many global
geographical regions including 6 in North America.[4]
In 2016, AWS comprised more than 70 services spanning a wide range
including compute, storage, networking, database, analytics, [[Application service
provider|application services], deployment, management, mobile, developer tools, and tools for
the Internet of Things. The most popular include Amazon Elastic Compute Cloud (EC2) and Amazon
Simple Storage Service (S3). Most services are not exposed directly to end users, but instead offer
functionality through APIs for developers to use in their applications. Amazon Web Services
offerings are accessed over HTTP, using the REST architectural style and SOAP protocol.
Amazon markets AWS to subscribers as a way of obtaining large scale computing capacity more
quickly and cheaply than building an actual physical server farm.[5] All services are billed based on
usage, but each service measures usage in varying ways.
Contents
[hide]
Map showing Amazon Web Services' availability zones within geographic regions around the world.
Community Energy of Virginia, a solar farm coming online in 2016, to support the US East
region.[12]
Pattern Development, in January 2015, to construct and operate Amazon Wind Farm Fowler
Ridge.
Iberdrola Renewables, LLC, in July 2015, to construct and operate Amazon Wind Farm US East.
EDP Renewables, in November 2015, to construct and operate Amazon Wind Farm US
Central.[13]
Tesla Motors, to apply battery storage technology to address power needs in the US West
(Northern California) region.[12]
Region and region names table [14][edit]
Region Name Region
EU (Frankfurt) eu-central-1
EU (Ireland) eu-west-1
EU (London) eu-west-2
History[edit]
Further information: Timeline of Amazon Web Services
AWS Summit 2013 event in NYC.
The AWS platform was launched in July 2002 to "expose technology and product data from Amazon
and its affiliates, enabling developers to build innovative and entrepreneurial applications on their
own."[2] In the beginning, the platform consisted of only a few disparate tools and services. Then in
late 2003, the AWS concept was publicly reformulated when Chris Pinkham and Benjamin Black
presented a paper describing a vision for Amazon's retail computing infrastructure that was
completely standardized, completely automated, and would rely extensively on web services for
services such as storage and would draw on internal work already underway. Near the end of their
paper, they mentioned the possibility of selling access to virtual servers as a service, proposing the
company could generate revenue from the new infrastructure investment.[15] In November 2004, the
first AWS service launched for public usage: Simple Queue Service (SQS).[16]Thereafter Pinkham
and lead developer Christoper Brown developed the Amazon EC2 service, with a team in Cape
Town, South Africa.[17]
Amazon Web Services was officially re-launched on March 14, 2006,[2] combining the three initial
service offerings of Amazon S3 cloud storage, SQS, and EC2. The AWS platform finally provided an
integrated suite of core online services, as Chris Pinkham and Benjamin Black had proposed back in
2003,[15] as a service offered to other developers, web sites, client-side applications, and
companies.[1] Andy Jassy, AWS founder and vice president in 2006, said at the time that Amazon S3
(one of the first and most scalable elements of AWS) "helps free developers from worrying about
where they are going to store data, whether it will be safe and secure, if it will be available when they
need it, the costs associated with server maintenance, or whether they have enough storage
available. Amazon S3 enables developers to focus on innovating with data, rather than figuring out
how to store it."[2] His quote marks a milestone in the Internet's history, when massive managed
resources became available to developers worldwide, allowing them to offer new scalable web-
enabled technologies. In 2016 Jassy was promoted to CEO of the division.[18] Reflecting the success
of AWS, his annual compensation in 2017 hit nearly $36 million.[19]
To support industry-wide training and skills standardization, AWS began offering a certification
program for computer engineers, on April 30, 2013, to highlight expertise in cloud computing.[20]
James Hamilton, an AWS engineer, wrote a retrospective article in 2016 to highlight the ten-year
history of the online service from 2006 to 2016. As an early fan and outspoken proponent of the
technology, he had joined the AWS engineering team in 2008.[21]
Customer base[edit]
AWS adoption has increased since launch in 2002.
On March 14, 2006, Amazon said in a press release:[2] "More than 150,000 developers have
signed up to use Amazon Web Services since its inception."
In June 2007, Amazon claimed that more than 180,000 developers had signed up to use
Amazon Web Services.[31]
In November 2012, AWS hosted its first customer event in Las Vegas.[32]
On May 13, 2013, AWS was awarded an Agency Authority to Operate (ATO) from the U.S.
Department of Health and Human Services under the Federal Risk and Authorization
Management Program.[33]
In October 2013, it was revealed that AWS was awarded a $600M contract with the CIA.[34]
During August 2014, AWS received Department of Defense-Wide provisional authorization for
all U.S. Regions.[35]
During the 2015 re:Invent keynote, AWS disclosed that they have more than a million active
customers every month in 190 countries, including nearly 2,000 government agencies, 5,000
education institutions and more than 17,500 nonprofits.
On April 5 2017, AWS and DXC Technology (formed from a merger of CSC and HPE)
announced an expanded alliance to increase access of AWS features for enterprise clients in
existing data centers.[36]
Notable customers include NASA,[37] the Obama presidential campaign of 2012,[38] Kempinski
Hotels,[39] and Netflix.[40]
On October 22, 2012, a major outage occurred, affecting many sites such
as Reddit, Foursquare, Pinterest, and others. The cause was a memory leak bug in an
operational data collection agent.[43]
On December 24, 2012, AWS suffered another outage causing websites such as Netflix to be
unavailable for customers in the Northeastern United States.[44] AWS cited their Elastic Load
Balancing (ELB) service as the cause.[45]
On February 28, 2017, AWS experienced a massive outage of S3 services in its Northern
Virginia data center. A majority of websites which relied on AWS S3 either hung or stalled, and
Amazon reported within five hours that AWS was fully online again.[46] No data has been reported
to have been lost due to the outage. The outage was caused by a human error made
while debugging, that resulted in removing more server capacity than intended, which caused a
domino effect of outages.[47]
List of products[edit]
This section contains content that is written like an
advertisement. Please help improve it by removing promotional
content and inappropriate external links, and by adding encyclopedic
content written from a neutral point of view. (September 2016) (Learn how and
when to remove this template message)
Compute[edit]
Amazon Elastic Compute Cloud (EC2) is an IaaS service providing virtual servers controllable by
an API, based on the Xen hypervisor. Equivalent remote services include Microsoft
Azure, Google Compute Engine and Rackspace; and on-premises equivalents such
as OpenStack or Eucalyptus.
Amazon Elastic Beanstalk provides a PaaS service for hosting applications, equivalent services
include Google App Engine or Heroku or OpenShift for on-premises use.
Amazon Lambda (AWS Lambda) runs code in response to AWS internal or external events such
as http requests, transparently providing the resource required.[48] Lambda is tightly integrated
with AWS but similar services such as Google Cloud Functions and open solutions such
as OpenWhisk are becoming competitors.
Networking[edit]
Amazon Route 53 provides a scalable Managed DNS service providing Domain Name Services.
Amazon Virtual Private Cloud (VPC) creates a logically isolated set of AWS resources which can
be connected using a VPN connection. This competes against on-premises solutions such
as OpenStack or HPE Helion Eucalyptus used in conjunction with PaaS software.
AWS Direct Connect provides dedicated network connections into AWS data centers.
Amazon Elastic Load Balancing (ELB) automatically distributes incoming traffic across
multiple Amazon EC2 instances.
AWS Elastic Network Adapter (ENA) provides up to 20Gbit/s of network bandwidth to
an Amazon EC2 instance.[49]
Content delivery[edit]
Amazon CloudFront, a content delivery network (CDN) for distributing objects to so-called "edge
locations" near the requester.
Storage and content delivery[edit]
This section needs additional citations for verification. Please
help improve this article by adding citations to reliable sources.
Unsourced material may be challenged and removed. (September
2016) (Learn how and when to remove this template message)
Amazon Simple Storage Service (S3) provides scalable object storage accessible from a Web
Service interface. Applicable use cases include backup/archiving, file (including media) storage
and hosting, static website hosting, application data hosting, and more.
Amazon Glacier provides long-term storage options (compared to S3). High redundancy and
availability, but low-frequent access times. Intended for archiving data.
AWS Storage Gateway, an iSCSI block storage virtual appliance with cloud-based backup.
Amazon Elastic Block Store (EBS) provides persistent block-level storage volumes for EC2.
AWS Import/Export, accelerates moving large amounts of data into and out of AWS using
portable storage devices for transport.
Amazon Elastic File System (EFS) a file storage service for Amazon Elastic Compute Cloud
(Amazon EC2) instances.
Database[edit]
Amazon DynamoDB provides a scalable, low-latency NoSQL online Database Service backed
by SSDs.
Amazon ElastiCache provides in-memory caching for web applications.[50] This is Amazon's
implementation of Memcached and Redis.[51]
Amazon Relational Database Service (RDS) provides scalable database servers
with MySQL, Oracle, SQL Server, and PostgreSQL support.[52]
Amazon Redshift provides petabyte-scale data warehousing with column-based storage and
multi-node compute.
Amazon SimpleDB allows developers to run queries on structured data. It operates in concert
with EC2 and S3.
AWS Data Pipeline provides reliable service for data transfer between different AWS compute
and storage services (e.g., Amazon S3, Amazon RDS, Amazon DynamoDB, Amazon EMR). In
other words, this service is simply a data-driven workload management system, which provides
a management API for managing and monitoring of data-driven workloads in cloud
applications.[53]
Amazon Aurora provides a MySQL-compatible relational database engine that has been created
specifically for the AWS infrastructure that claims faster speeds and lower costs that are realized
in larger databases.
Deployment[edit]
This section needs additional citations for verification. Please
help improve this article by adding citations to reliable sources.
Unsourced material may be challenged and removed. (September
2016) (Learn how and when to remove this template message)
Amazon Identity and Access Management (IAM) is an implicit service, providing the
authentication infrastructure used to authenticate access to the various services.
AWS Directory Service a managed service that allows connection to AWS resources with an
existing on-premises Microsoft Active Directory or to set up a new, stand-alone directory in the
AWS Cloud.
Amazon CloudWatch, provides monitoring for AWS cloud resources and applications, starting
with EC2.
AWS Management Console (AWS Console), A web-based point and click interface to manage
and monitor the Amazon infrastructure suite including (but not limited
to) EC2, EBS, S3, SQS, Amazon Elastic MapReduce, and Amazon CloudFront. A mobile
application for Android which has support for some of the management features from the
console.
Amazon CloudHSM - The AWS CloudHSM service helps to meet corporate, contractual
and regulatory compliance requirements for data security by using dedicated Hardware Security
Module (HSM) appliances within the AWS cloud.
AWS Key Management Service (KMS) a managed service to create and control encryption keys.
Amazon EC2 Container Service (ECS) a highly scalable and fast container management service
using Docker containers.
Application services[edit]
Amazon API Gateway is a service for publishing, maintaining and securing web service APIs.
Amazon CloudSearch provides basic full-text search and indexing of textual content.
Amazon DevPay, currently in limited beta version, is a billing and account management system
for applications that developers have built atop Amazon Web Services.
Amazon Elastic Transcoder (ETS) provides video transcoding of S3 hosted videos, marketed
primarily as a way to convert source files into mobile-ready versions.
Amazon Simple Email Service (SES) provides bulk and transactional email sending.
Amazon Simple Queue Service (SQS) provides a hosted message queue for web applications.
Amazon Simple Notification Service (SNS) provides a hosted multi-protocol "push" messaging
for applications.
Amazon Simple Workflow (SWF) is a workflow service for building scalable, resilient
applications.
Amazon Cognito is a user identity and data synchronization service that securely manages and
synchronizes app data for users across their mobile devices.[55]
Amazon AppStream 2.0 is a low-latency service that streams and resources intensive
applications and games from the cloud using NICE DVC technology.[56]
Analytics[edit]
Amazon Athena is an ETL-like service launched in November 2016. It allows server-less
querying of S3 content using standard SQL.[57]
Amazon Elastic MapReduce (EMR) Provides a PaaS service delivering Hadoop for
running MapReduce queries framework running on the web-scale infrastructure
of EC2 and Amazon S3.
Amazon Machine Learning a service that assists developers of all skill levels to use machine
learning technology.
Amazon Kinesis is a cloud-based service for real-time data processing over large, distributed
data streams. It streams data in real time with the ability to process thousands of data streams
on a per-second basis. The service, designed for real-time apps, allows developers to pull any
amount of data, from any number of sources, scaling up or down as needed. It has some
similarities in functionality to Apache Kafka.[58]
Amazon Elasticsearch Service provides fully managed Elasticsearch and Kibana services.[59]
Amazon QuickSight is a business intelligence, analytics, and visualization tool launched in
November 2016.[60] It provides ad-hoc services by connecting to AWS or non-AWS data sources.
Miscellaneous[edit]
Amazon Marketplace Web Service (MWS) allows users to manage complete shipment process
from creating listing to downloading shipment label using API.
Amazon Fulfillment Web Service provides a programmatic web service for sellers to ship items
to and from Amazon using Fulfillment by Amazon. This service will no longer be supported by
Amazon. All of the functionality of this service is now transferred to Amazon marketplace Web
service.
Amazon Historical Pricing provides access to Amazon's historical sales data from its affiliates. (It
appears that this service has been discontinued.)
Amazon Mechanical Turk (Mturk) manages small units of work distributed among many persons.
Amazon Product Advertising API formerly known as Amazon Associates Web Service (A2S) and
Amazon E-Commerce Service (ECS), provides access to Amazon's product data and electronic
commerce functionality.
Amazon Gift Code On Demand (AGCOD) for Corporate Customers[61] enables companies to
distribute Amazon gift cards (gift codes) instantly in any denomination, integrating Amazon's gift-
card technology into customer loyalty, employee incentive and payment disbursement platforms.
AWS Partner Network (APN) provides technology partners and consulting partners with the
technical information and sales and marketing support to increase business opportunities
through AWS and with businesses using AWS. Launched in April 2012, the APN is made up of
Technology Partners including Independent Software Vendors (ISVs), tool providers, platform
providers, and others.[62][63][64] Consulting Partners include System Integrators (SIs), agencies,
consultancies, Managed Service Providers (MSPs), and others. Potential Technology and
Consulting Partners must meet technical and non-technical training requirements set by AWS.[65]
Amazon Lumberyard is a freeware triple-A game engine that is integrated with AWS.[66]
Amazon Chime an enterprise collaboration service organizations can use for voice, video
conference, and instant messaging.[67]
Pop-up lofts[edit]
In June 2014 AWS opened their first temporary Pop-up Loft, in San Francisco, to help businesses
discover their services.[68] In May 2015 they expanded to New York City,[69][70]and in September 2015
expanded to Berlin.[71] AWS opened their fourth location, in Tel Aviv from March 1, 2016 to March 22,
2016.[72] A Pop-up Loft was open in London from September 10 to October 29, 2015.[73]
Charitable work[edit]
In 2017 AWS launched a program in the United Kingdom to help young adults and military veterans
retrain in technology-related skills. In partnership with the Prince's Trust and the Ministry of Defence
(MoD), AWS will help to provide re-training opportunities for young people from disadvantaged
backgrounds and former soldiers who have left the military. AWS is working alongside a number of
partner companies including Cloudreach, Sage, EDF Energy and Tesco Bank.[74]
Microsoft Azure /r/ is a cloud computing service created by Microsoft for building, deploying,
and managing applications and services through a global network of Microsoft-managed data
centers. It provides software as a service, platform as a service and infrastructure as a service and
supports many different programming languages, tools and frameworks, including both Microsoft-
specific and third-party software and systems.
Azure was announced in October 2008 and released on February 1, 2010 as Windows Azure,
before being renamed to Microsoft Azure on March 25, 2014.[1][2]
Contents
[hide]
1Services
o 1.1Compute
o 1.2Mobile services
o 1.3Storage services
o 1.4Data management
o 1.5Messaging
o 1.6Media services
o 1.7CDN
o 1.8Developer
o 1.9Management
o 1.10Machine Learning
2Regions
3Design
o 3.1Deployment models
4Timeline
5Privacy
6Significant outages
7Certifications
8See also
9References
10Further reading
11External links
Services[edit]
Microsoft lists over 600 Azure services,[3] of which some are covered below:
Compute[edit]
Virtual machines, infrastructure as a service (IaaS) allowing users to launch general-
purpose Microsoft Windows and Linux virtual machines, as well as preconfigured machine
images for popular software packages.[4]
App services, platform as a service (PaaS) environment letting developers easily publish and
manage Web sites.
Websites, high density hosting of websites allows developers to build sites
using ASP.NET, PHP, Node.js, or Python and can be deployed using FTP, Git, Mercurial, Team
Foundation Server or uploaded through the user portal. This feature was announced in preview
form in June 2012 at the Meet Microsoft Azure event.[5] Customers can create websites in PHP,
ASP.NET, Node.js, or Python, or select from several open source applications from a gallery to
deploy. This comprises one aspect of the platform as a service (PaaS) offerings for the Microsoft
Azure Platform. It was renamed to Web Apps in April 2015.[1][6]
WebJobs, applications that can be deployed to a Web App to implement background
processing. That can be invoked on a schedule, on demand or can run continuously. The Blob,
Table and Queue services can be used to communicate between Web Apps and Web Jobs and
to provide state.[citation needed]
Mobile services[edit]
Mobile Engagement collects real-time analytics that highlight users behavior. It also provides
push notifications to mobile devices.[7]
HockeyApp can be used to develop, distribute, and beta-test mobile apps[8]
Storage services[edit]
Storage Services provides REST and SDK APIs for storing and accessing data on the cloud.
Table Service lets programs store structured text in partitioned collections of entities that are
accessed by partition key and primary key. It's a NoSQL non-relational database.
Blob Service allows programs to store unstructured text and binary data as blobs that can be
accessed by a HTTP(S) path. Blob service also provides security mechanisms to control access
to data.
Queue Service lets programs communicate asynchronously by message using queues.
File Service allows storing and access of data on the cloud using the REST APIs or the SMB
protocol.[9]
Data management[edit]
Azure Search provides text search and a subset of OData's structured filters using REST or
SDK APIs.
DocumentDB is a NoSQL database service that implements a subset of the SQL SELECT
statement on JSON documents.
Redis Cache is a managed implementation of Redis.
StorSimple manages storage tasks between on-premises devices and cloud storage.[10]
SQL Database, formerly known as SQL Azure Database, works to create, scale and extend
applications into the cloud using Microsoft SQL Server technology. It also integrates with Active
Directory and Microsoft System Center and Hadoop.[11]
SQL Data Warehouse is a data warehousing service designed to handle computational and data
intensive queries on datasets exceeding 1TB.
Messaging[edit]
The Microsoft Azure Service Bus allows applications running on Azure premises or off premises
devices to communicate with Azure. This helps to build scalable and reliable applications in
a service-oriented architecture (SOA). The Azure service bus supports four different types of
communication mechanisms:[citation needed]
Event Hubs, which provide event and telemetry ingress to the cloud at massive scale, with low
latency and high reliability. For example an event hub can be used to track data from cell
phones such as a GPS location coordinate in real time.[citation needed]
Queues, which allow one-directional communication. A sender application would send the
message to the service bus queue, and a receiver would read from the queue. Though there can
be multiple readers for the queue only one would process a single message.
Topics, which provide one-directional communication using a subscriber pattern. It is similar to a
queue, however each subscriber will receive a copy of the message sent to a Topic. Optionally
the subscriber can filter out messages based on specific criteria defined by the subscriber.
Relays, which provide bi-directional communication. Unlike queues and topics, a relay doesn't
store in-flight messages in its own memory. Instead, it just passes them on to the destination
application.
Media services[edit]
A PaaS offering that can be used for encoding, content protection, streaming, or analytics.[citation needed]
CDN[edit]
A global content delivery network (CDN) for audio, video, applications, images, and other static files.
Can be used to cache static assets of websites geographically closer to users to increase
performance. The network can be managed by a REST based HTTP API.[citation needed]
Azure has 30 point of presence locations worldwide (also known as Edge locations) as of December,
2016.[12]
Developer[edit]
Application Insights[citation needed]
Visual Studio Team Services[citation needed]
Management[edit]
Azure Automation, provides a way for users to automate the manual, long-running, error-prone,
and frequently repeated tasks that are commonly performed in a cloud and enterprise
environment. It saves time and increases the reliability of regular administrative tasks and even
schedules them to be automatically performed at regular intervals. You can automate processes
using runbooks or automate configuration management using Desired State Configuration.[1]
Microsoft SMA (software)
Machine Learning[edit]
Microsoft Azure Machine Learning (Azure ML) service is part of Cortana Intelligence Suite that
enables predictive analytics and interaction with data using natural language and speech
through Cortana.[13]
Regions[edit]
Azure is generally available in 38 regions around the world.[14]
Design[edit]
Microsoft Azure uses a specialized operating system, called Microsoft Azure, to run its "fabric
layer":[citation needed] a cluster hosted at Microsoft's data centers that manages computing and storage
resources of the computers and provisions the resources (or a subset of them) to applications
running on top of Microsoft Azure. Microsoft Azure has been described as a "cloud layer" on top of a
number of Windows Server systems, which use Windows Server 2008 and a customized version
of Hyper-V, known as the Microsoft Azure Hypervisor to provide virtualization of services.[citation needed]
Scaling and reliability are controlled by the Microsoft Azure Fabric Controller[citation needed] so the services
and environment do not crash, if one of the servers crashes within the Microsoft data center and
provides the management of the user's Web application like memory resources and load
balancing.[citation needed]
Azure provides an API built on REST, HTTP, and XML that allows a developer to interact with the
services provided by Microsoft Azure. Microsoft also provides a client-side managed class library
that encapsulates the functions of interacting with the services. It also integrates with Microsoft
Visual Studio, Git, and Eclipse.[citation needed]
In addition to interacting with services via API, users can manage Azure services using the Web-
based Azure Portal, which reached General Availability in December 2015.[15]The portal allows users
to browse active resources, modify settings, launch new resources, and view basic monitoring data
from active virtual machines and services. More advanced Azure management services are
available. [16]
Deployment models[edit]
Microsoft Azure offers two deployment models for cloud resources: the "classic" deployment model
and the Azure Resource Manager.[17] In the classic model, each Azure resource (virtual machine,
SQL database, etc.) was managed individually. The Azure Resource Manager, introduced in
2014,[17] enables users to create groups of related services so that closely coupled resources can be
deployed, managed, and monitored together.[18]
Timeline[edit]
Privacy[edit]
Microsoft has stated that, per the USA Patriot Act, the US government could have access to the data
even if the hosted company is not American and the data resides outside the USA.[23] However,
Microsoft Azure is compliant with the E.U. Data Protection Directive (95/46/EC)[24][25][contradictory]. To
manage privacy and security-related concerns, Microsoft has created a Microsoft Azure Trust
Center,[26] and Microsoft Azure has several of its services compliant with several compliance
programs including ISO 27001:2005 and HIPAA. A full and current listing can be found on the
Microsoft Azure Trust Center Compliance page.[27] Of special note, Microsoft Azure has been granted
JAB Provisional Authority to Operate (P-ATO) from the U.S. government in accordance with
guidelines spelled out under the Federal Risk and Authorization Management Program (FedRAMP),
a U.S. government program that provides a standardized approach to security assessment,
authorization, and continuous monitoring for cloud services used by the federal government.[28]
Significant outages[edit]
Documented Microsoft Azure outages and service disruptions.
Azure storage upgrade caused Xbox Live, Windows Store, MSN, Search,
2014-11-18 reduced capacity across several Visual Studio Online among others were
regions[35] affected.[36]
As of December 4, 2015, Azure has been available for 99.9936% of the past year.[37]
Certifications[edit]
Microsoft Azure certifications
Business analytics (BA) refers to the skills, technologies, practices for continuous iterative
exploration and investigation of past business performance to gain insight and drive business
planning.[1] Business analytics focuses on developing new insights and understanding of business
performance based on data and statistical methods. In contrast, business intelligence traditionally
focuses on using a consistent set of metrics to both measure past performance and guide business
planning, which is also based on data and statistical methods.(citation needed)
Business analytics makes extensive use of statistical analysis, including explanatory and predictive
modeling,[2] and fact-based management to drive decision making. It is therefore closely related
to management science. Analytics may be used as input for human decisions or may drive fully
automated decisions. Business intelligence is querying, reporting, online analytical
processing (OLAP), and "alerts".
In other words, querying, reporting, OLAP, and alert tools can answer questions such as what
happened, how many, how often, where the problem is, and what actions are needed. Business
analytics can answer questions like why is this happening, what if these trends continue, what will
happen next (that is, predict), what is the best that can happen (that is, optimize).[3]
Contents
[hide]
1Examples of application
2Types of analytics
3Basic domains within analytics
4History
5Challenges
6Competing on analytics
7See also
8References
9Further reading
Examples of application[edit]
Banks, such as Capital One, use data analysis (or analytics, as it is also called in the business
setting), to differentiate among customers based on credit risk, usage and other characteristics and
then to match customer characteristics with appropriate product offerings. Harrahs, the gaming firm,
uses analytics in its customer loyalty programs. E & J Gallo Winery quantitatively analyses and
predicts the appeal of its wines. Between 2002 and 2005, Deere & Company saved more than $1
billion by employing a new analytical tool to better optimise inventory.[3] A telecoms company that
pursues efficient call center usage over customer service may save money.
Types of analytics[edit]
Decision Analytics: supports human decisions with visual analytics that the user models to
reflect reasoning.[4]
Descriptive Analytics: gains insight from historical data with reporting, scorecards, clustering etc.
Predictive Analytics: employs predictive modelling using statistical and machine
learning techniques
Prescriptive Analytics: recommends decisions using optimisation, simulation, etc.
History[edit]
Analytics have been used in business since the management exercises were put into place
by Frederick Winslow Taylor in the late 19th century. Henry Ford measured the time of each
component in his newly established assembly line. But analytics began to command more attention
in the late 1960s when computers were used in decision support systems. Since then, analytics
have changed and formed with the development of enterprise resource planning (ERP)
systems, data warehouses, and a large number of other software tools and processes.[3]
In later years the business analytics have exploded with the introduction to computers. This change
has brought analytics to a whole new level and has brought about endless possibilities. As far as
analytics has come in history, and what the current field of analytics is today, many people would
never think that analytics started in the early 1900s with Mr. Ford himself.
Challenges[edit]
Business analytics depends on sufficient volumes of high quality data. The difficulty in ensuring data
quality is integrating and reconciling data across different systems, and then deciding what subsets
of data to make available.[3]
Previously, analytics was considered a type of after-the-fact method of forecasting consumer
behavior by examining the number of units sold in the last quarter or the last year. This type of data
warehousing required a lot more storage space than it did speed. Now business analytics is
becoming a tool that can influence the outcome of customer interactions.[5] When a specific customer
type is considering a purchase, an analytics-enabled enterprise can modify the sales pitch to appeal
to that consumer. This means the storage space for all that data must react extremely fast to provide
the necessary data in real-time.
Competing on analytics[edit]
Thomas Davenport, professor of information technology and management at Babson College argues
that businesses can optimize a distinct business capability via analytics and thus better compete. He
identifies these characteristics of an organization that are apt to compete on analytics:[3]
One or more senior executives who strongly advocate fact-based decision making and,
specifically, analytics
Widespread use of not only descriptive statistics, but also predictive modeling and
complex optimization techniques
Substantial use of analytics across multiple business functions or processes
Movement toward an enterprise level approach to managing analytical tools, data, and
organizational skills and capabilities
See also[edit]
Analytics
Business analysis
Business analyst
Business intelligence
Business process discovery
Customer dynamics
Data mining
OLAP
Statistics
Test and learn
DevOps
From Wikipedia, the free encyclopedia
Core activities
Requirements
Design
Construction
Testing
Debugging
Deployment
Maintenance
Paradigms and models
Software engineering
Waterfall
Cleanroom
Incremental
Spiral
V-Model
Agile
Prototyping
RAD
UP
XP
TSP
PSP
DSDM
MSF
Scrum
Kanban
Dual Vee Model
TDD
ATDD
BDD
FDD
DDD
MDD
IID
Lean
DevOps
Supporting disciplines
Configuration management
Infrastructure as Code
Documentation
Software quality assurance (SQA)
Project management
User experience
WinOps
Tools
Compiler
Debugger
Profiler
GUI designer
Modeling
IDE
Build automation
Release automation
Testing
CMMI
IEEE standards
ISO 9001
ISO/IEC standards
SWEBOK
PMBOK
BABOK
v
t
e
Contents
[hide]
1History
2Overview
3DevOps toolchain
4Relationship to agile and continuous delivery
o 4.1Agile
o 4.2Continuous delivery
5Goals
o 5.1Benefits of DevOps
6Cultural change
o 6.1Building a DevOps culture
7Deployment
8DevOps and architecture
9Scope of adoption
o 9.1Incremental adoption
9.1.1The first way: systems thinking
9.1.2The second way: amplify feedback loops
9.1.3The third way: culture of continual experimentation and learning
10References
11Further reading
History[edit]
At the Agile 2008 conference, Andrew Clay Shafer and Patrick Debois discussed "Agile
Infrastructure".[6] The term DevOps was popularized through a series of "devopsdays" starting in
2009 in Belgium.[7] Since then, there have been devopsdays conferences, held in many countries,
worldwide.[8]
The popularity of DevOps has grown in recent years, inspiring many other tangential movements
including OpsDev and WinOps.[9] WinOps embodies the same set of practices and emphasis on
culture as DevOps, but is specific for a Microsoft-centric view.[10]
Overview[edit]
Because DevOps is a cultural shift and collaboration (between development, operations and testing),
there is no single "DevOps tool": it is rather a set (or "DevOps toolchain"), consisting of multiple
tools.[12] Generally, DevOps tools fit into one or more of these categories, which is reflective of key
aspects of the software development and delivery process:[13][14]
1. Code Code development and review, version control tools, code merging;
2. Build Continuous integration tools, build status;
3. Test Continuous testing tools that provide feedback on business risks;
4. Package Artifact repository, application pre-deployment staging;
5. Release Change management, release approvals, release automation;
6. Configure Infrastructure configuration and management, Infrastructure as Code tools;
7. Monitor Applications performance monitoring, enduser experience.
Though there are many tools available, certain categories of them are essential in the DevOps
toolchain setup for use in an organization. Some attempts to identify those basic tools can be found
in the existing literature.[15]
Tools such as Docker (containerization), Jenkins (continuous integration), Puppet (Infrastructure as
Code) and Vagrant (virtualization platform)among many othersare often used and frequently
referenced in DevOps tooling discussions.[16]
Goals[edit]
The specific goals of DevOps span the entire delivery pipeline. They include improved deployment
frequency, which can lead to:
Cultural change[edit]
DevOps is more than just a tool or a process change; it inherently requires an organizational culture
shift.[25] This cultural change is especially difficult, because of the conflicting nature of departmental
roles:
Deployment[edit]
Companies with very frequent releases may require a DevOps awareness or orientation program.
For example, the company that operates the image hosting website Flickr developed a DevOps
approach, to support a business requirement of ten deployments per day;[30] this daily deployment
cycle would be much higher at organizations producing multi-focus or multi-function applications.
This is referred to as continuous deployment[31] or continuous delivery [32] and has been associated
with the lean startup methodology.[33]Working groups, professional associations and blogs have
formed on the topic since 2009.[5][34][35]
Scope of adoption[edit]
Some articles in the DevOps literature assume, or recommend, significant participation in DevOps
initiatives from outside an organization's IT department, e.g.: "DevOps is just the agile principle,
taken to the full enterprise."[38]
A survey published in January 2016 by the SaaS cloud-computing company RightScale, DevOps
adoption increased from 66 percent in 2015 to 74 percent in 2016. And among larger enterprise
organizations, DevOps adoption is even higher 81 percent.[39]
Adoption of DevOps is being driven by many factors including:
This article contains content that is written like an advertisement. (November 2015)
This article may rely excessively on sources too closely associated with the
subject, potentially preventing the article from
being verifiable and neutral. (November 2015)
Jira
Jira logo
Written in Java
software
Website atlassian.com/software/jira
Jira (/di.r/ JEE-rah)[5] (stylized JIRA) is a proprietary issue tracking product, developed
by Atlassian. It provides bug tracking, issue tracking, and project management functions. Although
normally styled JIRA, the product name is not an acronym, but a truncation of Gojira, the Japanese
name for Godzilla,[6] itself a reference to Jira's main competitor, Bugzilla. It has been developed since
2002.[1] According to one ranking method, as of June 2017, Jira is the most popular issue
management tool.[7]
Contents
[hide]
1Description
2License
3Security
4See also
5References
6External links
Description[edit]
According to Atlassian, Jira is used for issue tracking and project management by over 25,000
customers in 122 countries around the globe.[8] Some of the organizations that have used Jira at
some point in time for bug-tracking and project management include Fedora
Commons,[9] Hibernate,[10] JBoss,[11] Skype Technologies,[12] Spring Framework,[13] and The Apache
Software Foundation, which uses both Jira and Bugzilla.[14] Jira includes tools allowing migration from
competitor Bugzilla.[15]
Jira is offered in three packages:[citation needed]
License[edit]
Jira is a commercial software product that can be licensed for running on-premises or available as a
hosted application. Pricing depends on the maximum number of users.[21]
Atlassian provides Jira for free to open source projects meeting certain criteria, and to organizations
that are non-academic, non-commercial, non-governmental, non-political, non-profit, and secular.
For academic and commercial customers, the full source code is available under a developer source
license.[21]
Security[edit]
In April 2010 a cross-site scripting vulnerability in Jira led to the compromise of two Apache Software
Foundation servers. The Jira password database was compromised. The database
contained unsalted password hashes, which are vulnerable to dictionary lookups and cracking tools.
Apache advised users to change their passwords.[22] Atlassian themselves were also targeted as part
of the same attack and admitted that a legacy database with passwords stored in plain text had been
compromised.[23]
Machine learning
From Wikipedia, the free encyclopedia
Problems[show]
Supervised learning
(classification regression)
[show]
Clustering[show]
Dimensionality reduction[show]
Structured prediction[show]
Anomaly detection[show]
Neural nets[show]
Reinforcement Learning[show]
Theory[show]
v
t
e
Machine learning is the subfield of computer science that, according to Arthur Samuel in 1959,
gives "computers the ability to learn without being explicitly programmed."[1] Evolved from the study
of pattern recognition and computational learning theory in artificial intelligence,[2] machine learning
explores the study and construction of algorithms that can learn from and make predictions
on data[3] such algorithms overcome following strictly static program instructions by making data-
driven predictions or decisions,[4]:2through building a model from sample inputs. Machine learning is
employed in a range of computing tasks where designing and programming explicit algorithms with
good performance is difficult or infeasible; example applications include email filtering, detection of
network intruders or malicious insiders working towards a data breach,[5] optical character
recognition (OCR),[6] learning to rank and computer vision.
Machine learning is closely related to (and often overlaps with) computational statistics, which also
focuses on prediction-making through the use of computers. It has strong ties to mathematical
optimization, which delivers methods, theory and application domains to the field. Machine learning
is sometimes conflated with data mining,[7] where the latter subfield focuses more on exploratory data
analysis and is known as unsupervised learning.[4]:vii[8] Machine learning can also be
unsupervised[9] and be used to learn and establish baseline behavioral profiles for various
entities[10] and then used to find meaningful anomalies.
Within the field of data analytics, machine learning is a method used to devise complex models and
algorithms that lend themselves to prediction; in commercial use, this is known as predictive
analytics. These analytical models allow researchers, data scientists, engineers, and analysts to
"produce reliable, repeatable decisions and results" and uncover "hidden insights" through learning
from historical relationships and trends in the data.[11]
As of 2016, machine learning is a buzzword, and according to the Gartner hype cycle of 2016, at its
peak of inflated expectations.[12]Because finding patterns is hard, often not enough training data is
available, and also because of the high expectations it often fails to deliver.[13][14]
Contents
[hide]
1Overview
o 1.1Types of problems and tasks
2History and relationships to other fields
o 2.1Relation to statistics
3Theory
4Approaches
o 4.1Decision tree learning
o 4.2Association rule learning
o 4.3Artificial neural networks
o 4.4Deep learning
o 4.5Inductive logic programming
o 4.6Support vector machines
o 4.7Clustering
o 4.8Bayesian networks
o 4.9Reinforcement learning
o 4.10Representation learning
o 4.11Similarity and metric learning
o 4.12Sparse dictionary learning
o 4.13Genetic algorithms
o 4.14Rule-based machine learning
o 4.15Learning classifier systems
5Applications
6Model assessments
7Ethics
8Software
o 8.1Free and open-source software
o 8.2Proprietary software with free and open-source editions
o 8.3Proprietary software
9Journals
10Conferences
11See also
12References
13Further reading
14External links
Overview[edit]
Tom M. Mitchell provided a widely quoted, more formal definition: "A computer program is said to
learn from experience E with respect to some class of tasks T and performance measure P if its
performance at tasks in T, as measured by P, improves with experience E."[15] This definition is
notable for its defining machine learning in fundamentally operational rather than cognitive terms,
thus following Alan Turing's proposal in his paper "Computing Machinery and Intelligence", that the
question "Can machines think?" be replaced with the question "Can machines do what we (as
thinking entities) can do?".[16] In the proposal he explores the various characteristics that could be
possessed by a thinking machine and the various implications in constructing one.
Types of problems and tasks[edit]
Machine learning tasks are typically classified into three broad categories, depending on the nature
of the learning "signal" or "feedback" available to a learning system. These are[17]
Supervised learning: The computer is presented with example inputs and their desired outputs,
given by a "teacher", and the goal is to learn a general rule that maps inputs to outputs.
Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to find
structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns
in data) or a means towards an end (feature learning).
Reinforcement learning: A computer program interacts with a dynamic environment in which it
must perform a certain goal (such as driving a vehicle or playing a game against an
opponent[4]:3). The program is provided feedback in terms of rewards and punishments as it
navigates its problem space.
Between supervised and unsupervised learning is semi-supervised learning, where the teacher
gives an incomplete training signal: a training set with some (often many) of the target outputs
missing. Transduction is a special case of this principle where the entire set of problem instances is
known at learning time, except that part of the targets are missing.
A support vector machine is a classifier that divides its input space into two regions, separated by a linear
boundary. Here, it has learned to distinguish black and white circles.
Among other categories of machine learning problems, learning to learn learns its own inductive
bias based on previous experience. Developmental learning, elaborated for robot learning,
generates its own sequences (also called curriculum) of learning situations to cumulatively acquire
repertoires of novel skills through autonomous self-exploration and social interaction with human
teachers and using guidance mechanisms such as active learning, maturation, motor synergies, and
imitation.
Another categorization of machine learning tasks arises when one considers the desired output of a
machine-learned system:[4]:3
In classification, inputs are divided into two or more classes, and the learner must produce a
model that assigns unseen inputs to one or more (multi-label classification) of these classes.
This is typically tackled in a supervised way. Spam filtering is an example of classification, where
the inputs are email (or other) messages and the classes are "spam" and "not spam".
In regression, also a supervised problem, the outputs are continuous rather than discrete.
In clustering, a set of inputs is to be divided into groups. Unlike in classification, the groups are
not known beforehand, making this typically an unsupervised task.
Density estimation finds the distribution of inputs in some space.
Dimensionality reduction simplifies inputs by mapping them into a lower-dimensional
space. Topic modeling is a related problem, where a program is given a list of human
language documents and is tasked to find out which documents cover similar topics.
As a scientific endeavour, machine learning grew out of the quest for artificial intelligence. Already in
the early days of AI as an academic discipline, some researchers were interested in having
machines learn from data. They attempted to approach the problem with various symbolic methods,
as well as what were then termed "neural networks"; these were mostly perceptrons and other
models that were later found to be reinventions of the generalized linear models of
statistics.[18] Probabilistic reasoning was also employed, especially in automated medical
diagnosis.[17]:488
However, an increasing emphasis on the logical, knowledge-based approach caused a rift between
AI and machine learning. Probabilistic systems were plagued by theoretical and practical problems
of data acquisition and representation.[17]:488 By 1980, expert systems had come to dominate AI, and
statistics was out of favor.[19] Work on symbolic/knowledge-based learning did continue within AI,
leading to inductive logic programming, but the more statistical line of research was now outside the
field of AI proper, in pattern recognition and information retrieval.[17]:708710; 755 Neural networks research
had been abandoned by AI and computer science around the same time. This line, too, was
continued outside the AI/CS field, as "connectionism", by researchers from other disciplines
including Hopfield, Rumelhart and Hinton. Their main success came in the mid-1980s with the
reinvention of backpropagation.[17]:25
Machine learning, reorganized as a separate field, started to flourish in the 1990s. The field changed
its goal from achieving artificial intelligence to tackling solvable problems of a practical nature. It
shifted focus away from the symbolic approaches it had inherited from AI, and toward methods and
models borrowed from statistics and probability theory.[19] It also benefited from the increasing
availability of digitized information, and the possibility to distribute that via the Internet.
Machine learning and data mining often employ the same methods and overlap significantly, but
while machine learning focuses on prediction, based on known properties learned from the training
data, data mining focuses on the discovery of (previously) unknown properties in the data (this is the
analysis step of Knowledge Discovery in Databases). Data mining uses many machine learning
methods, but with different goals; on the other hand, machine learning also employs data mining
methods as "unsupervised learning" or as a preprocessing step to improve learner accuracy. Much
of the confusion between these two research communities (which do often have separate
conferences and separate journals, ECML PKDD being a major exception) comes from the basic
assumptions they work with: in machine learning, performance is usually evaluated with respect to
the ability to reproduce known knowledge, while in Knowledge Discovery and Data Mining (KDD) the
key task is the discovery of previously unknown knowledge. Evaluated with respect to known
knowledge, an uninformed (unsupervised) method will easily be outperformed by other supervised
methods, while in a typical KDD task, supervised methods cannot be used due to the unavailability
of training data.
Machine learning also has intimate ties to optimization: many learning problems are formulated as
minimization of some loss function on a training set of examples. Loss functions express the
discrepancy between the predictions of the model being trained and the actual problem instances
(for example, in classification, one wants to assign a label to instances, and models are trained to
correctly predict the pre-assigned labels of a set of examples). The difference between the two fields
arises from the goal of generalization: while optimization algorithms can minimize the loss on a
training set, machine learning is concerned with minimizing the loss on unseen samples.[20]
Relation to statistics[edit]
Machine learning and statistics are closely related fields. According to Michael I. Jordan, the ideas of
machine learning, from methodological principles to theoretical tools, have had a long pre-history in
statistics.[21] He also suggested the term data science as a placeholder to call the overall field.[21]
Leo Breiman distinguished two statistical modelling paradigms: data model and algorithmic
model,[22] wherein 'algorithmic model' means more or less the machine learning algorithms
like Random forest.
Some statisticians have adopted methods from machine learning, leading to a combined field that
they call statistical learning.[23]
Theory[edit]
Main article: Computational learning theory
A core objective of a learner is to generalize from its experience.[24][25] Generalization in this context is
the ability of a learning machine to perform accurately on new, unseen examples/tasks after having
experienced a learning data set. The training examples come from some generally unknown
probability distribution (considered representative of the space of occurrences) and the learner has
to build a general model about this space that enables it to produce sufficiently accurate predictions
in new cases.
The computational analysis of machine learning algorithms and their performance is a branch
of theoretical computer science known as computational learning theory. Because training sets are
finite and the future is uncertain, learning theory usually does not yield guarantees of the
performance of algorithms. Instead, probabilistic bounds on the performance are quite common.
The biasvariance decomposition is one way to quantify generalization error.
For the best performance in the context of generalization, the complexity of the hypothesis should
match the complexity of the function underlying the data. If the hypothesis is less complex than the
function, then the model has underfit the data. If the complexity of the model is increased in
response, then the training error decreases. But if the hypothesis is too complex, then the model is
subject to overfitting and generalization will be poorer.[26]
In addition to performance bounds, computational learning theorists study the time complexity and
feasibility of learning. In computational learning theory, a computation is considered feasible if it can
be done in polynomial time. There are two kinds of time complexity results. Positive results show
that a certain class of functions can be learned in polynomial time. Negative results show that certain
classes cannot be learned in polynomial time.
Approaches[edit]
Main article: List of machine learning algorithms
Decision tree learning uses a decision tree as a predictive model, which maps observations about an
item to conclusions about the item's target value.
Association rule learning[edit]
Main article: Association rule learning
Association rule learning is a method for discovering interesting relations between variables in large
databases.
Artificial neural networks[edit]
Main article: Artificial neural network
An artificial neural network (ANN) learning algorithm, usually called "neural network" (NN), is a
learning algorithm that is inspired by the structure and functional aspects of biological neural
networks. Computations are structured in terms of an interconnected group of artificial neurons,
processing information using a connectionist approach to computation. Modern neural networks
are non-linear statistical data modeling tools. They are usually used to model complex relationships
between inputs and outputs, to find patterns in data, or to capture the statistical structure in an
unknown joint probability distribution between observed variables.
Deep learning[edit]
Main article: Deep learning
Falling hardware prices and the development of GPUs for personal use in the last few years have
contributed to the development of the concept of deep learning which consists of multiple hidden
layers in an artificial neural network. This approach tries to model the way the human brain
processes light and sound into vision and hearing. Some successful applications of deep learning
are computer vision and speech recognition.[27]
Inductive logic programming[edit]
Main article: Inductive logic programming
Inductive logic programming (ILP) is an approach to rule learning using logic programming as a
uniform representation for input examples, background knowledge, and hypotheses. Given an
encoding of the known background knowledge and a set of examples represented as a logical
database of facts, an ILP system will derive a hypothesized logic program that entails all positive and
no negative examples. Inductive programming is a related field that considers any kind of
programming languages for representing hypotheses (and not only logic programming), such
as functional programs.
Support vector machines[edit]
Main article: Support vector machines
Support vector machines (SVMs) are a set of related supervised learning methods used
for classification and regression. Given a set of training examples, each marked as belonging to one
of two categories, an SVM training algorithm builds a model that predicts whether a new example
falls into one category or the other.
Clustering[edit]
Main article: Cluster analysis
Cluster analysis is the assignment of a set of observations into subsets (called clusters) so that
observations within the same cluster are similar according to some predesignated criterion or
criteria, while observations drawn from different clusters are dissimilar. Different clustering
techniques make different assumptions on the structure of the data, often defined by some similarity
metric and evaluated for example by internal compactness (similarity between members of the same
cluster) and separation between different clusters. Other methods are based on estimated
density and graph connectivity. Clustering is a method of unsupervised learning, and a common
technique for statistical data analysis.
Bayesian networks[edit]
Main article: Bayesian network
A Bayesian network, belief network or directed acyclic graphical model is a probabilistic graphical
model that represents a set of random variables and their conditional independencies via a directed
acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic
relationships between diseases and symptoms. Given symptoms, the network can be used to
compute the probabilities of the presence of various diseases. Efficient algorithms exist that
perform inference and learning.
Reinforcement learning[edit]
Main article: Reinforcement learning
Reinforcement learning is concerned with how an agent ought to take actions in an environment so
as to maximize some notion of long-term reward. Reinforcement learning algorithms attempt to find
a policy that maps states of the world to the actions the agent ought to take in those states.
Reinforcement learning differs from the supervised learning problem in that correct input/output pairs
are never presented, nor sub-optimal actions explicitly corrected.
Representation learning[edit]
Main article: Representation learning
Several learning algorithms, mostly unsupervised learning algorithms, aim at discovering better
representations of the inputs provided during training. Classical examples include principal
components analysis and cluster analysis. Representation learning algorithms often attempt to
preserve the information in their input but transform it in a way that makes it useful, often as a pre-
processing step before performing classification or predictions, allowing reconstruction of the inputs
coming from the unknown data generating distribution, while not being necessarily faithful for
configurations that are implausible under that distribution.
Manifold learning algorithms attempt to do so under the constraint that the learned representation is
low-dimensional. Sparse coding algorithms attempt to do so under the constraint that the learned
representation is sparse (has many zeros). Multilinear subspace learning algorithms aim to learn
low-dimensional representations directly from tensor representations for multidimensional data,
without reshaping them into (high-dimensional) vectors.[28] Deep learning algorithms discover multiple
levels of representation, or a hierarchy of features, with higher-level, more abstract features defined
in terms of (or generating) lower-level features. It has been argued that an intelligent machine is one
that learns a representation that disentangles the underlying factors of variation that explain the
observed data.[29]
Similarity and metric learning[edit]
Main article: Similarity learning
In this problem, the learning machine is given pairs of examples that are considered similar and
pairs of less similar objects. It then needs to learn a similarity function (or a distance metric function)
that can predict if new objects are similar. It is sometimes used in Recommendation systems.
Sparse dictionary learning[edit]
Main article: Sparse dictionary learning
In this method, a datum is represented as a linear combination of basis functions, and the
coefficients are assumed to be sparse. Let x be a d-dimensional datum, D be a d by n matrix, where
each column of D represents a basis function. r is the coefficient to represent x using D.
A genetic algorithm (GA) is a search heuristic that mimics the process of natural selection, and uses
methods such as mutation and crossover to generate new genotype in the hope of finding good
solutions to a given problem. In machine learning, genetic algorithms found some uses in the 1980s
and 1990s.[32][33] Vice versa, machine learning techniques have been used to improve the
performance of genetic and evolutionary algorithms.[34]
Rule-based machine learning[edit]
Rule-based machine learning is a general term for any machine learning method that identifies,
learns, or evolves `rules to store, manipulate or apply, knowledge. The defining characteristic of a
rule-based machine learner is the identification and utilization of a set of relational rules that
collectively represent the knowledge captured by the system. This is in contrast to other machine
learners that commonly identify a singular model that can be universally applied to any instance in
order to make a prediction.[35] Rule-based machine learning approaches include learning classifier
systems, association rule learning, and artificial immune systems.
Learning classifier systems[edit]
Main article: Learning classifier system
Learning classifier systems (LCS) are a family of rule-based machine learning algorithms that
combine a discovery component (e.g. typically a genetic algorithm) with a learning component
(performing either supervised learning, reinforcement learning, or unsupervised learning). They seek
to identify a set of context-dependent rules that collectively store and apply knowledge in
a piecewise manner in order to make predictions.[36]
Applications[edit]
Applications for machine learning include:
Adaptive websites
Affective computing
Bioinformatics
Brain-machine interfaces
Cheminformatics
Classifying DNA sequences
Computational anatomy
Computer vision, including object recognition
Detecting credit card fraud
Game playing
Information retrieval
Internet fraud detection
Marketing
Machine learning control
Machine perception
Medical diagnosis
Economics
Natural language processing
Natural language understanding
Optimization and metaheuristic
Online advertising
Recommender systems
Robot locomotion
Search engines
Sentiment analysis (or opinion mining)
Sequence mining
Software engineering
Speech and handwriting recognition
Financial market analysis
Structural health monitoring
Syntactic pattern recognition
User behavior analytics
Translation[37]
In 2006, the online movie company Netflix held the first "Netflix Prize" competition to find a program
to better predict user preferences and improve the accuracy on its existing Cinematch movie
recommendation algorithm by at least 10%. A joint team made up of researchers from AT&T Labs-
Research in collaboration with the teams Big Chaos and Pragmatic Theory built an ensemble
model to win the Grand Prize in 2009 for $1 million.[38] Shortly after the prize was awarded, Netflix
realized that viewers' ratings were not the best indicators of their viewing patterns ("everything is a
recommendation") and they changed their recommendation engine accordingly.[39]
In 2012, co-founder of Sun Microsystems Vinod Khosla predicted that 80% of medical doctors jobs
would be lost in the next two decades to automated machine learning medical diagnostic software.[40]
In 2014, it has been reported that a machine learning algorithm has been applied in Art History to
study fine art paintings, and that it may have revealed previously unrecognized influences between
artists.[41]
Model assessments[edit]
Classification machine learning models can be validated by accuracy estimation techniques like
the Holdout method, which splits the data in a training and test set (conventionally 2/3 training set
and 1/3 test set designation) and evaluates the performance of the training model on the test set. In
comparison, the N-fold-cross-validation method randomly splits the data in k subsets where the k-1
instances of the data are used to train the model while the kth instance is used to test the predictive
ability of the training model. In addition to the holdout and cross-validation methods, bootstrap, which
samples n instances with replacement from the dataset, can be used to assess model accuracy.[42]
In addition to overall accuracy, investigators frequently report sensitivity and specificity (True
Positive Rate: TPR and True Negative Rate: TNR, respectively) meaning True Positive Rate (TPR)
and True Negative Rate (TNR) respectively. Similarly, investigators sometimes report the False
Positive Rate (FPR) as well as the False Negative Rate (FNR). However, these rates are ratios that
fail to reveal their numerators and denominators. The Total Operating Characteristic (TOC) is an
effective method to express a models diagnostic ability. TOC shows the numerators and
denominators of the previously mentioned rates, thus TOC provides more information than the
commonly used Receiver operating characteristic (ROC) and ROCs associated Area Under the
Curve (AUC).
Ethics[edit]
Machine Learning poses a host of ethical questions. Systems which are trained on datasets
collected with biases may exhibit these biases upon use, thus digitizing cultural
prejudices.[43] Responsible collection of data thus is a critical part of machine learning.
Because language contains biases, machines trained on language corpora will necessarily also
learn bias.[44]
See Machine ethics for additional information.
Software[edit]
Software suites containing a variety of machine learning algorithms include the following :
Free and open-source software[edit]
Deeplearning4j
dlib
ELKI
GNU Octave
H2O
Mahout
Mallet
mlpy
MLPACK
MOA (Massive Online Analysis)
MXNet
ND4J: ND arrays for Java
NuPIC
OpenAI Gym
OpenAI Universe
OpenNN
Orange
R
scikit-learn
Shogun
SMILE
TensorFlow
Torch
Yooreeka
Weka
Proprietary software with free and open-source editions[edit]
KNIME
RapidMiner
Proprietary software[edit]
Journals[edit]
Journal of Machine Learning Research
Machine Learning
Neural Computation
Conferences[edit]
Conference on Neural Information Processing Systems
International Conference on Machine Learning
See also[edit]
Automatic reasoning
Big data
Computational intelligence
Computational neuroscience
Data science
Ethics of artificial intelligence
Existential risk from advanced artificial intelligence
Explanation-based learning
Quantum machine learning
Important publications in machine learning
List of machine learning algorithms
List of datasets for machine learning research
Similarity learning
Machine Learning Applications in Bioinformatics
Key Skills - Skilling up for Salesforce
The return on investment for mastering Salesforce applications can prove to
be very attractive for IT professionals.
John Kittle, Eurostaff TechnologyNovember 23, 2012
Share
witter
acebook
inkedIn
oogle Plus
What is the skill? Weve seen demand for professionals that can implement Salesforce applications
nearly triple in the past couple of years. Salesforce, which is the enterprise cloud computing leader, offers
customer relationship management and sales applications. Salesforce can be divided into four distinct job
types that cover the different implementation stages of products: Salesforce
Administrator, Force.com Developer, Implementation Expert and Architects.
The administrators keep Salesforce working on a day-to-day basis after the implementation team puts the
right solutions in place. The architects and developers design and build the new applications that make
businesses grow and stay at the forefront of technological advances.
Many existing developers and IT professionals will have the skills required to be able to perform in a
Salesforce type role, but for experts to really achieve success (and the highest salaries) they need to
become cloud certified. This consists of taking part in classroom training and/or self-study, as well as
exams.
As well as CRM, the top related IT skills that will help you are:
Apex Code
Java
SaaS
.NET
Visualforce
Where did it come from? Salesforce was founded 13 years ago by former Oracle executive Marc
Benioff and is best known for its customer relationship software. In the last three years, Salesforce has
made over 15 acquisitions as it expands into the social enterprise arena which has since created the six
distinct products below.
What is it for? In its current form, Salesforce can be split into six pre-defined products (mainly driven by
acquisitions). The first, and the original area, is The Sales Cloud which aims to boost revenues and
productivity within the business. The Service Cloud is about connecting with customers from a service
point of view via social media communities. Marketing Cloud aligns the sales, service and marketing
functions of a business by listening to social chat and connecting with customers. Salesforce Platform is
used by a wide-range of companies in order to build real-time apps for its customers. Chatter very much
acts like a social intranet for businesses allowing colleagues across divisions, offices and countries to
collaborate in real time, in context, from anywhere. Finally, Work.com is an internal sales performance
management platform it is designed to motivate the sales team and drive performance
Related
How Salesforce brought AI and machine learning into its platform of products
What is unique about it? The intuitive, easy to use and well-designed aspects of the Salesforce platform
means there are far more options from a development point of view. The Sales Cloud also offers features
that are unique in the CRM sector. What is also unique about Salesforce is the fact that implementation is
fast, and can be achieved with a relatively small team. Typically a medium-sized implementation could
take around a year and only require half a dozen consultants. Those candidates with experience in
implementing Siebel will be well suited to Salesforce roll-outs, although our clients that have requested
Siebel roll-outs are usually large sized and the implementation takes a couple of years. Our clients
requesting Salesforce are sometimes small, or medium sized and looking for a shorter term contract, This
offers variety, but not necessarily long-term stability, for candidates.
Where is it used? Salesforce is not ring-fenced into any particular industry and is common across
verticals such as Communications (Telefonica), Financial Services (ING), Government, Healthcare &
Life Sciences, Hi-Tech (Dell & Cisco), Manufacturing (Toyota) and Retail (Burberry).
How do I acquire the skill, is it difficult to master? Salesforce offers a range of training programmes at
different prices, for example the Administration Essentials for New Admins virtual class costs 3,125
for 25 hours. There are also free online resources and a range of books, but you will need to be motivated
to teach yourself and invest the time (and sometimes money) to become certified and/or at least be able to
demonstrate your expertise in the Salesforce platform. We also work closely with training providers and
often put candidates in touch with consultancies to improve their skillset.
A knowledge of development/architecture is needed for the more advanced Salesforce courses, and the
best approach for career development is to specialise.
The return on investment on your Salesforce training could be very attractive, as the cost of the training
courses are relativity low, yet the day rates you can command are high.
Pay and prospects? Salaries are up 10% on last year, averaging around 46,000 per annum (though do
note that the salaries are sensitive to location). Application support can expect to get 30,000 up to two
years experience, rising potentially to 50,000 for those with five years. Database developers would look
at similar levels with potentially a few thousand extra for those more experienced. Business Analysts take
a bit more home, 34,000 at the start of their career, rising to 58,000 with five years experience.
Development Managers could potentially earn around 83,000 with five years of experience. From a
contract side typical daily rates for Salesforce developers (technical resources) with two-three years
experience would be upwards of 330 a day. Salesforce BAs / Architects can earn upwards of 410 a day
with two-three years experience and 450 or more with five+ years.
Whats next/where does it lead? With so many acquisitions and a constantly changing IT industry,
people with Salesforce skills will have to evolve with the company and platform to stay at the forefront of
the market.
[See also Key Skills - Working with Workday]
John Kittle is Contracts Director at Eurostaff Technology, a specialist recruiters to the technology sector
Selenium (software)
From Wikipedia, the free encyclopedia
This article needs additional citations for verification. Please
help improve this article by adding citations to reliable sources.
Unsourced material may be challenged and removed. (January
2017) (Learn how and when to remove this template message)
Selenium
Repository github.com/SeleniumHQ/selenium
Written in Java
applications
Website www.seleniumhq.org
Contents
[hide]
1History
2Components
o 2.1Selenium IDE
o 2.2Selenium client API
o 2.3Selenium Remote Control
o 2.4Selenium WebDriver
o 2.5Selenium Grid
3See also
4References
5External links
History[edit]
Selenium was originally developed by Jason Huggins in 2004 as an internal tool at ThoughtWorks.
Huggins was later joined by other programmers and testers at ThoughtWorks, before Paul Hammant
joined the team and steered the development of the second mode of operation that would later
become "Selenium Remote Control" (RC). The tool was open sourced that year.
In 2005 Dan Fabulich and Nelson Sproul (with help from Pat Lightbody) made an offer to accept a
series of patches that would transform Selenium-RC into what it became best known for. In the
same meeting, the steering of Selenium as a project would continue as a committee, with Huggins
and Hammant being the ThoughtWorks representatives.
In 2007, Huggins joined Google. Together with others like Jennifer Bevan, he continued with the
development and stabilization of Selenium RC. At the same time, Simon Stewart at ThoughtWorks
developed a superior browser automation tool called WebDriver. In 2009, after a meeting between
the developers at the Google Test Automation Conference, it was decided to merge the two projects,
and call the new project Selenium WebDriver, or Selenium 2.0.[1]
In 2008, Philippe Hanrigou (then at ThoughtWorks) made "Selenium Grid", which provides a hub
allowing the running of multiple Selenium tests concurrently on any number of local or remote
systems, thus minimizing test execution time. Grid offered, as open source, a similar capability to the
internal/private Google cloud for Selenium RC. Pat Lightbody had already made a private cloud for
"HostedQA" which he went on to sell to Gomez, Inc.
The name Selenium comes from a joke made by Huggins in an email, mocking a competitor
named Mercury, saying that you can cure mercury poisoning by taking selenium supplements. The
others that received the email took the name and ran with it.[2]
Components[edit]
Selenium is composed of several components with each taking on a specific role in aiding the
development of web application test automation.
Selenium IDE[edit]
Selenium IDE is a complete integrated development environment (IDE) for Selenium tests. It is
implemented as a Firefox Add-On, and allows recording, editing, and debugging tests. It was
previously known as Selenium Recorder. Selenium-IDE was originally created by Shinya Kasatani
and donated to the Selenium project in 2006. It is little-maintained and is compatible with Selenium
RC, which was deprecated.[3]
Scripts may be automatically recorded and edited manually providing autocompletion support and
the ability to move commands around quickly. Scripts are recorded in Selenese, a special test
scripting language for Selenium. Selenese provides commands for performing actions in a browser
(click a link, select an option), and for retrieving data from the resulting pages.
Selenium client API[edit]
As an alternative to writing tests in Selenese, tests can also be written in various programming
languages. These tests then communicate with Selenium by calling methods in the Selenium Client
API. Selenium currently provides client APIs for Java, C#, Ruby, JavaScript and Python.
With Selenium 2, a new Client API was introduced (with WebDriver as its central component).
However, the old API (using class Selenium) is still supported.
Selenium Remote Control[edit]
Selenium Remote Control (RC) is a server, written in Java, that accepts commands for the browser
via HTTP. RC makes it possible to write automated tests for a web application in any programming
language, which allows for better integration of Selenium in existing unit test frameworks. To make
writing tests easier, Selenium project currently provides client drivers
for PHP, Python, Ruby, .NET, Perl and Java. The Java driver can also be used with JavaScript (via
the Rhino engine). An instance of selenium RC server is needed to launch html test case - which
means that the port should be different for each parallel run.[citation needed] However, for Java/PHP test
case only one Selenium RC instance needs to be running continuously.[citation needed]
Selenium Remote Control was a refactoring of Driven Selenium or Selenium B designed by Paul
Hammant, credited with Jason as co-creator of Selenium. The original version directly launched a
process for the browser in question, from the test language of Java, .Net, Python or Ruby. The wire
protocol (called 'Selenese' in its day) was reimplemented in each language port. After the refactor by
Dan Fabulich, and Nelson Sproul (with help from Pat Lightbody) there was an intermediate daemon
process between the driving test script, and the browser. The benefits included the ability to drive
remote browsers, and the reduced need to port every line of code to an increasingly growing set of
languages. Selenium Remote Control completely took over from the Driven Selenium code-line in
2006. The browser pattern for 'Driven'/'B' and 'RC' was response/request, which subsequently
became known as Comet.
With the release of Selenium 2, Selenium RC has been officially deprecated in favor of Selenium
WebDriver.
Selenium WebDriver[edit]
Selenium WebDriver is the successor to Selenium RC. Selenium WebDriver accepts commands
(sent in Selenese, or via a Client API) and sends them to a browser. This is implemented through a
browser-specific browser driver, which sends commands to a browser, and retrieves results. Most
browser drivers actually launch and access a browser application (such
as Firefox, Chrome or Internet Explorer); there is also an HtmlUnit browser driver, which simulates a
browser using HtmlUnit.
Unlike in Selenium 1, where the Selenium server was necessary to run tests, Selenium WebDriver
does not need a special server to execute tests. Instead, the WebDriver directly starts a browser
instance and controls it. However, Selenium Grid can be used with WebDriver to execute tests on
remote systems (see below). Where possible, WebDriver uses native operating system level
functionality rather than browser-based JavaScript commands to drive the browser. This bypasses
problems with subtle differences between native and JavaScript commands, including security
restrictions.[4]
In practice, this means that the Selenium 2.0 API has significantly fewer calls than does the
Selenium 1.0 API. Where Selenium 1.0 attempted to provide a rich interface for many different
browser operations, Selenium 2.0 aims to provide a basic set of building blocks from which
developers can create their own Domain Specific Language. One such DSL already exists:
the Watir project in the Ruby language has a rich history of good design. Watir-webdriver
implements the Watir API as a wrapper for Selenium-Webdriver in Ruby. Watir-webdriver is created
entirely automatically, based on the WebDriver specification and the HTML specification.
As of early 2012, Simon Stewart (inventor of WebDriver), who was then with Google and now with
Facebook, and David Burns of Mozilla were negotiating with the W3C to make WebDriver an internet
standard. In July 2012, the working draft was released. Selenium-Webdriver (Selenium 2.0) is fully
implemented and supported in Python, Ruby, Java, and C#.
Selenium Grid[edit]
Selenium Grid is a server that allows tests to use web browser instances running on remote
machines. With Selenium Grid, one server acts as the hub. Tests contact the hub to obtain access to
browser instances. The hub has a list of servers that provide access to browser instances
(WebDriver nodes), and lets tests use these instances. Selenium Grid allows running tests in parallel
on multiple machines, and to manage different browser versions and browser configurations
centrally (instead of in each individual test).
The ability to run tests on remote browser instances is useful to spread the load of testing across
several machines, and to run tests in browsers running on different platforms or operating systems.
The latter is particularly useful in cases where not all browsers to be used for testing can run on the
same platform.
See also[edit]
Acceptance testing
Capybara (software)
Given-When-Then
List of web testing tools
MediaWiki Selenium extension
MediaWiki Selenium Framework extension
Regression testing
Robot Framework
Sikuli
Apache Spark
From Wikipedia, the free encyclopedia
Apache Spark
Original author(s) Matei Zaharia
Repository github.com/apache/spark
learning algorithms
Website spark.apache.org
Contents
[hide]
1Overview
o 1.1Spark Core
o 1.2Spark SQL
o 1.3Spark Streaming
o 1.4MLlib Machine Learning Library
o 1.5GraphX
2History
3Notes
4See also
5References
6External links
Overview[edit]
Apache Spark provides programmers with an application programming interface centered on a data
structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed
over a cluster of machines, that is maintained in a fault-tolerant way.[2] It was developed in response
to limitations in the MapReduce cluster computing paradigm, which forces a particular
linear dataflow structure on distributed programs: MapReduce programs read input data from
disk, map a function across the data, reduce the results of the map, and store reduction results on
disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately)
restricted form of distributed shared memory.[3]
The availability of RDDs facilitates the implementation of both iterative algorithms, that visit their
dataset multiple times in a loop, and interactive/exploratory data analysis, i.e., the
repeated database-style querying of data. The latency of such applications (compared to a
MapReduce implementation, as was common in Apache Hadoop stacks) may be reduced by several
orders of magnitude.[2][4] Among the class of iterative algorithms are the training algorithms
for machine learning systems, which formed the initial impetus for developing Apache Spark.[5]
Apache Spark requires a cluster manager and a distributed storage system. For cluster
management, Spark supports standalone (native Spark cluster), Hadoop YARN, or Apache
Mesos.[6] For distributed storage, Spark can interface with a wide variety, including Hadoop
Distributed File System (HDFS),[7] MapR File System (MapR-FS),[8]Cassandra,[9] OpenStack
Swift, Amazon S3, Kudu, or a custom solution can be implemented. Spark also supports a pseudo-
distributed local mode, usually used only for development or testing purposes, where distributed
storage is not required and the local file system can be used instead; in such a scenario, Spark is
run on a single machine with one executor per CPU core.
Spark Core[edit]
Spark Core is the foundation of the overall project. It provides distributed task dispatching,
scheduling, and basic I/O functionalities, exposed through an application programming interface
(for Java, Python, Scala, and R) centered on the RDD abstraction (the Java API is available for other
JVM languages, but is also usable for some other non-JVM languages, such as Julia,[10] that can
connect to the JVM). This interface mirrors a functional/higher-order model of programming: a
"driver" program invokes parallel operations such as map, filter or reduce on an RDD by passing a
function to Spark, which then schedules the function's execution in parallel on the cluster.[2] These
operations, and additional ones such as joins, take RDDs as input and produce new RDDs. RDDs
are immutable and their operations are lazy; fault-tolerance is achieved by keeping track of the
"lineage" of each RDD (the sequence of operations that produced it) so that it can be reconstructed
in the case of data loss. RDDs can contain any type of Python, Java, or Scala objects.
Aside from the RDD-oriented functional style of programming, Spark provides two restricted forms of
shared variables: broadcast variables reference read-only data that needs to be available on all
nodes, while accumulators can be used to program reductions in an imperative style.[2]
A typical example of RDD-centric functional programming is the following Scala program that
computes the frequencies of all words occurring in a set of text files and prints the most common
ones. Each map, flatMap (a variant of map) and reduceByKey takes an anonymous function that
performs a simple operation on a single data item (or a pair of items), and applies its argument to
transform an RDD into a new RDD.
val conf = new SparkConf().setAppName("wiki_test") // create a spark config
object
val sc = new SparkContext(conf) // Create a spark context
val data = sc.textFile("/path/to/somedir") // Read files from "somedir" into
an RDD of (filename, content) pairs.
val tokens = data.flatMap(_.split(" ")) // Split each file into a list of
tokens (words).
val wordFreq = tokens.map((_, 1)).reduceByKey(_ + _) // Add a count of one to
each token, then sum the counts per word type.
wordFreq.sortBy(s => -s._2).map(x => (x._2, x._1)).top(10) // Get the top 10
words. Swap word and count to sort by count.
Spark SQL[edit]
Spark SQL is a component on top of Spark Core that introduced a data abstraction called
DataFrames,[a] which provides support for structured and semi-structured data. Spark SQL provides
a domain-specific language (DSL) to manipulate DataFrames in Scala, Java, or Python. It also
provides SQL language support, with command-line interfaces and ODBC/JDBC server. Although
DataFrames lack the compile-time type-checking afforded by RDDs, as of Spark 2.0, the strongly
typed DataSet is fully supported by Spark SQL as well.
import org.apache.spark.sql.SQLContext
val url =
"jdbc:mysql://yourIP:yourPort/test?user=yourUsername;password=yourPassword"
// URL for your database server.
val sqlContext = new org.apache.spark.sql.SQLContext(sc) // Create a sql
context object
val df = sqlContext
.read
.format("jdbc")
.option("url", url)
.option("dbtable", "people")
.load()
Spark Streaming[edit]
Spark Streaming leverages Spark Core's fast scheduling capability to perform streaming analytics. It
ingests data in mini-batches and performs RDD transformations on those mini-batches of data. This
design enables the same set of application code written for batch analytics to be used in streaming
analytics, thus facilitating easy implementation of lambda architecture.[11][12] However, this
convenience comes with the penalty of latency equal to the mini-batch duration. Other streaming
data engines that process event by event rather than in mini-batches include Storm and the
streaming component of Flink.[13] Spark Streaming has support built-in to consume
from Kafka, Flume, Twitter, ZeroMQ, Kinesis, and TCP/IP sockets.[14]
MLlib Machine Learning Library[edit]
Spark MLlib is a distributed machine learning framework on top of Spark Core that, due in large part
to the distributed memory-based Spark architecture, is as much as nine times as fast as the disk-
based implementation used by Apache Mahout (according to benchmarks done by the MLlib
developers against the Alternating Least Squares (ALS) implementations, and before Mahout itself
gained a Spark interface), and scales better than Vowpal Wabbit.[15] Many common machine learning
and statistical algorithms have been implemented and are shipped with MLlib which simplifies large
scale machine learning pipelines, including:
History[edit]
Spark was initially started by Matei Zaharia at UC Berkeley's AMPLab in 2009, and open sourced in
2010 under a BSD license.
In 2013, the project was donated to the Apache Software Foundation and switched its license
to Apache 2.0. In February 2014, Spark became a Top-Level Apache Project.[22]
In November 2014, Spark founder M. Zaharia's company Databricks set a new world record in large
scale sorting using Spark.[23][third-party source needed]
Spark had in excess of 1000 contributors in 2015,[24] making it one of the most active projects in the
Apache Software Foundation[25] and one of the most active open source big data projects.[26]
Given the popularity of the platform by 2014, paid programs like General Assembly and free
fellowships like The Data Incubator have started offering customized training courses [27]
Legend:
Old version
Latest version
Tableau Software
From Wikipedia, the free encyclopedia
A major contributor to this article appears to have a close connection with its
subject. It may require cleanup to comply with Wikipedia's content policies,
particularly neutral point of view. Please discuss further on the talk
page. (November 2016) (Learn how and when to remove this template message)
Tableau Software
Type Public
Industry Software
Website tableau.com
Contents
[hide]
1Public company
2Wikileaks and policy changes
3Awards
4References
5External links
Public company[edit]
On May 17, 2013, Tableau launched an initial public offering (IPO) on the New York Stock
Exchange,[14] raising more than $250 million USD.[15] Prior to its IPO, Tableau raised over $45 million
in venture capital investment from investors such as NEA and Meritech.[15]
The company's 2013 revenue reached $232.44 million, an 82% growth over 2012's $128 million.[16] In
2010, Tableau reported revenue of $34.2 million. That figure grew to $62.4 million in 2011 and
$127.7 million in 2012. Profit during the same periods came to $2.7 million, $3.4 million, and $1.6
million, respectively.[17] The founders moved the company to Seattle, Washington in October, 2003,
where it remains headquartered today.[18] In August 2016, Tableau announced the appointment of
Adam Selipsky as president and CEO, effective September 16, 2016, replacing co-founder Christian
Chabot as CEO.[19]
On December 2, 2010, Tableau withdrew its visualizations from the contents of the United States
diplomatic cables leak by WikiLeaks, with Tableau stating that it was directly due to political pressure
from US Senator Joe Lieberman.[20][21]
On February 21, 2011, Tableau posted an updated data policy.[22] The accompanying blog post cited
the two main changes as (1) creating a formal complaint process and (2) using freedom of speech
as a guiding principle.[23] In addition, the post announced the creation of an advisory board to help the
company navigate future situations that "push the boundaries of the policy."[23] Tableau likened the
new policy to the model set forth in the Digital Millennium Copyright Act, and opined that under the
new policy, the Wikileaks cables would not have been removed.[24]
Awards[edit]
Tableau Software has won awards including "Best Overall in Data Visualization" by DM Review,
"Best of 2005 for Data Analysis" by PC Magazine,[25] and "2008 Best Business Intelligence Solution
(CODiE award)" by the Software & Information Industry Association.[26]
References[edit]
1. ^ Jump up to:a b