Sie sind auf Seite 1von 15

29/08/2019 Data Science Terminology Flashcards | Quizlet

Data Science Terminology


STUDY

Flashcards

Learn

Write

Spell

Test

PLAY

Match

Created by

obscrivn TEACHER

Terms in this set (88)

Data Analyst Collects data, visualizes data with various


tools and looks for patterns and insights.
Knows basic statistics and has
business/domain knowledge

Data Engineer Develops and manages infrastructure


that deals with big data. A specialist in
Data Wrangling. Well versed with tools

https://quizlet.com/355671477/data-science-terminology-flash-cards/ 1/15
29/08/2019 Data Science Terminology Flashcards | Quizlet

such as Hadoop, NoSQL and


MapReduce. Sets up data pipelines

Data Wrangling Transform raw data into a form suitable


for analysis. For example, combining
multiple datasets, removing
inconsistencies, converting into a specific
format

Data Cleansing Raw data with missing values, bad


delimiters or inconsistent records is
repaired to syntactic and semantic
correctness

EDA Exploratory Data Analysis - a first step in


exploring data without statistical
modeling and inference

Aggregation the process through which data is


searched, gathered and presented.

Algorithm a mathematical process that can perform


a specific analysis or transformation on a
piece of data.

Analytics the discovery and communication of


insights derived from data, or the use of
software-based algorithms and statistics
to derive meaning from data.

Analytics Platform software and/or hardware that provide


the tools and computational power

https://quizlet.com/355671477/data-science-terminology-flash-cards/ 2/15
29/08/2019 Data Science Terminology Flashcards | Quizlet

needed to build and perform many


different analytical queries.

Anomaly Detection the systematic search for data items in a


dataset that deviate from a projected
pattern or expected behavior.

Artificial Intelligence the field of computer science related to


(A.I.) the development of machines and
software that are capable of perceiving
their environment and taking appropriate
action when required (in real-time), even
learning from those actions.

Behavioral Analytics investigates humanized patterns in the


data.

Big Data data sets with sizes beyond the ability of


commonly used software tools to
capture, curate, manage and process

Business Intelligence the theories, methodologies and


processes to make data, understandable
and more actionable.

Byte (B) an acronym for "binary term." A sequence


of bits that represents a character.

Central Processing Unit the brains of an information processing


(CPU) system; the processing component that
controls the interpretation and execution
of instructions in a computer.

https://quizlet.com/355671477/data-science-terminology-flash-cards/ 3/15
29/08/2019 Data Science Terminology Flashcards | Quizlet

Classification Analysis a systematic process for obtaining


important and relevant information about
data using classification algorithms.

Cloud a broad term that refers to any Internet-


based application or service that is
hosted remotely.

Cloud Computing a computing system whose processing is


distributed over a network that uses
server farms to store data in a distant
location (see also, data centers).

Clustering Analysis the process of identifying objects that


are similar to each other and grouping
them in order to understand the
differences and the similarities within the
data.

Comparative Analysis a process that ensures a step-by-step


procedure of comparisons and
calculations to detect patterns within
very large data sets.

Correlation Analysis a statistical technique for determining a


relationship between variables and
whether that relationship is negative or
positive.

Customer Relationship managing sales and business processes.


Management (CRM)

https://quizlet.com/355671477/data-science-terminology-flash-cards/ 4/15
29/08/2019 Data Science Terminology Flashcards | Quizlet

Dashboard a graphical representation of the


analyses performed by algorithms,
usually in the form of plots and gauges.

Data a quantitative or qualitative value.

Data Access the act or method of viewing or


retrieving stored data.

Data Aggregation Tools methods for transforming scattered data


from numerous sources into a new, single
source.

Data Analytics the application of software to derive


information or meaning from data. The
end result might be a report, an
indication of status or an action taken
automatically based on the information
received.

Data Analyst someone who analyzes, models,


cleanses, and/or processes data.

Database a digital collection of data and the


structure in which the data is organized
(structured).

Database Management collecting, storing and providing access


System (DBMS) of data through integrated software that
is practical to use even by non-
specialists.

https://quizlet.com/355671477/data-science-terminology-flash-cards/ 5/15
29/08/2019 Data Science Terminology Flashcards | Quizlet

Data Cleansing the process of reviewing and revising


data in order to delete duplicates,
correct errors and provide consistency.

Data Mining the process of finding certain patterns or


information from data sets in an
automated way. This is one popular way
to perform data exploration.

Data Modeling development of a graphic representation


defining the structure of data for the
purpose of communicating the data
needed for business processes between
functional and technical people or for
communicating a plan to develop how
data is stored and accessed among
application development team members.

Data Science a recent term that has multiple definitions


but is generally accepted as a discipline
that incorporates statistics, data
visualization, computer programming,
data mining, machine learning and
database engineering to solve complex
problems.

Discriminant Analysis a statistical analysis that takes advantage


of known groups or clusters in data to
derive the classification rule. It involves
cataloguing the data as well as
distributing it into groups, classes or
categories.

Event Analytics a process that shows the series of steps

https://quizlet.com/355671477/data-science-terminology-flash-cards/ 6/15
29/08/2019 Data Science Terminology Flashcards | Quizlet

that led to an action.

Exploratory Analysis finding patterns within data without


standard procedures or methods. It is a
means of discovering the data and
finding the data set's main characteristics,
it constitutes an important part of the
data science process.

Extract, Transform and a process for populating data in a


Load (ETL) database and data warehouse.

Hypertext a technology that links text in one part of


a document with related text in another
part of the document or in other
documents. A user can quickly find the
related text by clicking on the
appropriate keyword, key phrase, icon or
button.

Hypertext Transfer the protocol used on the World Wide


Protocol (HTTP) Web that permits Web clients (Web
browsers) to communicate with Web
servers. This protocol allows
programmers to embed hyperlinks in
Web documents using hypertext markup
language (HTML).

Internet of Things (IoT) ordinary devices that are connected to


the Internet at anytime and anywhere via
sensors. IoT is expected to contribute
substantially to the growth of big data.

Machine Learning (ML) the field of computer science related to


the development and use of algorithms
https://quizlet.com/355671477/data-science-terminology-flash-cards/ 7/15
29/08/2019 Data Science Terminology Flashcards | Quizlet

to enable machines to learn from what


they are doing and become better over
time.

Natural Language a field of computer science involved with


Processing (NLP) interactions between computers and
human languages.

Online Analytical the process of analyzing


Processing (OLAP) multidimensional data using three
operations: consolidation (the
aggregation of available data), drill-
down (the ability for users to see the
underlying details) and slice and dice
(the ability for users to select subsets and
view them from different perspectives).

Ontology ontology represents knowledge as a set


of concepts within a domain and the
relationships between those concepts.
Very useful when designing a database.

Outlier an object that deviates significantly from


the general average within a dataset or a
combination of data. It is numerically
distant from the rest of the data and
therefore indicates that something is
going on that requires additional
analysis.

Parallel Data Analysis breaking up an analytical problem into


smaller components and running
algorithms on each of those components
at the same time.

https://quizlet.com/355671477/data-science-terminology-flash-cards/ 8/15
29/08/2019 Data Science Terminology Flashcards | Quizlet

Predictive Analysis the most valuable analysis within big data


(Predictive Analytics) as it helps predict what someone is likely
to buy, visit or do as well as how
someone will behave in the (near) future.
It uses a variety of different data sets
such as historical, transactional, social, or
customer profile data to identify risks
and opportunities.

Predictive Modeling the process of developing a model to


predict a trend or outcome.

R an open-source programming language


and software environment for statistical
computing and graphics. The R language
is widely used among statisticians and
data miners for developing statistical
software and data analysis. R's popularity
has increased substantially in recent
years.

Regression Analysis a statistical technique for defining the


dependency between continuous
variables. It assumes a one-way causal
effect from one variable to the response
of another variable.

Risk Analysis the application of statistical methods on


one or more datasets to determine the
likely risk of a project, action or decision.

Root-Cause Analysis the process of determining the main


cause of an event or problem.

https://quizlet.com/355671477/data-science-terminology-flash-cards/ 9/15
29/08/2019 Data Science Terminology Flashcards | Quizlet

Semi-Structured Data a form a structured data that does not


conform to a formal structure the way
structured data does. It contains tags or
other markers to enforce a hierarchy of
records. usually found in .JSON objects.

Signal Analysis the analysis of measurement of time


varying or spatially varying physical
quantities to analyze the performance of
a product. .

Structured Data data that is identifiable because it is


organized in a structure such as rows and
columns. The data resides in fixed fields
within a record or file, or the data is
tagged correctly and can be accurately
identified.

Structured Query a programming language for retrieving


Language (SQL) data from a relational database. SQL is
not directly applicable in the big data
domain.

Text Analytics the application of statistical, linguistic


and machine learning techniques on
text-based sources to derive meaning or
insight.

Thread a series of posted messages that


represents an ongoing discussion of a
specific topic in a bulletin board system,
a newsgroup or a Web site.

https://quizlet.com/355671477/data-science-terminology-flash-cards/ 10/15
29/08/2019 Data Science Terminology Flashcards | Quizlet

Time Series Analysis the process of analyzing well-defined


data obtained through repeated
measurements of time. The data has to
be well-defined and measured at
successive points in time spaced at
identical time intervals.

Topological Data focusing on the shape of complex data


Analysis and identifying clusters and any
statistical significance that is present
within that data.

Unstructured Data data that is text heavy, in general, but


may also contain dates, numbers and
facts.

Variable A characteristic of quantity of interest


that can take on different values

Observation A set of values corresponding to a set of


variables

Variation Differences in values of a variable over


observations

Random variable A quantity whose values are not known


with certainty

Population The set of all elements of interest in a


particular study

https://quizlet.com/355671477/data-science-terminology-flash-cards/ 11/15
29/08/2019 Data Science Terminology Flashcards | Quizlet

Sample A subset of the population

Random sampling The act of collecting a sample that


ensures that (1) each element selected
comes from the same population and (2)
each element is selected independently

Quantitative data Data where numerical values are used to


indicate magnitude, such as how many or
how much. Arithmetic operations such as
addition, subtraction, and multiplication
can be performed on quantitative data

Categorical data Data where categories of like items are


identified by labels or names. Arithmetic
operations cannot be performed on
categorical data

Frequency distribution A tabular summary of data showing the


number (frequency) of data values in
each of several non-overlapping bins

Relative frequency A tabular summary of data showing the


distribution fraction or proportion of data values in
each of several non-overlapping bins

Percent frequency A tabular summary of data showing the


distribution percentage of data values in each of
several non-overlapping bins

Skewness A measure of the lack of symmetry in a


distribution

https://quizlet.com/355671477/data-science-terminology-flash-cards/ 12/15
29/08/2019 Data Science Terminology Flashcards | Quizlet

Cumulative frequency A tabular summary of quantitative data


distribution showing the number of data values that
are less than equal to the upper class
limit of each bin

Mean (Arithmetic Mean) A measure of central location computed


by summing the data values and dividing
by the number of observations

Median A measure of central location provided


by the value in the middle when the data
are arranged in ascending order

Mode A measure of location, defined as the


value that occurs with greatest
frequency

Geometric Mean A measure of location that is calculated


by finding the nth root of the product of
n values

Range A measure of variability, defined to be


the largest value minus the smallest value

Variance A measure of variability based on the


squared deviations of the data values
about the mean

Standard deviation A measure of variability computed by


taking the positive square root of the
variance

https://quizlet.com/355671477/data-science-terminology-flash-cards/ 13/15
29/08/2019 Data Science Terminology Flashcards | Quizlet

Percentile A value such that approximately p


percent of the observations have values
les than the pth percentile; hence,
approximately (100p) percent of the
observations have values greater than
the pth percentile. The 50th percentile is
the median

Quartile The 25th, 50th, 75th percentiles, referred


to as the first quartile, the second
quartile (median), and third quartile,
respectively. The quartiles can be used to
divide a data set into four parts, with
each part containing approximately 25
percent of the data

Interquartile range The difference between the third and


first quartiles

Covariance A measure of linear association between


two variables. Positive values indicate a
positive relationship; negative values
indicate a negative relationship

YOU MIGHT ALSO LIKE...

STUDY GUIDE

Academic MSIS Chapter 6.2


Word Lists - 46 terms
AWL Sublists
10 sets

giflingua $12.99 arieldiane7

https://quizlet.com/355671477/data-science-terminology-flash-cards/ 14/15
29/08/2019 Data Science Terminology Flashcards | Quizlet

Chapter 6.2 Learnsmart 6.2 BIS


61 terms 29 terms

rreb32 emmalucky

1/3

https://quizlet.com/355671477/data-science-terminology-flash-cards/ 15/15

Das könnte Ihnen auch gefallen