Sie sind auf Seite 1von 3

DEVELOPING INTIMACY WITH YOUR DATA

This exercise involves you working with a dataset of your choosing. Visit the Kaggle website, browse
through the options and find a dataset of interest, then follow the simple instructions to download it.
With acquisition completed, work through the remaining key steps of examining, transforming and
exploring your data to develop a robust familiarisation with its potential offering:

Examination: Thoroughly examine the physical properties (type, size, condition) of your dataset, noting
down useful observations or descriptions where relevant.

Transformation: What could you do/would you need to do to clean or modify the existing data to create
new values to work with? What other data could you imagine would be valuable to consolidate the
existing data?

Exploration: Using a tool of your choice (such as Excel, Tableau, R) to visually explore the dataset in
order to deepen your appreciation of the physical properties and their discoverable qualities (insights)
to help you cement your understanding of their respective value. If you don’t have scope or time to use
a tool, use your imagination to consider what angles of analysis you might explore if you had the
opportunity? What piques your interest about this subject?

(You can, of course, repeat this exercise on any subject and any dataset of your choice, not just those on
Kaggle.)

Assignment Link: http://book.visualisingdata.com/chapter/chapter-4


My data of choice is:

New York City Airbnb Open Data

Airbnb listings and metrics in NYC, NY, USA (2019)


Link https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data

Examination

Size - 2MB
Type - CSV file
Description - This is a Summary information and metrics for listings in New York City. It is good for
exploration, visualizations and predictions.

Transformation:
I would have to clean the data by modifying the “name” field to have a clear and precise name not a
whole paragraphs as it is given. For easy operations of the data a single place name would be a
appropriate because of given latitude and longitude the place won’t have similar name.

Exploration:
Because of the large data set more than 48000 records I wouldn’t make a great analysis of the data at
the moment it needs a lot of time for clear and accurate analysis. But my big interest would be to
analyze the data based on location, name, price and the given reviews by clients who have visited the
promises. This would be a very good analogy for any client or customer planning to travel to a place
because through that he/she will have accurate information on the place price, location and also know
the experience of the place on how previous clients reviews.
From the data-set also here is a summary of analysis what I found interesting as well.

From the most popular visited groups of Manhattan and Brooklyn below is the analysis of the most
visited room.

Other
visited
groups also

Das könnte Ihnen auch gefallen