Project

Hochgeladen von

Saikumar

0% fanden dieses Dokument nützlich (0 Abstimmungen)

9 Ansichten10 Seiten

Originaltitel

project.docx

Copyright

Verfügbare Formate

DOCX, PDF, TXT oder online auf Scribd lesen

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Dieses Dokument melden

Copyright:

Verfügbare Formate

Als DOCX, PDF, TXT herunterladen oder online auf Scribd lesen

Markieren Sie unangemessene Inhalte

0% fanden dieses Dokument nützlich (0 Abstimmungen)

9 Ansichten10 Seiten

Project

Hochgeladen von

Saikumar

Copyright:

Verfügbare Formate

Als DOCX, PDF, TXT herunterladen oder online auf Scribd lesen

Markieren Sie unangemessene Inhalte

Zu Seite

Sie sind auf Seite 1von 10

Im Dokument suchen

Step 1: Scope the Project and Gather Data

Scope
The purpose of this project is to provide a deep dive into US immigration, primarily
focusing on the type of visas being issued and the profiles associated. The scope of this
project is limited to the data sources listed below with data being aggregated across
numerous features such as visatype, gender, port_of_entry, nationality and month.

Data Description & Sources

 I94 Immigration Data: This data comes from the US National Tourism and Trade
Office found here. Each report contains international visitor arrival statistics by
world regions and select countries (including top 20), type of visa, mode of
transportation, age groups, states visited (first intended address only), and the
top ports of entry (for select countries).
 World Temperature Data: This dataset came from Kaggle found here.
 U.S. City Demographic Data: This dataset contains information about the
demographics of all US cities and census-designated places with a population
greater or equal to 65,000. Dataset comes from OpenSoft found here.
 Airport Code Table: This is a simple table of airport codes and corresponding
cities. The airport codes may refer to either IATA airport code, a three-letter code
which is used in passenger reservation, ticketing and baggage-handling systems,
or the ICAO airport code which is a four letter code used by ATC systems and for
airports that do not have an IATA airport code (from wikipedia). It comes
from here.
Step 2: Preprocessing Data
Note: preprocessing was performed prior to storing CSV files in S3 buckets i.e.
converting expanding columns, Capitalizing/Lowercasing test etc.
Explore Data

 Identify missing values

 Identify duplicate values
Cleaning Steps

 Either drop rows or fill missing data with median values where appropriate
 Expand coordinates to Latitude & Longitude columns
 Expand locations to City & State columns e.g. the data provided
for port_of_entry_codes was originally code and location. These have
subsequently been expanded out to city and state_or_country as shown
below:

Step 3: Data Model

Step 4: Run Pipelines to Model the Data

4.1 Create the data model
Creating the data model involves various steps, which can be made significantly easier
through the use of Airflow. The process of extracting files from S3 buckets, transforming
the data and then writing CSV and PARQUET files to Redshift is accomplished through
various tasks highlighted below in the ETL Dag graph. These steps include:

 Extracting data from SAS Documents and writing as CSV files to S3 immigration
bucket
 Extracting remaining CSV and PARQUET files from S3 immigration bucket
 Writing CSV and PARQUET files from S3 to Redshift
 Performing data quality checks on the newly created tables
4.2 Data Quality Checks
Data quality checks include:

 Integrity constraints on the relational database (e.g., unique key, data type, etc.)
 Unit tests for the scripts to ensure they are doing the right thing
 Source/Count checks to ensure completeness

Das könnte Ihnen auch gefallen

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Von Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Bewertung: 4 von 5 Sternen
4/5 (5794)
ETL Pipelines and Data Warehouse Development
Dokument2 Seiten
ETL Pipelines and Data Warehouse Development
Saikumar
Noch keine Bewertungen
The Little Book of Hygge: Danish Secrets to Happy Living
Von Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Bewertung: 3.5 von 5 Sternen
3.5/5 (399)
ETL Pipelines and Data Warehouse Development
Dokument2 Seiten
ETL Pipelines and Data Warehouse Development
Saikumar
Noch keine Bewertungen
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Von Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Bewertung: 3.5 von 5 Sternen
3.5/5 (231)
OB 0361OrgStructure
Dokument18 Seiten
OB 0361OrgStructure
Saikumar
Noch keine Bewertungen
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Von Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Bewertung: 4 von 5 Sternen
4/5 (894)
Feasibility Study
Dokument2 Seiten
Feasibility Study
Saikumar
Noch keine Bewertungen
The Yellow House: A Memoir (2019 National Book Award Winner)
Von Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Bewertung: 4 von 5 Sternen
4/5 (98)
Feasibility Study-2
Dokument3 Seiten
Feasibility Study-2
Saikumar
Noch keine Bewertungen
Shoe Dog: A Memoir by the Creator of Nike
Von Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Bewertung: 4.5 von 5 Sternen
4.5/5 (537)
Project Management Virtual Teams
Dokument3 Seiten
Project Management Virtual Teams
Saikumar
Noch keine Bewertungen
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Von Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Bewertung: 4.5 von 5 Sternen
4.5/5 (474)
Feasibility Study
Dokument2 Seiten
Feasibility Study
Saikumar
Noch keine Bewertungen
Never Split the Difference: Negotiating As If Your Life Depended On It
Von Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Bewertung: 4.5 von 5 Sternen
4.5/5 (838)
Organizational Feasibility
Dokument1 Seite
Organizational Feasibility
Saikumar
Noch keine Bewertungen
Grit: The Power of Passion and Perseverance
Von Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Bewertung: 4 von 5 Sternen
4/5 (587)
Brand Communication Recommendations
Dokument3 Seiten
Brand Communication Recommendations
Saikumar
Noch keine Bewertungen
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Von Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Bewertung: 4.5 von 5 Sternen
4.5/5 (265)
Social Media Strategy and Recommendation
Dokument1 Seite
Social Media Strategy and Recommendation
Saikumar
Noch keine Bewertungen
Yes Please
Von Everand
Yes Please
Amy Poehler
Bewertung: 4 von 5 Sternen
4/5 (1891)
Brand Communication Recommendations
Dokument3 Seiten
Brand Communication Recommendations
Saikumar
Noch keine Bewertungen
Angela's Ashes: A Memoir
Von Everand
Angela's Ashes: A Memoir
Frank McCourt
Bewertung: 4.5 von 5 Sternen
4.5/5 (440)
Javascript PDF
Dokument5 Seiten
Javascript PDF
anas
Noch keine Bewertungen
The Emperor of All Maladies: A Biography of Cancer
Von Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Bewertung: 4.5 von 5 Sternen
4.5/5 (271)
SQL Server 2019 Licensing Guide
Dokument42 Seiten
SQL Server 2019 Licensing Guide
May Kung
Noch keine Bewertungen
On Fire: The (Burning) Case for a Green New Deal
Von Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Bewertung: 4 von 5 Sternen
4/5 (73)
Teradata Utilities
Dokument88 Seiten
Teradata Utilities
siva86
Noch keine Bewertungen
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Von Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Bewertung: 4.5 von 5 Sternen
4.5/5 (344)
ASCP Planning Flow Guide
Dokument33 Seiten
ASCP Planning Flow Guide
Wijana Nugraha
100% (4)
Team of Rivals: The Political Genius of Abraham Lincoln
Von Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Bewertung: 4.5 von 5 Sternen
4.5/5 (234)
Getting Started With Apache Kafka
Dokument21 Seiten
Getting Started With Apache Kafka
ancgate
Noch keine Bewertungen
Fear: Trump in the White House
Von Everand
Fear: Trump in the White House
Bob Woodward
Bewertung: 3.5 von 5 Sternen
3.5/5 (738)
Triplea Plus Overview
Dokument4 Seiten
Triplea Plus Overview
esomanto
100% (1)
The Glass Castle: A Memoir
Von Everand
The Glass Castle: A Memoir
Jeannette Walls
Bewertung: 4.5 von 5 Sternen
4.5/5 (1712)
HTML MCQ
Dokument5 Seiten
HTML MCQ
Pallab Datta
Noch keine Bewertungen
Rise of ISIS: A Threat We Can't Ignore
Von Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Bewertung: 3.5 von 5 Sternen
3.5/5 (137)
OLI Chemistry Wizard 1.0: User Guide
Dokument12 Seiten
OLI Chemistry Wizard 1.0: User Guide
Yves-donald Makoumbou
Noch keine Bewertungen
Principles: Life and Work
Von Everand
Principles: Life and Work
Ray Dalio
Bewertung: 4 von 5 Sternen
4/5 (599)
Performance Tuning and Databases
Dokument41 Seiten
Performance Tuning and Databases
valen_tseng
Noch keine Bewertungen
The Unwinding: An Inner History of the New America
Von Everand
The Unwinding: An Inner History of the New America
George Packer
Bewertung: 4 von 5 Sternen
4/5 (45)
Online Banking Srs
Dokument21 Seiten
Online Banking Srs
vickyvarath
81% (62)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Von Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Bewertung: 3.5 von 5 Sternen
3.5/5 (2219)
KDC-U453 KDC-U3053 KDC-U353 KDC-3054UM: Instruction Manual
Dokument22 Seiten
KDC-U453 KDC-U3053 KDC-U353 KDC-3054UM: Instruction Manual
Casc Dzil
Noch keine Bewertungen
Steve Jobs
Von Everand
Steve Jobs
Walter Isaacson
Bewertung: 4.5 von 5 Sternen
4.5/5 (806)
Javascript: Prof.N.Nalini Ap (SR) Scope Vit
Dokument35 Seiten
Javascript: Prof.N.Nalini Ap (SR) Scope Vit
Hansraj Rouniyar
Noch keine Bewertungen
John Adams
Von Everand
John Adams
David McCullough
Bewertung: 4.5 von 5 Sternen
4.5/5 (2409)
Docu54667 Connectrix Manager Converged Network Edition 12.1.6 Release Notes
Dokument61 Seiten
Docu54667 Connectrix Manager Converged Network Edition 12.1.6 Release Notes
zepolk
Noch keine Bewertungen
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Von Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Bewertung: 4 von 5 Sternen
4/5 (1090)
FCP Translation Results 2021-04-28 15-52
Dokument3 Seiten
FCP Translation Results 2021-04-28 15-52
Herbert Tagudin
Noch keine Bewertungen
Bad Feminist: Essays
Von Everand
Bad Feminist: Essays
Roxane Gay
Bewertung: 4 von 5 Sternen
4/5 (1015)
Filing NOTED
Dokument57 Seiten
Filing NOTED
oki mora
100% (1)
The Outsider: A Novel
Von Everand
The Outsider: A Novel
Stephen King
Bewertung: 4 von 5 Sternen
4/5 (1839)
Android Speech To Text Tutorial PDF
Dokument21 Seiten
Android Speech To Text Tutorial PDF
Shoaib Quraishi
Noch keine Bewertungen
Brooklyn: A Novel
Von Everand
Brooklyn: A Novel
Colm Toibin
Bewertung: 3.5 von 5 Sternen
3.5/5 (1937)
Oracle Linux Technology Overview - OTN Sysadmin Days - 2013-01-15 - 13559416817988563
Dokument34 Seiten
Oracle Linux Technology Overview - OTN Sysadmin Days - 2013-01-15 - 13559416817988563
evangelos13
Noch keine Bewertungen
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Von Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Bewertung: 4.5 von 5 Sternen
4.5/5 (119)
8 EFY-August 2014 - Pages Miss PDF
Dokument44 Seiten
8 EFY-August 2014 - Pages Miss PDF
Gautam Monipatro
Noch keine Bewertungen
A Man Called Ove: A Novel
Von Everand
A Man Called Ove: A Novel
Fredrik Backman
Bewertung: 4.5 von 5 Sternen
4.5/5 (4609)
Lesson 1
Dokument17 Seiten
Lesson 1
mvehbal
Noch keine Bewertungen
The Light Between Oceans: A Novel
Von Everand
The Light Between Oceans: A Novel
M.L. Stedman
Bewertung: 4.5 von 5 Sternen
4.5/5 (789)
DataStage Parallel Job Developer's Guide
Dokument706 Seiten
DataStage Parallel Job Developer's Guide
mohenishjaiswal
100% (2)
The Woman in Cabin 10
Von Everand
The Woman in Cabin 10
Ruth Ware
Bewertung: 3.5 von 5 Sternen
3.5/5 (2322)
Wong Wei Lung
Dokument5 Seiten
Wong Wei Lung
los2014
Noch keine Bewertungen
Manhattan Beach: A Novel
Von Everand
Manhattan Beach: A Novel
Jennifer Egan
Bewertung: 3.5 von 5 Sternen
3.5/5 (792)
Xquery Tutorial: Craig Knoblock University of Southern California
Dokument26 Seiten
Xquery Tutorial: Craig Knoblock University of Southern California
Godam Hadipradita
Noch keine Bewertungen
The Perks of Being a Wallflower
Von Everand
The Perks of Being a Wallflower
Stephen Chbosky
Bewertung: 4.5 von 5 Sternen
4.5/5 (2099)
Https WWW W3schools Com Css css3 Animations Asp
Dokument20 Seiten
Https WWW W3schools Com Css css3 Animations Asp
Ryan Barril
Noch keine Bewertungen
Wolf Hall: A Novel
Von Everand
Wolf Hall: A Novel
Hilary Mantel
Bewertung: 4 von 5 Sternen
4/5 (3811)
Ansible For Cisco Nexus Switches v1
Dokument22 Seiten
Ansible For Cisco Nexus Switches v1
pyxpdrlviqddcbyiju
Noch keine Bewertungen
Little Women
Von Everand
Little Women
Louisa May Alcott
Bewertung: 4 von 5 Sternen
4/5 (104)
Managing Your Datacenter: With System Center 2012 R2
Dokument176 Seiten
Managing Your Datacenter: With System Center 2012 R2
mycert
Noch keine Bewertungen
The Art of Racing in the Rain: A Novel
Von Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Bewertung: 4 von 5 Sternen
4/5 (4200)
3500+ Questions ITT Online Exam
Dokument282 Seiten
3500+ Questions ITT Online Exam
ARUN KUMAR
Noch keine Bewertungen
Sing, Unburied, Sing: A Novel
Von Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Bewertung: 4 von 5 Sternen
4/5 (1103)
Customer Table: Database Theory and Application
Dokument22 Seiten
Customer Table: Database Theory and Application
yes 1012
Noch keine Bewertungen
A Tree Grows in Brooklyn
Von Everand
A Tree Grows in Brooklyn
Betty Smith
Bewertung: 4.5 von 5 Sternen
4.5/5 (1929)
Audio Freeware Plugins by Sascha Eversmeier
Dokument1 Seite
Audio Freeware Plugins by Sascha Eversmeier
Allisa Pro
Noch keine Bewertungen
The Constant Gardener: A Novel
Von Everand
The Constant Gardener: A Novel
John le Carre
Bewertung: 3.5 von 5 Sternen
3.5/5 (104)
Data Compression and Encryption Techniques
Dokument3 Seiten
Data Compression and Encryption Techniques
Jay Mehta
Noch keine Bewertungen
Her Body and Other Parties: Stories
Von Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Bewertung: 4 von 5 Sternen
4/5 (821)
Master's Program Application in Computer Systems
Dokument1 Seite
Master's Program Application in Computer Systems
Petro Man
Noch keine Bewertungen