0. COVID19 - Consumption Pattern Change in Itaewon
* There was an administrative order after a large scale of infection of COVID19 that started from a club located in Itaewon.
* The order was about the refrain from operating entertainment facilities (clubs, entertainment bars, etc.)
* Effective period: 2020.05.08 ~ 2020.06.07 (1 month, can be extended)
1. Data Preprocessing
* First of all, I collected the modules needed for the analysis
!sudo apt-get install -y fonts-nanum
!sudo fc-cache -fv
!rm ~/.cache/matplotlib -rf
import warnings
warnings.filterwarnings("ignore")
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
import pandas as pd
import seaborn as sns
import math
from datetime import date, timedelta
%matplotlib inline
from google.colab import drive
drive.mount('/content/drive')
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
fonts-nanum
0 upgraded, 1 newly installed, 0 to remove and 17 not upgraded.
Need to get 9,604 kB of archives.
After this operation, 29.5 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 fonts-nanum all 20170925-1 [9,604 kB]
Fetched 9,604 kB in 0s (39.6 MB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 1.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin:
Selecting previously unselected package fonts-nanum.
(Reading database ... 146425 files and directories currently installed.)
Preparing to unpack .../fonts-nanum_20170925-1_all.deb ...
Unpacking fonts-nanum (20170925-1) ...
Setting up fonts-nanum (20170925-1) ...
Processing triggers for fontconfig (2.12.6-0ubuntu2) ...
/usr/share/fonts: caching, new cache contents: 0 fonts, 1 dirs
/usr/share/fonts/truetype: caching, new cache contents: 0 fonts, 3 dirs
/usr/share/fonts/truetype/humor-sans: caching, new cache contents: 1 fonts, 0 dirs
/usr/share/fonts/truetype/liberation: caching, new cache contents: 16 fonts, 0 dirs
/usr/share/fonts/truetype/nanum: caching, new cache contents: 10 fonts, 0 dirs
/usr/local/share/fonts: caching, new cache contents: 0 fonts, 0 dirs
/root/.local/share/fonts: skipping, no such directory
/root/.fonts: skipping, no such directory
/var/cache/fontconfig: cleaning cache directory
/root/.cache/fontconfig: not cleaning non-existent cache directory
/root/.fontconfig: not cleaning non-existent cache directory
fc-cache: succeeded
Mounted at /content/drive
1) Read the Card data and check the columns in the csv file
card = pd.read_csv('/content/drive/Shared drives/경제정보분석/KT_data_20200717/card_20200717.csv')
card
receipt_dttm | adstrd_code | adstrd_nm | mrhst_induty_cl_code | mrhst_induty_cl_nm | selng_cascnt | salamt | |
---|---|---|---|---|---|---|---|
0 | 20200104 | 1174066000 | 성내3동 | 7041 | 약국 | 463 | 5843230 |
1 | 20200104 | 1174066000 | 성내3동 | 7022 | 치과의원 | 33 | 7835550 |
2 | 20200104 | 1174066000 | 성내3동 | 7021 | 한의원 | 53 | 4589800 |
3 | 20200104 | 1174066000 | 성내3동 | 7020 | 의원 | 339 | 9267240 |
4 | 20200104 | 1174066000 | 성내3동 | 6110 | 자동차정비 | 19 | 4441000 |
... | ... | ... | ... | ... | ... | ... | ... |
3713947 | 20200614 | 1121586000 | 구의2동 | 4004 | 대형할인점 | 136 | 1195640 |
3713948 | 20200614 | 1121584700 | 자양4동 | 9010 | 인테 리어 | 2 | 13100 |
3713949 | 20200614 | 1162052500 | 보라매동 | 6140 | 주 차 장 | 2 | 14500 |
3713950 | 20200614 | 1135056000 | 월계1동 | 9998 | 기타전문점 | 1 | 38000 |
3713951 | 20200614 | 1159060500 | 흑석동 | 9204 | 사무통신기기수리 | 1 | 20000 |
3713952 rows × 7 columns
* Here, we have the receipt date, adress code, town name, industry code, industry category name, and the sales amount
* We need to pull out the data for which the town name corresponds to "Itaewon"
* Then, the remaining columns would be the date, town, category, and the sales amount.
itaewon = card[(card.adstrd_nm == '이태원1동') | (card.adstrd_nm == '이태원2동')]
itaewon_need = itaewon[['receipt_dttm', 'adstrd_nm', 'mrhst_induty_cl_nm', 'salamt']]
itaewon_need.columns = ['date', 'town', 'category', 'sale']
itaewon_need
date | town | category | sale | |
---|---|---|---|---|
7911 | 20200104 | 이태원2동 | 제과점 | 719750 |
7912 | 20200104 | 이태원2동 | 스넥 | 840500 |
7913 | 20200104 | 이태원2동 | 주점 | 1766300 |
7914 | 20200104 | 이태원2동 | 서양음식 | 11295450 |
7915 | 20200104 | 이태원2동 | 중국음식 | 2352500 |
... | ... | ... | ... | ... |
3713099 | 20200614 | 이태원1동 | 주 차 장 | 619200 |
3713107 | 20200614 | 이태원1동 | 악세 사리 | 329200 |
3713516 | 20200614 | 이태원2동 | 사진관 | 35000 |
3713565 | 20200614 | 이태원1동 | 시 계 | 2494750 |
3713800 | 20200614 | 이태원1동 | 기타전기제품 | 8250 |
14420 rows × 4 columns
2) Group the rows by their dates and categories
itaewon_category = itaewon_need.groupby(['date', 'category'])
itaewon_category = itaewon_category.agg({'sale': ['sum']})
itaewon_category = itaewon_category.reset_index()
itaewon_category
date | category | sale | |
---|---|---|---|
sum | |||
0 | 20200104 | 기타숙박업 | 737000 |
1 | 20200104 | 기타음료식품 | 3276100 |
2 | 20200104 | 미 용 원 | 2193000 |
3 | 20200104 | 서양음식 | 83769020 |
4 | 20200104 | 슈퍼 마켓 | 4088100 |
... | ... | ... | ... |
9977 | 20200614 | 피부미용실 | 35000 |
9978 | 20200614 | 한의원 | 125100 |
9979 | 20200614 | 헬스 크럽 | 867000 |
9980 | 20200614 | 화 원 | 246000 |
9981 | 20200614 | 화 장 품 | 2305800 |
9982 rows × 3 columns
2. Comparison of Before & After the Itaewon Infection
itaewon_before = itaewon_category[(itaewon_category.date >= 20200424) & (itaewon_category.date <= 20200508)]
itaewon_after = itaewon_category[(itaewon_category.date >= 20200509) & (itaewon_category.date <= 20200523)]
print(itaewon_before)
print()
print(itaewon_after)
date category sale
sum
6331 20200424 1급 호텔 659500
6332 20200424 건강식품(회원제형태) 30000
6333 20200424 관광여행 43500
6334 20200424 기계 공구 5500
6335 20200424 기타가구 854000
... ... ... ...
7386 20200508 한의원 114900
7387 20200508 헬스 크럽 1328000
7388 20200508 화 랑 426000
7389 20200508 화 원 1681100
7390 20200508 화 장 품 598000
[1060 rows x 3 columns]
date category sale
sum
7391 20200509 1급 호텔 805000
7392 20200509 건강식품(회원제형태) 50000
7393 20200509 관광여행 6900
7394 20200509 기타 교육 315000
7395 20200509 기타건축자재 79500
... ... ... ...
8420 20200523 한의원 450500
8421 20200523 헬스 크럽 146000
8422 20200523 화 원 140000
8423 20200523 화 장 품 528150
8424 20200523 화방표구점 100000
[1034 rows x 3 columns]
1) Before the Administrative Order (2 weeks data)
itaewon_before_sum = itaewon_before.groupby(['category']).sum()
itaewon_before_sum = itaewon_before_sum['sale']
itaewon_before_sum
sum | |
---|---|
category | |
1급 호텔 | 23049955 |
가 방 | 7092500 |
가정용품수리 | 599000 |
건강식품(회원제형태) | 148000 |
골프연습장 | 1100000 |
... | ... |
헬스 크럽 | 15848300 |
화 랑 | 8553000 |
화 원 | 9326970 |
화 장 품 | 22929210 |
화방표구점 | 80000 |
102 rows × 1 columns
2) After the Administrative Order(2 weeks data)
itaewon_after_sum = itaewon_after.groupby(['category']).sum()
itaewon_after_sum = itaewon_after_sum['sale']
itaewon_after_sum
sum | |
---|---|
category | |
1급 호텔 | 5041220 |
가 방 | 4977000 |
가정용품수리 | 340000 |
건강식품(회원제형태) | 132000 |
골프연습장 | 1455000 |
... | ... |
혼수전문점 | 1000000 |
화 랑 | 7050000 |
화 원 | 6520100 |
화 장 품 | 18422470 |
화방표구점 | 100000 |
107 rows × 1 columns
3) Merge the Before & After data
-
I addded two columns that represent the difference and percentage change of the sales amount
itaewon_all = itaewon_before_sum.merge(itaewon_after_sum, left_index=True, right_index=True)
itaewon_all.columns = ['before', 'after']
itaewon_all['difference'] = itaewon_all.after - itaewon_all.before
itaewon_all['pct_ch'] = itaewon_all.difference / itaewon_all.before
itaewon_all
before | after | difference | pct_ch | |
---|---|---|---|---|
category | ||||
1급 호텔 | 23049955 | 5041220 | -18008735 | -0.781292 |
가 방 | 7092500 | 4977000 | -2115500 | -0.298273 |
가정용품수리 | 599000 | 340000 | -259000 | -0.432387 |
건강식품(회원제형태) | 148000 | 132000 | -16000 | -0.108108 |
골프연습장 | 1100000 | 1455000 | 355000 | 0.322727 |
... | ... | ... | ... | ... |
헬스 크럽 | 15848300 | 11846280 | -4002020 | -0.252520 |
화 랑 | 8553000 | 7050000 | -1503000 | -0.175728 |
화 원 | 9326970 | 6520100 | -2806870 | -0.300941 |
화 장 품 | 22929210 | 18422470 | -4506740 | -0.196550 |
화방표구점 | 80000 | 100000 | 20000 | 0.250000 |
99 rows × 4 columns
4) Check the Higher & Lower rank of Industry Categories
* According to their difference & percentage change
print(itaewon_all.loc[itaewon_all['difference'].idxmax()])
print()
print(itaewon_all.loc[itaewon_all['pct_ch'].idxmax()])
before 4.954647e+07
after 7.629203e+07
difference 2.674556e+07
pct_ch 5.398076e-01
Name: 일반 가구, dtype: float64
before 5.080000e+04
after 2.068000e+06
difference 2.017200e+06
pct_ch 3.970866e+01
Name: 문화취미기타, dtype: float64
print(itaewon_all.loc[itaewon_all['difference'].idxmin()])
print()
print(itaewon_all.loc[itaewon_all['pct_ch'].idxmin()])
before 1.191118e+09
after 2.858069e+08
difference -9.053115e+08
pct_ch -7.600517e-01
Name: 서양음식, dtype: float64
before 1.081342e+07
after 8.486000e+04
difference -1.072856e+07
pct_ch -9.921523e-01
Name: 통신 기기, dtype: float64
3. Sort by Difference
itaewon_sort_diff = itaewon_all.sort_values(by=['difference'])
itaewon_sort_diff
before | after | difference | pct_ch | |
---|---|---|---|---|
category | ||||
서양음식 | 1191118370 | 285806872 | -905311498 | -0.760052 |
일반한식 | 617036568 | 207923536 | -409113032 | -0.663029 |
주점 | 238128744 | 32349380 | -205779364 | -0.864152 |
유흥주점 | 114567350 | 981500 | -113585850 | -0.991433 |
편 의 점 | 222432993 | 154366340 | -68066653 | -0.306010 |
... | ... | ... | ... | ... |
기타회원제형태업소4 | 2580100 | 5318490 | 2738390 | 1.061350 |
수입자동차 | 43268046 | 57520226 | 14252180 | 0.329393 |
의료 용품 | 10909800 | 27868800 | 16959000 | 1.554474 |
유아원 | 8457000 | 31322320 | 22865320 | 2.703715 |
일반 가구 | 49546470 | 76292030 | 26745560 | 0.539808 |
99 rows × 4 columns
Categories of Lower Rank (Negative Value)
* Western food (서양음식)
* Korean food (일반한식)
* Bar (주점)
* Entertainment Bar (유흥주점)
* Convenient Store (편의점)
Categories of Higher Rank (Positive Value)
* Furniture (일반가구)
* Children Care Center (유아원)
* Health Care Goods (의료용품)
* Imported Car (수입자동차)
4. Sort by Percentage Change
itaewon_sort_pct = itaewon_all.sort_values(by=['pct_ch'])
itaewon_sort_pct
before | after | difference | pct_ch | |
---|---|---|---|---|
category | ||||
통신 기기 | 10813420 | 84860 | -10728560 | -0.992152 |
유흥주점 | 114567350 | 981500 | -113585850 | -0.991433 |
완 구 점 | 224400 | 2650 | -221750 | -0.988191 |
내의판매점 | 321000 | 15000 | -306000 | -0.953271 |
노래방 | 8881500 | 949500 | -7932000 | -0.893092 |
... | ... | ... | ... | ... |
사무서비스 | 824100 | 2996830 | 2172730 | 2.636488 |
유아원 | 8457000 | 31322320 | 22865320 | 2.703715 |
페 인 트 | 21000 | 264550 | 243550 | 11.597619 |
기타건강식 | 45000 | 910000 | 865000 | 19.222222 |
문화취미기타 | 50800 | 2068000 | 2017200 | 39.708661 |
99 rows × 4 columns
Categories of Lower Rank (Negative Value)
* Telephone System (통신기기)
* Entertainment Bar (유흥주점)
* Toy Store (완구점)
* Innerwear Store (내의판매점)
* Karaoke (노래방)
Categories of Higher Rank (Positive Value)
* Culture, Hobby (문화취미기타)
* Health Food (기타건강식)
* Painting (페인트)
* Children Care Center (유아원)
* Desk Service (사무서비스)
5. Plot by Matplotlib
plt.rc('font', family='NanumBarunGothic')
itaewon_plot = pd.concat([itaewon_sort_pct.head(), itaewon_sort_pct.tail()])
itaewon_plot.index = ['Telephone', 'Bar', 'Toy', 'Inner', 'Karaoke', 'Desk',
'Children Care', 'Painting', 'Health Food', 'Culture']
itaewon_plot
before | after | difference | pct_ch | |
---|---|---|---|---|
Telephone | 10813420 | 84860 | -10728560 | -0.992152 |
Bar | 114567350 | 981500 | -113585850 | -0.991433 |
Toy | 224400 | 2650 | -221750 | -0.988191 |
Inner | 321000 | 15000 | -306000 | -0.953271 |
Karaoke | 8881500 | 949500 | -7932000 | -0.893092 |
Desk | 824100 | 2996830 | 2172730 | 2.636488 |
Children Care | 8457000 | 31322320 | 22865320 | 2.703715 |
Painting | 21000 | 264550 | 243550 | 11.597619 |
Health Food | 45000 | 910000 | 865000 | 19.222222 |
Culture | 50800 | 2068000 | 2017200 | 39.708661 |
import seaborn as sns
import matplotlib.font_manager as fm
plt.rcParams["figure.figsize"] = (10, 10)
color = []
for i in np.arange(len(itaewon_plot.difference)) :
if (itaewon_plot.difference[i] > 0) :
color.append('lightblue')
else :
color.append('lightpink')
ax = sns.scatterplot(itaewon_plot.difference, itaewon_plot.pct_ch, s = np.abs(itaewon_plot.pct_ch) * 1000,
color = color, alpha=0.4, edgecolors="grey", linewidth=2, legend = 'full')
#For each point, add text inside the bubble
for line in range(0,itaewon_plot.shape[0]):
ax.text(itaewon_plot.difference[line], itaewon_plot.pct_ch[line], itaewon_plot.index[line],
horizontalalignment='center', size='large', color='black', weight='semibold', rotation = 40)
plt.title("\n<Itaewon> Strongly Affected Categories\n", fontsize = 20)
plt.xlabel("Difference in sale", fontsize = 15)
plt.ylabel("Percentage change in sale", fontsize = 15)
#plt.xticks(color = 'w')
#plt.yticks(color = 'w')
axes = plt.gca()
axes.yaxis.grid()
plt.show()
findfont: Font family ['NanumBarunGothic'] not found. Falling back to DejaVu Sans.
findfont: Font family ['NanumBarunGothic'] not found. Falling back to DejaVu Sans.
findfont: Font family ['NanumBarunGothic'] not found. Falling back to DejaVu Sans.
findfont: Font family ['NanumBarunGothic'] not found. Falling back to DejaVu Sans.
* The size of the Bubble represents the scale of sales percentage change
* The blue bubble represents a positive change, while the red one refers to a negative change
* A lot of people spent their money on their personal hobby
* The 'Bar' Category decreased, but the change was not as substantial as the increase in the 'Culture, Hobby' category
plt.bar(itaewon_plot.index, itaewon_plot.pct_ch, color = color)
plt.title("\n<Itaewon> Strongly Affected Categories\n", fontsize=20)
plt.xlabel('Category', fontsize=18)
plt.ylabel('Percentage Change in Sales', fontsize=18)
plt.xticks(fontsize=15, rotation = 45)
axes = plt.gca()
axes.yaxis.grid()
plt.show()
findfont: Font family ['NanumBarunGothic'] not found. Falling back to DejaVu Sans.
* Another type of graph(bar graph) that shows the percentage changes of each category
'Python Programming > Projects' 카테고리의 다른 글
WebPage Content Crawling & Preprocessing (0) | 2021.08.03 |
---|---|
기업의 ESG 요소와 수익률 간의 상관관계 분석(2) - Clustering Analysis (1) | 2021.06.02 |
기업의 ESG 요소와 수익률 간의 상관관계 분석(1) - Data Prep & Exploration (0) | 2021.06.02 |
국가 및 연도 별 총생산량(GDP)과 1인당 GDP에 대한 비교분석(2) - 시각화와 결과 (1) | 2021.02.18 |
국가 및 연도 별 총생산량(GDP)과 1인당 GDP에 대한 비교분석(1) (0) | 2021.02.14 |