본문 바로가기

Python Programming/Projects

COVID19 - Consumption Pattern Change in Itaewon

0. COVID19 - Consumption Pattern Change in Itaewon

* There was an administrative order after a large scale of infection of COVID19 that started from a club located in Itaewon.

* The order was about the refrain from operating entertainment facilities (clubs, entertainment bars, etc.)

* Effective period: 2020.05.08 ~ 2020.06.07 (1 month, can be extended)

1. Data Preprocessing

* First of all, I collected the modules needed for the analysis

!sudo apt-get install -y fonts-nanum
!sudo fc-cache -fv
!rm ~/.cache/matplotlib -rf

import warnings
warnings.filterwarnings("ignore")

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
import pandas as pd
import seaborn as sns
import math
from datetime import date, timedelta

%matplotlib inline

from google.colab import drive
drive.mount('/content/drive')
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  fonts-nanum
0 upgraded, 1 newly installed, 0 to remove and 17 not upgraded.
Need to get 9,604 kB of archives.
After this operation, 29.5 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 fonts-nanum all 20170925-1 [9,604 kB]
Fetched 9,604 kB in 0s (39.6 MB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 1.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 
Selecting previously unselected package fonts-nanum.
(Reading database ... 146425 files and directories currently installed.)
Preparing to unpack .../fonts-nanum_20170925-1_all.deb ...
Unpacking fonts-nanum (20170925-1) ...
Setting up fonts-nanum (20170925-1) ...
Processing triggers for fontconfig (2.12.6-0ubuntu2) ...
/usr/share/fonts: caching, new cache contents: 0 fonts, 1 dirs
/usr/share/fonts/truetype: caching, new cache contents: 0 fonts, 3 dirs
/usr/share/fonts/truetype/humor-sans: caching, new cache contents: 1 fonts, 0 dirs
/usr/share/fonts/truetype/liberation: caching, new cache contents: 16 fonts, 0 dirs
/usr/share/fonts/truetype/nanum: caching, new cache contents: 10 fonts, 0 dirs
/usr/local/share/fonts: caching, new cache contents: 0 fonts, 0 dirs
/root/.local/share/fonts: skipping, no such directory
/root/.fonts: skipping, no such directory
/var/cache/fontconfig: cleaning cache directory
/root/.cache/fontconfig: not cleaning non-existent cache directory
/root/.fontconfig: not cleaning non-existent cache directory
fc-cache: succeeded
Mounted at /content/drive

 

1) Read the Card data and check the columns in the csv file

card = pd.read_csv('/content/drive/Shared drives/경제정보분석/KT_data_20200717/card_20200717.csv')
card
  receipt_dttm adstrd_code adstrd_nm mrhst_induty_cl_code mrhst_induty_cl_nm selng_cascnt salamt
0 20200104 1174066000 성내3동 7041 약국 463 5843230
1 20200104 1174066000 성내3동 7022 치과의원 33 7835550
2 20200104 1174066000 성내3동 7021 한의원 53 4589800
3 20200104 1174066000 성내3동 7020 의원 339 9267240
4 20200104 1174066000 성내3동 6110 자동차정비 19 4441000
... ... ... ... ... ... ... ...
3713947 20200614 1121586000 구의2동 4004 대형할인점 136 1195640
3713948 20200614 1121584700 자양4동 9010 인테 리어 2 13100
3713949 20200614 1162052500 보라매동 6140 주 차 장 2 14500
3713950 20200614 1135056000 월계1동 9998 기타전문점 1 38000
3713951 20200614 1159060500 흑석동 9204 사무통신기기수리 1 20000

3713952 rows × 7 columns

 

* Here, we have the receipt date, adress code, town name, industry code, industry category name, and the sales amount

* We need to pull out the data for which the town name corresponds to "Itaewon"

* Then, the remaining columns would be the date, town, category, and the sales amount.

itaewon = card[(card.adstrd_nm == '이태원1동') | (card.adstrd_nm == '이태원2동')]
itaewon_need = itaewon[['receipt_dttm', 'adstrd_nm', 'mrhst_induty_cl_nm', 'salamt']]
itaewon_need.columns = ['date', 'town', 'category', 'sale']
itaewon_need
  date town category sale
7911 20200104 이태원2동 제과점 719750
7912 20200104 이태원2동 스넥 840500
7913 20200104 이태원2동 주점 1766300
7914 20200104 이태원2동 서양음식 11295450
7915 20200104 이태원2동 중국음식 2352500
... ... ... ... ...
3713099 20200614 이태원1동 주 차 장 619200
3713107 20200614 이태원1동 악세 사리 329200
3713516 20200614 이태원2동 사진관 35000
3713565 20200614 이태원1동 시 계 2494750
3713800 20200614 이태원1동 기타전기제품 8250

14420 rows × 4 columns

 

2) Group the rows by their dates and categories

itaewon_category = itaewon_need.groupby(['date', 'category'])
itaewon_category = itaewon_category.agg({'sale': ['sum']})

itaewon_category = itaewon_category.reset_index()
itaewon_category
  date category sale
      sum
0 20200104 기타숙박업 737000
1 20200104 기타음료식품 3276100
2 20200104 미 용 원 2193000
3 20200104 서양음식 83769020
4 20200104 슈퍼 마켓 4088100
... ... ... ...
9977 20200614 피부미용실 35000
9978 20200614 한의원 125100
9979 20200614 헬스 크럽 867000
9980 20200614 화 원 246000
9981 20200614 화 장 품 2305800

9982 rows × 3 columns

 

2. Comparison of Before & After the Itaewon Infection

itaewon_before = itaewon_category[(itaewon_category.date >= 20200424) & (itaewon_category.date <= 20200508)]
itaewon_after = itaewon_category[(itaewon_category.date >= 20200509) & (itaewon_category.date <= 20200523)]

print(itaewon_before)
print()
print(itaewon_after)
          date     category     sale
                                 sum
6331  20200424        1급 호텔   659500
6332  20200424  건강식품(회원제형태)    30000
6333  20200424         관광여행    43500
6334  20200424        기계 공구     5500
6335  20200424         기타가구   854000
...        ...          ...      ...
7386  20200508          한의원   114900
7387  20200508        헬스 크럽  1328000
7388  20200508        화   랑   426000
7389  20200508        화   원  1681100
7390  20200508        화 장 품   598000

[1060 rows x 3 columns]

          date     category    sale
                                sum
7391  20200509        1급 호텔  805000
7392  20200509  건강식품(회원제형태)   50000
7393  20200509         관광여행    6900
7394  20200509        기타 교육  315000
7395  20200509       기타건축자재   79500
...        ...          ...     ...
8420  20200523          한의원  450500
8421  20200523        헬스 크럽  146000
8422  20200523        화   원  140000
8423  20200523        화 장 품  528150
8424  20200523        화방표구점  100000

[1034 rows x 3 columns]

1) Before the Administrative Order (2 weeks data)

itaewon_before_sum = itaewon_before.groupby(['category']).sum()
itaewon_before_sum = itaewon_before_sum['sale']
itaewon_before_sum
  sum
category  
1급 호텔 23049955
가 방 7092500
가정용품수리 599000
건강식품(회원제형태) 148000
골프연습장 1100000
... ...
헬스 크럽 15848300
화 랑 8553000
화 원 9326970
화 장 품 22929210
화방표구점 80000

102 rows × 1 columns

 

2) After the Administrative Order(2 weeks data)

itaewon_after_sum = itaewon_after.groupby(['category']).sum()
itaewon_after_sum = itaewon_after_sum['sale']
itaewon_after_sum
  sum
category  
1급 호텔 5041220
가 방 4977000
가정용품수리 340000
건강식품(회원제형태) 132000
골프연습장 1455000
... ...
혼수전문점 1000000
화 랑 7050000
화 원 6520100
화 장 품 18422470
화방표구점 100000

107 rows × 1 columns

 

3) Merge the Before & After data

  • I addded two columns that represent the difference and percentage change of the sales amount

itaewon_all = itaewon_before_sum.merge(itaewon_after_sum, left_index=True, right_index=True)
itaewon_all.columns = ['before', 'after']
itaewon_all['difference'] = itaewon_all.after - itaewon_all.before
itaewon_all['pct_ch'] = itaewon_all.difference / itaewon_all.before

itaewon_all
  before after difference pct_ch
category        
1급 호텔 23049955 5041220 -18008735 -0.781292
가 방 7092500 4977000 -2115500 -0.298273
가정용품수리 599000 340000 -259000 -0.432387
건강식품(회원제형태) 148000 132000 -16000 -0.108108
골프연습장 1100000 1455000 355000 0.322727
... ... ... ... ...
헬스 크럽 15848300 11846280 -4002020 -0.252520
화 랑 8553000 7050000 -1503000 -0.175728
화 원 9326970 6520100 -2806870 -0.300941
화 장 품 22929210 18422470 -4506740 -0.196550
화방표구점 80000 100000 20000 0.250000

99 rows × 4 columns

 

4) Check the Higher & Lower rank of Industry Categories

* According to their difference & percentage change

print(itaewon_all.loc[itaewon_all['difference'].idxmax()])
print()
print(itaewon_all.loc[itaewon_all['pct_ch'].idxmax()])
before        4.954647e+07
after         7.629203e+07
difference    2.674556e+07
pct_ch        5.398076e-01
Name: 일반 가구, dtype: float64

before        5.080000e+04
after         2.068000e+06
difference    2.017200e+06
pct_ch        3.970866e+01
Name: 문화취미기타, dtype: float64
print(itaewon_all.loc[itaewon_all['difference'].idxmin()])
print()
print(itaewon_all.loc[itaewon_all['pct_ch'].idxmin()])
before        1.191118e+09
after         2.858069e+08
difference   -9.053115e+08
pct_ch       -7.600517e-01
Name: 서양음식, dtype: float64

before        1.081342e+07
after         8.486000e+04
difference   -1.072856e+07
pct_ch       -9.921523e-01
Name: 통신 기기, dtype: float64

 

3. Sort by Difference

itaewon_sort_diff = itaewon_all.sort_values(by=['difference'])
itaewon_sort_diff
  before after difference pct_ch
category        
서양음식 1191118370 285806872 -905311498 -0.760052
일반한식 617036568 207923536 -409113032 -0.663029
주점 238128744 32349380 -205779364 -0.864152
유흥주점 114567350 981500 -113585850 -0.991433
편 의 점 222432993 154366340 -68066653 -0.306010
... ... ... ... ...
기타회원제형태업소4 2580100 5318490 2738390 1.061350
수입자동차 43268046 57520226 14252180 0.329393
의료 용품 10909800 27868800 16959000 1.554474
유아원 8457000 31322320 22865320 2.703715
일반 가구 49546470 76292030 26745560 0.539808

99 rows × 4 columns

Categories of Lower Rank (Negative Value)

* Western food (서양음식)

* Korean food (일반한식)

* Bar (주점)

* Entertainment Bar (유흥주점)

* Convenient Store (편의점)

 

Categories of Higher Rank (Positive Value)

* Furniture (일반가구)

* Children Care Center (유아원)

* Health Care Goods (의료용품)

* Imported Car (수입자동차)

 

4. Sort by Percentage Change

itaewon_sort_pct = itaewon_all.sort_values(by=['pct_ch'])
itaewon_sort_pct
  before after difference pct_ch
category        
통신 기기 10813420 84860 -10728560 -0.992152
유흥주점 114567350 981500 -113585850 -0.991433
완 구 점 224400 2650 -221750 -0.988191
내의판매점 321000 15000 -306000 -0.953271
노래방 8881500 949500 -7932000 -0.893092
... ... ... ... ...
사무서비스 824100 2996830 2172730 2.636488
유아원 8457000 31322320 22865320 2.703715
페 인 트 21000 264550 243550 11.597619
기타건강식 45000 910000 865000 19.222222
문화취미기타 50800 2068000 2017200 39.708661

99 rows × 4 columns

Categories of Lower Rank (Negative Value)

* Telephone System (통신기기)

* Entertainment Bar (유흥주점)

* Toy Store (완구점)

* Innerwear Store (내의판매점)

* Karaoke (노래방)

Categories of Higher Rank (Positive Value)

* Culture, Hobby (문화취미기타)

* Health Food (기타건강식)

* Painting (페인트)

* Children Care Center (유아원)

* Desk Service (사무서비스)

 

5. Plot by Matplotlib

plt.rc('font', family='NanumBarunGothic') 
itaewon_plot = pd.concat([itaewon_sort_pct.head(), itaewon_sort_pct.tail()])
itaewon_plot.index = ['Telephone', 'Bar', 'Toy', 'Inner', 'Karaoke', 'Desk', 
                      'Children Care', 'Painting', 'Health Food', 'Culture']
itaewon_plot
  before after difference pct_ch
Telephone 10813420 84860 -10728560 -0.992152
Bar 114567350 981500 -113585850 -0.991433
Toy 224400 2650 -221750 -0.988191
Inner 321000 15000 -306000 -0.953271
Karaoke 8881500 949500 -7932000 -0.893092
Desk 824100 2996830 2172730 2.636488
Children Care 8457000 31322320 22865320 2.703715
Painting 21000 264550 243550 11.597619
Health Food 45000 910000 865000 19.222222
Culture 50800 2068000 2017200 39.708661
import seaborn as sns
import matplotlib.font_manager as fm

plt.rcParams["figure.figsize"] = (10, 10)

color = []
for i in np.arange(len(itaewon_plot.difference)) :
    if (itaewon_plot.difference[i] > 0) : 
        color.append('lightblue')
    else : 
        color.append('lightpink')

ax = sns.scatterplot(itaewon_plot.difference, itaewon_plot.pct_ch, s = np.abs(itaewon_plot.pct_ch) * 1000, 
                     color = color, alpha=0.4, edgecolors="grey", linewidth=2, legend = 'full')


#For each point, add text inside the bubble
for line in range(0,itaewon_plot.shape[0]):
     ax.text(itaewon_plot.difference[line], itaewon_plot.pct_ch[line], itaewon_plot.index[line], 
             horizontalalignment='center', size='large', color='black', weight='semibold', rotation = 40)

plt.title("\n<Itaewon> Strongly Affected Categories\n", fontsize = 20)
plt.xlabel("Difference in sale", fontsize = 15)
plt.ylabel("Percentage change in sale", fontsize = 15)
#plt.xticks(color = 'w')
#plt.yticks(color = 'w')
axes = plt.gca()
axes.yaxis.grid()

plt.show()
findfont: Font family ['NanumBarunGothic'] not found. Falling back to DejaVu Sans.
findfont: Font family ['NanumBarunGothic'] not found. Falling back to DejaVu Sans.
findfont: Font family ['NanumBarunGothic'] not found. Falling back to DejaVu Sans.
findfont: Font family ['NanumBarunGothic'] not found. Falling back to DejaVu Sans.

 

* The size of the Bubble represents the scale of sales percentage change

* The blue bubble represents a positive change, while the red one refers to a negative change

* A lot of people spent their money on their personal hobby

* The 'Bar' Category decreased, but the change was not as substantial as the increase in the 'Culture, Hobby' category

plt.bar(itaewon_plot.index, itaewon_plot.pct_ch, color = color)
plt.title("\n<Itaewon> Strongly Affected Categories\n", fontsize=20)
plt.xlabel('Category', fontsize=18)
plt.ylabel('Percentage Change in Sales', fontsize=18)
plt.xticks(fontsize=15, rotation = 45)

axes = plt.gca()
axes.yaxis.grid()
plt.show()
findfont: Font family ['NanumBarunGothic'] not found. Falling back to DejaVu Sans.

 

* Another type of graph(bar graph) that shows the percentage changes of each category