0. COVID19 - Consumption Pattern Change in Itaewon

* There was an administrative order after a large scale of infection of COVID19 that started from a club located in Itaewon.

* The order was about the refrain from operating entertainment facilities (clubs, entertainment bars, etc.)

* Effective period: 2020.05.08 ~ 2020.06.07 (1 month, can be extended)

1. Data Preprocessing

* First of all, I collected the modules needed for the analysis

!sudo apt-get install -y fonts-nanum
!sudo fc-cache -fv
!rm ~/.cache/matplotlib -rf

import warnings
warnings.filterwarnings("ignore")

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
import pandas as pd
import seaborn as sns
import math
from datetime import date, timedelta

%matplotlib inline

from google.colab import drive
drive.mount('/content/drive')

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  fonts-nanum
0 upgraded, 1 newly installed, 0 to remove and 17 not upgraded.
Need to get 9,604 kB of archives.
After this operation, 29.5 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 fonts-nanum all 20170925-1 [9,604 kB]
Fetched 9,604 kB in 0s (39.6 MB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 1.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 
Selecting previously unselected package fonts-nanum.
(Reading database ... 146425 files and directories currently installed.)
Preparing to unpack .../fonts-nanum_20170925-1_all.deb ...
Unpacking fonts-nanum (20170925-1) ...
Setting up fonts-nanum (20170925-1) ...
Processing triggers for fontconfig (2.12.6-0ubuntu2) ...
/usr/share/fonts: caching, new cache contents: 0 fonts, 1 dirs
/usr/share/fonts/truetype: caching, new cache contents: 0 fonts, 3 dirs
/usr/share/fonts/truetype/humor-sans: caching, new cache contents: 1 fonts, 0 dirs
/usr/share/fonts/truetype/liberation: caching, new cache contents: 16 fonts, 0 dirs
/usr/share/fonts/truetype/nanum: caching, new cache contents: 10 fonts, 0 dirs
/usr/local/share/fonts: caching, new cache contents: 0 fonts, 0 dirs
/root/.local/share/fonts: skipping, no such directory
/root/.fonts: skipping, no such directory
/var/cache/fontconfig: cleaning cache directory
/root/.cache/fontconfig: not cleaning non-existent cache directory
/root/.fontconfig: not cleaning non-existent cache directory
fc-cache: succeeded
Mounted at /content/drive

1) Read the Card data and check the columns in the csv file

card = pd.read_csv('/content/drive/Shared drives/경제정보분석/KT_data_20200717/card_20200717.csv')
card

	receipt_dttm	adstrd_code	adstrd_nm	mrhst_induty_cl_code	mrhst_induty_cl_nm	selng_cascnt	salamt
0	20200104	1174066000	성내3동	7041	약국	463	5843230
1	20200104	1174066000	성내3동	7022	치과의원	33	7835550
2	20200104	1174066000	성내3동	7021	한의원	53	4589800
3	20200104	1174066000	성내3동	7020	의원	339	9267240
4	20200104	1174066000	성내3동	6110	자동차정비	19	4441000
...	...	...	...	...	...	...	...
3713947	20200614	1121586000	구의2동	4004	대형할인점	136	1195640
3713948	20200614	1121584700	자양4동	9010	인테 리어	2	13100
3713949	20200614	1162052500	보라매동	6140	주 차 장	2	14500
3713950	20200614	1135056000	월계1동	9998	기타전문점	1	38000
3713951	20200614	1159060500	흑석동	9204	사무통신기기수리	1	20000

3713952 rows × 7 columns

* Here, we have the receipt date, adress code, town name, industry code, industry category name, and the sales amount

* We need to pull out the data for which the town name corresponds to "Itaewon"

* Then, the remaining columns would be the date, town, category, and the sales amount.

itaewon = card[(card.adstrd_nm == '이태원1동') | (card.adstrd_nm == '이태원2동')]
itaewon_need = itaewon[['receipt_dttm', 'adstrd_nm', 'mrhst_induty_cl_nm', 'salamt']]
itaewon_need.columns = ['date', 'town', 'category', 'sale']
itaewon_need

	date	town	category	sale
7911	20200104	이태원2동	제과점	719750
7912	20200104	이태원2동	스넥	840500
7913	20200104	이태원2동	주점	1766300
7914	20200104	이태원2동	서양음식	11295450
7915	20200104	이태원2동	중국음식	2352500
...	...	...	...	...
3713099	20200614	이태원1동	주 차 장	619200
3713107	20200614	이태원1동	악세 사리	329200
3713516	20200614	이태원2동	사진관	35000
3713565	20200614	이태원1동	시 계	2494750
3713800	20200614	이태원1동	기타전기제품	8250

14420 rows × 4 columns

2) Group the rows by their dates and categories

itaewon_category = itaewon_need.groupby(['date', 'category'])
itaewon_category = itaewon_category.agg({'sale': ['sum']})

itaewon_category = itaewon_category.reset_index()
itaewon_category

	date	category	sale
			sum
0	20200104	기타숙박업	737000
1	20200104	기타음료식품	3276100
2	20200104	미 용 원	2193000
3	20200104	서양음식	83769020
4	20200104	슈퍼 마켓	4088100
...	...	...	...
9977	20200614	피부미용실	35000
9978	20200614	한의원	125100
9979	20200614	헬스 크럽	867000
9980	20200614	화 원	246000
9981	20200614	화 장 품	2305800

9982 rows × 3 columns

2. Comparison of Before & After the Itaewon Infection

itaewon_before = itaewon_category[(itaewon_category.date >= 20200424) & (itaewon_category.date <= 20200508)]
itaewon_after = itaewon_category[(itaewon_category.date >= 20200509) & (itaewon_category.date <= 20200523)]

print(itaewon_before)
print()
print(itaewon_after)

          date     category     sale
                                 sum
6331  20200424        1급 호텔   659500
6332  20200424  건강식품(회원제형태)    30000
6333  20200424         관광여행    43500
6334  20200424        기계 공구     5500
6335  20200424         기타가구   854000
...        ...          ...      ...
7386  20200508          한의원   114900
7387  20200508        헬스 크럽  1328000
7388  20200508        화   랑   426000
7389  20200508        화   원  1681100
7390  20200508        화 장 품   598000

[1060 rows x 3 columns]

          date     category    sale
                                sum
7391  20200509        1급 호텔  805000
7392  20200509  건강식품(회원제형태)   50000
7393  20200509         관광여행    6900
7394  20200509        기타 교육  315000
7395  20200509       기타건축자재   79500
...        ...          ...     ...
8420  20200523          한의원  450500
8421  20200523        헬스 크럽  146000
8422  20200523        화   원  140000
8423  20200523        화 장 품  528150
8424  20200523        화방표구점  100000

[1034 rows x 3 columns]

1) Before the Administrative Order (2 weeks data)

itaewon_before_sum = itaewon_before.groupby(['category']).sum()
itaewon_before_sum = itaewon_before_sum['sale']
itaewon_before_sum

	sum
category
1급 호텔	23049955
가 방	7092500
가정용품수리	599000
건강식품(회원제형태)	148000
골프연습장	1100000
...	...
헬스 크럽	15848300
화 랑	8553000
화 원	9326970
화 장 품	22929210
화방표구점	80000

102 rows × 1 columns

2) After the Administrative Order(2 weeks data)

itaewon_after_sum = itaewon_after.groupby(['category']).sum()
itaewon_after_sum = itaewon_after_sum['sale']
itaewon_after_sum

	sum
category
1급 호텔	5041220
가 방	4977000
가정용품수리	340000
건강식품(회원제형태)	132000
골프연습장	1455000
...	...
혼수전문점	1000000
화 랑	7050000
화 원	6520100
화 장 품	18422470
화방표구점	100000

107 rows × 1 columns

3) Merge the Before & After data

I addded two columns that represent the difference and percentage change of the sales amount

itaewon_all = itaewon_before_sum.merge(itaewon_after_sum, left_index=True, right_index=True)
itaewon_all.columns = ['before', 'after']
itaewon_all['difference'] = itaewon_all.after - itaewon_all.before
itaewon_all['pct_ch'] = itaewon_all.difference / itaewon_all.before

itaewon_all

	before	after	difference	pct_ch
category
1급 호텔	23049955	5041220	-18008735	-0.781292
가 방	7092500	4977000	-2115500	-0.298273
가정용품수리	599000	340000	-259000	-0.432387
건강식품(회원제형태)	148000	132000	-16000	-0.108108
골프연습장	1100000	1455000	355000	0.322727
...	...	...	...	...
헬스 크럽	15848300	11846280	-4002020	-0.252520
화 랑	8553000	7050000	-1503000	-0.175728
화 원	9326970	6520100	-2806870	-0.300941
화 장 품	22929210	18422470	-4506740	-0.196550
화방표구점	80000	100000	20000	0.250000

99 rows × 4 columns

4) Check the Higher & Lower rank of Industry Categories

* According to their difference & percentage change

print(itaewon_all.loc[itaewon_all['difference'].idxmax()])
print()
print(itaewon_all.loc[itaewon_all['pct_ch'].idxmax()])

before        4.954647e+07
after         7.629203e+07
difference    2.674556e+07
pct_ch        5.398076e-01
Name: 일반 가구, dtype: float64

before        5.080000e+04
after         2.068000e+06
difference    2.017200e+06
pct_ch        3.970866e+01
Name: 문화취미기타, dtype: float64

print(itaewon_all.loc[itaewon_all['difference'].idxmin()])
print()
print(itaewon_all.loc[itaewon_all['pct_ch'].idxmin()])

before        1.191118e+09
after         2.858069e+08
difference   -9.053115e+08
pct_ch       -7.600517e-01
Name: 서양음식, dtype: float64

before        1.081342e+07
after         8.486000e+04
difference   -1.072856e+07
pct_ch       -9.921523e-01
Name: 통신 기기, dtype: float64

3. Sort by Difference

itaewon_sort_diff = itaewon_all.sort_values(by=['difference'])
itaewon_sort_diff

	before	after	difference	pct_ch
category
서양음식	1191118370	285806872	-905311498	-0.760052
일반한식	617036568	207923536	-409113032	-0.663029
주점	238128744	32349380	-205779364	-0.864152
유흥주점	114567350	981500	-113585850	-0.991433
편 의 점	222432993	154366340	-68066653	-0.306010
...	...	...	...	...
기타회원제형태업소4	2580100	5318490	2738390	1.061350
수입자동차	43268046	57520226	14252180	0.329393
의료 용품	10909800	27868800	16959000	1.554474
유아원	8457000	31322320	22865320	2.703715
일반 가구	49546470	76292030	26745560	0.539808

99 rows × 4 columns

Categories of Lower Rank (Negative Value)

* Western food (서양음식)

* Korean food (일반한식)

* Bar (주점)

* Entertainment Bar (유흥주점)

* Convenient Store (편의점)

Categories of Higher Rank (Positive Value)

* Furniture (일반가구)

* Children Care Center (유아원)

* Health Care Goods (의료용품)

* Imported Car (수입자동차)

4. Sort by Percentage Change

itaewon_sort_pct = itaewon_all.sort_values(by=['pct_ch'])
itaewon_sort_pct

	before	after	difference	pct_ch
category
통신 기기	10813420	84860	-10728560	-0.992152
유흥주점	114567350	981500	-113585850	-0.991433
완 구 점	224400	2650	-221750	-0.988191
내의판매점	321000	15000	-306000	-0.953271
노래방	8881500	949500	-7932000	-0.893092
...	...	...	...	...
사무서비스	824100	2996830	2172730	2.636488
유아원	8457000	31322320	22865320	2.703715
페 인 트	21000	264550	243550	11.597619
기타건강식	45000	910000	865000	19.222222
문화취미기타	50800	2068000	2017200	39.708661

99 rows × 4 columns

Categories of Lower Rank (Negative Value)

* Telephone System (통신기기)

* Entertainment Bar (유흥주점)

* Toy Store (완구점)

* Innerwear Store (내의판매점)

* Karaoke (노래방)

Categories of Higher Rank (Positive Value)

* Culture, Hobby (문화취미기타)

* Health Food (기타건강식)

* Painting (페인트)

* Children Care Center (유아원)

* Desk Service (사무서비스)

5. Plot by Matplotlib

plt.rc('font', family='NanumBarunGothic') 
itaewon_plot = pd.concat([itaewon_sort_pct.head(), itaewon_sort_pct.tail()])
itaewon_plot.index = ['Telephone', 'Bar', 'Toy', 'Inner', 'Karaoke', 'Desk', 
                      'Children Care', 'Painting', 'Health Food', 'Culture']
itaewon_plot

	before	after	difference	pct_ch
Telephone	10813420	84860	-10728560	-0.992152
Bar	114567350	981500	-113585850	-0.991433
Toy	224400	2650	-221750	-0.988191
Inner	321000	15000	-306000	-0.953271
Karaoke	8881500	949500	-7932000	-0.893092
Desk	824100	2996830	2172730	2.636488
Children Care	8457000	31322320	22865320	2.703715
Painting	21000	264550	243550	11.597619
Health Food	45000	910000	865000	19.222222
Culture	50800	2068000	2017200	39.708661

import seaborn as sns
import matplotlib.font_manager as fm

plt.rcParams["figure.figsize"] = (10, 10)

color = []
for i in np.arange(len(itaewon_plot.difference)) :
    if (itaewon_plot.difference[i] > 0) : 
        color.append('lightblue')
    else : 
        color.append('lightpink')

ax = sns.scatterplot(itaewon_plot.difference, itaewon_plot.pct_ch, s = np.abs(itaewon_plot.pct_ch) * 1000, 
                     color = color, alpha=0.4, edgecolors="grey", linewidth=2, legend = 'full')


#For each point, add text inside the bubble
for line in range(0,itaewon_plot.shape[0]):
     ax.text(itaewon_plot.difference[line], itaewon_plot.pct_ch[line], itaewon_plot.index[line], 
             horizontalalignment='center', size='large', color='black', weight='semibold', rotation = 40)

plt.title("\n<Itaewon> Strongly Affected Categories\n", fontsize = 20)
plt.xlabel("Difference in sale", fontsize = 15)
plt.ylabel("Percentage change in sale", fontsize = 15)
#plt.xticks(color = 'w')
#plt.yticks(color = 'w')
axes = plt.gca()
axes.yaxis.grid()

plt.show()

findfont: Font family ['NanumBarunGothic'] not found. Falling back to DejaVu Sans.
findfont: Font family ['NanumBarunGothic'] not found. Falling back to DejaVu Sans.
findfont: Font family ['NanumBarunGothic'] not found. Falling back to DejaVu Sans.
findfont: Font family ['NanumBarunGothic'] not found. Falling back to DejaVu Sans.

* The size of the Bubble represents the scale of sales percentage change

* The blue bubble represents a positive change, while the red one refers to a negative change

* A lot of people spent their money on their personal hobby

* The 'Bar' Category decreased, but the change was not as substantial as the increase in the 'Culture, Hobby' category

plt.bar(itaewon_plot.index, itaewon_plot.pct_ch, color = color)
plt.title("\n<Itaewon> Strongly Affected Categories\n", fontsize=20)
plt.xlabel('Category', fontsize=18)
plt.ylabel('Percentage Change in Sales', fontsize=18)
plt.xticks(fontsize=15, rotation = 45)

axes = plt.gca()
axes.yaxis.grid()
plt.show()

findfont: Font family ['NanumBarunGothic'] not found. Falling back to DejaVu Sans.

* Another type of graph(bar graph) that shows the percentage changes of each category

'Python Programming > Projects' 카테고리의 다른 글

WebPage Content Crawling & Preprocessing (0)	2021.08.03
기업의 ESG 요소와 수익률 간의 상관관계 분석(2) - Clustering Analysis (1)	2021.06.02
기업의 ESG 요소와 수익률 간의 상관관계 분석(1) - Data Prep & Exploration (0)	2021.06.02
국가 및 연도 별 총생산량(GDP)과 1인당 GDP에 대한 비교분석(2) - 시각화와 결과 (1)	2021.02.18
국가 및 연도 별 총생산량(GDP)과 1인당 GDP에 대한 비교분석(1) (0)	2021.02.14

Data-est-her

COVID19 - Consumption Pattern Change in Itaewon

0. COVID19 - Consumption Pattern Change in Itaewon

1. Data Preprocessing

1) Read the Card data and check the columns in the csv file

2) Group the rows by their dates and categories

2. Comparison of Before & After the Itaewon Infection

1) Before the Administrative Order (2 weeks data)

2) After the Administrative Order(2 weeks data)

3) Merge the Before & After data

4) Check the Higher & Lower rank of Industry Categories

3. Sort by Difference

Categories of Lower Rank (Negative Value)

Categories of Higher Rank (Positive Value)

4. Sort by Percentage Change

Categories of Lower Rank (Negative Value)

Categories of Higher Rank (Positive Value)

5. Plot by Matplotlib

'Python Programming > Projects' 카테고리의 다른 글

티스토리툴바

COVID19 - Consumption Pattern Change in Itaewon

0. COVID19 - Consumption Pattern Change in Itaewon

1. Data Preprocessing

1) Read the Card data and check the columns in the csv file

2) Group the rows by their dates and categories

2. Comparison of Before & After the Itaewon Infection

1) Before the Administrative Order (2 weeks data)

2) After the Administrative Order(2 weeks data)

3) Merge the Before & After data

4) Check the Higher & Lower rank of Industry Categories

3. Sort by Difference

Categories of Lower Rank (Negative Value)

Categories of Higher Rank (Positive Value)

4. Sort by Percentage Change

Categories of Lower Rank (Negative Value)

Categories of Higher Rank (Positive Value)

5. Plot by Matplotlib

'Python Programming > Projects' 카테고리의 다른 글

'Python Programming/Projects' Related Articles

티스토리툴바