본문 바로가기

R Programming/Notes

R 프로그래밍(2) - 기본 사용법과 Vector

 

" >> " 뒤에 오는 것은 앞 줄에서 실행한 입력값에 대한 출력 결과를 의미한다.

02 R 기본 사용법

1. 기본 연산자

4 / 2                      # 나눗셈

5 %/% 2              # 몫

5%%2                 # 나머지

(3+2)^3               # 거듭제곱

# 변수명 할당 (구성원리: 영문자/숫자/마침표 조합, 반드시 영문자로 시작, 대소문자 구별)

 

2. Data Type과 함수 class

class(TRUE)                     

>> [1] “logical”

class(T)                            

>> [1] “logical”

class(12L)                        

>> [1] “integer”

class(3 + 2i)                    

>> [1] “complex”

class(12.3)                      

>> [1] “numeric”

as.numeric(12L)             

>> [1] 12

class(‘a’)    

>> [1] “character”

class(“good”)

>> [1] “character”

class(‘2 + 4’)

>> [1] “character”

 

3. Data Structure

a <- c(‘red’, ‘green’, ‘yellow’)

class(a)                              

>> [1] “character”

str(a)                                 

>> chr [1:3] “red” “green” “yellow”

is.vector(a)                       

>> [1] TRUE

b <- c(12, 13.5, 0)

class(b)                              

>> [1] “numeric”

str(b)                                   

>> num [1:3] 12 13.5 0

f <- factor(c(‘green’, ‘green’, ‘yellow’, ‘red’, ‘red’, ‘red’, ‘green’))

class(f)                               

>> [1] “factor”

str(f)                                   

>> Factor w/ 3 levels “green” “red” “yellow”: 1 1 3  2 2 2 1

is.factor(f)                         

>> [1] TRUE

g <- data.frame(gender = c(‘Male’, ‘Male’, ‘Female’), height = c(152, 171, 165))

class(g)                              

>> [1] “data.frame”

str(g)                                 

>> ‘data.frame’  : 3 obs. of 4 variables …

is.data.frame(g)                 

>> [1] TRUE

dim(g)                               

>> [1] 3 2

 

4. 변수 삭제

rm(a)                   

>> 변수 a 삭제

rm(list = ls())    

>> 모두 삭제

 

 

03 Vector 벡터

1. 벡터의 생성과 연산

1) 종류

is.vector(“apple”)  

>> [1] TRUE

str(“apple”)          

>> chr “apple”

str(1.25)         

>> num 1.25

str(3L)                                                   

>> int 3

str(TRUE)                                               

>> logi TRUE

str(2 + 3i)                                             

>> cplx 2 + 3i

 

2) Vector 만들기

1                                                             

>> [1] 1

c(1)                                                         

>> [1] 1

c(1, 2, 3)                                               

>> [1] 1 2 3

1:5                                                         

>> [1] 1 2 3 4 5

class(a); class(b)                                    

>> [1] “numeric”

    [1] “character”

 

* 두 개 이상의 원소를 포함할 수 있음

c <- 5.5 : 20.4                                       

>> [1] 5.5  6.5 … 19.5

d <- 5.5 : 20.6                                       

>> [1] 5.5  6.5  … 20.5

 

3) Vector 연산

x <- c(1, 3, 5, 7, 9)

y <- c(2, 4, 6, 8, 10)

x + y                                                       

>> [1]  3  7 11 15 19

x * c(2, 4, 5)                                           

>> [1] 2 12 25 14 36

     --- Warning Message

 

4) 문자열과 변수

x <- c(‘A’, ‘B’, ‘C’)

y <- c(“a”, “b”, “c”)

z <- c(x, y)

z                                                             

>> [1] “A” “B” “C” “a” “b” “c”

 

5) 단일한 유형의 값으로 구성되는 벡터

a <- c(1, 2, “3”)

a                                                             

>> [1] “1” “2” “3”

 

2. 벡터의 인덱싱과 비교연산자

1) 내장변수 Built-in variable

letters                                                     

>> [1] “a” “b” … “z”

LETTERS                                                 

>> [1] “A” “B” … “Z”

month.name                                         

>> [1] “January” “February” … “December”

month.abb                                                

>> [1] “Jan” “Feb” … “Dec”

 

2) Vector and Indexing

month.abb[1]                                          

>> [1] “Jan”

month.abb[1:3]                                      

>> [1] “Jan” “Feb “Mar”

month.abb[c(1, 3, 5)]                          

>> [1] “Jan” “Mar” “May”

month.abb[c(2, 1, 1, 3)]                     

>> [1] “Feb” “Jan” “Jan” “Mar”

month.abb[c(-1, -3, -5, -12)]       

>> [1] “Feb” “Apr” “Jun” “Jul” “Aug” “Sep” “Oct” “Nov”

month.abb[-c(1, 3, 5, 12)]                

>> same as above

month.abb[-c(1:5)]                             

>> [1] “Jun” … “Dec”

month.abb[-1:5]                                   

>> Error(only 0’s may be mix)

 

month.abb[1:3][c(TRUE, FALSE, TRUE)]       

>> [1] “Jan” “Mar”

month.abb[c(TRUE, FALSE, TRUE)]

>> [1] “Jan” “Mar” “Apr” “Jun” “Jul” “Sep” “Oct” “Dec”

month.abb[1:3][c(1, 0, 1)]

>> [1] “Jan” “Jan”

   즉, 0과 1은 logical 이 아닌 인덱스의 기능을 함

 

3) 비교/논리 연산자

&                      and

|                        or

>, <, >=, <=      greater/less

!=, ==              not equal, equal

 

month.abb == ‘Feb’ | month.abb == ‘Jan’       

>> [1] TRUE TRUE FALSE FALSE FALSE …

month.abb == ‘Feb’ | ‘Jan’                          

>> Error(possible only for numeric, logical …)

month.abb != ‘Feb’ | month.abb != ‘Jan’

month.abb != ‘Feb’ & month.abb != ‘Jan’

 

4) 비교/논리 연산자와 인덱싱

month.abb[month.abb == ‘Feb’]                                             

>> [1] “Feb”

month.abb[month.abb == ‘Feb’ | month.abb == ‘Jan’]              

>> [1] “Jan” “Feb”

month.abb[‘Jan’]                                                                           

>> [1] NA

month.abb[c(‘Jan’, “Mar’)]                                                       

>> [1] NA NA

month.abb[month.abb[1:2]]                                                   

>> [1] NA NA

 

3. 벡터와 함수 1

a <- 1:5

length(a)                                                                           

>> [1] 5

sum(a)                                                                                

>> [1] 15

mean(a)                                                                             

>> [1] 3

 

1) Sample 함수

data <- 1:3

sample(data, size = 5, replace = T)                 

>> [1] 2 3 1 1 3   (랜덤 표본추출)

sample(data, 5, T)                                                 

>> [1] 2 2 3 3 3

sample(data, 5, T, prob = c(0.2, 0.2, 0.8))    

>> [1] 3 3 2 3 3

 

2) Str 함수

x <- sample(10)

x                                                                             

>> [1] 7 3 6 2 9 10 5 4 1 8

y <- sample(letters, 10, replace = F)       

y                                                                             

>> [1] “l” “n” “v” … “g”

str(x)                                                                   

>> int [1:10] 7 3 6 2 9 10 5 4 1 8

str(y)                                                                   

>> chr [1:10] “l” “n” “v” … “g”

 

3) Rep 함수

rep(c(1, 2, 3), 4)                                             

>> [1] 1 2 3 1 2 3 1 2 3 1 2 3

rep(sample(3), 4)                                          

>> [1] 1 2 3 1 2 3 1 2 3 1 2 3

rep(sample(3), 4)                                          

>> [1] 2 1 3 2 1 3 2 1 3 2 1 3

 

rep(c(1,2,3), times = 4)                               

>> [1] 1 2 3 1 2 3 1 2 3 1 2 3

rep(c(1,2,3), each = 4)                                

>> [1] 1 1 1 1 2 2 2 2 3 3 3 3

rep(1:3, 1:3)                                                     

>> [1] 1 2 2 3 3 3

rep(1:3, 1:2)                                                     

>> Error

rep(1:3, 3:1)                                                     

>> [1] 1 1 1 2 2 3

rep(1:3, c(2,4,6))                                           

>> [1] 1 1 2 2 2 2 3 3 3 3 3 3

 

rep(c(1,2,3), times = 1:3)                           

>> [1] 1 2 2 3 3 3

rep(c(1,2,3), each = 1:3)                        

>> [1] 1 2 3

     Warning Message

 

4) Seq 함수

seq(1, 10)                                                          

>> [1] 1 2 3 4 5 6 7 8 9 10

seq(from = 1, to = 10)                                   

>> same as above

seq(1, 10, 1)                                                     

>> same as above

seq(from = 1, to = 10, by = 1)                    

>> same as above

seq(1, 10, 2)                                                     

>> [1] 1 3 5 7 9

seq(by = 2, to = 10, from = 3)                

>> [1] 3 5 7 9

seq(10, 2, 3)                                                     

>> Error

seq(10, -10, -2)                                             

>> [1] 10 8 6 4 2 0 -2 -4 -6 -8 -10

 

seq(1, 8, length = 5)                                      

>> [1] 1.00 2.75 4.50 6.25 8.00

seq(1, 8, length.out = 5)                             

>> same as above

seq(1, by = 3, length = 5)                            

>> [1] 1 4 7 10 13

seq(1, by = 3, length.out = 5)                    

>> same as above

 

-연습문제

letters[rep(1:length(letters), times = 1:length(letters))] 

>> [1] “a” “b” “b” “c” “c” “c” …

letters[seq(1, length(letters), 2)]                                           

>> [1] “a” “c” “e” …

 

4. 벡터와 함수 2

1) 데이터 타입 변환

x <- 1:5

as.numeric(x)                                                  

>> [1] 1 2 3 4 5                  

 *** 하지만 x가 별도로 num으로 바뀌진 않음

class(as.numeric(x))                                    

>> [1] “numeric”

str(as.numeric(x))                                        

>> num [1:5] 1 2 3 4 5

as.character(x)                                               

>> [1] “1” “2” “3” “4” “5”

class(as.character(x))                                 

>> [1] “character”

str(as.character(x))                                      

>> chr [1:5] “1” “2” “3” “4” “5”

 

y <- seq(1.5, 5, 1); y                                      

>> [1] 1.5 2.5 3.5 4.5

as.integer(y)                                                    

>> [1] 1 2 3 4

 

z <- letters[1:5]; z                                          

>> [1] “a” “b” “c” “d” “e”

as.numric(z)                                                     

>> [1] NA NA NA NA NA (Warning Message)

 

2) 함수 names

x <- 1:3

names(x) <- c(“one”, “two”, “three”)       

x                                                                             

>>            one        two        three

                  1             2             3

class(names)                                                   

>> [1] “integer”

str(x)                                                                   

>> Named int [1:3] 1 2 3

    -attr(*, “names”) = chr [1:3] “one” “two” “three”

names(x)                                                           

>> [1] “one” “two” “three”

unname(x)                                                        

>> [1] 1 2 3

*** 하지만 x 자체에서 name이 빠지진 않음

x[1]                                                                      

>> one

        1

x[1:2]                                                                  

>> one two

    1           2

x[c(‘one’,’three’)]                                                 

>> one three
       1        3

x[‘one’ : ‘two’]                                                         

>> Error

 

3) 함수 print VS 함수 cat

print(x)                                                               

>> one two three

     1        2         3

print(names(x))                                             

>> [1] “one” “two” “three”

print(unname(x))                                          

>> [1] 1 2 3

 

cat(x, ‘\n’)     

>> 1 2 3

cat(names(x), ‘\n’)                                     

>> one two three

cat(unname(x), ‘\n’)                                  

>> 1 2 3

cat(as.vector(x), ‘\n’)                                

>> 1 2 3

*** ‘\n’ 안 쓰면 다음 줄로 안 넘어감

a <- 1:3

b <- print(a)                                                      

>> [1] 1 2 3

c <- cat(a)                                                         

>> 1 2 3

d <- str(a)                                                          

>> int [1:3] 1 2 3

b           

>> [1] 1 2 3

c                                                                             

>> NULL

d                                                                            

>> NULL                            

*** str과 cat 함수를 적용한 변수는 NULL 출력

 

4) 함수 round

x <- seq(3.4, 3.49, 0.01)

x                                                                             

>> [1] 3.40 3.41 … 3.49

round(x, 1)                                                        

>> [1] 3.4 3.4 … 3.5 3.5

     *** 3.46부터 3.5로 round(IEEE 기준)

round(seq(1.1, 1.19, 0.01), 1)                 

>> [1] 1.1 1.1 … 1.2 1.2

***이진수 연산에 따른 오차로 5번째부터 올림되기도, 6번째부터 올림되기도..

 

5) 함수 which

x <- 10:1

x == 4             

>> [1] FALSE …TRUE FALSE FALSE FALSE

which(x == 4)           

>> [1] 7

which(x > 3 & x < 6)                                       

>> [1] 6 7

x[x > 3 & x < 6]                                         

>> [1] 5 4

x[which(x > 3 & x < 6)]                           

>> [1] 5 4

 

6) 함수 length

length(letters)                                                 

>> [1] 26

length(which(letters==”a”|letters=”b”))       

>> [1] 2

length(letters==’a’|letters==’b’)             

>> [1] 26

length(letters!=’a’&letters!=’b’)            

>> [1] 26

 

7) 함수 sum

x <- 10:1

sum(x)                                                                

>> [1] 55

sum(x == 4)                                                      

>> [1] 1

sum(x > 8 | x < 3)                                            

>> [1] 4                 *** 개수 출력

 

8) 함수 table

table(x)                                                              

>> x

    1 2 3 4 5 6 7 8 9 10

    1 1 1 1 1 1 1 1 1 1

class(table(x))                                                

>> [1] “table”

table(x == 4)                                                     

>> FALSE   TRUE

      9               1

table(x > 8 | x < 3)                                           

>> FALSE   TRUE

      6               4

 

9) 값 편집

x <- 10:1

x[which(x > 8)] <- NA

x                                                                             

>> [1] NA NA 8 7 6 5 4 3 2 1

x[x < 3] <- NA

x                                                                             

>> [1] NA NA 8 7 6 5 4 3 NA NA

 

10) Value Matching

x <- c(“a”, “b”, “c”, “d”)

y <- c(“g”, “x”, “d”, “e”, “f”, “a”, “c”)       

match(x, y)                                                       

>> [1] 6  NA  7  3

x %in% y                                                             

>> [1] TRUE FALSE TRUE TRUE

x <- c(“a”, “b”, “c”, “d”)

y <- c(“g”, “a”, “d”, “e”, “c”, “a”, “c”)

match(x, y)                                                       

>> [1] 2 NA 5 3

x %in% y                                                             

>> [1] TRUE FALSE TRUE TRUE

which(y %in% x)                                             

>> [1] 2 3 5 6 7

 

11) 집합론 함수

unique(x)                                                          

>> [1] “a” “b” “c” “d”

unique(y)                                                          

>> [1] “g” “a” “d” “e” “c”

union(x, y)                                                         

>> [1] “a” “b” “c” “d” “g” “e”

union(y, x)                                                         

>> [1] “g” “a” “d” “e” “c” “b”

intersect(x, y)                                                  

>> [1] “a” “c” “d”

intersect(y, x)                                                  

>> [1] “a” “d” “c”

setdiff(x, y)                                                       

>> [1] “b”

setdiff(y, x)                                                       

>> [1] “g” “e”

 

x <- 1:10

any(x > 8)                                                           

>> [1] TRUE

any(x > 10)                                                        

>> [1] FALSE

all(x > 8)                                                             

>> [1] FALSE

all(x > 0)                                                             

>> [1] TRUE

 

12)  벡터 정렬

x <- c(“a”, “b”, “c”, “d”)

y <- c(“g”, “a”, “d”, “e”, “c”, “a”, “c”)

sort(x)                                                                 

>> [1] “a” “b” “c” “d”

sort(x, decreasing = T)                                 

>> [1] “d” “c” “b” “a”

sort(y)                                                                 

>> [1] “a” “a” “c” “c” “d” “e” “g”

order(x)                                                              

>> [1] 1 2 3 4

order(x, decreasing = T)                             

>> [1] 4 3 2 1

order(y)                                                              

>> [1] 2 6 5 7 3 4 1

order(y, decreasing = T)                             

>> [1] 1 4 3 5 7 2 6

 

5. 텍스트파일 불러오기

1) 클립보드에서 불러오기

텍스트 파일을 메모장 등에서 열고 Ctrl + A(전체선택), Ctrl + C 후 입력

TEXT <- scan(file = ‘clipboard’, what = ‘char’, quote = NULL)

 

2) 파일명으로 불러오기

파일 – 작업디렉토리 변경 – File – Change Dir 메뉴에서 선택

TEXT <- scan(file = ‘03_WhatIsR.txt’, what = ‘char’, quote = NULL)

 

3) 파일 열기/선택창에서 파일 선택하기

TEXT <- scan(file = file.choose(), what = ‘char’, quote = NULL)

>> Read 486 items

 

6. 불러온 데이터 출력

1) 벡터 처음/마지막 원소 보기

head(TEXT), tail(TEXT)                              

>> 6개씩 출력

head(TEXT, 10)

tail(TEXT, 10)

 

2) 조건을 이용한 검색 및 추출

TEXT[TEXT == ‘a’]                                         

>> [1] “a” “a” “a” …

length(TEXT[TEXT == “the”])                  

>> [1] 14

 

3) 값의 편집

TEXT[TEXT == ‘an’] <- ‘a’                           

TEXT[TEXT == ‘an’]                                      

>> character(0)

 

4) 벡터파일 저장

cat(TEXT, file = “vector.txt”, sep = ‘\n’)

'R Programming > Notes' 카테고리의 다른 글

R 프로그래밍(3) - Factor & DataFrame  (0) 2021.02.18
R 프로그래밍(1) - 코퍼스 언어학이란?  (1) 2021.02.15