๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
TIL๐Ÿ”ฅ/๋ฉ‹์Ÿ์ด์‚ฌ์ž์ฒ˜๋Ÿผ_AI School 5๊ธฐ

[๋ฉ‹์‚ฌ] AI SCHOOL 5๊ธฐ_ Day 16

by hk713 2022. 3. 29.

NumPy

NumPy(Numerical Python)๋Š” ๊ฑฐ์˜ ๋ชจ๋“  ๊ณผํ•™ ๋ฐ ๊ณตํ•™ ๋ถ„์•ผ์—์„œ ์‚ฌ์šฉ๋˜๋Š” ์˜คํ”ˆ ์†Œ์Šค ํŒŒ์ด์ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋‹ค.

NumPy ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—๋Š” ๋‹ค์ฐจ์› ๋ฐฐ์—ด ๋ฐ ํ–‰๋ ฌ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ๋‹ค.

import numpy as np  # ๋ณดํ†ต ์ด๋ ‡๊ฒŒ np ๋กœ ์ค„์—ฌ ์‚ฌ์šฉํ•œ๋‹ค

# ๋ฐฐ์—ด(Array) ๋งŒ๋“ค๊ธฐ
v1 = np.array([ 1, 2, 3, 4, 5 ])

# array ๋‚ด๋ถ€ ์•„์ดํ…œ์˜ ๋ฐ์ดํ„ฐ ํƒ€์ž…์„ ์•Œ๊ณ  ์‹ถ๋‹ค๋ฉด?!
v1.dtype

numpy array์—๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ ํƒ€์ž…์ด ๋“ค์–ด๊ฐˆ ์ˆ˜ ์—†๋‹ค!! (๋ฆฌ์ŠคํŠธ์™€ ๋‹ค๋ฅธ ์ )

# ํ–‰๋ ฌ ๋งŒ๋“ค๊ธฐ
matrix = np.array ([ [1, 2, 3], [4, 5, 6], [7, 8, 9] ])

# ๋ชจ์–‘ ํ™•์ธ
matrix.shape

# 2์—ด ์ „์ฒด - [ํ–‰, ์—ด]
matrix[ : , 1]  # ๊ฒฐ๊ณผ - array([2, 5, 8])

# 3ํ–‰ ์ „์ฒด
matrix[2]  # ๊ฒฐ๊ณผ - array([7, 8, 9])

arange ( 'a'rray of 'range' ) ๋Š” ํ•ด๋‹น ๋ฒ”์œ„๋กœ array๋ฅผ ๋งŒ๋“ค์–ด์ค€๋‹ค.

# arange๋Š” range์ฒ˜๋Ÿผ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค
v2 = np.arange(1, 10, 3)  
v2  # ๊ฒฐ๊ณผ - array([1, 4, 7])

# arange๋Š” ์ผ๊ด„์ ์ธ ์—ฐ์‚ฐ์ด ๊ฐ€๋Šฅํ•˜๋‹ค 
v3 = np.arange(1, 10, 3) ** 2
v3  # ๊ฒฐ๊ณผ - array([1, 16, 49])

# reshape - ๋ชจ์–‘์„ ๋ฐ”๊ฟ”์ค€๋‹ค
v = np.arange(12)

# 2ํ–‰ 6์—ด๋กœ ๋ฐ”๊ฟˆ
v4 = v.reshape(2, 6) 

# ํ–‰์„ 3๊ฐœ๋กœ ๋งž์ถ”๊ณ  ๋‚˜๋จธ์ง€๋Š” ์•Œ์•„์„œ ํ•˜๋„๋ก ํ•จ
v5 = v.reshape(3,-1) 

# ์—ด๋ถ€ํ„ฐ ์ฑ„์šฐ๋„๋ก ํ•จ
v6 = v.reshape(2, 6, order="F")

Method chaining(๋ฉ”์„œ๋“œ ์ฒด์ด๋‹)์€ ๋ฉ”์„œ๋“œ๊ฐ€ ๊ฐ์ฒด๋ฅผ ๋ฐ˜ํ™˜ํ•˜๊ฒŒ ๋˜๋ฉด, ๋ฉ”์„œ๋“œ์˜ ๋ฐ˜ํ™˜๊ฐ’์ธ ๊ฐ์ฒด๋ฅผ ํ†ตํ•ด ๋˜ ๋‹ค๋ฅธ ๋ฉ”์„œ๋“œ๋ฅผ ํ˜ธ์ถœํ•˜๋Š” ํ”„๋กœ๊ทธ๋ž˜๋ฐ ํŒจํ„ด์„ ๋งํ•œ๋‹ค. arange์—์„œ๋„ ๋ฉ”์„œ๋“œ ์ฒด์ด๋‹์„ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

np.arange(12).reshape(2,6).reshape(4,-1)

numpy์—๋Š” ์‚ฌ์น™์—ฐ์‚ฐ์ด ์ ์šฉ๋œ๋‹ค

v1 = np.arange(1,5).reshape(2,2)

# ์ตœ๋Œ“๊ฐ’
np.max(v1)  # 4

# ์ตœ์†Ÿ๊ฐ’
np.min(v1)  # 1

# ํ‰๊ท ๊ฐ’
np.mean(v1) # 2.5

# ํ‘œ์ค€ํŽธ์ฐจ
np.std(v1)  # 1.118033999749895

# ๋”ํ•˜๊ธฐ
np.add(v1, v1)

# ๋นผ๊ธฐ
np.subtract(v1, v1)

# ๊ณฑํ•˜๊ธฐ 
np.multiply(v1, v1)

# ํ–‰๋ ฌ๊ณฑ (dot-product)
np.dot(v1, v1)

 

Pandas

Pandas(ํŒ๋‹ค์Šค)๋Š” ํŒŒ์ด์ฌ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋‹ค.

๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•ด Series(์‹œ๋ฆฌ์ฆˆ) ํด๋ž˜์Šค์™€ DataFrame(๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„) ํด๋ž˜์Šค๋ฅผ ์ œ๊ณตํ•œ๋‹ค.

Series(์‹œ๋ฆฌ์ฆˆ) ํด๋ž˜์Šค๋Š” ๋„˜ํŒŒ์ด์—์„œ ์ œ๊ณตํ•˜๋Š” 1์ฐจ์› ๋ฐฐ์—ด๊ณผ ๋น„์Šทํ•˜์ง€๋งŒ ๊ฐ ๋ฐ์ดํ„ฐ์˜ ์˜๋ฏธ๋ฅผ ํ‘œ์‹œํ•˜๋Š” ์ธ๋ฑ์Šค๋ฅผ ๋ถ™์ผ ์ˆ˜ ์žˆ๋‹ค.

๋ฐ์ดํ„ฐ ์ž์ฒด๋Š” ๊ฐ’(value)๋ผ๊ณ  ํ•œ๋‹ค. ์ฆ‰, ์‹œ๋ฆฌ์ฆˆ = ๊ฐ’ + ์ธ๋ฑ์Šค ๋‹ค.

import pandas as pd # ๋ณดํ†ต pd๋กœ ์ค„์—ฌ์„œ ์“ด๋‹ค

a = pd.Series([1,3,5,7])

a.values  # ๊ฒฐ๊ณผ - array([1,3,5,7])

a.index   # ๊ฒฐ๊ณผ - RangeIndex(start=0, stop=4, step=1)

 

DataFrame์—์„œ ํ•˜๋‚˜์˜ cell ๊ฐ’์„ ์–ป์–ด๋‚ด๋Š” 4๊ฐ€์ง€ ๋ฐฉ๋ฒ•

1๏ธโƒฃ df['์ปฌ๋Ÿผ๋ช…'][index] 

2๏ธโƒฃ df.loc[index]['์ปฌ๋Ÿผ๋ช…']

3๏ธโƒฃ df.at[index, '์ปฌ๋Ÿผ๋ช…'] - ๊ฐ€์žฅ ๊ถŒ์žฅํ•˜๋Š” ๋ฐฉ๋ฒ•

4๏ธโƒฃ df.iloc[(index number ๊ธฐ์ค€)ํ–‰, ์—ด]

 

**

Pandas ์—์„œ๋Š” ํ•ญ๋ชฉ๋ณ„ ๊ฐฏ์ˆ˜๋ฅผ ์„ธ๊ณ  ๋‚˜์„œ matplotlib ์—†์ด ๋ฐ”๋กœ ์‹œ๊ฐํ™” ์ฒ˜๋ฆฌ๋ฅผ ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๋Œ“๊ธ€