๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

TIL๐Ÿ”ฅ/๋ฉ‹์Ÿ์ด์‚ฌ์ž์ฒ˜๋Ÿผ_AI School 5๊ธฐ20

[๋ฉ‹์‚ฌ] AI SCHOOL 5๊ธฐ_ Day 17 ๊ธฐ์ˆ ํ†ต๊ณ„๋ถ„์„ DataFrame์˜ ๊ธฐ์ˆ ํ†ต๊ณ„๋Ÿ‰ ํ•จ์ˆ˜๋‹ค. # ํ™”์žฅํ’ˆ ๊ตฌ๋งค ๊ด€๋ จ ์ •๋ณด๊ฐ€ ๋“ค์–ด์žˆ๋Š” csvํŒŒ์ผ์„ ๋ถˆ๋Ÿฌ์™€ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ ๋งŒ๋“ค์–ด์คŒ df = pd.read_csv('cosmetices_csv') # '1ํšŒ ํ‰๊ท  ๊ตฌ๋งค ๋น„์šฉ' ๊ธฐ์ค€ # ์ตœ๋Œ“๊ฐ’ df['amount'].max() # ์ตœ์†Ÿ๊ฐ’ df['amount'].min() # ํ•ฉ๊ณ„ df['amount'].sum() # ํ‰๊ท ๊ฐ’ df['amount'].mean() # ๋ถ„์‚ฐ(variance) df['amount'].var() # ํ‘œ์ค€ํŽธ์ฐจ(standard deviation) df['amount'].std() ๊ธฐ์ˆ ํ†ต๊ณ„๋ฅผ ํ•  ๋•Œ๋Š” ์™œ๋„์™€ ์ฒจ๋„๋ฅผ ์ฒจ๋ถ€ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค. ์™œ๋„ (Skewness) : ๋ถ„ํฌ๊ฐ€ ์ขŒ์šฐ๋กœ ์น˜์šฐ์ณ์ง„ ์ •๋„! 0์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ์ •๊ทœ๋ถ„ํฌ๋ผ๊ณ  ๊ฐ€์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.. 2022. 3. 30.
[๋ฉ‹์‚ฌ] AI SCHOOL 5๊ธฐ_ Day 16 NumPy NumPy(Numerical Python)๋Š” ๊ฑฐ์˜ ๋ชจ๋“  ๊ณผํ•™ ๋ฐ ๊ณตํ•™ ๋ถ„์•ผ์—์„œ ์‚ฌ์šฉ๋˜๋Š” ์˜คํ”ˆ ์†Œ์Šค ํŒŒ์ด์ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋‹ค. NumPy ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—๋Š” ๋‹ค์ฐจ์› ๋ฐฐ์—ด ๋ฐ ํ–‰๋ ฌ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ๋‹ค. import numpy as np # ๋ณดํ†ต ์ด๋ ‡๊ฒŒ np ๋กœ ์ค„์—ฌ ์‚ฌ์šฉํ•œ๋‹ค # ๋ฐฐ์—ด(Array) ๋งŒ๋“ค๊ธฐ v1 = np.array([ 1, 2, 3, 4, 5 ]) # array ๋‚ด๋ถ€ ์•„์ดํ…œ์˜ ๋ฐ์ดํ„ฐ ํƒ€์ž…์„ ์•Œ๊ณ  ์‹ถ๋‹ค๋ฉด?! v1.dtype numpy array์—๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ ํƒ€์ž…์ด ๋“ค์–ด๊ฐˆ ์ˆ˜ ์—†๋‹ค!! (๋ฆฌ์ŠคํŠธ์™€ ๋‹ค๋ฅธ ์ ) # ํ–‰๋ ฌ ๋งŒ๋“ค๊ธฐ matrix = np.array ([ [1, 2, 3], [4, 5, 6], [7, 8, 9] ]) # ๋ชจ์–‘ ํ™•์ธ matrix.shape # 2์—ด ์ „์ฒด - [ํ–‰,.. 2022. 3. 29.
[๋ฉ‹์‚ฌ] AI SCHOOL 5๊ธฐ_ Day 15 Web scraping Tips 1๏ธโƒฃ ์˜ค๋ฅ˜ ์˜ˆ์™ธ ์ฒ˜๋ฆฌ ๋‹ค๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•˜๊ธฐ์œ„ํ•ด ํฌ๋กค๋ง์„ ํ•˜๋‹ค๋ณด๋ฉด ์˜ˆ์ƒ์น˜ ๋ชปํ•œ ์ผ๋กœ ์—๋Ÿฌ๊ฐ€ ๋‚˜์„œ ์ค‘๋‹จ๋˜์–ด ๋ฒ„๋ ธ์„ ๋•Œ ์ด๋ฏธ ์Œ“์•„์™”๋˜ ๋ฐ์ดํ„ฐ๊ฐ€ ๋‚ ๋ผ๊ฐ€๊ฑฐ๋‚˜ ํ˜น์€ ํŠน์ • ์‹œ์  ์ดํ›„๋กœ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ผฌ์—ฌ๋ฒ„๋ฆฌ๋Š” ์ผ์ด ์ƒ๊ธด๋‹ค. ๊ทธ๋Ÿด ๋•Œ๋ฅผ ์œ„ํ•ด ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด try, except ๋ฌธ์ด๋‹ค. try: # ์ผ๋‹จ ์‹คํ–‰ํ•ด ๋ณผ ์ฝ”๋“œ except: # ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ–ˆ์„ ๋•Œ ์ˆ˜ํ–‰ํ•  ์ฝ”๋“œ 2๏ธโƒฃ time.sleep() ํฌ๋กค๋ง์„ ํ•˜๋ฉด์„œ ์‰ดํ‹ˆ์—†์ด ์„œ๋ฒ„์— ์š”์ฒญ์„ ๋ณด๋‚ด๋ฉด ๊ธฐ๊ณ„์ธ๊ฑธ ๋ˆˆ์น˜์ฑ„์„œ ๋ง‰์•„๋ฒ„๋ฆฌ๋Š” ๊ฒฝ์šฐ๊ฐ€ ์ƒ๊ธด๋‹ค. ๊ทธ๋ž˜์„œ time.sleep์„ ์ด์šฉํ•ด ์ž ์‹œ ์ž‘๋™์„ ์‰ด ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค˜์•ผํ•œ๋‹ค. 3๏ธโƒฃ pickle ํŒŒ์ด์ฌ์€ ๊ฐ์ฒด๋ฅผ ํŒŒ์ผ์— ์ €์žฅํ•˜๋Š” pickle ๋ชจ๋“ˆ์„ ์ œ๊ณตํ•œ๋‹ค. ๊ฑฐ์˜ ๋ชจ๋“  ๊ฒƒ์„ ํ”ผํด๋กœ ์ €์žฅํ•˜๊ณ , ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ.. 2022. 3. 28.
[๋ฉ‹์‚ฌ] AI SCHOOL 5๊ธฐ_ Day 12 The process of data analysis for text data Tokenize → POS Tagging → Stopwords ์ œ๊ฑฐ →๋‹จ์–ด์‚ฌ์ „ ์ƒ์„ฑ → ์‚ฌ์ „ ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™” → ๋จธ์‹ ๋Ÿฌ๋‹/๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ์ ์šฉ NLTK NLTK๋Š” Natural Language Toolkit์˜ ์•ฝ์ž๋กœ, ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ๋ฐ ๋ฌธ์„œ ๋ถ„์„์šฉ ํŒŒ์ด์ฌ ํŒจํ‚ค์ง€๋‹ค. (์ž์—ฐ์–ด๋Š” ์ผ์ƒ์ ์ธ ์ƒํ™œ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ์–ธ์–ด๋ฅผ ๋งํ•œ๋‹ค) ๋ถ„์„์„ ์œ„ํ•ด์„œ๋Š” ๊ธด ๋ฌธ์ž์—ด์„ ์ž‘์€ ๋‹จ์œ„๋กœ ๋‚˜๋ˆ ์•ผ ํ•˜๋Š”๋ฐ, ์ด ๋‹จ์œ„๋ฅผ token(ํ† ํฐ)์ด๋ผ ํ•˜๊ณ  ๊ทธ ์ž‘์—…์„ tokenizing(ํ† ํฐ ์ƒ์„ฑ)์ด๋ผ๊ณ  ํ•œ๋‹ค. word_tokenize() ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ฌธ์žฅ์„ ํ† ํฐํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค. ํ’ˆ์‚ฌ(POS, part-of-speech) pos_tag() ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ํ† ํฐํ™”ํ•œ ๋ฌธ์žฅ์„ ๋Œ€์ƒ์œผ๋กœ .. 2022. 3. 25.
[๋ฉ‹์‚ฌ] AI SCHOOL 5๊ธฐ_ Day 11 ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”(Data visualization) GeoJSON GeoJSON(์ง€์˜ค์ œ์ด์Šจ)์€ JSON(JavaScript Object Notation)์— ๊ธฐ๋ฐ˜ํ•œ ์ง€๋ฆฌ ๊ณต๊ฐ„ ๋ฐ์ดํ„ฐ ๊ตํ™˜ ํฌ๋งท์ด๋‹ค. ์ง€๋ฆฌ ์ขŒํ‘œ ์ฐธ์กฐ ์‹œ์Šคํ…œ์ธ World Geodetic์„ ์‚ฌ์šฉํ•œ๋‹ค. https://geojson.org/ GeoJSON GeoJSON GeoJSON is a format for encoding a variety of geographic data structures. { "type": "Feature", "geometry": { "type": "Point", "coordinates": [125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } GeoJSON supp.. 2022. 3. 24.
[๋ฉ‹์‚ฌ] AI SCHOOL 5๊ธฐ_ Day 10 Pandas(ํŒ๋‹ค์Šค) ์ธ๋ฑ์Šค ์„ค์ • ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ column(์—ด)์„ index๋กœ ๊ฐ€์ ธ์˜ค๊ณ  ์‹ถ์„ ๋•, df.set_index('์—ด ์ด๋ฆ„', inplace=Ture) ๋กœ ์‚ฌ์šฉํ•œ๋‹ค. ์„ค์ •ํ–ˆ๋˜ ์ธ๋ฑ์Šค๋ฅผ ์ดˆ๊ธฐํ™”ํ•˜๊ณ  ์‹ถ์„ ๋•Œ๋Š” df.reset_index(inplace=True)๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ๋งŒ์•ฝ ์ธ๋ฑ์Šค๋กœ ์„ค์ •ํ•˜๋ ค๋Š” ์—ด ๊ฐ’์˜ ์ค‘๋ณต์ด ์žˆ๋‹ค๋ฉด pivot_table์„ ์ถ”์ฒœํ•œ๋‹ค. pd.pivot_table(๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„, index='์—ด ์ด๋ฆ„', aggfunc=np.sum) ๊ฒฐ์ธก์น˜(N/A) ๊ฐ’ ์ฑ„์šฐ๊ธฐ NaN(์ˆซ์ž๊ฐ€ ์•„๋‹˜) ํƒ€์ž…์˜ ๊ฒฐ์ธก์น˜๋ฅผ ํŠน์ • ๊ฐ’์œผ๋กœ ์ฑ„์šฐ๊ณ  ์‹ถ์„ ๋•Œ๋Š” df.fillna('์›ํ•˜๋Š” ๊ฐ’')๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ์กฐ๊ฑด ๊ฒ€์ƒ‰ df[ (df['์—ด ์ด๋ฆ„1'] 100) ] ๋‘๊ฐœ ์ด์ƒ.. 2022. 3. 23.