Python / Requests와 Beautiful Soup (웹스크래핑)

Notice

Recent Posts

Recent Comments

Link

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Tags more

Archives

Today

Total

관리 메뉴

YUYANE

Python / Requests와 Beautiful Soup (웹스크래핑) 본문

Programming Languages/PYTHON

Python / Requests와 Beautiful Soup (웹스크래핑)

YUYA 2021. 1. 30. 16:04

학습 강의

nomadcoders.co/python-for-beginners/lobby

Requests와 Beautiful Soup

두 가지 모두 웹 사이트에서 원하는 부분을 스크래핑 할 때 필요한 라이브러리/패키지이다.

1) Requests

- HTTP 요청을 간단하게 만드는 파이썬 HTTP 라이브러리.

- 오늘의 용도 : 스크래핑 하고 싶은 웹 페이지의 URL에서 html을 가져오기 위해 사용

- 문서 링크 : requests.readthedocs.io/en/master/

2) Beautiful Soup

- HTML과 XML 문서를 파싱하기 위한 파이썬 패키지.

- 오늘의 용도 : Requests를 통해 가져온 html에서 필요한 정보를 추출하기 위해 사용

- 문서 링크 : www.crummy.com/software/BeautifulSoup/bs4/doc/

3) 코드로 차이점 살펴보기

import requests
from bs4 import BeautifulSoup

indeed_result = requests.get
("https://www.indeed.com/jobs?q=python&start=0")

#indeed_result.status_code

indeed_soup = BeautifulSoup(indeed_result.text, "html.parser")

- requests와 BeautifulSoup 을 import 한다.

- requests의 get 메서드를 통해 원하는 url의 html 소스를 가져올 수 있다.

- 제대로 가져왔는 지 확인하고 싶다면, 주석 처리 된 status-code를 출력 해보자.

정상 작동한다면 '200'을 출력 할 것이다.

- 가져온 소스는 텍스트 형태로 변환한다. (indeed_result.text)

결과물은 단지 문자열일 뿐이므로, 파이썬이 이해하는 객체 구조는 아니다.

- requests로 가져온 html 소스를, BeautifulSoup를 통해 파이썬이 이해할 수 있도록 파싱한다.

참고

beomi.github.io/gb-crawling/posts/2017-01-20-HowToMakeWebCrawler.html

저작자표시 (새창열림)

'Programming Languages > PYTHON' 카테고리의 다른 글

Python / CSV(Comma Separated Values) (0)	2021.02.01
Python / float 소수점 자리 표기, 숫자에 콤마(,) 넣기 (0)	2021.01.30
Python / special method : __init__, __str__ (0)	2021.01.25
Python / for와 range (0)	2021.01.19
Python / Regular Expression(Regex) (0)	2021.01.13

'Programming Languages/PYTHON' Related Articles

Comments

YUYANE

Python / Requests와 Beautiful Soup (웹스크래핑) 본문

Python / Requests와 Beautiful Soup (웹스크래핑)

'Programming Languages > PYTHON' 카테고리의 다른 글

티스토리툴바