[Python] ML - unsupervised text classification for word labeling / toptic modeling python 단어 라벨링 투척!

Notice

Recent Posts

Tags more

Archives

관리 메뉴

Let's enjoy our life

Study/Python

IT파스칼 2023. 10. 18. 13:24

과제에 쓰일 자료를 정리한 목적입니다. 방대한 단어들을 그룹화하려는 것이 목적입니다.

Documents with similar topics will always have similar set of words.
Groups are formed by searching group of words that frequently appear in document. (이미 있는 것을 사용할 것임)
User has to input/provide the value of ‘ K ‘ i.e number of topics in a document.
Documents are assumed to be probability distributions over topics.
Topics are assumed to be probability distributions over words used in documents.

여기 방법이 잘 나와있고 이해하기도 쉬움.

그룹화할 수를 정하면 비슷한 단어끼리 묶임.

비슷한 단어끼리 묶인 것을 카테고리화(Topic)으로 구별.

위에 토픽으로 구별한 것을 수작업으로 각 토픽의 이름을 정해줘야함. (단계 1)

이걸 자동으로 할 수 있을지는 더 찾아봐야할 것 같음.

단어 수가 많을때 사용하면 유용한 모델!

만개정도(?) 넘어갈때 사용하나봐요.

위 모델의 결과 값 미리보기 ▼

Unsupervised Text Classification In Python - Home

Unsupervised text classification using python using LDA ( Latent Derilicht Analysis ) & NMF ( Non-negative Matrix factorization )

www.herevego.com

[Python] ModuleNotFoundError: No module named 'inflect' 주피터 노트북 에러해결 (0)	2023.10.20
[Python] OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory. 에러 해결 방법 - Jupyter Notebook 주피터 노트북 (0)	2023.10.19
Auto GLM (0)	2021.02.22
더미변수 전환, 전환 되돌리기 파이썬 코드 (Dummy variable Python code) (0)	2021.02.04
파이썬 회귀분석 팁 (0)	2021.02.04

'Study/Python' Related Articles

Comments