How do I compute document similarity using Python?
This presentation gathers together video+python. It was written by Jonathan Mugan. Dr. Mugan specializes in artificial intelligence and machine learning.
How do I find documents similar to a particular document?
We will use a library in Python called gensim.
Let’s create some documents.
We will use NLTK to tokenize.
A document will now be a list of tokens.
We will create a dictionary from a list of documents.
A dictionary maps every word to a number.
What you will find in the full presentation:
- Create corpus
- Create tf-idf model
- Similarity measure object
- Convert query document
- Similar documents
- Exercises
To check out all this information, click here. For more articles about Python, click here.
DSC Resources
- Services: Hire a Data Scientist | Search DSC | Classifieds | Find a Job
- Contributors: Post a Blog | Ask a Question
- Follow us: @DataScienceCtrl | @AnalyticBridge
Popular Articles
- Difference between Machine Learning, Data Science, AI, Deep Learnin…
- What is Data Science? 24 Fundamental Articles Answering This Question
- Hitchhiker’s Guide to Data Science, Machine Learning, R, Python
- Advanced Machine Learning with Basic Excel
Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge
How do I compute document similarity using Python?
Recent Comments