Select Page

## Machine Learning Guide and Tutorial for Software Engineers

Machine Learning Guide and Tutorial for Software Engineers

What is it?

This is my multi-month study plan for going from mobile developer (self-taught, no CS degree) to machine learning engineer.My main goal was to find an approach to studying Machine Learning that is mainly hands-on and abstracts most of the Math for the beginner. This approach is unconventional because it’s the top-down and results-first approach designed for software engineers.

Please, feel free to make any contributions you feel will make it better.

Top DSC Resources

## 6 Easy Steps to Learn Naive Bayes Algorithm (with code in Python)

6 Easy Steps to Learn Naive Bayes Algorithm (with code in Python)

## Introduction

Here’s a situation you’ve got into:

You are working on a classification problem and you have generated your set of hypothesis, created features and discussed the importance of variables. Within an hour, stakeholders want to see the first cut of the model.

What will you do? You have hunderds of thousands of data points and quite a few variables in your training data set. In such situation, if I were at your place, I would have used ‘Naive Bayes‘, which can be extremely fast relative to other classification algorithms. It works on Bayes theorem of probability to predict the class of unknown data set.

In this article, I’ll explain the basics of this algorithm, so that next time when you come across large data sets, you can bring this algorithm to action. In addition, if you are a newbie in Python, you should be overwhelmed by the presence of available codes in this article.

1. What is Naive Bayes algorithm?
2. How Naive Bayes Algorithms works?
3. What are the Pros and Cons of using Naive Bayes?
4. 4 Applications of Naive Bayes Algorithm
5. Steps to build a basic Naive Bayes Model in Python
6. Tips to improve the power of Naive Bayes Model

Top DSC Resources

## New Approaches to Unsupervised Domain Adaptation

New Approaches to Unsupervised Domain Adaptation

The cost of large scale data collection and annotation often makes the application of machine learning algorithms to new tasks or datasets prohibitively expensive. One approach circumventing this cost is training models on synthetic data where annotations are provided automatically.

However, despite their appeal, such models often fail to distinguish synthetic images from real images, necessitating domain adaptation algorithms to manipulate these models before they can be successfully applied. Dilip Krishnan, Research Scientist at Google, is working on two approaches to the problem of unsupervised visual domain adaptation (both of which outperform current state-of-the-art methods.)

What you can find in the full article:

• What started your work in deep learning?
• What are the key factors that have enabled recent advancements in deep learning?
• Which industries do you think deep learning will benefit the most and why?
• What advancements in deep learning would you hope to see in the next 3 years?

Top DSC Resources

## How do I compute document similarity using Python?

How do I compute document similarity using Python?

This presentation gathers together video+python. It was written by Jonathan Mugan. Dr. Mugan specializes in artificial intelligence and machine learning.

How do I find documents similar to a particular document?

We will use a library in Python called gensim.

Let’s create some documents.

We will use NLTK to tokenize.

A document will now be a list of tokens.

We will create a dictionary from a list of documents.

A dictionary maps every word to a number.

What you will find in the full presentation:

• Create corpus
• Create tf-idf model
• Similarity measure object
• Convert query document
• Similar documents
• Exercises

DSC Resources

Popular Articles

## How well do facial recognition algorithms cope with a million strangers?

How well do facial recognition algorithms cope with a million strangers?

This article was written by . Co-authors include UW computer science and engineering professor Steve Seitz, undergraduate student and web developer Evan Brossard and former student Daniel Miller.

The MegaFace dataset contains 1 million images representing more than 690,000 unique people. It is the first benchmark that tests facial recognition algorithms at a million scale.University of Washington

In the last few years, several groups have announced that their facial recognition systems have achieved near-perfect accuracy rates, performing better than humans at picking the same face out of the crowd.
But those tests were performed on a datasetwith only 13,000 images — fewer people than attend an average professional U.S. soccer game. What happens to their performance as those crowds grow to the size of a major U.S. city?
University of Washington researchers answered that question with the MegaFace Challenge, the world’s first competition aimed at evaluating and improving the performance of face recognition algorithms at the million person scale. All of the algorithms suffered in accuracy when confronted with more distractions, but some fared much better than others.

“We need to test facial recognition on a planetary scale to enable practical applications — testing on a larger scale lets you discover the flaws and successes of recognition algorithms,” said Ira Kemelmacher-Shlizerman, a UW assistant professor of computer science and the project’s principal investigator. “We can’t just test it on a very small scale and say it works perfectly.”

The UW team first developed a dataset with one million Flickr images from around the world that are publicly available under a Creative Commons license, representing 690,572 unique individuals. Then they challenged facial recognition teams to download the database and see how their algorithms performed when they had to distinguish between a million possible matches.