Design Thinking: Future-proof Yourself from AI

Design Thinking: Future-proof Yourself from AI

It’s all over for us humans. It may not have been “The Matrix”[1], but the machines look like they are finally poised to take our jobs. Machines powered by artificial intelligence and machine learning process data faster, aren’t hindered by stupid human biases, don’t waste time with gossip on social media and don’t demand raises or more days off.

Figure 1:  Is Artificial Intelligence Putting Humans Out of Work?

While there is a high probability that machine learning and artificial intelligence will play an important role in whatever job you hold in the future, there is one way to “future-proof” your career…embrace the power of design thinking.

I have written about design thinking before , but I want to use this blog to provide more specifics about what it is about design thinking that can help you to harness the power of machine learning…instead of machine learning (and The Matrix) harnessing you.

The Power of Design Thinking

Design thinking is defined as human-centric design that builds upon the deep understanding of our users (e.g., their tendencies, propensities, inclinations, behaviors) to generate ideas, build prototypes, share what you’ve made, embrace the art of failure (i.e., fail fast but learn faster) and eventually put your innovative solution out into the world.  And fortunately for us humans (who really excel at human-centric things), there is a tight correlation between the design thinking and the machine learning (see Figure 2).

Figure 2: Integrating Machine Learning and Design Thinking

In fact, integrating design thinking and machine learning can give you “super powers” that future-proof whatever career you decide to pursue. To meld these two disciplines together, one must:

  1. Understand where and how machine learning can impact your business initiatives. While you won’t need to write machine learning algorithms (though I wouldn’t be surprised given the progress in “Tableau-izing” machine learning), business leaders do need to learn how to “Think like a data scientist” in order understand how machine learning can optimize key operational processes, reduce security and regulatory risks, uncover new monetization opportunities. .
  2. Understand how design thinking techniques, concepts and tools can create a more compelling and emphatic user experience with a “delightful” user engagement through superior insights into your customers’ usage objectives, operating environment and impediments to success.

Let’s jump into the specifics about what business leaders need to know about integrating design thinking and machine learning in order to provide lifetime job security (my career adviser bill will be in the mail)!

Step 1:  Empathize and Analyze

The objective of the “Emphasize and Analyze” step is to really, and I mean really, understand your users, and to build a sense of empathy for the challenges and constraints that get in their way: Who is my user? What matters to this person? What are they trying to accomplish? What are their impediments to success? What frustrates them today? This step captures what the user is trying to accomplish (i.e., tasks, roles, responsibilities and expectations) versus what they are doing. Walk in your users’ shoes by shadowing them, and where possible, actually become a user of the product or service.

A useful tool for “Emphasize and Analyze” step is the Persona. A Persona is a template for capturing key user operational and usage requirements including the job to be done, barriers to consumption and hurdles to user satisfaction (see Figure 3).

Figure 3:  Persona Template

The Persona template in Figure 3 is one that we use in our Vision Workshop engagements to capture the decisions that the key business stakeholders are trying to make – and the associated pain points – in support of the organization’s key business initiatives.

How does the “Emphasize and Analyze” step apply to Design Thinking and Machine Learning?

  • Design Thinking – understand and capture the user’s task objectives, operational requirements and impediments to success; learn as much as possible about the users for whom you are designing.
  • Machine Learning – Capture and prioritize the user’s key decisions; capture the variables and metrics that might be better predictors of those decisions.

Step 2:  Define and Synthesize

The “Define and Synthesize” step starts to assemble an initial Point of View (POV) regarding the user’s needs: What capabilities are the user going to need? In what type of environment will the user be working? What is likely to impede the execution of their job?  What is likely to hinder adoption?

Sharing your POV and getting feedback from key constituencies is critical to ensuring that you have properly defined and synthesized the requirements and potential impediments.  Use the Opportunity Report to document the story, gather feedback and input from your constituencies, and refine your thinking regarding the solution (see Figure 4).

Figure 4:  Opportunity Report

How does the “Define and Synthesize” step apply to Design Thinking and Machine Learning?

  • Design Thinking – define, document and validate your understanding of the user’s task objectives, operational requirements and potential impediments. Don’t be afraid of being wrong.
  • Machine Learning – synthesize your understanding of the decisions (e.g., latency, granularity, frequency, governance, sequencing) in order to flesh out the potential variables and metrics, and assess potential analytic algorithms and approaches.

Step 3:  Ideate and uh… Ideate

The “Ideate and Ideate” step is all about… ideate! This is a chance to gather all impacted parties, stakeholders and other key constituents and leverage facilitation techniques to brainstorm as many creative solutions to the users’ needs and impediments as possible.  Exploit group brainstorming techniques to ideate, validate and prioritize the usage and operational requirements, document those requirements in a Strategy Report and identify supporting operational and performance metrics (see Figure 5).

Figure 5:  Strategy Map

It is useful to make use of storyboards to refine your thinking in Step 3. A storyboard is a graphic rendition and sequencing of the usage of the solution in the form of illustrations displayed as a story (see Figure 6).

Storyboards are an effective and efficient way to communicate a potential experience and approach for your users to review and provide feedback. Storyboarding can provide invaluable insights into usage behaviors and potential impediments without writing any code (said as if coding is something evil)!

While it would be nice to be an accomplished sketcher, even rough sketches can be an invaluable – and fast – way to gather feedback on the user experience and product design (see Figure 7).

Figure 7:  Power of Sketching

How does the “Ideate and Ideate” step apply to Design Thinking and Machine Learning?

  • Design Thinking – brainstorm as many potential solutions as possible. Diverge in your brainstorming (“all ideas are worthy of consideration”) before you converge (priority those best ideas based upon potential business and customer value and implementation feasibility).
  • Machine Learning – start piloting potential analytic models and algorithms with small sample data sets to see what types of insights and relationships are buried in the data. Capture and refine the hypotheses that you want to test.

Step 4:  Prototype and Tune

The “Prototype and Tune” step starts to build the product and supporting analytics. Start to model your ideas so that you can validate usage and navigational effectiveness and identify the metrics against which usage and navigational effectiveness will be measured.  Wireframe and mockups are useful tools that can be used to validate product usage and navigation effectiveness (see Figure 8).

Figure 8:  Interactive Mockups

How does the “Prototype and Tune” step apply to Design Thinking and Machine Learning?

  • Design Thinking – create one or more interactive mockups with which your key constituents can “play”. Study users’ interactions with the mockups to see what works and where they struggle.  Identify what additional design guides and/or analytics insights could be provided to improve the user experience.
  • Machine Learning – Identify where analytic insights or recommendations are needed – and what additional data can be captured – as the users “play” with the mockups. Explore opportunities to delivery real-time actionable insights to help “guide” the user experience.  Fail fast, but learn faster!  Embrace the “Art of Failure.”

Step 5:  Test and Validate

The “Test and Validate” step seeks to operationalize both the design and analytics. But step 5 is also the start of the continuous improvement process from the user experience and analytic model tuning perspectives. Instrumenting or tagging the product or solution becomes critical so that one can constantly monitor its usage: What features get used the most? What paths are the most common? Are there usage patterns that indicate that users are confused? Are their usage paths from which users “eject” and never return?

Step 5 is also where product usage and decision effectiveness metrics can be used to monitor and ultimately improve the user experience. Web Analytics packages (like Google Analytics in Figure 9) provide an excellent example of the type of metrics that one could capture in order to monitor the usage of the product or solution.

Figure 9:  Google Web Analytics

Web analytic metrics like New Visits, Bounce Rate and Time On Site are very relevant metrics if one is trying to measure and improve the usage and navigational effectiveness of the product or solution.

How does the “Test and Validate” step apply to Design Thinking and Machine Learning?

  • Design Thinking – monitor usage and navigational metrics to determine the effectiveness of the product or solution. Create a continuous improvement environment where usage and performance feedback can be acted upon quickly to continuously improve the product’s design.
  • Machine Learning – exploit the role of “Recommendations” to improve or guide the user experience. Leverage the “wisdom of crowds” to continuously fine-tune and re-tune the supporting analytic models predictive and prescriptive effectiveness.

Design Thinking + Machine Learning = Game Changing Potential

I know that I am probably preaching to the choir here, but I am advising my students and my own kids about the power of integrating Design Thinking and Machine Learning. As an example, my son Max is creating the “Strong by Science” brand by integrating the disciplines of Kinesoleogy with Data Analytics. Heck, he’s even written his first book on the topic (which is probably one more book than he actually read his entire high school career). 

But it’s also not too late for us old codgers to also embrace the power of integrating design thinking with machine learning. If you don’t, well, then enjoy being  a human battery powering “The Matrix”…

 

Sentient machines created “The Matrix” to subdue the human population and use the humans as a source of energy.


Design Thinking: Future-proof Yourself from AI

Using Bayesian Kalman Filter to predict positions of moving particles / objects in 2D (in R)

Using Bayesian Kalman Filter to predict positions of moving particles / objects in 2D (in R)

In this article, we shall see how the Bayesian Kalman Filter can be used to predict positions of some moving particles / objects in 2D.

This article is inspired by a programming assignment from the coursera course Robotics Learning by University of Pennsylvania, where the goal was to implement a Kalman filter for ball tracking in 2D space. Some part of the problem description is taken from the assignment description.

  • The following equations / algorithms are going to be used to compute the Bayesian state updates for the Kalman Filter.

    kfalgo.png

    th_kf.png

  • For the first set of experiments, a few 2D Brownian Motion like movements are simulated for a particle.
    • The noisy position measurements of the particle are captured at different time instants (by adding random Gaussian noise).
    • Next, Kalman Filter is used to predict the particle’s position at different time instants, assuming different position, velocity and measurement uncertainty parameter values.
    • Both the actual trajectory and KF-predicted trajectory of the particle are shown in the following figures / animations.
    • The positional uncertainty (as 2D-Gaussian distribution) assumed by the Kalman Filter is also shown as gray / black contour (for different values of uncertainties).

      kfout.png

      motion1.1motion1.2motion1.3

  • The next set of figures / animations show how the position of a moving bug is tracked using Kalman Filter.
    • First the noisy measurements of the positions of the bug are obtained at different time instants.
    • Next the correction steps and then the prediction steps are applied, after assuming some uncertainties in position, velocity and the measurements with some Gaussian distributions.

      motion3

      • The position of the bug as shown in the figure above is moving in the x and ydirection randomly in a grid defined by the rectangle [-100,100]x[-100,100].
      • The next figure shows how different iterations for the Kalman Filter predicts and corrects the noisy position observations. The uncertain motion model p(x_t|x_{t-1}) increases the spread of the contour.  We observe a noisy position estimate z_tThe contour of the corrected position p(x_t) has less spread than both the observation p(z_t|x_t) and the motion p(x_t|x_{t-1})  adjusted state.

      motion3

  • Next the GPS dataset from the UCI Machine Learning Repository is used to get the geospatial positions of some vehicles at different times.
    • Again some noisy measurement is simulated by adding random noise to the original data.
    • Then the Kalman Filter is again used to predict the vehicle’s position at different time instants, assuming different position, velocity and measurement uncertainties.
    • The position and measurement uncertainties (σ_p,  σ_m) are in terms of latitude / longitude values, where uncertainty in the motion model is σ_v.
    • Both the actual trajectory and KF-predicted trajectory of the vehicle are shown in the following figures / animations.
    • As expected, the more the uncertainties in the position / motion model, the more the actual trajectory differs from the KF-predicted one.

      kfout1kfout2kfout3

      motion2.2motion2.3motion2.4motion2.1.gif

Using Bayesian Kalman Filter to predict positions of moving particles / objects in 2D (in R)

Python Overtakes R for Data Science and Machine Learning

Python Overtakes R for Data Science and Machine Learning

This article summarizes a trend in programming languages usage, based on a number of proxy metrics. This change started to be more pronounced in early 2017: Python became the language of choice, over R, for data science and machine learning applications. 

Statistics from Google

Google has one app called Google Trend to find out trends about specific subjects, to compare interest for a number of search topics, broken down by region or time period. 

Search index for Python Data Science (blue) versus R Data Science (red) over the last 5 years, in US

We use the app in question to compare search interest for R data Science versus Python Data Science, see above chart.  It looks like until December 2016, R dominated, but fell below Python by early 2017. The above chart displays an interest index, 100 being maximum and 0 being minimum. Click here to access this interactive chart on Google, and check the results for countries other than US, or even for specific regions such as California or New York.

Note that Python always dominated R by a long shot, because it is a general-purpose language, while R is a specialized language. But here, we compare R and Python in the niche context of data science. The map below shows interest for Python (general purpose) per region, using the same Google index in question.    

Interest for Python, by region (last 12 months; source: Google)

Indeed statistics

Indeed is a job aggregator. The jobs listed there might have expired or could be duplicate, or irrelevant, anyway it is worth having a quick look:

Python Data Science returns 15,741 full time jobs. Top cities in US are:

  • New York, NY (1401)
  • Seattle, WA (1141)
  • San Francisco, CA (1052)
  • Chicago, IL (469)
  • Boston, MA (410)

R Data Science jobs returns 7,533 full time jobs. Top cities in US are:

7,533 full time jobs

  • New York, NY (734)
  • San Francisco, CA (402)
  • Seattle, WA (375)
  • Boston, MA (269)
  • Chicago, IL (260)

Our internal statistics

We have 83 fresh, active job ads, relevant to data science and mostly in US and London, for Python: you can check them out here. For R, we have 66, and you can check them out here. It would be interesting to compare these stats with job number stats from LinkedIn.  

Another metric of interest is the number of articles written about each language, in the context of data science. On Data Science Central, we have 19,500 documents where R is mentioned (since 2008) versus 11,500 with Python. However, when you click on these two links to check out the top results, 9 out of 10 are in 2017 for Python, versus 7 out of 10 for R. In short, R is starting to show its age.  A Google search for R or Python (on Data Science Central) will yield similar conclusions.

It would be interesting to check what is happening with Java and C++, as they have been the workhorses of software development for a long time. 

DSC Resources

Popular Articles

Python Overtakes R for Data Science and Machine Learning

Data Science Simplified Part 9: Interactions and Limitations of Regression Models

Data Science Simplified Part 9: Interactions and Limitations of Regression Models

In the last few blog posts of this series discussed regression models at length. Fernando has built a multivariate regression model. The model takes the following shape:

price = -55089.98 + 87.34 engineSize + 60.93 horse power + 770.42 width

The model predicts or estimates price (target) as a function of engine size, horse power, and width (predictors).

Recall that multivariate regression model assumes independence between the independent predictors. It treats horsepower, engine size, and width as if they are not related.

In practice, variables are rarely independent.

What if there are relations between horsepower, engine size and width? Can these relationships be modeled?

This blog post will address this question. It will explain the concept of interactions.

The Concept:

The independence between predictors means that if one predictor changes, it has the impact on the target. This impact has no relation with existence or changes to other predictors. The relationship between the target and the predictors is additive and linear.

Let us take an example to illustrate it. Fernando’s equation is:

price = -55089.98 + 87.34 engine size + 60.93 horse power + 770.42 width

It is interpreted as a unit change to the engine size changes the price by $87.34.

This interpretation never takes into consideration that engine size may be related to the width of the car.

Can’t it be the case that wider the car, bigger the engine?

A third predictor captures the interaction between engine and width. This third predictor is called as the interaction term.

With the interaction term between engine size and the width, the regression model takes the following shape:

price = β0 + β1. engine size + β2. horse power + β3. width + β4. (engine size . width)

The part of the equation (β1. engine size + β3. width) is called as the main effect.

The term engine size x width is the interaction term.

How does this term capture the relation between engine size and width? We can rearrange this equation as:

price = β0 + (β1 + β4. width) engine size + β2. horse power + β3. width

Now, β4 can be interpreted as the impact on the engine size if the width is increased by 1 unit.

Model Building:

Fernando inputs these data into his statistical package. The package computes the parameters. The output is the following:

The equation becomes:

price = 51331.363–1099.953 x engineSize + 45.896 x horsePower — 744.953 x width + 17.257 x engineSize:width

price = 51331.363 — (1099.953–17.257 x width)engineSize + 45.896 x horsePower — 744.953 x width

Let us interpret the coefficients:

  • The engine size, horse power and engine size: width (the interaction term) are significant.
  • The width of the car is not significant.
  • Increasing the engine size by 1 unit, reduces the price by $1099.953.
  • Increasing the horse power by 1 unit, increases the price by $45.8.
  • The interaction term is significant. This implies that the true relationship is not additive.
  • Increasing the engine size by 1 unit, also increases the price by (1099.953–17.257 x width).
  • The adjusted r-squared on test data is 0.8358 => the model explains 83.5% of variation.

Note that the width of the car is not significant. Then does it make sense to include it in the model?

Here comes a principle called as the hierarchical principle.

Hierarchical Principle: When interactions are included in the model, the main effects needs to be included in the model as well. The main effects needs to be included even if the individual variables are not significant in the model.

Fernando now runs the model and tests the model performance on test data.

The model performs well on the testing data set. The adjusted r-squared on test data is 0.8175622 => the model explains 81.75% of variation on unseen data.

Fernando now has an optimal model to predict the car price and buy a car.

Limitations of Regression Models

Regression models are workhorse of data science. It is an amazing tool in a data scientist’s toolkit. When employed effectively, they are amazing at solving a lot of real life data science problems. Yet, they do have their limitations. Three limitations of regression models are explained briefly:

Non-linear relationships:

Linear regression models assume linearity between variables. If the relationship is not linear then the linear regression models may not perform as expected.

Practical Tip: Use transformations like log to transform a non-linear relationship to a linear relationship

Multi-Collinearity:

Collinearity refers to a situation where two predictor variables are correlated with each other. When there a lot of predictors and these predictors are correlated to each other, it is called as multi-collinearity. If the predictors are correlated with each other then the impact of a specific predictor on the target is difficult to be isolated.

Practical Tip: Make the model simpler by choosing predictors carefully. Limit choosing too many correlated predictors. Alternately, use techniques like principal components that create new uncorrelated variables.

Impact of outliers:

An Outlier is a point which is far from the value predicted by the model. If there are outliers in the target variable, the model is stretched to accommodate them. Too much model adjustment is done for a few outlier points. This makes the model skew towards the outliers. It doesn’t do any good in fitting the model for the majority.

Practical Tip: Remove the outlier points for modeling. If there are too many outliers in the target, there may be a need for multiple models.

Conclusion:

It has been quite a journey. In the last few blog posts, simple linear regression model was explained. Then we dabbled in multivariate regression models. Model selection methods were discussed. Treating qualitative variables and interaction were discussed as well.

In the next post of this series, we will discuss another type of supervised learning model: Classification.


Originally published at datascientia.blog 


Data Science Simplified Part 9: Interactions and Limitations of Regression Models

Data and Analytics; Don’t Trust Numbers Blindly

Data and Analytics; Don’t Trust Numbers Blindly

Data & Analytics have become main-stream. Executives and their boards are increasingly starting to question whether their organizations are truly realizing the full value of the insights. A study suggests that 58% of organizations have difficulties evaluating the quality of the data and its reliability, raising a big question to the stakeholders as to “can you trust your data?” On one hand these is this set of people who are worried about the authenticity of their organizational data, or the data they intend to use.

On the other hand you may encounter a set of people coming up with lame excuses, and claiming that they are happy with their data-sets and find their data to be trustworthy. They are not in need of any kind of data cleansing or data processing or assistance of data management experts. They are not wrong completely at what they feel and so what they say. The recent reports by Gizmodo, The Independent, New York Post and various others, about “Balls have zero to me to me” where Facebook’s AI chatbots Bob & Alice created their own language. Such incidents are enough to send chills down your spines.

Investigations to aforementioned incidents are on, and most likely it would be the bad data or absence of data cleansing process; the root cause. Don’t get us wrong. We advocate data driven decisions. However, on a thoughtful note, all this and much more can happen only if your data is in place. When we are talking about the trustworthiness of your data, it’s the appropriateness and accuracy that we are referring to.

Evolvement from common sense to data sense

We as a society have moved away from decisions made based on limited information or gut feel, to a data and information driven society; where applicability of common sense is minimal or nil. However; the challenge is that though the society has evolved, people have not. Business and enterprises are still being led by baby boomers that are better suited to hunt mammoths and not take financial decisions based on accurate data and insights  derived from them.

Now that, everyone has realized that human judgement in a business context is poor, organizations are increasingly basing decisions on data driven facts. But is their data trustworthy? Let’s see why they should not trust numbers blindly?

1. Question the data tracking set-up

Believe it or not, but a lot of things can go wrong. Even Google Analytics is prone to mistakes, which is backed up with this discussion on GA data. Anything and everything starting from data collection to data integration, data interpretation to data reporting; should be questioned rigorously. For example, events not named in an explanatory fashion, inclusion of start date and many more; can lead decision analysts to commit errors while calculating results.

2. Question the interpretation of numbers

Yes, the numbers can be misinterpreted if the context is not understood completely. Sales manages, would die thinking why the conversions rate was not going up, even after making improvements to purchase funnel. Unless it was questioned to discover that sales team has started an acquisition campaign, which did result in higher volume of visitors who were ‘less qualified’ than earlier; and hence less conversions.

In an opposite situation, if the conversion rate had skyrocketed, no one would have questioned the positive numbers and the sales manager would have taken pride in the hike of the conversion rate.

3. Question the successful metric

It’s time to move away from one size fits all belief. One key success metric for all, will certainly not work. For example, in publication industry – the user behavior varies depending on the device used, and so does the metrics too. They do not have the easy task of measuring against a purchase funnel, as in the aforementioned case, and it can be challenging to find the right KPIs to be taken care of. The most important aspect here is content consumption, which tends to show low performance on mobile, wrongly portraying that there is a problem.

Also at times, the metric is either neglected or not adapted completely. A same scroll depth target of 75% for desktop and mobile users both; is as good as neglecting the metric. On the mobile all the left hand and right hand elements of the desktop page were stacked one on another under the main content. It made the user to go only 50% upon reading the full article.

4. Question the good numbers first

If you witness a huge drop of conversion from page 4 to 5 in the purchase flow, it is obvious that you will check the user experience. But what about if the conversion goes up between those two pages, due to the reason that your user missed out on a crucial piece of information and all they saw was the “next” button. Numbers have the tendency to make you feel all is well, when actually it is not. Irrespective of the kind of analytics one uses, Predictive – Descriptive or Prescriptive; it cannot replace the value of regularly watching customers use your services.

5. Question the KPIs

One of the big culprits – thankfully slowly becoming irrelevant – is page views. If set as a target, someone will surely find ways to grow this KPI without any improvement in customer behavior. And this is probably how these endless galleries of images were born, where each picture counts as a page view, or articles broken down in multiple pages, with no benefit to the users.

Don’t take numbers for granted

Check and verify numbers, good and bad both, as if you are a quality assurance manager and testing the code. The most glorified numbers are the ones which can damage your business the most. Be critical and challenge the numbers. Also ensure to adapt the metrics completely, as you enhance your own product. And lastly, do not forget to combine it with qualitative insights.


Data and Analytics; Don’t Trust Numbers Blindly

Comprehensive Repository of Data Science and ML Resources

Comprehensive Repository of Data Science and ML Resources

Here are 29 resources, mostly in the form of tutorials, covering most important topics in data science: This resource is part of a series on specific topics related to data science: regression, clustering, neural networks, deep learning, Hadoop, decision trees, ensembles, correlation, outliers, regression, Python, R, Tensorflow, SVM, data reduction, feature selection, experimental design, time series, cross-validation, model fitting, dataviz, AI and many more. To keep receiving these articles, sign up on DSC.

Comprehensive Repository of Data Science and ML Resources

Source for picture: click here

DSC Resources

Comprehensive Repository of Data Science and ML Resources