Simplifying AI and Machine Learning with IBM Watson Studio
Simplifying AI and Machine Learning with Watson Studio
Co-author: Henry Zou, Guo Chen
IBM Watson is a question-answering computer system capable of answering questions in natural language. It was developed from IBM Thomas J. Watson Research Center, and named after IBM’s founder Thomas J. Watson. Watson was created for the purpose of open domain question answering, which is to build a system that automatically answers questions posed by humans in natural language (any language human use without conscious planning). Some of the core technologies IBM Watson involves are Natural Language Processing (mainly concerns with the interactions between computers and human languages, especially how to analyze large amounts of language data); Information retrieval (obtain relevant information from a collection of all data resources); Knowledge Representation and Reasoning (represent information in a format that computer and artificial intelligence could utilize to solve tasks); Automated Reasoning (develop computer science metalogic to understand reasoning automatically); and Machine Learning (computer algorithms to improve automatically through the use of data).
As part of strong forces to enter the industry of artificial intelligence and regain leading positions in tech fields, IBM Watson has introduced the latest version of Watson Studio. This blog will give a brief overview of Watson Studio, the problems that it might solve, its potential applications in the movie recommendation system, its strengths and limitations, as well as additional data support.
Watson Studio provides a suite of tools and a collaborative environment for data scientists, developers, and domain experts. It is a cloud atmosphere for users to build, run, and manage AI models. The support of IBM Cloud, IBM data, will accelerate time for a flexible AI architecture. A key feature of Watson Studio is its automated AI lifecycles with the ModelOps pipeline. This ModelOps pipeline focuses on the governance and life cycle management of AI and decision models, which include machine learning, graphs, rules, optimizations, and linguistics. It is, truly, the heart of any enterprise AI strategy. Watson Studio also builds models visually and programmatically, to fully explain AI components from the pipeline. It allows multiple open-source frameworks, like PyTorch, TensorFlow, and scikit-learn. Other development tools, such as IDEs, JupyterLab, Python, R, and Scala really strengths Watson Studio’s flexibility.
To develop an ML system of industrial strength, it is not uncommon for developers to utilize a huge number of tools: databases, monitoring systems, data exploration, data cleaning, testing, etc. An update of one of the tools in the pipeline that generates intelligent predictions may break the entire system, for example, the latest machine learning framework tweaked the way it conducts broadcasting rules may make problems show up in many places in your workflow. It is not an easy task to keep track of everything that’s going on every platform or tool used in your ML system, let alone going to everyone one of them to fix the problems.
It is only natural for developers to wonder: what if there is a tool that can incorporate and unify the entire development cycle? Or at least part of it? IBM Watson is built to answer this question, with remarkable user-friendliness. And we will take you through a crash course on how to move from data loading to deployment in less than one hour and all on IBM Watson Studio.
First things first, you have some data (be in from some API or in your local hard drive), and you want to extract insights from it. After creating your project in the Studio, you can directly upload it into your project workspace, which is stored on IBM cloud storage. Alternatively, if you need to get data from online sources, you can do that by adding an “asset” to your deployment space, all within a few clicks.
Figure 1. Adding connected data for analysis
After loading in your data, the Studio provides you with plenty of powerful tools to visualize, summarize, clean, refine, and engineer your data. In the “refine” section when you open up your data files, it automatically profiles your data by feature and provides summary statistics and visualizations. In the “visualizations” tab, you can customize what to present and how you want to explore your data, as shown in Figure 2. The ease of use easily rivals other visualization tools such as Tableau. If you notice things not right about your data, e.g. users somehow gave the movie Inception a rating of 100,000 stars, you can easily remove those outliers too.
Figure 2. Easy data visualization
Once your data is ready, the Studio gives you tools such as IDE for R and Python notebooks preloaded with most mainstream data analysis and machine learning packages. And here is another great advantage of the Studio: it has many machine learning models built-in for you to use without writing a single line of code! The only thing you need to do is to choose a feature to predict, then the Studio can automatically predict which task this is and choose a number of appropriate models to train. Though the Studio is not able to provide pre-packaged tools to handle problems that deal with Sparse data (such as movie recommendation), we gave it a try to predict movie ratings from data of rating history. The Studio then trains 8 models that it believes are suitable for the task, and gave the result in Figure 3., with the best accuracy of 37.1% using XGBoost. This significantly shortens the life cycle of building production-ready ML models.
Figure 3. Predicting movie ratings
And finally, you’ll also find that deploying the model is too a breeze: by creating a deployment space and promoting your model there, you are one click away from deploying the model (Figure 4). You can find the link to our deployment at the end of the article.
Figure 4. Ready to deploy
Hopefully, the quick run-through can help you get a sense of how much the IBM Watson Studio can unify the workflow of developing an ML system, and greatly shorten the development cycle.
We have planned to develop a movie recommendation system that aims to integrate the application of Watson Studio. Our movie recommendation allows user to input their <userid> and the system will output a rank of recommended movies. The system has a database of movie names, online ratings, genres, etc. It has a certain percentage for each category and adds up to a final score for each film, which is the primary indicator of recommendation ranking. Our movie recommendation system involves several artificial intelligence-related techniques, such as Machine Learning, Regression, and Neural Networks (a subset of natural language processing). The system currently manages different tasks in various spaces, both locally and in the cloud. Processes for data input, data training, testing, and evaluation are kept in separate pipelines. The whole system will be benefited from Watson Studio, because its unique design allows us to manage all our tasks, from start to end, on one single platform. It saves our time and energy to check different spaces, promoting a much easier development process.
One of our worries about integrating Watson Studio is that we use different languages for testing, training, etc. Nevertheless, with Watson Studio’s support for different programming languages and frameworks, the problem is solved. With all-in-one platform design, our effort to monitor the recommendation system will be more comprehensive and explainable.
From our perspective, Watson Studio’s strength mainly lies in its integration effort to simplify artificial intelligence and machine learning process. Its idea to unify tools, processes, and monitor models for bias, trust, and transparency accelerates the deployment time. Features that allow an automatic build of model pipelines, open-source notebooks, and customized model monitors and metrics will increase the productivity for models. With this distribution, the speed is increased and costs are reduced. It will minimize the development error in the short term. While in the long term, with flexible models and easier build and application of artificial intelligence, Watson Studio will optimize the company’s AI strategies, and promote ideal cloud economics.
While Watson Studio has announced its commercial launch with several market players, there are drawbacks that could be addressed and improved. The mobile accessing experience for Watson Studio is not satisfactory. Since lots of developers might need to monitor and gain information from the platform through phones, a user-friendly mobile App will be appreciated. The Watson Studio desktop requires relatively large disk spaces and takes a long time to install. Watson Studio has not provided enough tutorials and technical support for users, which in result causes lots of confusion in user interactive experiences. Nevertheless, Watson Studio is still evolving. With the updated 2021 version coming soon, we believe that many of the drawbacks will be solved.
We have developed some actual running data result based on movie recommendation system and IBM Watson Studio:
Model deployment endpoint:
https://jp-tok.ml.cloud.ibm.com/ml/v4/deployments/e680d2dd-c026-4ecf-8275-4e45861bcce3/predictions?version=2021-03-23
Code written:
Bearer <token>:
import requests
# NOTE: you must manually set API_KEY below using information retrieved from your IBM Cloud account.
API_KEY = “<your API key>”
token_response = requests.post(‘https://iam.cloud.ibm.com/identity/token', data={“apikey”: API_KEY, “grant_type”: ‘urn:ibm:params:oauth:grant-type:apikey’})
mltoken = token_response.json()[“access_token”]
header = {‘Content-Type’: ‘application/json’, ‘Authorization’: ‘Bearer ‘ + mltoken}
# NOTE: manually define and pass the array(s) of values to be scored in the next line
payload_scoring = {“input_data”: [{“fields”: [array_of_input_fields], “values”: [array_of_values_to_be_scored, another_array_of_values_to_be_scored]}]}
response_scoring = requests.post(‘https://jp-tok.ml.cloud.ibm.com/ml/v4/deployments/e680d2dd-c026-4ecf-8275-4e45861bcce3/predictions?version=2021-03-23', json=payload_scoring, headers={‘Authorization’: ‘Bearer ‘ + mltoken})
print(“Scoring response”)
print(response_scoring.json())
We thank you for reading this blog. Shall you have any questions, please contact one of our authors: cunhanz@andrew.cmu.edu or guochen2@andrew.cmu.edu.