Founder and CEO of DotData, Ryohei Fujimaki, explains how automation can help the data science industry become more efficient.
Of the many technologies that will shape how we work in the future, automation is one of the most hotly debated. Some look forward to the new avenues it will open up while others fear it will make their skills redundant. Dr Ryohei Fujimaki, founder and CEO of data science company DotData, believes that data scientists are among those that will benefit the most.
Fujimaki’s team at DotData is helping companies accelerate their data science process. Its clients include global financial institution SMBC, where data science projects have increased by 40 times, developing 2m features per year. Another is US Electrical Services, which DotData helped build and deploy working AI models in six weeks. The company had no data scientists or prior data science experience.
Here, Fujimaki explains how automation platforms can make AI and machine learning processes more efficient and give data scientists time for higher-value, mission-critical projects.
‘With automation platforms, data scientists can become more productive by accelerating the development of AI models and automating the development of feature tables’
– RYOHEI FUJIMAKI
Why do you believe automation is particularly useful for data scientists?
The data science process is made up of several highly manual steps that require multiple skillsets as well as a great deal of subject-matter expertise. Broadly speaking, there are six distinct phases in the data science development process. Of these, two take the most amount of time: the selection and optimisation of machine learning algorithms, and what’s known as ‘feature engineering’.
Feature engineering is by far and away the most time-consuming process, often requiring months to complete. By automating these, as well as other processes, AutoML 2.0 platforms can cut the AI and machine learning development time from months to as little as a few days.
What are the kinds of data science processes that can be automated?
Data science automation platforms can be classified into two categories: AutoML platforms and AutoML 2.0 platforms. AutoML platforms traditionally focus on automating the process of selecting and optimising machine learning algorithms. This cuts development time by allowing developers to test feature tables against multiple algorithms to select the optimal one.
Feature engineering, however, is a far more time-consuming, repetitive and complex process. A new generation of AutoML 2.0 platforms are now making it possible to automate the entire data science lifecycle, including feature engineering and AI data preparation as well as making operationalisation possible – even in real-time applications like IoT.
What kinds of projects will automation help free data scientists up for?
There are two huge benefits to automating the data science lifecycle. First, it allows data scientists to spend more time on productive experiments to create optimal AI models rather than spending months on developing the complex feature tables necessary for model development. With AuoML 2.0 platforms, data scientists can accelerate their output from a handful of models per year to hundreds or more models in the same timeframe.
More importantly, AutoML 2.0 platforms make it possible for an entire new class of users to develop AI and machine learning models. The ease of use and automation of AutoML 2.0 platforms allows business-intelligence developers and analysts to build models for just about any type of predictive analytics use case, ranging from forecast optimisation to managing customer churn and far more.
Should data scientists be worried about losing their jobs to automation?
AutoML 2.0 platforms and data science automation are not going to eliminate the need for data scientists. In fact, with data science automation platforms, data scientists can become more productive by accelerating the development of AI models and by automating the development of feature tables.
In addition, by enabling a self-service approach to AI and machine learning development for business intelligence professionals and data analysts, the data scientists in the organisation can focus their efforts on more time-consuming, higher-value projects that are critical to the organisation.
What advice would you give people working in this industry to embrace automation?
The most important part of embracing automation is to understand where it is likely to provide the greatest benefit and what the greatest risks of failure are. In our experience, the biggest risk of failure comes when companies try to experiment with AI and begin without clear, compelling, measurable use cases.
For example, do you want to predict customers likely to churn? Do you want to improve your forecasts? How will you measure the success or failure of your AI models? These are all pre-conditions to beginning your AI and machine learning development efforts that must be thought through.
In addition, data science automation through AutoML 2.0 can also provide huge benefits in enabling an entirely new class of users – specifically business intelligence and business analysts – who can leverage automation to build predictive analytics systems faster and more efficiently.