Adriana Romero

A Deep-Dive into 2015’s Hottest Profession:  The Data Scientist

Blog Post created by Adriana Romero on Dec 1, 2015

At HDS and just about every tech company across the globe, the role of Data Scientists has reached fame-like status. Data Scientists have been called “the sexiest job of the 21st century” amongst many other things.

 

Screen Shot 2015-11-30 at 2.09.05 PM.png

Ashok Nirsoe is a Data Scientist on the Social Innovation and Global Industries team at HDS.  He started his 15-year long Data Science career at Liberty Media as an instrumental member of their BI Team. While he learned the basics at Liberty Media, Ashok’s experience at Shell is where he learned to apply those basics. With a BA in Software Engineering, Ashok was always interested in numbers and intrigued by the business challenges Data Scientists are tasked with solving on a day to day basis. With no golden model out there, there is always a challenge to solve business specific use cases.

 

What Does a Scientist Do?

Picture1.png

Many people understand that these Data Scientists solve business

problems through making sense of big data, but the question becomes “What exactly do these Data Scientists do on a day-to-day basis”?

 

I tracked down Ashok Nirsoe to understand what consumes a Data Scientist’s day to day activities. Read below to get a sense of what keeps him occupied.

 

Picture2.png

06:30AM – Alarm goes off, but biological clock woke him up at 06:15AM – He’s getting ready to get in to the groove!

06:45AM – Of course the daily routine kicks in (Shower, brushing teeth, etc.)

Picture3.png

07:30AM – The daily commute begins. Oh, the joys of living in Silicon Valley

08:00AM – Arrives in the office. First things first, reads emails and catches up on the latest technology news (what's hot today and what's not) including news from back home in Netherlands/Europe (Ashok is a NL native!)

08:30AM – Contemplates ideas to predict workload on VSP/G1000/G200 using performance data

09:00AM – Answers emails, joins daily calls with potential Hitachi Live Insight for IT Operations POC’s including demos all while going through Spark documentation

Picture4.png10:00AM – Another day, another meeting. Answers questions from Sales and Pre-Sales Teams

11:00AM – Starts playing with newly deployed Hadoop cluster and primarily focuses on data ingestion and movement as part of Hitachi Live Insight for IT Operations

11:30AM – Clones Hadoop cluster for colleague and tries to raise awareness of Spark within team

Picture5.png

11:45AM – Enjoys lunch with colleagues in the Hicafe; Eats a sandwich and bowl of soup of the day while working on tan in the California sunshine. Sadly, his natural tan is starting tofade away since the move from NL…

12:30PM – Cloning Hadoop cluster is done and he hands over to his colleague. She wants to try Spark… another win for the pro-Spark fan base

1:00PM – Samples statistics tools in the lab, primarily focusing on Logistics Regression. Runs batch input data 7 days of VSP data

2:30PM – Dedicates some time to POC reporting and open issues. This includes general housekeeping, automation and providing answers on some analysis questions

3:00PM – Statistical batch finished and now analyzes output to determine accuracy of output

3:15PM – Not happy with batch output and therefore tweaks parameters to optimize output / results

Picture7.png3:30PM – Syncs up with colleague on power consumption use case using Hitachi sensors (temperature, humidity) attached to a Hitachi Unified Compute Platform (UCP) 19" rack with smart PDU (Power Distribution Unit)

3:45PM – Sets up data collection, agreed 5m interval for now using IPMI (Intelligent Platform Management Interface) data obtained from VMware and Smart PDU

4:00PM – Focuses on automated data collection. Statistical batch finished, but still6.png not happy with the output as it’s not usable in current format. Maybe a different approach is required such as PCA (Principal Component Analysis) or CCA (Canonical Correlation Analysis) or KCCA (Kernel Canonical Correlation Analysis)? Needs to sync up with the team

4:05PM – Discusses with friends / ex- colleagues on Lync about data pipelines in AWS (Amazon Web Services)

8.png4:30PM – Reads about Azure IoT platform and its architecture. Meanwhile, he reverse engineers ChangePoint so that colleagues in GSS can use Pentaho

5:00PM – Because parsing storage data is a pain, reviews options to make it faster, but not losing on accuracy. A native API would be the best solution but knows certain team member will not like the idea.

9.png

6:00PM – Time to focus on a high priority POC. Next step: Converts all data acquisition tools to PowerShell as the customer wants to keep the number of third party tools to a bare minimum. PowerShell to the rescue! Fight with PowerShell for extraction tuples from JSON strings. End result: Finds out that the BOM character was killing his regular expression

7:00PM – Tests PowerShell scripts in lab prior to sharing with team

7:30PM – Dinner time

8:30PM – Replies to emails and sends updates to colleagues in EMEA & APAC

regarding POCs

9:00PM - Works on documentation

smiley-face.jpg10:00PM - Reads about Open Source ML libraries (AKA: Working ahead trying to be one or two steps ahead of Sales / Pre-Sales)

11:30PM – Signs off, another day gone by so quickly

 

 

 

Learn more about our current offering, Hitachi Live Insight for IT Operations here and be sure to view our Solution Overview Video!

 

Have any questions for Ashok Nirsoe? If so, send them my way!

Outcomes