Michael Hay

Hitachi Data Systems Works with Maxeler Technologies

Blog Post created by Michael Hay on Jan 3, 2017

In the post that Shmuel and I published last month (The Many Core Phenomena) at the end we hinted about some upcoming news.

Hitachi has demonstrated a functional prototype running with HNAS and VSP to capture finance data and report on things like currency market movements, etc. (more on this in the near future).

Well there is obviously more to the story than just this somewhat vague statement, and Maxeler Technologies announced our mutual collaborations around a high fidelity packet capture and analytics solution.  To provide a bit more detail I'm embedding a video, narrated by Maxeler's Itay Greenspan, within this post.


Joint Maxeler HDS CPU-less Packet Capture Plus AnalyticsCommentary
As HDS and Maxeler set out on our collaborative R&D journey we initially were inspired by market intelligence related to an emerging EU Financial Services directive called MIFID II.  This EU directive, and its associated regulation, was  designed to help the regulators better handle High Frequency Trading (HFT) and so called Dark Pools.  In other words to increase transparency in the markets.  Myself, Shmuel Shottan, Scott Nacey, and Karl Kholmoos were all aware of HFT efforts because we ran into folks from a "captive startup" that "spilled the beans."  Essentially, some Financial Services firms were employing these "captive startups" to build FPGA based HFT solutions, enabling money making in a timespan faster than the blink of an eye.  So as Maxeler and HDS approached our R&D we assumed a hypothetical use cases which would enable the capture and decode of packets at the speed equivalent to HFT.  We then took the prototype on the road to validate/invalidate the hypothesis and see where our R&D actions would fit in the market.  Our findings were surprising, and while the prototype did its job of getting us in the door we ultimately ended up moving in a different direction.


As the reader/viewer can see in the video we leveraged many off the shelf components/technologies -- we actually used generation -1 tech, but heck who's counting.  As stated in the video, we accomplished our operational prototype through the use of Maxeler's DFE (Data Flow Engine) network cards, Dataflow based capture/decode capability executing on Dataflow hardware, a hardware accelerated NFS client, Hitachi's CB500, Pentaho, and Hitachi Unified Storage (HUS).  While related in the video, a point worthy of restating is: All of the implemented hardware accelerated software functions fit on about 20% - 30% of the available Dataflow hardware resources, and since we're computing in space more than a super majority of space remains for future novel functions.  Furthermore, the overall system from packet capture to NFS write does not use a single server side CPU cycle! (Technically, the NFS server, file system and object/file aware sector caches are all also running on FPGAs.  So, even on the HUS general CPUs are augmented by FPGAs.) 


As serial innovators we picked previous generation off the shelf technologies for two primary reasons.  The first and most important was to make the resulting system fit into an accelerate market availability model -- we wanted the results to be visible and reachable without deep and lengthy R&D cycles.  Second, was an overt choice to make the prototype mirror our UCP (Universal Compute Platform) system so that when revealed we could be congruent with our current portfolio and field skill sets.  Beyond these key points, a secondary and emergent benefit is that the architecture could readily be extended to support almost any packet analysis problem.  (While unknown to us at the time the architecture also resembles both the Azure FPGA accelerated networking stack and is close to a private version of Amazon EC2 F1 lending further credibility to it being leading edge and general purpose.)   Something that was readily visible, during our rapid R&D cycle, is Maxeler's key innovation lowering the bar for programming an FPGA from needing to be an Electrical Engineer/Computer Engineer to being a mere mortal developer with knowledge of C and Java.  For reference, what we've historically observed is a FPGA development cycle which takes no less than 6-months for a component level functional prototype, and in the case of Maxeler's DFEs and development toolchain we witnessed 3-4 weeks of development time for a fully functional prototype system.  This is well dramatic!  For a view on Maxeler's COTS derived FPGA computing elements (DFEs), and our mutual collaboration let me quote Oskar Mencer (Maxeler's CEO).

Multi-scale Dataflow Computing looks at computing in a vertical way and multiple scales of abstraction: the math level, the algorithm level, the architecture level all the way down to the bit level. Efficiency gained from dataflow lies in maximizing the number of arithmetic unit workers inside a chip, as well as a distributed buffer architecture to replace the traditional register file bottleneck of a microprocessor. Just as in the industrial revolution where highly skilled artisans get replaced by low-wage workers in a factory, the super high end arithmetic units of a high end microprocessor get replaced by tiny ultra-low energy arithmetic units that are trained to maximize throughput rather than latency. As such we are building latency tolerant architecture and achieve maximum performance per Watt and per unit of space.


The key to success in such an environment is data, and therefore partnership between Maxeler and Hitachi Data Systems is a natural opportunity to maximize value of storage and data lakes, as well as bring dataflow computing closer to the home of data. (Oskar Mencer)

Projecting into the now and ahead a bit, firstly we're "open for business."  Maxeler is in the HDS TAP program and we can meet in the market, engage through HDS and when it makes sense we (HDS) are keen to help users directly realize extreme benefits.  As for targets we need a tough network programming or computing problem where the user is willing to reimagine what they are doing.  In the case of the already constructed packet capture solution we could extend, with some effort, from financial packet analysis to say cyber packet forensics, Telco customer assurance, low attack surface network defenses and so on.  For other potential problems (especially those in the computing space) please reach out. With respect to projecting a bit in the future, I want to pull forward some of Oskar's words from his quote to make a point: "[Computing] per unit space."  This is something that I really had to wrap my head around to understand and I think it is both worthy of calling out and explaining a bit -- the futurism aspect will come into focus shortly.  Unlike CPUs which work off of complex queuing methodologies computing in time, Maxeler's DFEs and more generally FPGAs compute in space.  What that means is that as data is flowing through the computing element it (the data) can be processed at ultra low latencies and little cost.  This is nothing short of profound because it means that in the case of networking the valueless action of moving data from system A to system B can now provide value.  This is in fact what Microsoft's Azure FPGA acceleration efforts for Neural Networks, Compression, Encryption, Search acceleration, etc. are all about.  To drive the point home further, what if you could put a networking card in a production database system and through the live database log ETL the data, via a read operation, immediately putting it into your data warehouse?  This would completely remove the need for an entire Hadoop infrastructure or performing ELT, and that means computing in space frees data center space.  Putting ETL on a programmable card is my projection ahead to tease the reader with now possible use cases, and further ETL logic executing on a Dataflow card gets down to computing in space not time!