2) Loading Data into H2O with Flow

This Blog entry is from the Deep Learning section in Learn R.

In this Blog entry a logistic regression model will be created, using Flow, achieving the same results as achieved in the GLM functions of R and Exhaustive.

In the Flow user interface, start by navigating:

Flow >>> New Flow


If prompted to create a new workbook, affirm this:


To add a cell for the importing of data, navigate to:

Data >>> Import Files


It can be seen that Import Files Cell has been added to the Flow:


In the Search dialog box, enter the location of the FraudRisk.csv file until a drop down is populated, for example:


Click on the Search Icon to bring back the contents of this directory:


Click on the file or plus sign to add the file to the cell:


Click the Import Button to import the file to H2O:


Note that the file is not parsed to the H2O column compressed format, known as Hex.  To achieve parsing, simply click the button titled 'Parse These Files':


The next screen allows for the specification and data types to be more robustly configured.  In this example, a cursory check to ensure that the data types are correct is sufficient:


Upon satisfaction, click parse to mount the dataset in H20 as Hex:


A background job will start the process of transforming the data from FraudRisk.csv to the H2O hex format:


H2O supports the concept of training and validation datasets robustly, henceforth the hex file needs to be split into training and validation.  To split a Hex frame, navigate to:

Data >>> Split Frame


Click on the menu item to create the split data frame cell:


Select the frame to be split, in this case FraudRisk.hex:


The default frame split is 75% by 25%, confirm this by clicking the Create button:


There now exists two frames in the flow, the smaller of which will be used for validation: