Modules#

vTarget is made up of 4 modules designed to speed up and simplify the discovery of insights in Analytics and/or Machine Learning use cases.

  • DataPrep: Use drag & drop nodes for data preparation or descriptive analytics

  • DataViz: Convert data and tables into graphs with simple settings

  • AutoML: Model binary, multi-class and regressor problems in minutes

  • AutoTS: Forecast time series comparing multiple models and pipelines with minimal setup

DataPrep#

Shortcuts#

  • Open Folder Ctrl+Shift+O

  • Open File Ctrl+O

  • (Dataprep) Group Nodes Ctrl+G

  • (Dataprep) UnGroup Nodes Ctrl+Shift+O

Nodes#

IN/OUT#

Input#

Add data sources to your stream, connecting csv, xlsx, json, and txt files from your file system.

Output#

Exports the output table of any node to a local file in csv or excel format

vOutput#

Allows you to connect the output table of a node to the DataViz, AutoML and AutoTS modules without the need to export a file and read it again. In this way, the same RAM memory space that is already used by that data is reused.

PREPARATION#

Filter#

Gets a subset of rows by applying a condition on any one column. The available operators vary depending on the selected data type.

Note

In case you need to write a more complex condition, you can use the Custom Filter area and keep the pandas syntax. Consider that your variable that contains the dataframe (output table) is called df.

(df["COLUMN_A"] > 2) & (df["COLUMN_B"] < 10)

In code terms, the above represents this part of the structure

df = df[<YOUR_CONDITION>]

Formula#

It allows performing operations on one or multiple columns, either updating the value of a column, or creating a new column.

  • Output Column: Choose the column on which you want to operate. By default New Column is selected to add a new column with the expression you enter.

  • New Column Name: Assigns a name to the new column, only if Output Column has the New Column option selected.

  • Sentence: allows you to enter any sentence following the pandas syntax.

Note

Consider that your variable that contains the dataframe (output table) is called df. Here you can use any Built in <#>`_ library of ``Python. Some examples of what you could write:

# Math operation
df['ANY_NUM_COLUMN'] * 20 + df['ANOTHER_NUM_COLUMN'] * .5

# Rounding a numeric column
round(df['ANY_NUM_COLUMN'], 2)

# Extract the start of a month
df['DATE_COL'].dt.strftime('%Y-%m-01')

# Fill null values with an empty str
df['ANY_COLUMN'].fillna('')

# Incremental counter per group
df.groupby(['ANY_COLUMN']).cumcount()+1

# Anonymous function to categorize positive and negative numbers
df['ANY_COLUMN'].apply(lambda x: 'POS' if x >= 0 else 'NEG')

In code terms, what you write represents this part of the structure:

df[OUTPUT_COLUMN or NEW_COLUMN] = <YOUR_SENTENCE>

Sort#

Sort values by one or multiple columns with ascending or descending option

Unique#

Extracts all unique (different) values from a column

Dtype#

Change data types or rename one or multiple columns at once

Column#

Select or remove columns from an output table. It also allows you to rename columns

IsIn#

Returns all rows where an element of the source is contained in values

Drop Duplicates#

Remove duplicate rows by one or more columns

Cleansing#

Perform preconfigured cleanup tasks on selected columns

  • Replace Nulls: with blanks or zeros

  • Remove Unwanted Chars: espacios al inicio o final, tabuladores, saltos de línea, espacios en blanco duplicados, todos los espacios en blanco, letras, números

  • Modify Case: Upper|Title|Lower case

Switch#

Allows you to perform multiple conditions and assign a value to a new column based on the expression.

Warning

The conditions are evaluated in order, if the first one is fulfilled, the next one will not be evaluated.

Note

The equivalence of this node, corresponds to writing this function in pandas (Python)

def rule(x):
  if x <= 10:
    return '1 a 10'
  elif x <= 20:
    return '11 a 20'
  elif x <= 25:
    return '21 a 25'
  elif x <= 30:
    return '26 a 30'
  else:
    return '30+'
df['NEW_COLUMN'] = df['ANY_NUM_COL'].apply(rule)

JOIN#

Cross#

Generate a matched join of each row in the first table with each row in the second table. This type of join is also known as a Cartesian join.

Multi Join#

Warning

This functionality is not yet implemented, soon available.

Merge#

Join tables database-style, specifying one or more keys to join and selecting which type of join you want to perform left (L), inner (J) right (R)` ` and ``outer (F) (You can select more than one at a time).

Note

The outer (F) option will add a indicator column called _merge containing a left_only, right_only or both categorization depending on the occurrence in each of the input tables.

Concat#

It allows concatenating two tables on the index axis, that is, it adds the rows of input B, after the input table A

AGGREGATION#

Group By#

It groups an input table by one or multiple columns, allowing you to select a preset aggregation method according to the type of data, for the columns you want.

Note

Available aggregation actions

  • Summary: sum, count, count distinct, count null, min, max, first, last

  • String: count blank, count non blank, mode

Pivot#

Rotate the orientation of a table by moving fields from the vertical axis to a horizontal position, and add an aggregation method for a given column.

Note

Aggregation methods vary by data type

  • Numeric: sum, avg, count (without nulls), count (with nulls), percent row, percent column, total column, total row

  • String: count blank, count non blank, mode

  • The Add Margin option adds full summarization for rows and columns*

Melt#

Converts wide-format data to long-format data, by compressing columns into a single list

Cumsum#

It performs operations associated with the accumulation of numerical variables, with the option of segregating the accumulation by any grouping column. Specifically you can perform these operations:

  • cum count: incremental numbering

  • cum sum: incremental sum

  • cum pct: cumulative percentage

  • pct: percentage of the row with respect to the total

PARSER#

DateTime FORMAT#

Transforms the data type of a column with date format, using pre-established formats. Performs two operations depending on the type of input data:

  • (In) String: returns a column of type Datetime

  • (In) Datetime: returns a column of type String

Note

If you require any custom settings not present in the default formats you can add your own by following Python nomenclature

Most common date format codes#

Key

Value

%j

Mon (abbreviated day)

%A

monday (day)

%d

01 (day | 01-30)

%B

january (month)

%b

jan (abbreviated month)

%m

01 (month | 01-12)

%y

23 (year without century)

%Y

23 (year with century)

%H

23 (time format 24 | 00-23)

%I

10 (hour format 12 | 01-12)

$M

59 (minutes | 00-59)

$S

59 (seconds | 00-59)

DateTime Take#

Extracts parts of a date or datetime field, default options are getting date, time, year, month, day , hour, minute, second.

SCRIPT#

Code#

Lets you write snippets of Python code, keeping a few considerations in mind:

Note

  • The input ports of this box are dynamic, that is, if you connect a link to the first IN output, a 2nd one will automatically be created with the name IN2 with a limit of 5 input ports as maximum.

  • Each input port will be available in your script with the name df_in, df_in1, df_in2, df_in3, df_in4 and df_in5 as a DataFrame of Pandas.

  • Only one output can be written, which must be a Pandas dataframe, to rescue it in the output port call the vtg_codeout(my_df) function already existing in the scope of Python variables.

  • Si requieres alguna librería que no esté incluida, al importarla se instalará automáticamente (solo la primera vez retrasará la velocidad de ejecución debido a la instalación).

# Get 10 samples from the dataframe linked to the first input port
df = df_in.samples(10)
# And the samples are written to the output of the code node
vtg_codeout(df)

TIMESERIES#

Inter Row#

Perform operations involving multiple rows at once, such as shift, diff, pct_change. You also have the option of performing the operation (shift, diff or pct_change) grouping by any column.

SUMMARY#

shape#

Returns the number of rows and columns in an output table.

Value Count#

Counts duplicate values, either from a single column or from the mix of multiple columns

Describe#

Generates descriptive statistics, including a summary of central tendency, dispersion, and the shape of the distribution for single or multiple columns (excluding NaN values).

Note

Additional options to complement and speed up data exploration.

  • Group by a certain column.

  • Add custom percentiles

CHART#

Warning

The in-flow graphs do not allow to graph more information than the one shown in the output table, which by default is limited to 50 rows (you can change this limit in the configuration of each node). If you need to see a graph of all the information, use the DataViz module

Bar#

Generates a bar graph within the same flow, being able to move and position it the same as any other node

Line#

Generates a line graph within the same flow, being able to move and position it the same as any other node

Pie#

Generates a pie graph within the same flow, being able to move and position it the same as any other node

Scatter#

Generates a scatter graph within the same flow, being able to move and position it the same as any other node

DataViz#

Warning

Comming soon

AutoML#

Warning

Comming soon

AutoTS#

Warning

Comming soon