Modules#
vTarget is made up of 4 modules designed to speed up and simplify the discovery of insights in Analytics and/or Machine Learning use cases.
DataPrep: Use drag & drop nodes for data preparation or descriptive analytics
DataViz: Convert data and tables into graphs with simple settings
AutoML: Model binary, multi-class and regressor problems in minutes
AutoTS: Forecast time series comparing multiple models and pipelines with minimal setup
DataPrep#
Shortcuts#
Open FolderCtrl+Shift+O
Open FileCtrl+O(Dataprep)
Group NodesCtrl+G(Dataprep)
UnGroup NodesCtrl+Shift+O
Nodes#
IN/OUT#
Input#
Add data sources to your stream, connecting csv, xlsx, json, and txt files from your file system.
Output#
Exports the output table of any node to a local file in csv or excel format
vOutput#
Allows you to connect the output table of a node to the DataViz, AutoML and AutoTS modules without the need to export a file and read it again. In this way, the same RAM memory space that is already used by that data is reused.
PREPARATION#
Filter#
Gets a subset of rows by applying a condition on any one column. The available operators vary depending on the selected data type.
Note
In case you need to write a more complex condition, you can use the Custom Filter area and keep the pandas syntax. Consider that your variable that contains the dataframe (output table) is called df.
(df["COLUMN_A"] > 2) & (df["COLUMN_B"] < 10)
In code terms, the above represents this part of the structure
df = df[<YOUR_CONDITION>]
Formula#
It allows performing operations on one or multiple columns, either updating the value of a column, or creating a new column.
Output Column: Choose the column on which you want to operate. By default New Column is selected to add a new column with the expression you enter.
New Column Name: Assigns a name to the new column, only if Output Column has the New Column option selected.
Sentence: allows you to enter any sentence following the pandas syntax.
Note
Consider that your variable that contains the dataframe (output table) is called df. Here you can use any Built in <#>`_ library of ``Python. Some examples of what you could write:
# Math operation
df['ANY_NUM_COLUMN'] * 20 + df['ANOTHER_NUM_COLUMN'] * .5
# Rounding a numeric column
round(df['ANY_NUM_COLUMN'], 2)
# Extract the start of a month
df['DATE_COL'].dt.strftime('%Y-%m-01')
# Fill null values with an empty str
df['ANY_COLUMN'].fillna('')
# Incremental counter per group
df.groupby(['ANY_COLUMN']).cumcount()+1
# Anonymous function to categorize positive and negative numbers
df['ANY_COLUMN'].apply(lambda x: 'POS' if x >= 0 else 'NEG')
In code terms, what you write represents this part of the structure:
df[OUTPUT_COLUMN or NEW_COLUMN] = <YOUR_SENTENCE>
Sort#
Sort values by one or multiple columns with ascending or descending option
Unique#
Extracts all unique (different) values from a column
Dtype#
Change data types or rename one or multiple columns at once
Column#
Select or remove columns from an output table. It also allows you to rename columns
IsIn#
Returns all rows where an element of the source is contained in values
Drop Duplicates#
Remove duplicate rows by one or more columns
Cleansing#
Perform preconfigured cleanup tasks on selected columns
Replace Nulls: with blanks or zeros
Remove Unwanted Chars: espacios al inicio o final, tabuladores, saltos de línea, espacios en blanco duplicados, todos los espacios en blanco, letras, números
Modify Case: Upper|Title|Lower case
Switch#
Allows you to perform multiple conditions and assign a value to a new column based on the expression.
Warning
The conditions are evaluated in order, if the first one is fulfilled, the next one will not be evaluated.
Note
The equivalence of this node, corresponds to writing this function in pandas (Python)
def rule(x):
if x <= 10:
return '1 a 10'
elif x <= 20:
return '11 a 20'
elif x <= 25:
return '21 a 25'
elif x <= 30:
return '26 a 30'
else:
return '30+'
df['NEW_COLUMN'] = df['ANY_NUM_COL'].apply(rule)
JOIN#
Cross#
Generate a matched join of each row in the first table with each row in the second table. This type of join is also known as a Cartesian join.
Multi Join#
Warning
This functionality is not yet implemented, soon available.
Merge#
Join tables database-style, specifying one or more keys to join and selecting which type of join you want to perform left (L), inner (J) right (R)` ` and ``outer (F) (You can select more than one at a time).
Note
The outer (F) option will add a indicator column called _merge containing a left_only, right_only or both categorization depending on the occurrence in each of the input tables.
Concat#
It allows concatenating two tables on the index axis, that is, it adds the rows of input B, after the input table A
AGGREGATION#
Group By#
It groups an input table by one or multiple columns, allowing you to select a preset aggregation method according to the type of data, for the columns you want.
Note
Available aggregation actions
Summary: sum, count, count distinct, count null, min, max, first, last
String: count blank, count non blank, mode
Pivot#
Rotate the orientation of a table by moving fields from the vertical axis to a horizontal position, and add an aggregation method for a given column.
Note
Aggregation methods vary by data type
Numeric: sum, avg, count (without nulls), count (with nulls), percent row, percent column, total column, total row
String: count blank, count non blank, mode
The
Add Marginoption adds full summarization for rows and columns*
Melt#
Converts wide-format data to long-format data, by compressing columns into a single list
Cumsum#
It performs operations associated with the accumulation of numerical variables, with the option of segregating the accumulation by any grouping column. Specifically you can perform these operations:
cum count: incremental numbering
cum sum: incremental sum
cum pct: cumulative percentage
pct: percentage of the row with respect to the total
PARSER#
DateTime FORMAT#
Transforms the data type of a column with date format, using pre-established formats. Performs two operations depending on the type of input data:
(In) String: returns a column of type
Datetime(In) Datetime: returns a column of type
String
Note
If you require any custom settings not present in the default formats you can add your own by following Python nomenclature
Key |
Value |
|---|---|
%j |
Mon (abbreviated day) |
%A |
monday (day) |
%d |
01 (day | 01-30) |
%B |
january (month) |
%b |
jan (abbreviated month) |
%m |
01 (month | 01-12) |
%y |
23 (year without century) |
%Y |
23 (year with century) |
%H |
23 (time format 24 | 00-23) |
%I |
10 (hour format 12 | 01-12) |
$M |
59 (minutes | 00-59) |
$S |
59 (seconds | 00-59) |
DateTime Take#
Extracts parts of a date or datetime field, default options are getting date, time, year, month, day , hour, minute, second.
SCRIPT#
Code#
Lets you write snippets of Python code, keeping a few considerations in mind:
Note
The input ports of this box are dynamic, that is, if you connect a link to the first
INoutput, a 2nd one will automatically be created with the nameIN2with a limit of 5 input ports as maximum.Each input port will be available in your script with the name
df_in,df_in1,df_in2,df_in3,df_in4anddf_in5as aDataFrameofPandas.Only one output can be written, which must be a
Pandasdataframe, to rescue it in the output port call thevtg_codeout(my_df)function already existing in the scope of Python variables.Si requieres alguna librería que no esté incluida, al importarla se instalará automáticamente (solo la primera vez retrasará la velocidad de ejecución debido a la instalación).
# Get 10 samples from the dataframe linked to the first input port
df = df_in.samples(10)
# And the samples are written to the output of the code node
vtg_codeout(df)
TIMESERIES#
Inter Row#
Perform operations involving multiple rows at once, such as shift, diff, pct_change. You also have the option of performing the operation (shift, diff or pct_change) grouping by any column.
SUMMARY#
shape#
Returns the number of rows and columns in an output table.
Value Count#
Counts duplicate values, either from a single column or from the mix of multiple columns
Describe#
Generates descriptive statistics, including a summary of central tendency, dispersion, and the shape of the distribution for single or multiple columns (excluding NaN values).
Note
Additional options to complement and speed up data exploration.
Group by a certain column.
Add custom percentiles
CHART#
Warning
The in-flow graphs do not allow to graph more information than the one shown in the output table, which by default is limited to 50 rows (you can change this limit in the configuration of each node). If you need to see a graph of all the information, use the DataViz module
Bar#
Generates a bar graph within the same flow, being able to move and position it the same as any other node
Line#
Generates a line graph within the same flow, being able to move and position it the same as any other node
Pie#
Generates a pie graph within the same flow, being able to move and position it the same as any other node
Scatter#
Generates a scatter graph within the same flow, being able to move and position it the same as any other node
DataViz#
Warning
Comming soon
AutoML#
Warning
Comming soon
AutoTS#
Warning
Comming soon