Software
I'm using Pentaho Data Integration 5.4
Input data & explanation
Input data from a file (simplified, there are more columns):
number name
1009 ProductA
2150 ProductB
3235 ProductC
ProductD
ProductE
1234 ProductF
7765 ProductG
4566 ProductH
ProductI
9907 ProductJ
The issue is that I had an Excel file format xlsx which has the data with merged cells, and for one value of id there are 1..n rows of values.
After converting that file to csv values for next rows (other than first) are missing, despite the one column which was not merged (see example id=3, id=6).
I'm generating a sequence using step Add sequence, the input is sorted the way it was originally stored in a file.
Steps to achieve the goal
Basically what I need to do is:
Find first non-null value that has sequence_number less than current_row.sequence_number
Concatenate the value from field name to that matching row
Keep scanning next rows with sequence_number higher than the last scanned
As stated before, there can be 1..n rows of values for such case.
Expected output
number name
1009 ProductA
2150 ProductB
3235 ProductC; ProductD; ProductE
1234 ProductF
7765 ProductG
4566 ProductH; ProductI
9907 ProductJ
My approach
I believe I'm able to do this in a loop, by using Analytic Query and calculating LAG(1) and then concatenating the column name for one row with null values and discarding other column values from null row - and then doing this in a loop (for like 20 times assuming this is maximum), but I do consider this a bad idea.
There are probably better ways to achieve this result using for example Java Script step with scanning the rows backward from current (based on sequence number), but I'm unaware of those functions, if they do exist.
How can I achieve this using Modified Java Script Value step, or any other efficient way without using a loop for entire content of the file until there are no empty rows?
To solve this, I would use Modified Java Script Value to save the last seen product and use this for all rows, and then use Group By to group the columns.
Introduction
Merged adjacent cells in Excel files are presented on the image below.
When opened as a plain text file, it actually creates gaps (data from merged cell is missing) for every row but first that contains the merged cell.
number name
1000/P um6p1
um1p2
um1p3
1500 um2p1
9823 um3p1
83424 um4p1
um4p2
um4p3
um4p4
21390 um5p1
While #bolav answer addresses the problem, there is a simplier and probably more efficient approach to this issue in Kettle.
Approach
In Microsoft Excel Input step go to Fields tab and mark Repeat option as Y for columns that store values in merged cells
Use Sort rows on number column because Group by step needs the input to be sorted
Group by on field number and aggregate name with Concatenate strings separated by as type and ; as value
From Pentaho User Guide:
Repeat If set to Y, will repeat this value if the field in the next row is empty.
Related
Slightly complex formating, but I hope someone can direct me in the right direction.
Refer to this Demo Sheet before reading as I'll be referring to it throughout this post.
What I'm Trying to Accomplish: I'm trying to 'Filter' the "Ongoing Sales" sheet based on which store is selected (G13). I have a formula that will automatically populate the rows in each branded section and would like to add another condition to that formula stating the following:
If 'G13' is found in 'Current Markdown Sheet'!$P$16:AC16, display/filter the corresponding column if the value's in said column are greater than 0.
My current code (Which is filtering 'SKU's based off of brand names and if its men's, women's, or kids):
=FILTER('Current Markdown Sheet'!$B$16:$B, REGEXMATCH('Current Markdown Sheet'!$A$16:A, "ASICS"), REGEXMATCH(LEFT('Current Markdown Sheet'!$C$16:C,2), " M"))
Here are some visuals if my explanation wasn't good enough đ:
^ Where I want the condition to go. (You can replace the &T(N("INSERT FILTER CONDITION HERE"))) ^
Some Rules to Follow:
You cannot alter the "Current Markdown Sheet" in anyway (Add data, remove data, etc).
I dont necessarily need it to show "All Stores", but if it's possible; Bonus points đ.
Best of luck everyone, and I thank you in advance!
Cheers!
Try:
=FILTER('Current Markdown Sheet'!$B$16:$B, REGEXMATCH('Current Markdown Sheet'!$A$16:A, "ASICS"), REGEXMATCH(LEFT('Current Markdown Sheet'!$C$16:C,2), " M"),INDIRECT("'Current Markdown Sheet'!$"®exextract(ADDRESS(16,MATCH(G13,'Current Markdown Sheet'!16:16,0)),"[A-Z]+")&"$16:$"®exextract(ADDRESS(16,MATCH(G13,'Current Markdown Sheet'!16:16,0)),"[A-Z]+"))>0)
Result:
Explanation:
-Using the MATCH() to find the column of the match word from the dropdown, this returns the column index. You then use the ADDRESS() to get it's exact cell address, then REGEXTRACT() to only get the COLUMN letter:
regexextract(ADDRESS(16,MATCH(G13,'Current Markdown Sheet'!16:16,0)),"[A-Z]+")
-Now that you have the column, you can use the returned column to filter those greater than 0. You can also use column for the INDRECT() to refer to the dropdown.
References:
MATCH()
Convert Column number to Letter
INDIRECT()
I am new at pentaho, I am using a step âMerge rows (diff)â and compare two tables. Problem ist that I dont know the key and value fields to compare of my origin tables, I can only read them in âJavascriptâ-step. Do you know any variants how to use such parameters in âMerge rows (diff)â-Step? Especially I am interested in âValues to compareâ, because I need two compare all columns in these two tables and the structure of the tables (for example column names) can change in database any time, so I will have always different number of fields in âvalue to compareâ.
Thank you for your help.
So as we know firebase won't let order by multiple childs. I'm looking for a solution to filter my data so at the end I will be able to limit it to 1 only. So if I won't to get the lowest price it will be something like that:
ref.orderByChild("price").limitToFirst(1).on...
The problem is that I also need to filter it by dates (timestamp)
so for that only I will do:
.orderByChild("timestamp").startAt(startValue).endAt(endValue).on...
So for now that's my query and then I'm running on all results and checking for that one row that has the lowest price. my Data is pretty big and contains around 100,000 rows. I can changed it however I want.
for the first query that gets the lowest price but all timestamps causes that the returned row might be the lowest price but not in my dates range. However this query takes ONLY 2 seconds compared to the second one which takes 20 including my code to get the lowest price.
So, what are your suggestions on how to do it best? I know I can make another index which contains the timestamp and the price but those are different data values and it makes it impossible.
full data structure:
country
store
item
price,
timestamp
just to make it even more clear, I have 2 inner loops which runs over all countries and then over all stores. so the real query is something like that:
ref.child(country[i]).child(store[j]).orderByChild("timestamp").startAt(startValue).endAt(endValue).on...
Thanks!
I have AngularJs tooltip for character counter where i am facing two problems.
1- Because of special character in the string its not saving it to db when copy paste below text in the text area.
2- It should only add 4000 character in the text area more than that should be escaped.
I have tried below code if i add more than 4000 character its going beyond 4000 and characters left is showing negative numbers in count and also failed to save.
How to resolve these problems ?
Ctrl.js
$scope.$watch('processDTO.processStatementText', function () {
if (!$scope.processDTO.processStatementText) {
$scope.processStatementTextTooltip = '4000';
}
else {
$scope.processStatementTextTooltip = 4000 - $scope.processDTO.processStatementText.replace(/[&<>"'\/]/g,'\r\n').length ;
}
});
main.html
<textarea rows="2" class="form-control"
ng-model="processDTO.processStatementText"
name="processStatement" id="processStatement"
placeholder="Process Statement" maxlength="4000" required
tooltip="{{processStatementTextTooltip}} characters left"
tooltip-trigger="{{{true: 'focus', false: 'never'}[processDTO.processStatementText.length >= 0 || processDTO.processStatementText.length == null ]}}"
tooltip-placement="top" tooltip-class="bluefill">
</textarea>
Text.txt
2. Upload Template Header â Risk Causal and Impact comments are color coded as mandatory but they are optional. Similary the Originating Source System Process/Risk/control ID
3. In the upload template, for any of the multi value fields, I use an invalid delimiter â:â, Eg: 13:7 , for some reason this changes the cell format to time format and thereafter I am not able to give any single values, its always converted to time format. Not sure if this is a training issue, but want to put it on the table to see if it requires any fix at all
4. In Upload, for the grid field length validations, the filter doesnât works on Row Number and Max Allowed columns.
5. ERH All levels added in the View End to End ERH screen â Sort and Filter doesnât works, After clicking on this field, none of the other filter/sort works on the page. Also the sort indicator(black triangle) is not visible, I believe the column width needs to be adjusted.
6. The Risk/Control reference id is seen on the Process Search grids, but when I search by Risk/Control, I donât see the corresponding control/risk grid having the ref id column however tool tip has it
7. The label change âOriginating Source System Process/Risk/Control IDâ is not done in View/Search Inventory tool tips of Process, Risk, Control grids, It has to be fixed for all the 3 searches âBy Process/Risk/Controlâ
2. Upload Template Header â Risk Causal and Impact comments are color coded as mandatory but they are optional. Similary the Originating Source System Process/Risk/control ID
3. In the upload template, for any of the multi value fields, I use an invalid delimiter â:â, Eg: 13:7 , for some reason this changes the cell format to time format and thereafter I am not able to give any single values, its always converted to time format. Not sure if this is a training issue, but want to put it on the table to see if it requires any fix at all
4. In Upload, for the grid field length validations, the filter doesnât works on Row Number and Max Allowed columns.
I have an html table with cells that can be edited when clicked on. I am trying to figure out the best method to change cell data in cells following an edited cell.
For example, say the table comes populated by random numbers or letters. When I changed a cell to "14" I want the cells after it to change automatically to 15, 16, 17,n+1..ect. Or if I entered "h" the following cells would change to i,j,k,l...z stopping at z.
The number one seems pretty easy as I could just create a loop and i++ for each cell. However, the letter one doesn't seem as simple. Would I need to create an alphabet array and find the edited cell letter within it then proceed to the end of the array inserting each into the follow cells?
This can actually be done with a fairly simple function call like this one:
function NextChar(c){
return String.fromCharCode(c.charCodeAt(0) + 1);
}
where c is the alphabetic character that is entered into the cell, passed as a parameter.
I can see this question was done quite some time ago, so this answer is more for people who make come seeking answers later.
I would make arrays with character sequences as you said and use the jQuery.inArray() API to detect which sequence the edited cells content is in.
Check it out: http://api.jquery.com/jQuery.inArray/