I can currently scrape Javascript data from a post request I sent using requests then Soup. But I only want to scrape the product plu, sku, description and brand. I am struggling to find a way in which I can just print the data I need rather then the whole script. This is the text that is printed after I extract the script using soup. I will be scraping more than one product from multiple post requests, so the chunk idea is not really suitable.
<script type="text/javascript">
var dataObject = {
platform: 'desktop',
pageType: 'basket',
orderID: '',
pageName: 'Basket',
orderTotal: '92.99',
orderCurrency: 'GBP',
currency: 'GBP',
custEmail: '',
custId: '',
items: [
{
plu: '282013',
sku: '653460',
category: 'Footwear',
description: 'Mayfly Lite Pinnacle Women's',
colour: '',
brand: 'Nike',
unitPrice: '90',
quantity: '1',
totalPrice: '90',
sale: 'false'
} ]
};
As you can see it is far too much information.
How about this:
You assign the captured text to a new multiline string variable called "chunk"
Make a list of keys you are looking for
Loop over each line to check if the line has a term that you want, and then print out that term:
chunk = '''
<script type="text/javascript">
var dataObject = {
.........blah blah.......
plu: '282013',
sku: '653460',
category: 'Footwear',
description: 'Mayfly Lite Pinnacle Women's',
colour: '',
brand: 'Nike',
..... blah .......
};'''
keys = ['plu', 'sku', 'description', 'brand']
for line in chunk.splitlines():
if line.split(':')[0].strip() in keys:
print line.strip()
Result:
plu: '282013',
sku: '653460',
description: 'Mayfly Lite Pinnacle Women's',
brand: 'Nike',
You could obviously clean up the result using similar applications of split, strip, replace, etc.
Related
I want to get only those element from notes Array that match the given tag from tags array in notes.
For example : If I want to get only those notes from notes Array having lrn tag in tags array but I am getting either the whole object or only the first matching note.
My Schema :
const mongoose = require('mongoose');
const noteSchema = new mongoose.Schema({
user : {
type: String,
required: true,
unique : true
},
notes : {
type : [{
title : {
type: String,
required: true
},
tags : {
type : [String],
required: true
},
note: {
type: String,
required: true
}
}],
required: true
}
});
const Notebook = mongoose.model('NOTEBOOK', noteSchema);
module.exports = Notebook;
My sample data :
{ _id: ObjectId("621c3a41fa2fe3c07f43edc9"),
user: '620c81434d8a65c1aa36e0d4',
notes:
[ { title: 'Eraning',
tags: [ 'ern', 'lrn', 'dik', 'pik', 'sik' ],
note: 'Prior to his return, Craig worked at NeXT, followed by Apple, and then spent a decade at Ariba, an internet e-commerce pioneer where he held several roles including chief technology officer.\nPrior to his return, Craig worked at NeXT, followed by Apple, and then spent a decade at Ariba, an internet e-commerce pioneer where he held several roles including chief technology officer.\n',
_id: ObjectId("62205a32bab7aed6315195e9") },
{ title: 'Lerningngs',
tags: [ 'lrn', 'eat', 'drink' ],
note: 'Prior to his return, Craig worked at NeXT, followed by Apple, and then spent a decade at Ariba, an internet e-commerce pioneer where he held several roles including chief technology officer.\n',
_id: ObjectId("6220583cbab7aed63151958b") },
{ title: 'Learning',
tags: [ 'learn', 'eran', 'buy', 'sell' ],
note: 'Prior to his return, Craig worked at NeXT, followed by Apple, and then spent a decade at Ariba, an internet e-commerce pioneer where he held several roles including chief technology officer.Solving DSA from Maths.in and SolveMaths.org using great techniques by INDIA and RUSSIA.\n',
_id: ObjectId("6220532dbab7aed631519531") },
{ title: 'Biology',
tags: [ 'tissue', 'cell', 'myto', 'energy', 'glycogen', 'hydrogen' ],
note: 'Prior to his return, Craig worked at NeXT, followed by Apple, and then spent a decade at Ariba, an internet e-commerce pioneer where he held several roles including chief technology officer.Solving DSA from Maths.in and SolveMaths.org using great techniques by INDIA and RUSSIA.\n\n',
_id: ObjectId("621f88d4c189c6b5501c3d5f") },
{ title: 'Chemistry',
tags: [ 'this', 'is', 'great', 'thing', 'to', 'do', 'by', 'time' ],
note: ' Prior to his return, Craig worked at NeXT, followed by Apple, \n and then spent a decade at Ariba, an internet e-commerce pioneer \n where he held several roles including chief technology officer.\n',
_id: ObjectId("621f85e9c189c6b5501c3d49") },
{ title: 'Physics',
tags:
[ 'newtonlawsofmotion',
'ktg',
'thermo',
'fluid',
'mechanice',
'bernaulii' ],
note: 'P\' + hrg + 1/2rv^2 = constant',
_id: ObjectId("621f8520c189c6b5501c3d3f") },
{ title: 'Maths',
tags: [ 'trignometry', 'llp', 'continuity', 'tags', 'circle' ],
note: 'Solving DSA from Maths.in and SolveMaths.org using great techniques by INDIA and RUSSIA.',
_id: ObjectId("621f84d7c189c6b5501c3d37") },
{ title: 'DSA from Net',
tags: [ 'dp', 'stack', 'queue', 'heaps', 'sorting' ],
note: 'Solving DSA from Leetcode.in and GeeksforGeeks.org',
_id: ObjectId("621c3a41fa2fe3c07f43edca") } ],
__v: 7 }
You can use $filter to get documents of given tag as shown below:
Notebook.findOne({
user: "620c81434d8a65c1aa36e0d4" // user ID
},
{
notes: {
$filter: {
input: "$notes",
as: "note",
cond: {
$in: [
"ern", // Tag to search
"$$note.tags"
]
}
}
}
})
Mongo Playground
I have been having a hard time while trying to sort out a problem on Tag Manager. I would be so grateful if you could help me, I am totally lost.
I have to implement a sales tracking tag for a prices comparator website we want to be referenced in.
I found the tag tracking pattern provided by the company, however, I do not manage to get the same result. This the tag available on the website that I am asked to implement :
var _kkstrack = {
merchantInfo : [{ country:"COUNTRY_CODE", merchantId:"COMID_VALUE" }],
orderValue: 'ORDER_VALUE',
orderId: 'ORDER_ID',
basket: [{ productname: 'PRODUCT1_NAME',
productid: 'PRODUCT1_ID',
quantity: 'PRODUCT1_QUANTITY',
price: 'PRODUCT1_PRICE'
},
{ productname: 'PRODUCT2_NAME',
productid: 'PRODUCT2_ID',
quantity: 'PRODUCT2_QUANTITY',
price: 'PRODUCT2_PRICE'
}
]
};
(function() {
var s = document.createElement('script');
s.type = 'text/javascript';
s.async = true;
s.src = 'https://s.kk-resources.com/ks.js';
var x = document.getElementsByTagName('script')[0];
x.parentNode.insertBefore(s, x);
})();
I implemented this tag, I created my variables that were based on my datalayer that is to say : "ecommerce.purchase.products.0.id" for example for productid.
However, it did not work, I mean I got duplication, for 2 different products ordered, I ended up having the same title, id for both. The 2nd item had the same features as the first.
So I tried a 2nd solution :
For each variable I used custom Javascript (Kelkoo - purchase corresponds to a datalayer variable "ecommerce.purchase" ) This is what I wrote :
function() {
var products = {{Kelkoo - purchase}};
return products.map(function(prod) { return prod.price; });
}
Unfortunately, another problem has arisen : I got all the products datas next to each other separated by a coma (productid : 34756, 8546) Like below :
basket: [{ productname: 'PRODUCT1_NAME', 'PRODUCT2_NAME',
productid: 'PRODUCT1_ID', 'PRODUCT2_ID',
quantity: 'PRODUCT1_QUANTITY', 'PRODUCT2_QUANTITY',
price: 'PRODUCT1_PRICE','PRODUCT2_PRICE'
}
]
};
I would like to return the datas following this format
basket: [{ productname: 'PRODUCT1_NAME',
productid: 'PRODUCT1_ID',
quantity: 'PRODUCT1_QUANTITY',
price: 'PRODUCT1_PRICE'
},
{ productname: 'PRODUCT2_NAME',
productid: 'PRODUCT2_ID',
quantity: 'PRODUCT2_QUANTITY',
price: 'PRODUCT2_PRICE'
}
]
How can I do, do I have to build an array ?
I thank you for your attention on this matter.
Thanks
As far as I understand, you already have a dataLayer with product data, that is matching the required content and data structure of Enhanced Ecommerce for Google Analytics. You would like to transform the product data into the format used in the basket key of _kkstrack variable.
Based on the Enhanced Ecommerce Developer guide, enhanced ecommerce purchase data has the following format.
dataLayer.push({
'ecommerce': {
'purchase': {
'actionField': {
'id': 'T12345', // Transaction ID. Required for purchases and refunds.
'affiliation': 'Online Store',
'revenue': '35.43', // Total transaction value (incl. tax and shipping)
'tax':'4.90',
'shipping': '5.99',
'coupon': 'SUMMER_SALE'
},
'products': [{ // List of productFieldObjects.
'name': 'Triblend Android T-Shirt', // Name or ID is required.
'id': '12345',
'price': '15.25',
'brand': 'Google',
'category': 'Apparel',
'variant': 'Gray',
'quantity': 1,
'coupon': '' // Optional fields may be omitted or set to empty string.
},
{
'name': 'Donut Friday Scented T-Shirt',
'id': '67890',
'price': '33.75',
'brand': 'Google',
'category': 'Apparel',
'variant': 'Black',
'quantity': 1
}]
}
}
});
You have several options to reference this data, either by pointing the whole ecommerce object to a dataLayer variable in GTM (and later referencing its keys in your code), or pointing directly to product arrays. (I'm using this latter so dataLayer variable name will be ecommerce.purchase.products)
Assigning products to {{DLV - Ecommerce purchase products}} GTM variable name, you can create a custom JavaScript function to make the necessary transformations.
function() {
return {{DLV - Ecommerce purchase products}}.map(function(e) {
return {
productname: e.name,
productid: e.id,
quantity: e.quantity,
price: e.price
};
});
}
In your function you loop through all the product array elements, and return existing data under different keys, which is requested by this other tool. You can now use this custom JavaScript variable in your Custom HTML tag, where you set the content of basket variable.
Obviously, you can use your own variable naming convention, as long as you refer it in you custom JavaScript variable.
I hope I understood your question well, and you can solve your task with this answer.
Using the Moltin Javascript SDK and able to retrieve products and add them to my cart. However when I try to checkout the cart and process a payment I get a 406 Not acceptable returned.
This I have stripped out all my two way binding and used the code snippet from Moltin's site but still not result.
Link to the entire project as well https://github.com/humbm0/ecommerce-site.
Thanks in advance!
angular.module('ecommerceSite2App')
.controller('CheckoutCtrl', function ($scope, checkout, moltin) {
$scope.items = checkout.cart.contents;
$scope.createOrder = function() {
var order = moltin.Cart.Complete({
gateway: 'dummy',
customer: {
first_name: 'Jon',
last_name: 'Doe',
email: 'jon.doe#gmail.com'
},
bill_to: {
first_name: 'Jon',
last_name: 'Doe',
address_1: '123 Sunny Street',
address_2: 'Sunnycreek',
city: 'Sunnyvale',
county: 'California',
country: 'US',
postcode: 'CA94040',
phone: '6507123124'
},
ship_to: 'bill_to',
shipping: '1305214549095350548'
});
console.log(order);
var checkout = moltin.Checkout.Payment('purchase', order.id, {
data: {
number: '4242424242424242',
expiry_month: '02',
expiry_year: '2017',
cvv: '123'
}
});
Have you checked that you have enabled the dummy gateway from the forge dashboard? - https://forge.moltin.com/gateway
The second thing, instead of using the shipping method ID could you try using the shipping method slug instead?
If you need more help you can also request an invite to join our community slack channel where you can speak to other users and the rest of the moltin team. https://moltin.com/community
I have a complex JSON response that is iterated over using ng-repeat. Only a relatively small subset of the attributes within the result set are displayed on the screen, so filtering of the results should be restricted to values the user can actually see, otherwise the filtering behavior would be confusing to the end-user.
Since one of the attributes I wish to filter on is a deeply nested array, a custom filter was needed since the built-in AngularJS filterFilter does not iterate over the array elements to the best of my knowledge.
I was able to get this working some time back in AngularJS v1.2.28, but unfortunately it appears to break during a migration to v1.4.3. I have not spent time to isolate where in the release cadence this functionality broke however.
I have not found any helpful information in the migration guides that would indicate what has changed. All I know is that the actual/expected parameters to the filter receive different values in the latest major version of AngularJS, which leads to the failure.
ng-repeat filter expression:
<li ng-repeat="user in users | list_filter:{establishment: {id: filterText, names: [{name: filterText}], locations: [{streetAddress1: filterText, streetAddress2: filterText, city: filterText, stateProvince: filterText, postalCode: filterText}]}}">
Example data structure of a single element:
data = [{
id: 234567,
name: 'John Doe',
establishment: {
id: 067915959,
locations: [{
id: '134B030365F5204EE05400212856E994',
type: 'postal',
streetAddress1: 'P O BOX 900',
city: 'Grover',
stateProvince: 'CA',
postalCode: '902340900',
isoCountryCode: 'US',
region: 'MONROE'
}, {
id: '999B030365F4204EE05400212856E991',
type: 'postal',
streetAddress1: '2590 Atlantic Ave',
city: 'Fredricks',
stateProvince: 'VA',
postalCode: '45487',
isoCountryCode: 'US',
region: 'MONROE'
}],
names: [{
name: 'Grover Central School Dst',
type: 'PRIMARY'
}, {
name: 'Grover Central School Dst',
type: 'MARKETING'
}, {
name: 'Grover CENTRAL SCHOOL DISTRICT',
type: 'LEGAL'
}]
}
}];
Supporting Plunker Examples:
Plunker for version 1.2.28:
http://plnkr.co/edit/KD1MmNMBEhO7X2v9yK4S?p=info
Plunker for version
1.4.3: http://plnkr.co/edit/OmPOOwRWCHuPutUtWOcC?p=info
Edit:
The issue appears to be directly related to the changes introduced in v1.3.6.
It appears the issue is related to the fact that an implicit AND condition is now being applied but was previously an implicit OR, which is what is desired in my case. You can import the old version as a separate filter, if the old behavior is desired.
I'm building a web scraper in nodeJS that uses request and cheerio to parse the DOM. While I am using node, I believe this is more of a general javascript question.
tl;dr - creating ~60,000 - 100,000 objects, uses up all my computer's RAM, get an out of memory error in node.
Here's how the scraper works. It's loops within loops, I've never designed anything this complex before so there might be way better ways to do this.
Loop 1: Creates 10 objects in array called 'sitesArr'. Each object represents one website to scrape.
var sitesArr = [
{
name: 'store name',
baseURL: 'www.basedomain.com',
categoryFunct: '(function(){ // do stuff })();',
gender: 'mens',
currency: 'USD',
title_selector: 'h1',
description_selector: 'p.description'
},
// ... x10
]
Loop 2: Loops through 'sitesArr'. For each site it goes to the homepage via 'request' and gets a list of category links, usually 30-70 URLs. Appends these URLs to the current 'sitesArr' object to which they belong, in an array property whose name is 'categories'.
var sitesArr = [
{
name: 'store name',
baseURL: 'www.basedomain.com',
categoryFunct: '(function(){ // do stuff })();',
gender: 'mens',
currency: 'USD',
title_selector: 'h1',
description_selector: 'p.description',
categories: [
{
name: 'shoes',
url: 'www.basedomain.com/shoes'
},{
name: 'socks',
url: 'www.basedomain.com/socks'
} // x 50
]
},
// ... x10
]
Loop 3: Loops through each 'category'. For each URL it gets a list of products links and puts them in an array. Usually ~300-1000 products per category
var sitesArr = [
{
name: 'store name',
baseURL: 'www.basedomain.com',
categoryFunct: '(function(){ // do stuff })();',
gender: 'mens',
currency: 'USD',
title_selector: 'h1',
description_selector: 'p.description',
categories: [
{
name: 'shoes',
url: 'www.basedomain.com/shoes',
products: [
'www.basedomain.com/shoes/product1.html',
'www.basedomain.com/shoes/product2.html',
'www.basedomain.com/shoes/product3.html',
// x 300
]
},// x 50
]
},
// ... x10
]
Loop 4: Loops through each of the 'products' array, goes to each URL and creates an object for each.
var product = {
infoLink: "www.basedomain.com/shoes/product1.html",
description: "This is a description for the object",
title: "Product 1",
Category: "Shoes",
imgs: ['http://foo.com/img.jpg','http://foo.com/img2.jpg','http://foo.com/img3.jpg'],
price: 60,
currency: 'USD'
}
Then, for each product object I'm shipping them off to a MongoDB function which does an upsert into my database
THE ISSUE
This all worked just fine, until the process got large. I'm creating about 60,000 product objects every time this script runs, and after a little while all of my computer's RAM is being used up. What's more, after getting about halfway through my process I get the following error in Node:
FATAL ERROR: CALL_AND_RETRY_2 Allocation failed - process out of memory
I'm very much of the mind that this is a code design issue. Should I be "deleting" the objects once I'm done with them? What's the best way to tackle this?