April 8, 2020
Grofers web scraping using python

Hello Hello, okay. Hello friends. Good morning Ronnie again, so Steph discussed today I’ll be going to show you how you can Do both scraping from office.com website It also won one of the popular websites mostly like customers buy groceries from household stoves or some other I can feel like What you say like products or something related to grocery items for like For mostly domestic and commercial purpose for example if I open the website over here it looks like this so they also have He was all large number of products in inventory. It should be like interesting for us to level up our future data set or data model, when will be going to start our like some Some decision-making tasks on machine learning or deep learning or some data analytics related activities so we Can we can force? Okay the first one so we will be going to scrap the total like number of products their Price I think this will be interesting Okay, so under each product there could be different or multiple number of variant. For example, this is having Two variants and for example, this product is only one variance. So this will be also interesting. We’ll see how this is actually like Segregated Over this one product category. So what will do he will be going to? Scrap maybe name-brand. I’m not sure whether all products is having brand Yes not having brand some of them are having brand cuz there is no information about brand So we’ll see like discount Actual price displayed price I’m not sure. Maybe you need whatever they’re Possible will be going to scrape from this pesticide. So let’s see how it is actually how it looks So for example, let me minimize things a bit By the way, my body’s already been dead. Yes students in the class Okay, fine, okay if you’re having any question just Send me a chart over. Let’s Live Meeting. Okay? Okay, perfect Okay, so let’s Okay, I Believe this is kind of infinite scrolling and inside that there will be some JavaScript which is sending some HTTP requests with Heather and parameter and this is star doing This is are doing the actual talk So it will be not like our last video like whatever we did with mantra dot-com will be having some Static pagination with page equals to one page equals tutorial. This will be a bit different and then it will be like interesting Yes, we will face some difficulties will be more error or the more exception will be hit on trial But anyways, I am going to perform this like don’t worry so Let’s see first if something is there or not, I believe it will it will be not there. Okay It’s there but this is not like that maybe it is coming from yes it is maybe coming from some This thing okay then Going here Something will come here You see You think we got some requests? Okay, so this is the main thing by which actually where It’s larger, okay. This is the request URL Okay This is the parameters or this is the header and this is the query parameter. Okay. Let’s check the string response. Okay See everything came over here Click wrong place Okay, so let’s target this one so copy as called So I’ll show you maybe then This is the call You can write your own HTTP request from request using request Library Using Python, but again, I’m not going to write the whole code again, so there is one very good Website, I’ll show you there if you just put the curl command It will convert this as a Python like thing just to just avoid this typing like this is all everything is here actually if you see C Encoding this all these are from the chrome. So it is just replace this header equals two and this inside this and again, This is the plain piece of code see This is normal. So we are going to use that. Don’t worry So just start it Did fine Okay, I will import another one for the later part Some you for files thing. Okay. Okay, fine. Let’s start maybe Print Response so it should return 200. Let’s run it first Means the response is cutting Okay extra to us, I think you’ll see it we Okay strange Some encoding needs to be done Something strange, right? Okay Okay, I think I need to maybe Okay, cool, so I think some of them are not required at all, let me check first This is also not cracked let me check first Yes this coming I think this is also not required this is also not required Nothing let us keep the other three this will be important but I believe only this is required we can comment out if – also, but again, maybe some Gum not sure maybe we will discuss this later. But as of now, this is our whole data, but but but we just only okay, this is Category 16 this will start from 96 number product. Number 96 we until next one means 48 Okay, but interesting thing is that we got our count so it should be 0 to 337 we need to run our like Requesting another another time just to make sure that we doing a single request we get hold products. We should get the whole data So what I will do One two maybe Pawns Text Let’s print this fast this data think this will be interesting It’s okay, I haven’t declared defined this thing that is worth throwing here. I will do it now. So I Think I should make it a dark then I think I should use this meta Fine Thing let’s copy this one here Should return Result count so count Equals 2/3 I think we should use this one. This is fine. This will do our job. I Think this is fine. So now we can run this Thing Copy this need to run this one more time Just to make sure that our parameters Updated with This Is another one it was one, okay and Theta one equals two with Jason dot loads Was 1.16 Turn SS let’s print this once more later one Let’s run it. Let’s see if yes, everything is fine Yes, everything is fine so now you need to Okay, I think Then the products part is actually starting from here See with the products from here so I think make sure that we should Get another data to is Theta one then think result And then Print Then it should come from here starting from Reading let’s check that see that it’s been interested in this part. I Think yes right to bring data to Sorry soil friends. Oh I think this is not required anymore. This is not required. Any we need? all the data Fine so now Let me write this to one file and we’ll open that and after that we will see Let’s let’s check one thing. What is the length of this thing? It should return Print It should return three hundred and seven items like But it is returning 211 not sure why Cause we have like you’re having like data to Is step length of okay understood You need to sorry It’s for the timing just Something strange, I’m not sure. What is this Okay, I made a mistake like my start count is 96 over here. So that is why it should be zero soy Fine so now it is returning on the product. So It was fine Let’s write this in a file then we’ll open that Maybe name it as Records dot txt Then This one dot right to Not equals to it string In changing this is not Nam string then Move on dot close friend. If we do not make it as a string then you can see like will get some middle So I’ll show you know This is not string right this diction dictionary object. So let’s turn it. No, sorry I I should have all done my data to Coffee Right argument must be stretching no dictionary. So we need to convert this other string then there is Then this now this will be fine Let’s first I Think if you’ve done the file should be here somewhere let me refresh cause I’m using clip so we have to repress every time Oh case created inside kafirs underscore scoping so I can go to my trips workspace down here Then let me open this text to date Okay This is the right fine thing Get me Christopher Let me open it bit This is the file if you see Okay Now you understood that as I told you some of the product er Products are having 10 kg 5 kg maybe another variance like 2 kg and some of them are having not having any variant So our important part is inside this thing like variant will be the key and inside that isn’t the whole like value there so like this You’ll see mmm The name of the product the discount the price everything is listed here There is something called explicit discount is also there the main markup is also here If you go like this then the next to have you get started see 44.4 eating these days 21. Okay. This is for 5 kg. This is 34, okay And for this thing the discount is 20% it is right coz 1 35 is the price discount is 20% MRP should be there somewhere discount amount. Yes. Everything is there If you see so our main part is this one like inside the variant we have to Get everything some of the variants are having one element some of them as having two element. Some of them are having three elements So depending on that we need to scrape so Let me close this most So till now everything is fine, so I think we should not write this And we should Escape this part as well and of no it is not required May be here. So So, let’s create another thing I’m creating too many variables, but don’t worry I’m not following any modular approach for this video and maybe upcoming videos we are going to start the oops concept class method and other things and then we’ll be going to use all this like Procedure like in each and every program like you can apply those things. But again, please remember it’s only functional programming the logic will be same only we are going to make those program more modular like more like It will be easier for understanding and it will be easier for like like some times from some space like with respect to memory optimization and as well as Coding will be less cost will be will be going to use the same method again and again from by calling What do you say like instantiation tinda objects of different class So but as of now if I were going to start with class object and methods and constructor then it will be difficult for some of the student to understand because they just came from Some of them are very new to like this computer programming. So I am just just focusing on the Python as a functional programming language Maybe next month. I am going to start our object-oriented concept in Python And after that you guys can use your own approach to convert this program from normal functional programming or functional according to modular approach, but It is same the logic will be same always Okay, so it will be data too. And then for example How many elements is there so our loop will be like if I say For I in range I think it will be counter Data three Data, I believe so this will this is this will print all the element 1 by 1 round At line so you see if it is doing it actually or not See it is it has completed this job now data 4 equals 2 Delta 3 And variant info It will mix down the Everything then if I skip this so now it will start from here as you see Let’s bring this first I think we’ll get some mirror. Of course. I have completed this part See the variant info vault is gone, but now the interesting thing Counter one equals to rent off this Data for and we’re going to run another for loop inside that for J in Range Yes, please understand just because there are much different elements inside the variant So we need to identify how many elements are there based on that. We’ll loop this for example if I just randomly select Anything so it will hit like index not found or index out of the range zero, so we are just Making we’re just making sure that the loop should run the exact number of times like with this should be same as the number of like Elements in this inside the variant so next if that data five is equals to Beta 4 see So it will also now start it it should it will start listing It will start so this let’s let me just Bring this. Let’s see. What is happening over here Notify you turn this first Fine so now we need to identify our things like what are the things we need so So if I open this file like Okay, I need to Use from here no other option because I’m okay. So one thing is reading that is fine Unit is one thing is kgs There is something called a space SVC discount and SVC price, which is listed here you also You know also take this one Then I am thinking I think this is not required. So one thing is you need one thing is SVC price Okay, the two thing is we see the score price Okay, let’s see. You need this be some size three things. We already listed they were Lifted three things already three things are listed here So let’s find most of them are like maybe they’re using for some internal purpose. I Think I can let me see if I can’t do some Yes, I have seen something called offer Yes offer if their price is there may be a market will be there also post Yes, M articles also there so fine perfect Okay, what be the product me let me check first maybe I Think grow forth mother choice, but to the flower I think line underscore one will be the product. I’m fine You got everything so let me copy this So product on the item name I Like this this fine then Writing we got the package and everything first Ready then I think you need You need then writing price then MRP then MRP Its MRP then actual price Price Okay fine now what else setting offer And something called SVC offer you can a Special discount coupon to something this mixing underscore offer Let’s check if you left a field of corn offer It if you offer okay fine great, so it will be This then SVC underscore for you sniffing underscore offer. I Think we have created This thing no, I don’t like to print this like for example Data line equals two I think we can this thing in this line as well directly like for example But again, you need to again convert this into string anyways let’s make it like This is not required I think immortal will be there Then The market price I Think MRP you need a market price now offer Then ACC underscore offer s basing on the spot I Think this is fine Know if I Did the line I think I think we need to make sure that this printing the new line So this will be like this So if if everything is fine It’s children unexpected inland And it is okay While respected Not sure Okay Away Put you So much whoo, hey teasing indent is bad Okay, I’m getting late maybe let’s keep this like this Name You hello Str you need a multi Pricing it is priced actual price offer SPC offer Go for This PC underscore often I Think they should run though But I believe everything will come in the same line so Just you know One not sure what if the authority to correct minded Would happen you You could cannot decode by itself, okay Let’s check Okay made a mistake Name rating weight price Actual price buffer, I think this is printing this is melting okay Right But I believe The Marquee actual price is busy offer Okay But why there is to this theme like Listen Okay, fine, so data line is here so we need to just Create a new file with the header And let me open this now we’ll need this here maybe Let me create one file with write mode and then inside that I will put all my headers like Item Name Comma In it comma 13 comma I think I marked the actual price then MRP actual price. I Think we missed the offer part It was after actual price Or no, I think No, it’s there of the final Offer and this missing offer of Awful slash and Nothing just to make sure that the painting should start from the next line again and Then fine will be closed And then we’re going to open the same file, but this time in happen mode Maybe we can do it over here I Think that will be data Line one So Every time this loop runs it will print close and it will go out of the loop Like this this will happen So, I believe this was done this was done maybe just to make sure Print like completed and just to again heat ones anyway, so Maybe I’m just deleting this file So that it will be easier for us to understand that whether it is actually scraping all the records or not Let’s run this first I Think completed, let’s check first reference my refreshing Okay, got to file not sure what was that I got dot txt one other one is Okay but what was It was 40 bytes, okay Not sure why the data then came over here Actually it printed the Only the last line Okay, I made a mistake It should be which append mode. Okay? so This also a new learning right as I told you. I don’t need to open it as I’d say it will append So first time we are opening as the right motive the file is not that it’ll create a new files and after that we name Is a pen Now it is I believe This actual CSB But it didn’t write this thing like The header Peter was gone Not sure He’s not sure wait wait the heater Ada should be there We first identify with this name of the header and Then we opened the same piling up in mode with this Show ideally should be there but it is not coming I’m not sure what happened Let me try to run this again but this time just Will not We’ll just open the file and close but let’s see It is not writing the header over here Okay the main problem is that I have another loop open here and there actually, I’m just This is actually creating the problem. I forgot to close that one So this is fine. This will work now sometime like yes Something this will be messy. You have to understand why Ideally it should work though See everything came over here. You can open it in Excel. I’m having Excel like blank Like next time we’ll show you how to open directly in Excel using Numpy soar pandas are opened by X and some Python libraries. So Like delimited Turn next then they believe it will be comma Okay Everything came over here Okay Some problems is some problem out there some problem some problem Everything came in the same line. I was I Was doing the correct thing but it was showing the wrong thing in the console maybe knowing not sure why so We need to just find it should work now Let me open the file again Now everything is it’s nice now looks nice Finish Well see Title name you need Frattini mark P actual price offer an HPC offer the whole item inside Grocery and staples came Why this is not 3:06 anyone Because we have taken the count should be 307 No, this is not right as I told you like some of the products are having multiple variant. For example, if you see two if you see two if you see this is also to See, this is also 2. This is also 2 it depending like one is 10 kg 1 is 5 kg So that’s why 3 This is 2 in 200 kg this is 2 into 200 kg sorry 200 gram not one deck See difficulty 200 gram 500 gram 1 kg so 3 variants, right? So pretty cool nice guys, so in this way, you can do your Like scraping over different websites, but again, this is completely for educational purpose I’m not going to use this data anywhere guys I’m also going to delete this straight afterwards But I will publish the video definitely for the learning purpose Make sure you should not use this data should not distribute this data This is the complete intellectual property for some company or some website or whatever Just it only for learning use it only for learning purpose. Don’t misuse it guys and Thanks for watching my channel Please do nothing to subscribe to share with your friends if you like the videos you can share whenever you get Opportunity for their Facebook whatsapp Instagram link, whatever My friend super to me also also going to start another feeling as I told you last as he’s going to start the new Tutorial on machine learning and with Python for course contents are almost ready We are going to trying to start by end of April or first week of May. Please do visit our new website I’ll be going to Post everything in some videos maybe in upcoming videos, but as of now we just started the channel We are looking for your support you’re looking for more subscriber It will help us. It will motivate us to create some unique contents If you can see, we’re just only giving the unique context which is not available in the youtubes we know that there are multiple very good channels and there you also learn something from the YouTube like six seven months that I was Also in the same place like Vito in Python, but now I know something yes, not everything something you can say I know one person of Python, so I’m just sharing my whole like journey with you by Sharing the videos. So this is not like tutorial or anything similar to this there definitely do a tutorial but again this is a live workshop how you can do so this reduces long videos and Will enjoy it so next video maybe you are planning to do one big basket and Some other works that if you have anything similar you want us to do it Please let us know our Facebook page by comment by any anything like that We’ll do it plus make sure that you should subscribe to our channel with the bell icon So every time we upload any videos you will did note you should get the notification immediately But please as of now we need your support. We would like to grow we would like to grow bigger We do you would like to be part of the bigger audience And also if we grow you will be grow you will be going right So, please make sure to do like subscribe and share our videos as diff is the only fourth week I think that just on the 14th, and I think we’re getting good Responses from the audience and all so guys one more time Thanks for watching this videos or hopefully I’ll be uploading more videos whenever I get time this week I was having like kind of Basically there was some vacation going on over Germany. So I was I was not going office for last 3-4 days So I just publish all those videos But yeah, I’ll make sure that I’ll post more videos in coming weeks Or maybe if I get some time tomorrow or day after tomorrow, I’ll be doing that But if my friendship with them is going to share more videos with you by today or tomorrow This is also like he helped me with this idea as well. So Unfortunately, he is right now in some Customer meetings, but again, I asked him to post his Videos soon. So guys once more thank you very much and over and out. Good day. Bye

