Homework 3 on Using Grep
As I indicated in class, wget will not get the page you need for this homework. Download the displayed page manually (which is dynamically generated) unless you know how to find the correct URL using analyzers such as charles or whichever you are familiar with.
- Download the index page manually saving it.
- Extract the lines that contain the strings with information: titles, user, upload_time, duration, views
- Save that lines to clips.txt
- Construct a csv table with five columns: titles, user, upload_time, duration, views
- Use while, read, expr, cut and paste commands, as discussed in class.
- First extract titles and save to titles.txt
- Do the same for the remaining four fields, save them to four files: users.txt, upload.txt, duration.txt, views.txt
- Now you have files each of which containing useful information.
titles, user, upload_time, duration, views
- Use paste command to construct a CSV table of five columns: titles, user, upload_time, duration, views
- Construct a csv table with five columns using grep with look-ahead '(?=)' and look-behind '(?<=)' and '\K.' Create five seperate txt files using five sepereate grep commands. Use paste command to construct a CSV table of five columns.
- Construct a csv table with five columns using one-liner sed with backreferences, as demonstrated in class.
For those live videos or movies that do not have all five fields, you can let them go for now. Collect data as much as you can. We'll come back with a sophisticated air-tight method that can handle all these abberations, outliers, missign cases, etc.
Again, as I said in class, this is just for a quick way of checking, rapid-prototyping if something can be done with a minimal effort. Then and only then can you make a decision on whether to proceed futher.
What to submit:
- execute expr-based script, display a CSV table, take a screenshot of command execution and the table
- repeat 1 using grep-based script
- repeat 1 using a one-liner sed
- prep a tgz file of three source code/scripts and three screenshots
Do not include the html file.