Homework 1

Due Date: 9/28/09
Points: 50

(1) a) Build a data table as follows:   [5 points]

- Go to http://apps.who.int/whosis/data/

- Click on select all countries.

- Select two indicators:

Deaths due to tuberculosis among HIV negative people
Incidence of tuberculosis

- Select time period:

2004, 2005, 2006

- Click on Create table

- Write down the meaning of all the columns.  You will NEED them later.

- Export the file by clicking.

- Open the exported file with WORD, NOT with XCEL.

- Delete the header lines (before Andorra) and the footer lines (after Viet Nam).

- Save the file as .txt file.

- Now create a database table using the IMPORT feature of AQUA.

Invent the column names based on what you wrote down above.
Name the table TUBER.

I explained many details of the import process in the lecture.

- Show (print out and hand in) the complete table. 


b) Build a table as follows:  [5 points]

- Go to http://www.census.gov/ipc/www/idb/

- Click on World Population Information

- Click on Country Rankings

- Select the year 2006

- Make sure you have the selection of All Countries.

- Click Submit

- Write down the column names.

- Highlight the data from the web page (only the data) with the mouse
and do a COPY.

- Open a WORD file.

- Paste all the data using PASTE SPECIAL from the edit menu.
Look at the file, make sure it looks right.

- Note:  I had big problems with the next steps.
I solved them by doing the following before import, in WORD.

In the Word file, replace every Blank+Tab by ;
This was explained in the lecture.

- Save the file as .txt

- Import all the data into AQUA as a table called POPULATION2006.
Set ; as separator.
Set string marker to NONE.
This was explained in the lecture.

- Show (hand in) the complete table. 

WE WILL REUSE THESE TABLES IN LATER ASSIGNMENTS.

[Questions c) to h) below count 1 point each.]

c) Write a SELECT statement that will show all the countries that
are in the table TUBER but that are missing in the table POPULATION2006.

MAKE SURE THAT YOU HAND IN ALL SQL CODE AND ALSO THE RESULT
AS A PRINTOUT.  HERE AND EVERYWHERE ELSE.

d) Write a SELECT statement that will show all the countries that
are in the table POPULATION2006 but that are missing in the table TUBER.

e) LOOK AT THESE TWO LISTS.

Are the United States really missing?
Did you know that South Korea is the Republic of Korea?
What other problems do you see?

Write down a description of at least one problem country other than the US and Korea.
What is the problem?

I mean, TYPE the answer into a file.  I do NOT mean literally, write it by hand.
The point is that you don't need to program this question.
Don't forget to print the file and hand it in.

f) Write as many SQL UPDATE statements as needed that will correct the country
names so that they are the same in both tables.  In general,
use the shorter name.  So, use Korea, South instead of Republic
of Korea.

MAKE SURE THAT YOU HAND IN ALL SQL CODE AND ALSO THE RESULT
AS A PRINTOUT.  HERE AND EVERYWHERE ELSE.

g) Rerun question c) now.  Which countries are REALLY missing?

h) Rerun question d) now.  Which countries are REALLY missing?

i) Now that you have prepared the data, study it one more time, because
now comes the first "real" question.  [4 points]

Write a SELECT statement which will return a table of the number of
people that died of tuberculosis in each country (that exists in both tables).
Display the country name and the number of people that died, and do
this in descending order, so the country with the most deaths comes first.

Use data for 2006 only.

This question will require some thinking and a little bit of math.
This should be done with ONE single SELECT.

Also you have to do this question with a JOIN.  No credit otherwise.


(2) Write SQL queries to find the answers for the following questions.
[1 point each]

a) What is the country with the smallest population, display name and population.

b) What is the number of countries in the POPULATION table?

c) What is the average population over all countries?

d) What is the total world population?

e) What percentage of the population lives in China?


(3) Write a PL/SQL program, using an explicit or implicit cursor
that prints to the screen the following result:

What percentage of the world population lives in the top 5 countries?

Make sure you use a loop.
[10 points]

(4) Write a PL/SQL program, using an explicit or implicit cursor
that prints to the screen the following result:

Which countries (starting counting at the top) TOGETHER contain 90% of the world
population?

If you need several cursors, that's ok.
[10 points]

(5) Write SQL queries to find the answers for the following questions.
[1 point each]

a) Which country had the highest percentage of deaths from tuberculosis in 2006?

b) Which country had the lowest percentage of cases of tuberculosis in 2005?

c) A country that has relatively few deaths (relative to the number of cases)
of tuberculosis could be considered a country with an effective healthcare system.
Using data from 2006, display a list of all countries, ordered so that
the countries with few deaths relative to the number of cases come first.
Do this by writing a SELECT statement.  Not a program.

d) Alternatively, a country that has few cases of tuberculosis to begin with 
could be considered a country with an effective healthcare system.
Display all data in TUBER, sorted by CASES2006.

e) Write down by hand, WITHOUT programming:
Which countries are in the top 10 of both question d) and question c).

Does the answer make sense?

I mean, TYPE the answer into a file.  I do NOT mean literally, write it by hand.
The point is that you don't need to program this question.
Don't forget to print the file and hand it in.

===============================================================

VOLUNTARY QUESTION:  No extra credit.

Solve (5) e) with at most 3 SELECT statements (they may be nested).

================================================================

THE FOLLOWING APPLIES TO ALL ASSIGNMENTS:

- Create a nice cover page with 
  your name, assignment number, class number and a descriptive
  title of the assignment.
  
  All parts of your assignment need to be stapled 
  together, or held together by a folder.  Points will be taken off
  for not complying with these rules.

  If you use a folder, write your name ALSO on the folder.  Or use
  a folder with a transparent cover page.

  No handwriting anywhere.
  No handwriting anywhere.
  No handwriting anywhere.

  Make things look nice and professional.  

- Documentation:
  Do a reasonable amount of internal documentation.
  Internal documentation typically means that you explain
  your code or SQL statements.
  No external documentation unless I specifically ask for it.

- Test runs:
  Show your table after you build it.  Show testruns.
  Show as much as you can.  The burden of proof is up to you.
  NO TESTRUNS, NO POINTS. 
  (Or very, very few.)  You may record the results in a file.
  Thus, you don't need to do screen dumps.
  You will not be punished for too many test runs.

- Hardcopy:
  All code, documentation, and testruns will be accepted as Hardcopy
  (Printout) ONLY.  No email, No fax etc.

- Grading:
  Grading takes into account correctness of the results, but also
  how well they are documented, and what their visual appearance is.
  Things should look nice and PROFESSIONAL.  Not sloppy.  Use a spelling
  corrector for your comments or (later on) write-ups.  Some people do this
  very nicely.  Don't cut corners here, or you will look bad in comparison.
  PROGRAMS MUST BE INDENTED.

- Cheating:
  The usual policies apply.  0 points for "suspiciously similar" results.
  Report to the Dean in case of repetition.

- Files:
  You must keep all your files of every assignment up to the end of
  the semester.  If there is any doubt about your assignment I will
  ask you to demonstrate it online.

- Late policy:
  I will take off 10 points for being late up to 1 week.  Programs
  that are handed in more than a week late will receive no credit
  at all, unless you have a good reason with WRITTEN documentation
  (usually medical).