Homework 1

Due Date: 9/21/2010
Points: 50


1) Create a table of big US cities and their population counts.

For this purpose go to:

http://www.infoplease.com/ipa/A0763098.html

HINTS:

Copy and paste the table from the Web page into an MS Word file.

Use paste special (not the normal paste).

Use the first menu option in paste special, plain text.

Delete the header in the file by hand.

Replace all tabs by ;  
Make sure that there is no blank before the ; or it might become part
of a varchar2 value, which we don't want.

Replace all , between cities and states by ;

Clean up weird things by hand, such as foot note marks
and delete Raleigh for which there is almost no information.

Delete cities from Hawaii and Alaska.

Replace all state abbreviations by the standard two letter
abbreviations.  So Ariz. becomes AZ, for example.
(The standard abbreviations are used in the file in question
2)).


Save the whole thing as .txt file:  citiespop.txt

Now use the AQUA import feature explained in class to
create a table CITIES_POPULATIONS from the .txt file.

Print out the final resulting table.

[10 points]




2) Create a table of US cities and their latitudes and longitudes.

For this go to 

http://www.realestate3d.com/gps/latlong.htm

paste starting at Alabama

Remove the three letter IDs at the beginning of every line
Use \[???\] and replace by nothing (empty string).
This will be discussed in class in more detail.

Replace ^p^p (two paragraph marks) by one ^p
twice to get rid off empty lines.
This will be discussed in class in more detail.

Now get rid off the state names between rows of data.
(If necessary by hand.)  They are not necessary as
every row has the state in it already.

Delete cities from Hawaii and Alaska.

Replace the comma before each state by a ;
Replace pairs of spaces by a ;
This will be discussed in class in more detail.

Save the whole thing as .txt file:  citieslat.txt

Load the text file into a table using the import feature of AQUA.
Call the table CITIES_LAT_LONG.

(NEW EXPLANATION 9/8/2010)
Some of the cities exist several times, with slightly different
latitudes and longitudes.  

Let's call such lines duplicate lines. (Really only the city name
is a duplicate.)

Compute one single latitude and longitude for such cities,
which is the average of all latitudes and longitudes of all the lines
with the same city name.  Round the average values to two digits after
the decimal point.

Hint 1: You need a SELECT with a "GROUP BY" to do this.
Hint 2: You will need to alias the average values you compute.
(END NEW EXPLANATION)

Create a new table LAT_LONG that has all the information of CITIES_LAT_LONG
but without the duplicate lines.  Do this with approporiate Oracle SQL
commands.  Show the commands.

Print out the FIRST FIVE pages of the final resulting table LAT_LONG.

[10 points]

----------------------------------------------------------------------

3) Write SQL Queries for all of the following problems.
Show a printout of both the query and the result.

LONG RESULTS:  Whenever the results are longer than 3 screens,
print the first two screens and print the last screen.

We define Chicago and all cities above it as Northern Cities (N).
We define Cleveland and all cities to its right Eastern Cities (E).
We define Memphis and all cities below it as Southern Cities (S).
We define Tucson and all cities to its left as Western Cities (W).

Look at your data!  Memphis is not called just Memphis.
You need to clean the data by updating it and removing the NAS!

Write SQL queries that will find the following information:


Q1:

a: All Western Cities in alphabetical order.
b: All Northern Cities in alphabetical order.
c: All Eastern Cities in alphabetical order.
d: All Southern Cities in alphabetical order.

NOTE:  You do not "write down" the location of
Tucson, Chicago, Memphis, Cleveland.
Rather you need to get them with a nested query
inside your query.


[4 points]


Q2:
All "middle" cities, that is cities that are neither 
Western, Northern, Eastern, or Southern.
In alphabetical order.

Tucson, Chicago, Memphis, Cleveland should NOT
be in the solution, they belong into the solutions
of Q1.

[2 points]



Q3: 
Find cities all that lie in the "corners," ie 
a: NW
b: NE
c: SW
d: SE
all in alphabetical order.

This problem HAS TO BE SOLVED as a PL/SQL program with a cursor.
Implicit or explicit cursor is fine.

[4 points]

NOTE: 

You do NOT have to do queries for
N middle, S middle, E middle, W middle.


Q4: 
Now find only those 
a: Western Cities 
b: Northern Cities
c: Eastern Cities 
d: Southern Cities
for which we know the population count in 2009.
If there are several such cities, display them in DESCENDING order
by population count.

[4 points]

Q5:
a: Find the latitude of the "Nothern City" that is lowest above Chicago.
b: Find the latitude of the "Southern City" that is hightest below Memphis.
c: Find the longitude of the "Western City" that is rightmost left of Tucson.
d: Find the longitude of the "Eastern City" that is leftmost right of Cleveland.

Only the number is required. But it has to be a unique number.

[4 points]


Q6:

Now, using the numbers found in Q5, find the names and states of those
cities.  You can plug the numbers from Q5 into your queries.

a: Find the name and state of the "Nothern City" that is lowest above Chicago.
b: Find the name and state of the "Southern City" that is hightest below Memphis.
c: Find the name and state of the "Western City" that is rightmost left of Tucson.
d: Find the name and state of the "Eastern City" that is leftmost right of Cleveland.

[4 points]

Q7:

a: Find the distance of latitudes (difference) and longitudes for the following pairs of 

New York - Chicago
Chicago - Houston
Houston - Los Angeles
San Diego - Los Angeles

You may use separate queries for each city.
But the "subtraction" has to be done within Oracle,
not by hand and not with a calculator.

[4 points]


Q8:  
a: Find all the states that have ONLY northern cities.
b: Find all the states that have ONLY southern cities.
c: Find all the states that have ONLY eastern cities.
d: Find all the states that have ONLY western cities.

Each of these can be done with a single SQL query.

[4 points]




================================================================

THE FOLLOWING APPLIES TO ALL ASSIGNMENTS:

- Create a nice cover page with 
  your name, assignment number, class number and a descriptive
  title of the assignment.
  
  All parts of your assignment need to be stapled 
  together, or held together by a folder.  Points will be taken off
  for not complying with these rules.

  If you use a folder, write your name ALSO on the folder.  Or use
  a folder with a transparent cover page.

  No handwriting anywhere.
  No handwriting anywhere.
  No handwriting anywhere.

  Make things look nice and professional.  

- Documentation:
  Do a reasonable amount of internal documentation.
  Internal documentation typically means that you explain
  your code or SQL statements.
  No external documentation unless I specifically ask for it.

- Test runs:
  Show your table after you build it.  Show testruns.
  Show as much as you can.  The burden of proof is up to you.
  NO TESTRUNS, NO POINTS. 
  (Or very, very few.)  You may record the results in a file.
  Thus, you don't need to do screen dumps.
  You will not be punished for too many test runs.

- Hardcopy:
  All code, documentation, and testruns will be accepted as Hardcopy
  (Printout) ONLY.  No email, No fax etc.

- Grading:
  Grading takes into account correctness of the results, but also
  how well they are documented, and what their visual appearance is.
  Things should look nice and PROFESSIONAL.  Not sloppy.  Use a spelling
  corrector for your comments or (later on) write-ups.  Some people do this
  very nicely.  Don't cut corners here, or you will look bad in comparison.
  PROGRAMS MUST BE INDENTED.

- Cheating:
  The usual policies apply.  0 points for "suspiciously similar" results.
  Report to the Dean in case of repetition.

- Files:
  You must keep all your files of every assignment up to the end of
  the semester.  If there is any doubt about your assignment I will
  ask you to demonstrate it online.

- Late policy:
  I will take off 10 points for being late up to 1 week.  Programs
  that are handed in more than a week late will receive no credit
  at all, unless you have a good reason with WRITTEN documentation
  (usually medical).