covariance and correlation coefficient

The covariance shows a linear relationship between two random variables. However covariance does not have an upper or lower bound.  A related measure, correlation coefficient, provides the strength of the relationship. After watching the following video:

1. Summarize the difference between a covariance and correlation coefficient.

2. Explain which one of the two measures would be more useful in describing the relationship between two variables. Justify your answer by giving an example.

Directions to student: The final paragraph (three or four sentences) of your initial post should summarize the one or two key points that you are making in your initial response. You will be writing three or more discussion posts per week. Your main post must be two to three substantive paragraphs 150-200 total words and include at least two APA-formatted citations/references. Please follow up with two subsequent replies to colleagues. Each reply should consist of a relevant paragraph containing 100 words or more.

Submit your posting to the Discussion Area by Day 3 of the week 2.

Advertisements

Assignment 2: LASA—Application of Biostatistics Concepts in Case Studies

A local government official, who happens to be up for reelection, has hired you to analyze the data from the Behavioral Risk Factor Surveillance System (BRFSS) surveys and develop a public health intervention plan based on the information.

The BRFSS is an ongoing telephone health survey system. It has been tracking health conditions and risk behaviors in the United States yearly since 1984. Data are collected monthly in all fifty states, the District of Columbia, Puerto Rico, the US Virgin Islands, and Guam. For this assignment, you need to do the following:

Part 1: Analyze the Data         

  1. View this link: http://www.cdc.gov/brfss/state_info/coordinators.htm
  2. Select your criteria for any state statistical areas, for any year 2002 to 2011, and pick a category.
  3. Click a subcategory (if offered) to retrieve the necessary data.
  4. In a detailed report, analyze the data by completing the following:
    • Summarize what data was collected, the method of collection, and the basic findings from the data.
    • Evaluate the data’s validity, quality, and reliability.
    • Identify and discuss strengths and weaknesses within the data.
    • Based on the data, develop a hypothesis about a public health problem that could be tested with a study.
    • Recommend one other form of data that should be collected in order to enhance the study.
    • Develop a study design to best collect your recommended data.
    • Identify the resources you would need to collect the data, including the people you would need to work with.
    • Identify and justify an appropriate method to display the findings.

Part 2: Public Health Intervention Plan

Based on your analysis of the provided data, create a public health intervention plan by completing the following:

  • Describe your chosen population and the health issue you are addressing.
  • Provide 2–3 specific, actionable recommendations to help improve health for your specific population and issue.
  • To whom would you present the results to achieve the desired outcomes?

Write an 8–12-page paper in Word format. Apply APA standards to citation of sources. Use the following file naming convention: LastnameFirstInitial_M5_A2.doc

By Monday, October 2, 2017, deliver your assignment to the M5: Assignment 2 LASA Dropbox.

Grading Criteria

Assignment ComponentsProficientMaximum Points

Summarize what data were collected, the method of collection, and the basic findings from the data.

Summary of the data and its method of collection is clear, complete, and accurate. Specific data points are used to highlight the metropolitan/micropolitan area, year, and categories of the research. Basic findings are clear and accurate, applying course content and skills to accurately generalize the data provided.32

Evaluate the data’s validity, quality, and reliability.

Evaluation of the data’s validity, quality, and reliability is clear and accurate. Specific data from the research and evidence from scholarly sources is used to justify answers.28

Identify and discuss strengths and weaknesses within the data.

Strengths and weaknesses identified are accurate, obvious, and specific. They represent all key strengths and weaknesses important to consider in this research analysis. Explanation uses specifics from the scenario as examples and applies content and concepts from the course. Scholarly resources are utilized in support.28

Develop a hypothesis about a public health problem that could be tested with a study.

Hypothesis developed is clear, specific, and appropriate to test through a public health study. It is likely that testing the hypothesis will generate new information related to the chosen public health problem. Hypothesis is unique from the study researched.24

Recommend one other form of data that should be collected in order to enhance the study.

Recommendation is clear, specific, and appropriate to collect data. Scholarly research is used in support.16

Identify the resources you would need to collect the data, including the people you would need for assistance.

Resources identified are appropriate and carefully selected to gather academic data. They represent all human and material resources needed to complete the study.24

Identify and justify an appropriate method to display the findings.

Method identified to display the findings is appropriate, organized, and clear. A complete justification is included as to why the method was chosen.20

Describe your chosen population and the health issue you are addressing.

Description of the chosen population is specific, organized, and complete. The health issue addressed is appropriate for the given population.24

Provide 2–3 specific, actionable recommendations to help improve health for your specific population and issue.

Two to three recommendations are made. Recommendations are appropriate, clear and specific. A justification as to why the recommendations are being made is included. The recommendations can be implemented to improve a specific health issue for targeted population.32

Writing Components

Organization
Write with clear organization, including introduction, thesis/main idea, transitions, and conclusion.

Introduction has a clear opening, provides background information, and states the topic. The assignment (e.g., report, presentation) is organized around an arguable stated thesis or main idea. Transitions are appropriate and help the flow of ideas. Conclusion summarizes main argument and has a clear ending. Writing generally provides a consistent coherency among ideas.16

Usage and Mechanics
Write using proper grammar, spelling, usage, and mechanics to provide smooth readability.

Writing follows conventions of spelling and grammar throughout. Writing skills are competent. Good command of language. Bullet points and/or sentence structures are accurate. Capitalization, punctuation, and indentation reasonably well followed. Spelling errors are very few. All errors are infrequent and do not interfere with readability or comprehension.20

APA Elements
Include proper attribution, paraphrasing, and quotations of all sources.

Using APA format, accurately paraphrased, quoted, and cited throughout the presentation when appropriate or called for. Only a few minor errors present.20

Audience and Communication
Write specifically to key audience, using terminology and tone appropriate for the audience.

Writing is focused. Tone is adequately formal in keeping with the audience.16Total:300

Grading Rubric

This assignment is worth 300 points and will be graded using a rubric. Click here to download and read the rubric to understand the expectations.

Design and analysis of algorithms

instructions.

1. Solution may not be submitted by students in pairs.

2. You may submit a pdf of the homework, either printed or hand-written and scanned, as long as it is easily readable.

3. If your solution is illegible not clearly written, it might not be graded.

4. Unless otherwise stated, you should prove the correctness of your answer. A correct answer without justification may be worth less.

5. If you have discussed any problems with other students, mention their names clearly on the homework. These discussions are not forbidden and are actually encouraged. However, you must write your whole solution yourself.

6. Unless otherwise specified, all questions have same weight.

7. You may refer to data structure or their properties studied in class without having to repeat details, and may reference theorems we have studied without proof. If your answer requires only modifications to one of the algorithms, it is enough to mention the required modifications, and the effect (if any) on the running time and on other operations that the algorithm performs.

8. In general, a complete solution should contain the following parts:

(a) A high level description of the data structures (if needed). E.g. We use a binary balanced search tree. Each node contains, a key and pointers to its children. We augment the tree so each node also contains a field…

(b) A clear description of the main ideas of the algorithm. You may include pseu- docode in your solution, but this may not be necessary if your description is clear.

(c) Proof of correctness (e.g. show that your algorithm always terminates with the desired output).

(d) A claim about the running time, and a proof showing this claim.

1

1. You are given k sorted arrays of keys A1 . . . Ak. Each key is a float number. The total number of keys is n. Your goal it to sort all these keys in time O(n logk).

(a) Suggest a solution under the assumption that there are no constrains on memory availability nor on access time to memory, and access to each memory cell takes a constant time.

(b) Suggest a solution under the following assumptions: Each ‘Write’ Operation is much slower than a ‘read’ operation (as is commonly the case for Solid State Driver). However, you have access to fast memory containing O(k) words. (hint – you might need to borrow some ideas from CSC345).

To elaborate, assume that your computer’s memory contains two major components:

i. SlowMem – a slower memory. Reading and writing from this components are slower. The input arrays are stored in SlowMem, and the final output should be writing to SlowMem. Your algorithm should keep the number of reads/writes from SlowMem to the bare minimum.

ii. FastMem – a faster memory unit. Could only contains say 10k words.

2. You insert n different keys into an initially empty SkipList L. The version of SkipList discussed in class assumes that if a key k participates in level i, then with probability p = 0.5 it is promoted to level i + 1. What is the probability that after all insertions take place, all keys only appear on level 2, and none appear on level 3?

3. When considering the probability of having a SkipList of height ≥ Z log2 n, we discussed the case where only insertions of keys are performed.

Give bound to the probability that the height is ≥ Z log2 n if n is the current number of keys, but possibly a larger number of keys were present (in the past) in the SkipList but were deleted.

4. (a) Let d > 2 be a fixed positive integer. consider a perfect SkipList constructed as follows: In order to create the ith level Li of the SkipList, we scan the keys of level Li−1, and promote to Li every d’th key. So for example, the perfect SkipList discussed in class uses the value d = 2. The case d = 3 implies that every third key is promoted, and so on.

Express your answer as a function of n and d

i. What is the number of levels, as a function of n and d?

ii. What is the worst case time for performing find(x) operation ? For delete(x), For insert(x) ?

iii. Assume n = 109. Compare the case d = 2 vs. d = d = 10 vs. d = 1000. When will the search time be optimal.

(b) Assume that we re-create a SkipList by inserting n keys, in the same order, but this time we are using the randomized insertion algorithm shown on the slides. However a key that appears in level i is promoted to level i+1 with probability p = 0.1 (rather than p = 0.5 that was discussed on the slides).

Will the expected time to perform find(x) operation increase or decrease, compared to the expected time for the same operation in the original SkipList created with p = 0.5.

2

(c) Repeat the question, but now assume p = 0.9.

5. A vicious hacker just got access to the SkipList L you have built, which contains n keys. Show that the hacker could delete an expected number of n/2 keys from L, such that the operation search(x) would take Ω(n) in the worst case.

6. Suggest a modification of the SkipList structure, such that in addition to the operations insert(x), find(x) and delete(x), you could also answer the operation LesserThan(x) specifying the sum of all the keys in the SkipList are strictly smaller than x.

The expected time for each operation should be O(log n).

Hint: Store at each node v of the SL another field, called size[v], containing the number sum of keys in the SL between v and the next node at the same level as v. Note that maintaining the values of these fields might imply extra work while performing other operations.

7. In addition to the operations mentioned in the previous question, now support also range(x1, x2) which returns the sum of all keys which are ≥ x1 and ≤ x2. The expected time for every operation is O(log n).

8. In addition to the operation in the last question, not support also avg(x1, x2) which returns the average value of all keys stored in the SL, which are are ≥ x1 and ≤ x2. The expected time for every operation is O(log n).

9. (a) Explain possible drawbacks of using the following hashing method: m = 32, h(k) = (kA) mod m, where A is a large integer. All keys in U are integers whose last digit (in decimal representation) is 8.

(b) Does the drawback that you pointed out exist even if we use the multiplication method, as described in the slides (but with the same value of A) ? If not, explain exactly why.

(c) Does the drawback that you pointed out exist even if we use the function h(k) = (kA) mod m but A is an long float number ? if you have to use this method, which values of A would you prefer, and why.

10. Assume a hash table T [0..20] (that is, m = 21), and a open addressing hashing where

h(x, i) = {x + i · (x mod 10)} mod m.

Assume you start with an empty table. Show an example of a set of 4 distinct keys {k1, k2, k3, k4} such that

(a) kj mod 10 > 0 for j = 1, 2, 3, 4. And in addition,

(b) You could not insert all of them into the table. That is, calling insert(k1), followed by insert(k2), followed by insert(k3) and insert(k4) would report that the last operation is unsuccessful.

11. You have stored a huge number of images, and you are running out of disk space. Explain how you would use hash functions to find if your computer contains two identical image

3

files (possibly under different names). Give a pseudocode of your solution. Specify which and how your hash functions are used. Do not use values provided by the file system.

Your algorithm should be as efficient (in space and running time) as possible.

Assume images are stored as raw data. That is, images are given as matrix of pixels, and for each pixel, we are given the RGB values, as numbers between 0 and 255.

If you prefer, think about these images as ASCII documents that you could read sequen- tially, one character after the other.

Assume that beside the memory used for storing the files, you could use only 1MB of fast-access memory. of data. You have over 10000 files, each over a 2GB long.

12. Repeat the previous question, but this time you could use values provided by the File System/Operating System (such as MD5). Could you use the MD5 value of the file as an index to a hash table?

Problem and Applications

Solve Problem and Applications: ch 13- prob 7 [Klein Industries manufactures three types of portable air compressors] in page 451 at the end of chapter 13 in your textbook, USA 5th edition. The data is given as part of the question. There is no external data set provided. Use what is given by the problem.

13-7.  Klein Industries manufactures three types of portable air compressors: small, medium, and large, which have unit profits of $20.50, $34.00, and $42.00, respectively. The projected monthly sales are: 
                  Small  Medium  Large 
 Minimum  14,000   6,200  2,600 
 Maximum  21,000  12,500  4,200 

 The production process consists of three primary activities: bending and forming, welding, and painting. The amount of time in minutes needed to process each product in each department is shown below: 

                                 Small  Medium  Large  Available Time  
Bending/forming            0.4       0.7       0.8       23,400 

Welding                       0.6       1.0       1.2       23,400 

Painting                       1.4       2.6       3.1       46,800 

   
How many of each type of air compressor should the company produce to maximize profit? 
 a.  Formulate and solve a linear optimization model using the auxiliary variable cells method and write a short memo to the production manager explaining the sensitivity information. 
 b.  Solve the model without the auxiliary variables and explain the relationship between the reduced costs and the shadow prices found in part a. 


Solution Tip: You must use Solver tool in Excel. Set your constraints right, and fill in the solver dialogue box properly. Be sure to review links provided in the last lesson, or in one of the announcements in the first week of class. Here is a You Tube link you could view: http://www.youtube.com/watch?v=Oyc0k9kiD7o 
Also additional slides to explain basics and a basic example is attached. The example and slides are for demonstration and illustration purpose only to help practice how to use solver tool.

Turn It In (TII):
TII is integrated in this course which means you will not need your own account for submitting papers.  Your faculty member will decide for each assignment if your paper needs to be submitted to TII.  Please submit each assignment only in the Assignments area.
here’s supporting material
     
     
     
  ——- Slip Ring ——-    
Number to Model 1 Model 2 Model 3  
    Make     0  0  0   
    Buy 0  0  0   
     
Cost to     
    Make     $50  $83  $130  Total Cost 
    Buy $61  $97  $145  $0  
     
# Available 0  0  0   
# Needed 3,000  2,000  900   
     
Hours Required     Used Available
    Wiring 2  1.5  3  0  10,000
    Harnessing 1  2  1  0  5,000
Content and Organization  Points Available Points Earned Comments
Assignment details and requirements and elements. : The content is comprehensive, accurate, and/or persuasive. Results interpreted and conclusions delivered. Excel work, if required, is an original sheet with the all details of work and calculations included. 80    
  80 0  
Readability and Style  Points Available Points Earned Comments
Content is present, logical and maintain the flow throughout the answers 10    
Work is identified and organized 10    
  20    
Late penalty   0  
Final Score 100 0  
Total  100 0
 

 

a linear optimization

There are two parts to the assignment and you need to  complete both parts.
a.  Formulate and solve a linear optimization model using the auxiliary variable cells method and write a short memo to the production manager explaining the sensitivity information. 
 b.  Solve the model without the auxiliary variables and explain the relationship between the reduced costs and the shadow prices found in part a. 
Without auxiliary variables: ????
You need to relax the conditions on the variables that you had for part a) and solve the optimization problem. For example,  you may have a constraint  a + b < 50 where 1 < a < 2.  The first one is the constraint and you will get rid of the condition 1 < a < 2 and run the analysis for the second part.