Java Video Tutorial 1: Installing the Java Development Kit

Java Video Tutorial 8: Arrays

Java Video Tutorial 7: Switch Statements

Java Video Tutorial 6: Loops

Java Video Tutorial 5: Object Oriented Programming

Java Video Tutorial 4: If Statements

Java Video Tutorial 3: Variables and Arithmatic (Part2)

Java Video Tutorial 3: Variables and Arithmatic (Part1)

Java Video Tutorial 2: Hello World!

Java Tutorial #1 - Hello World

1

C Programming By OOP Web

Chapters:

1

Contents

2 Copyright Notice and Credits
3 Introduction
4 A Quick Overview of C
5 Using C with UNIX
6 Constant and Variable Types
7 Expressions and Operators
8 Control Statements
9 Functions in C
10 Input and Output
11 Handling Files in C
12 Structures in C
13 The C Preprocessor
14 Programs with Several Files
15 UNIX Library Functions
16 Precedence of C operators
17 Special Characters
18 Formatted Input and Output Function Types
19 Some Recommended Books
20 C Language Keywords
21 Usable SUN Systems
22 About this document ...

Tutorial:: Intro to Java

This video will tech u basics of java programming.

Graphics/Games Programming

TRICK OF THE 3D GAME PROGRAMMING GURUZ
ADVANCED 3D GRAPHICS AND RASTERIZATION
Andre LaMothe


Graphics Programming with Pearl

C/C++ Programing BOOKS

ANSI C
PRENTICE HALL SOFTWARE SERIES
BRAIN W. KERNIGHAN
DENNIS M. RITCHIE

SAME BOOK AS ABOVE BUT .pdf FORMAT


C PROGRAMMING BY OOPWeb


C++ Programming through video
5 tutorials divided in 7 parts .

C++ By FunctionX

C# Programming

Title

C# (pronounced "C Sharp") is a language used to create computer applications that tell the machine what to do and when. In the various lessons on this web site, we study the C# language by creating console applications, which are text-based programs that display their results in a black or gray window.

The lessons on this site use step-by-step instructions with a patient detail-oriented approach, accentuated by various useful examples in every section. To make it easy to learn effectively, the lessons are organized in topics so you can lead to the particular part you are interested in.

If you are a beginner, we recommend you follow our below laid-out logical organization from the top-left side (Fundamentals) to the right and down.

To follow the lessons on this web site, you should have installed either Microsoft Visual C# 2005 (Express Edition (which is free) or Professional), Borland C# Builder, sharpdevelop (which is free), or another environment or compiler that uses C#.

FundamentalsClassesConditionals
:: Introduction to C#
:: Variables
:: Code Organization
:: Data Reading and Formatting
:: Introduction to Classes
:: The Methods of a Class
:: Classes Combinations
:: Partial Implementation
:: Introduction
:: Conditional Statements
:: Conditional Switches
:: Counting and Looping
StructuresUsing ClassesProperties
:: Introduction
:: bool
:: int
:: float
:: Dates
:: Times
:: Inheritance
:: Abstraction
:: Interfaces
:: Delegates
:: Events
:: The Properties of a Class
:: Introduction to Indexers
:: Indexers and Classes
File ProcessingSerializationBuilt-In Classes
:: Introduction
:: Details on File Processing
:: Files Operations
:: Binary
:: SOAP
:: Details
:: Object
:: Random
ArraysCollectionsADO
:: Introduction
:: Using Arrays
:: Arrays and Classes
:: Introduction
:: .NET Support
:: Intro to Built-In Classes
:: Generics
:: Introduction to ADO
:: Database Creation
:: Database Connection
:: The Tables of a Database
:: The Columns of a Table
:: The Records of a Table
Data SetsADO.NETXML
:: Introduction
:: Data Tables
:: The Column of a Table
:: The Records of a Table
:: Records Management
:: Data Relationships
:: Console Data Display
:: Introduction to XML
:: XML Nodes
:: XML Elements Operations
:: The Attributes of an Element
:: Characteristics of XML Nodes
TopicsBuilt-In Collection ClassesLibraries
:: Strings
:: Recursion
:: Mathematics
:: The Main() Method
:: Exception Handling
:: Hashtable
:: Stack
:: Queue
:: Introduction
:: Using VB Functions
:: External Libraries (C++/CLI)

Sports Science

If you want to do a sports-related science fair project, you're in luck. We have projects related to soccer, baseball, football, tennis, hockey, and more. As the following project ideas illustrate, there are many interesting ways to apply science to sports. Who knows, by thinking scientifically about your favorite sport, your science fair project might even help you become a better athlete!

Most of the projects span multiple scientific categories. We've grouped the projects into the following categories:

  • Baseball
  • Basketball
  • Bicycling
  • Football
  • Golf
  • Soccer
  • Sports and Human Behavior
  • Tennis
  • Throwing, Kicking, Hitting, and Bouncing
  • Winter Sports (Skiing, Skating, and Hockey)

Keeping Up

Do you ever feel like you need to walk faster than your parents just to keep up with them? This is because of the difference in leg length between you and your parents. How much faster do you need to walk than your parents? Can you use a walking test to determine how tall a person is?

Jumping Distance

Mike Powell of the United States currently holds the world record for the long jump at 8.95 meters, which is almost 30 feet! How did he jump so far? In this experiment, learn how a long jumper uses momentum from running to jump farther than the competition.

Think Fast!

Are you a piano player or a video gamer? Then you might have a quick reaction time that can come in handy while playing sports. Find out how to measure your reaction time and compare it to your friends and family with this fun experiment.

The Brain-Body Connection: Can Exercise Really Make Our Brains Work Better?

"Use it or lose it!" Sure, we all know physical exercise is important to keeping our bodies fit. But how important is physical exercise to your brain? In other words, is there any connection between an active body and increased brain power? This is an easy project where you can test the effect of exercise on a critical brain function: memory.

Under Pressure: Ball Bouncing Dynamics

Many sports use a ball in some way or another. We throw them, dribble them, hit them, kick them, and they always bounce back! What makes a ball so bouncy? In this experiment you can investigate the effect of air pressure on ball bouncing.

Nothing But Net: The Science of Shooting Hoops

Swish! What a great sound when you hit the perfect shot and get nothing but net. Here's a project to get you thinking about how you can make that perfect shot more often.

Tee Time: How Does Tee Height Affect Driving Distance?

If you're an avid golfer, this might be a fun project for you. When you're setting up to tee off out on the course, how much attention do you pay to putting the tee in the ground? The height of the tee can affect both where in the swing the club makes contact and where on the clubface the ball makes contact. Are you placing your tees at the right height to get the most distance from your swing?

Golf Clubs, Loft Angle, and Distance

If your idea of a great weekend morning is taking some practice swings at a driving range, or heading out to the links to play a round, this could be a good project for you. This project is designed to answer the question, what is the relationship between club loft angle and the distance that the ball travels when struck.

Which Team Batting Statistic Predicts Run Production Best?

Here's a sports science project that shows you how to use correlation analysis to choose the best batting statistic for predicting run-scoring ability. You'll learn how to use a spreadsheet to measure correlations between two variables.

A Cure for Hooks and Slices? Asymmetric Dimple Patterns and Golf Ball Flight

Have you ever wondered why golf balls have a pattern of dimples on their surface? The dimples are important for determining how air flows around the ball when it is in flight. The dimple pattern, combined with the spin imparted to the ball when hit by the club, greatly influence the ball's flight path. For example, backspin generates lift, prolonging flight. When the ball is not hit squarely with the club, varying degrees of sidespin are imparted to the ball. A clockwise sidespin (viewed from the top) will cause the ball to veer right (or slice). A counterclockwise sidespin will cause the ball to veer left (or hook). This project attempts to answer the question, "Can an asymmetric dimple pattern decrease hooks and slices?"

Are More Expensive Golf Balls Worth It?

There is a bewildering selection of different golf balls to choose from for playing the game. Some less expensive, some more expensive, all with different claims for the advantages they will bring to your game. This project can help you determine which type of golf ball is right for you.

Are More Expensive Golf Balls Worth It?

Objective

The goal of this project is to test whether you can increase the distance and/or accuracy of your drives by switching to a different ball.

Introduction

To be a successful golfer, you need to combine distance and accuracy to get the ball from the tee to the cup with the fewest strokes possible. Drives on the fairway need to be long and straight. As you approach the green, you need to be a good judge of distance in order to select the right club to put the ball where you want. Once on the green, you need to be able to read its contours so that you can predict the ball's path in order to sink your putt.

There is a bewildering array of available golf balls. Some are two-piece balls with an outer covering over an inner rubber ball. Others are three-piece, with two internal layers made from different materials. The thickness of the cover layer can be varied. The dimple pattern, shape, and depth can be varied, affecting the aerodynamics of the ball. And of course, some balls are also more expensive than others.

For each of these changes, various claims are made by the manufacturers. Do some background research to find out about the characteristics of different types of golf balls. Which ball do you think will give you the longest shots, or the most accurate shots? Don't take anyone's word for it, find out for yourself with an experiment!

Terms, Concepts and Questions to Start Background Research

To do this project, you should do research that enables you to understand the following terms and concepts:

  • Types of golf balls:
    • Two-piece
    • Three-piece

More advanced students should also study:

  • Momentum
  • Elastic collisions
  • Inelastic collisions
  • Projectile motion

Questions

  • How does the initial launch angle affect the distance of a drive?
  • How does the initial launch speed affect the distance of a drive?
  • How does the initial spin affect the distance of a drive?

Bibliography

Materials and Equipment

To do this experiment you will need the following materials and equipment:

  • Golf club (driver)
  • At least 3 different types of golf ball to test
  • Golf tees
  • A large open space for hitting the ball
  • A means for measuring the distance of your drives:
    • For example, you could place meter (or yard) markers at regular intervals (e.g., 25 or 50 meters, measured with a long measuring tape, or a long pre-measured rope, or a pedometer) and use the markers for measuring shot distances.

Experimental Procedure

  1. Do your background research so that you are knowledgeable about the terms, concepts, and questions above.
  2. Select at least three different types of golf ball to test. Use a dozen of each type for your experiment.
  3. Set up at one end of the open area, in the center of its width.
  4. Use the same club for each shot, and do your best to use a consistent swing for all of the shots.
    1. Alternate between the three different ball types.
    2. For each shot, measure the distance in meters (or yards), and the accuracy (deviation, in degrees from a straight away shot).
    3. Because your swing is not likely to be the same each time, you will need to do a large number of trials for each type of ball and each type of swing (at least 20, more is better).
  5. You can pre-measure the area where you are taking your shots, and place markers at regular intervals. Use these to judge the distance of each shot.
  6. You can alternate which end of the open area you hit from to save walking.
  7. Since the wind can have an effect on the flight of the ball, you should note the wind speed and direction in your lab notebook (see NWS, 2007).
  8. Calculate the average flight distance for each type of ball, and the average amount of deviation from a straight line shot (i.e., hook or slice) for each type of ball.
  9. Calculate the standard deviation for the flight distance and the amount of hook or slice for each type of ball.
  10. Illustrate your results by making graphs that show the distribution of the two types of balls with each type of swing.
  11. More advanced students should also do a t–test (Kirkman, date unknown) to see if any differences in the flight characteristics of the two types of balls are statistically significant.

Variations

  • Ball launch monitor club fitting session. Three important variables that determine the flight of the ball are: the initial launch angle, the initial speed of the ball, and the spin of the ball. These parameters are all determined in the fraction of a millisecond that the club is in contact with the ball. How well and how fast you swing the club, and the angle of the club face are critical factors for these parameters. Many golf pro shops have "Ball Launch Monitor" technology (usually based on high-speed photography), that you can pay to use to analyze your swing. With this technology, you can get high-quality data on all three of the critical variables: launch angle, speed, and spin. Maybe you can think of ways to enhance your experiment using "Ball Launch Monitor" technology to measure your swing with different golf balls to select the one that is right for you.
  • For a more basic golf-related experiment focusing on club selection and distance, see the Science Buddies project Golf Clubs, Loft Angle and Distance.
  • For a project on the importance of tee height for drive distance, see the Science Buddies project Tee Time: How Does Tee Height Affect Driving Distance?
  • For another golf-related experiment that focuses more on the aerodynamics of the golf ball, see the Science Buddies project A Cure for Hooks and Slices? Asymmetric Dimple Patterns and Golf Ball Flight.

A Cure for Hooks and Slices? Asymmetric Dimple Patterns and Golf Ball Flight

Objective

The goal of this project is to test whether an asymmetric dimple pattern on golf balls can produce straighter flight.

Introduction

The dimples on the surface of a golf ball are there for a reason. A golf ball with a smooth surface would only travel about half as far as the dimpled ball. Why is this so? The answer has to do with the flow of air over the ball when it is in flight. When a solid object moves through a gas (or a fluid), the gas pushes back on the solid. In aerodynamics (or fluid mechanics) this resistive force is called drag. The dimples on the surface of the golf ball are there because they reduce the drag force on the ball (Figure 1).

golf ball dimples
Figure 1. The dimpled surface of a golf ball decreases the drag force on the ball as it flies through the air (Scott, 2005).

How exactly does this work? In order to understand, we'll need to take a closer look at the pattern of airflow around a ball as it flies through the air. Figure 2 compares the airflow pattern for a smooth ball (top) vs. a dimpled ball (bottom), in horizontal flight (or in a wind tunnel). In the case of a ball with a smooth surface, the airflow in the thin layer right next to the ball (called the boundary layer) is very smooth. This type of flow is called laminar. For a ball with a smooth surface, the boundary layer separates from the ball's surface quite early, creating a wide, turbulent wake pattern behind the ball. The turbulent wake exerts a drag force on the ball. When dimples are added to the surface of the ball, they create turbulence within the boundary layer itself. The turbulent boundary layer has more energy than the laminar boundary layer, so it separates from the surface of the ball much later than the laminar boundary layer flowing over the smooth ball (Figure 2, bottom). Since flow separation occurs later, the turbulent wake behind the ball is narrower, resulting in less drag force.

smooth vs. dimpled ball air flow separation comparison
Figure 2. Comparison of the airflow over a smooth ball vs. a ball with a dimpled surface. In the case of the smooth ball (top), the boundary layer has a laminar flow pattern which separates from the surface early, creating a wide turbulent wake behind the ball. In the case of the dimpled ball, there is a turbulent boundary layer which separates from the surface later, creating a narrower turbulent wake behind the ball. The narrower wake results in less drag. Thus, given the same initial launch force, the dimpled ball travels further than the smooth ball (Scott, 2005).

In the real world, the situation is more complex than shown in Figure 2. First of all, golf balls don't fly horizontally through the air. When the club hits the ball, it launches it at an angle, determined by the golfer's swing and the loft angle of the club. The ball's initial speed and angle will be determined by the speed and orientation of the club face at the moment it strikes the ball, and exactly where on the surface of the ball that contact is made.

To make things even more complicated, the club generally imparts a spin to the ball. How does spin affect the flight of the ball? Let's consider the simplest case first. If the club strikes the ball squarely, the spin that is induced is called backspin (because the ball is spinning backwards, from the golfer's viewpoint). To be more precise, backspin is a spin around the horizontal axis, in a clockwise direction if viewed from the left-hand side (as in Figure 2, above).

Let's consider the effect that backspin will have on airflow over the ball. Since the surface of the ball is now moving in a clockwise direction, the airflow over the top of the ball will be sped up, and the airflow over the bottom of the ball will be slowed down. This has the effect of decreasing the pressure above the ball, and increasing the pressure below the ball. In other words, a spinning ball acts like an airplane wing and creates lift. Figure 3 shows how backspin affects the airflow over a golf ball in a wind tunnel. The smoke lines in Figure 3 show the airflow pattern. Notice how the flow pattern behind the ball is warped downward. This is the same type of pattern you would see for an airfoil at an angle to the wind tunnel air flow (like an airplane wing at takeoff when the plane starts climbing). The spin rate used in Figure 3 was less than the average spin for a golf ball hit by a club. The lift effect with real-world spin rates would be even greater.

smoke lines show pattern of airflow over a golf ball with backspin
Figure 3. Spinning golf ball in a wind tunnel. The smoke lines show the pattern of airflow over a golf ball with backspin. The air moves faster over the top of the ball, and more slowly over the bottom of the ball. The flow field is curved downward, indicating that the spinning ball is generating lift (F.N.M. Brown, in Veilleux and Simonds, 2004).

What if you don't hit the ball squarely? For example, say the club face is angled outward (away from the golfer's body) as it strikes the ball. Then the induced spin will have a component about the vertical axis. In this case, the spin would be clockwise, as viewed from above. The spin would result in an aerodynamic force pushing the ball off to the right, away from a straight flight path. In addition, the initial launch angle would be off to the right instead of straight ahead. These two combine to create what golfers call a slice. Instead of sailing straight down the fairway, the ball curves off to the right, perhaps into the rough, or trees, or (in the worst case) off to an adjacent fairway.

If the club face is angled inward (toward the golfer's body) as it strikes the ball, then the ball tails off in the opposite direction. Golfers call this a hook.

The Polara golf ball has an asymmetric pattern of dimples. There are six rows of deeper dimples on either side of the equator. At each pole, the dimples are shallower. This creates an airflow that tends to correct sidespin, and reorient the ball toward straighter flight. Does it have a significant effect on where the ball ends up? That's what this project is designed to find out.

Terms, Concepts and Questions to Start Background Research

To do this project, you should do research that enables you to understand the following terms and concepts:

  • Newton's Third Law
  • Golf ball aerodynamics:
    • Drag
    • Lift
    • Boundary layer
    • Flow separation
    • Wake
  • Loft angle
  • Spin
  • Magnus effect

Questions

  • How does backspin on the golf ball create lift?
  • How does side spin cause the ball to hook or slice?

Bibliography

Materials and Equipment

To do this experiment you will need the following materials and equipment:

  • Golf club (nine iron or pitching wedge)
  • Polara golf ball with asymmetric dimple pattern (available from Polara Golf; see the Variations section for a less expensive alternative)
  • Regular golf ball (any brand)
  • Empty football field
  • Tape measure

Disclaimer: Science Buddies occasionally provides information (such as part numbers, supplier names, and supplier weblinks) to assist our users in locating specialty items for individual projects. The information is provided solely as a convenience to our users. We do our best to make sure that part numbers and descriptions are accurate when first listed. However, since part numbers do change as items are obsoleted or improved, please send us an email if you run across any parts that are no longer available. We also do our best to make sure that any listed supplier provides prompt, courteous service. Science Buddies receives no consideration, financial or otherwise, from suppliers for these listings. (The sole exception is the Amazon.com link.) If you have any comments (positive or negative) related to purchases you've made for science fair projects from recommendations on our site, please let us know. Write to us at scibuddy@sciencebuddies.org.

Experimental Procedure

  1. Do your background research so that you are knowledgeable about the terms, concepts, and questions above.
  2. Set up at one end of the football field to hit a ball from the center of the goal line.
  3. Use the same club for each shot, and do your best to use a consistent swing for all of the shots.
    1. Note that the Polara ball is designed to correct the aerodynamics of balls that are mis-hit. It should have little effect on balls that are hit squarely.
    2. It would be a good idea to collect data for two different types of swing for each ball:
      • hitting the ball squarely so that it goes straight,
      • hitting the ball with the club face angled to deliberately hook or slice the ball.
    3. The trick is to do each type of swing as consistently as you can.
    4. Because your swing is not likely to be the same each time, you will need to do a large number of trials for each type of ball and each type of swing (at least 20, more is better).
  4. Use the yard lines on the field to measure the distance of each shot, and use your tape measure to distance away from the center of the field (amount of hook or slice). Keep track of these measures for each type of ball.
  5. You can alternate which end of the field you hit from to save walking.
  6. Since the wind can have an effect on the flight of the ball, you should note the wind speed and direction in your lab notebook (see NWS, 2007).
  7. Calculate the average flight distance for each type of ball, and the average amount of hook or slice for each type of ball.
  8. Calculate the standard deviation for the flight distance and the amount of hook or slice for each type of ball.
  9. Illustrate your results by making graphs that show the distribution of the two types of balls with each type of swing.
  10. More advanced students should also do a t–test (Kirkman, date unknown) to see if any differences in the flight characteristics of the two types of balls are statistically significant.

Variations

  • Rather than purchasing a Polara golf ball, you could make your own asymmetric dimple pattern by increasing the depth of some of the dimples on a regular golf ball. (You could also try filling, or partially filling, some of the dimples to create a smoother surface. You'll need to use a material that will stay put even when the ball is whacked repeatedly with a golf club.) You shouldn't need to remove (or add) a lot of material to get an effect—just a few thousandths of an inch should be enough (Veilleux and Simonds, 2004). You could do it by hand with a drill bit of appropriate size, or you could use a drill press with a mechanism for setting the maximum travel (wear safety goggles, adult supervision required). This experimental approach will save you some money (the Polara golf balls are expensive), and give you the flexibility to explore different patterns. For example, do you think you could make a dimple pattern that would increase the amount of hook or slice?
  • This procedure assumes that you will make your shots on an empty football field. The clubs were selected to be appropriate for this distance. If you have access to a larger open space, you may want to modify the experiment accordingly.
  • For a more basic experiment on the flight of golf balls, see the Science Buddies project Golf Clubs, Loft Angle, and Distance.

Which Team Batting Statistic Predicts Run Production Best?

Objective

The objective of this experiment is to use correlation analysis to determine which team batting statistic is the best predictor of a baseball team's run-scoring ability.

Introduction

Baseball is an interesting combination of individual and team effort. For example, there is the one-on-one duel of pitcher against batter. But once the batter reaches base, he needs his teammates to follow with hits (or "productive outs") in order to move him up the bases so that he can score. From the scientific side, an interesting aspect of baseball is the rich trove of statistics on nearly every aspect of the game.

In this project, you will learn about correlation analysis, a statistical method for quantifying the relationship between two variables. As an example, consider as our two variables the age and height of male students in an elementary school. In general, individuals in this age range grow taller every year. If we made a scatterplot with height as our y-axis and age as our x-axis, we would expect the data points to show a consistent upward trend, with height increasing steadily along with age. The graph below shows simulated data (based on average growth charts).

Height vs. age graph for boys 5-12 years old (data simulated from average growth charts).

In this case, the two variables are strongly correlated. As one increases, so does the other.

As a second example, suppose that we graph height as a function of birth month instead of age. Would you expect to find a correlation? Here is the same simulated height data, graphed now as a function of birth month (randomly assigned).

The same simulated height data, now plotted as a function of a randomly-generated birth month.

Our scatterplot is now a random arrangement of dots, with no apparent relationship. In this case, the two variables are not correlated.

To convince you that it is the same data, here is the same graph, with the different age groups (shown by grade level, K–6) each assigned a different symbol. You can clearly see the difference in average height of the different grade levels.

Height vs. birth month, with different symbols for each 1-year age group (simulated data).

The statistic that describes this relationship between two variables is the correlation coefficient, r (or, more formally, the "Pearson product-moment correlation coefficient"). It is a scale-independent measure of how two measures co-vary (change together). The correlation coefficient ranges between −1 and +1.

What do the values of the correlation coefficient mean? Well, the closer the correlation coefficient is to either +1 or −1, the more strongly the two variables are correlated. If the correlation coefficient is negative, the variables are inversely correlated (when one variable increases, the other decreases). If the correlation coefficient is positive, the variables are positively correlated (when one variable increases, the other increases also). How close to +1 or −1 does the correlation coefficient need to be in order for us to consider the correlation to be "strong"? A good method for deciding this is to calculate the square of the correlation coefficient (r 2) and then multiply by 100. This gives you the percent variance in common between the two variables (Rummel, 1976). Let's see what this means by calculating r 2 over the range from 0 to +1. (Note: for the corresponding values of r between 0 and −1, r 2 will be the same, since squaring a negative number results in a positive number.)

Interpreting the Correlation Coefficienct Using r 2
rr 2% variance
in common
1.001.00100
0.900.8181
0.800.6464
0.700.4949
0.600.3636
0.500.2525
0.400.1616
0.300.099
0.200.044
0.100.011
0.000.000

As you can see from the table, r 2 decreases much more rapidly than r. When r = 0.9, r 2 = 0.81, and the variables have 81% of their variance in common. When r = 0.7, that might seem like a fairly strong correlation, but r 2 has fallen to 0.49. The variables now have just less than half of their variance in common. By the time r 2 has fallen to 0.5, r 2 = 0.25, so the variables have only one-fourth of their variance in common.

For our simulated height data, the correlation coefficient for height vs. age was 0.88, indicating that age and height share 77% of their variance in common. In other words, 77% of the "spread" (variance) of the height data is shared with the "spread" of the age data. For height vs. birth month, the correlation coefficient was 0.03, so, to two decimal places, r 2 = 0.00. There is no correlation between the variables (as we suspected).

It is important to remember that correlation does not imply that one variable causes the other to vary. Correlation between two variables is a way of measuring the relationship between the variables, but correlation is silent about the cause of the relationship.

If the correlation coefficient is exactly ±1, then the two variables are perfectly correlated. This means that their relationship can be described by a linear equation, of the form:

y = mx + b .

You've probably seen this equation before, and you may remember that m is the slope of the line, and b is the y-intercept of the line (where the line crosses the y-axis). If two variables are strongly correlated, it is sometimes valuable to use the linear equation as a method for predicting the value of the independent variable when we know the value of the dependent variable. This method is called linear regression.

Let's look again at the scatterplot of simulated height vs. age for elementary school students. If we draw a "best fit" line through the points, our scatterplot looks like this:

Height vs. age, with regression line (simulated data).

A "best fit" line means the line that minimizes the distance between the line and all of the data points in the scatterplot. If you wanted to predict a boy's height, and all you knew was his age, using this line to make a prediction would be your best guess. A spreadsheet program (like Excel) can do this "best fit" calculation for you, and help you get started with making a graph of the data and the regression line. You can also make a graph of the "residuals," which shows the distance of each data point from the regression line. Here is an example of a residuals graph, again using our simulated height vs. age data:

Height vs. age residuals plot (simulated data).

The residuals plot essentially rotates the linear regression plot by 45°, making it easier to compare how the data points are distributed around the regression line. It is easier to make the comparisons when the regression line has a slope of zero. The vertical scale can also be expanded, since the data is now centered within the area of the graph. If you see patterns in the residuals plot, these are features of the data that are not explained by correlation between the two variables.

This project will use correlation analysis to determine which team batting statistic is the best predictor of a baseball team's run-scoring ability (Albert, 20003). In addition to standard batting statistics, you'll also use batter's runs average (BRA), total average (TA), and runs created (RC). Each of these is defined in the Experimental Procedure section, where you can learn how to program them in to a spreadsheet with a formula.

There are many possible variations to this project that could apply similar methods, or extend them further for a more advanced project. See the Variations section below for some ideas. No doubt you can also come up with your own. You can also check out the book on which this project is based, Teaching Statistics Using Baseball, by Jim Albert.

Terms, Concepts and Questions to Start Background Research

To do this project, you should do research that enables you to understand the following terms and concepts:

  • baseball batting statistics:
    • hits (H),
    • doubles (2B),
    • triples (3B),
    • walks (BB),
    • strikeouts (SO),
    • batting average (BA),
    • on-base percentage (OBP),
    • slugging percentage (SLG),
    • batter's runs average (BRA),
    • total average (TA),
    • runs created (RC).
  • correlation coefficient (or Pearson product-moment correlation coefficient),
  • linear regression.

Questions

  • If you find a correlation between two variables in your data set, can you conclude that one of the variables causes the other to change in a predictable way?

Bibliography

  • Batting statistics are defined here:
    Forman, S.L., 2006. "Batting Glossary," Baseball-Reference.com - Major League Statistics and Information [accessed March 3, 2006] http://www.baseball-reference.com/about/bat_glossary.shtml.
  • Here are two starting points for your background research on statistics:
  • The following sites are good sources for baseball statistics.
    • This project uses annual team batting statistics from baseball-reference.com:
      Forman, S.L., 2006. "League Index," Baseball-Reference.com - Major League Statistics and Information [accessed March 3, 2006] http://www.baseball-reference.com/leagues/.
    • Here is another site where you can download historical baseball statistics:
      Lahman, S., 2006. "The Baseball Archive," [accessed March 3, 2006] http://www.baseball1.com/.
  • Here is an Excel tutorial to get you started using a spreadsheet program:
    James, B., date unknown. "Excel 101," University of South Dakota, [accessed March 3, 2006] http://www.usd.edu/trio/tut/excel/.
  • If you'd like more ideas for exploring baseball statistics, check out the book this project is based on:
    Albert, Jim, 2003. Teaching Statistics Using Baseball. Washington, D.C.: The Mathematical Association of America.

Materials and Equipment

To do this experiment you will need the following materials and equipment:

  • a computer with Internet access and a spreadsheet program (the example below uses Microsoft Excel (Office 2003 version), similar functionality is probably available in other spreadsheet programs),
  • a printer.

Experimental Procedure

  1. Do your background research so that you are knowledgeable about the terms, concepts and questions above.
  2. If you are not familiar with using a spreadsheet program, be sure to take the time to go through the Excel tutorial listed in the Bibliography.
  3. Here is a short version of the data analysis steps you'll be following in order to find which batting statistic correlates best with run production. Use the links to jump down to the detailed sections, below. Use your browser's "back" button to return to this brief list:
    1. Download the historical data and import into Excel. (Downloading and Importing Data)
    2. Format the data for statistical analysis. (Formatting the Data)
    3. Add derived statistical measures (RC, TA, BRA or any others of your choosing) for testing. (Adding Derived Statistical Measures)
    4. Use the spreadsheet's correlation and linear regression analyses to see how the various batting statistics correlate with runs scored. (Running Correlation and Linear Regression Analysis)
    5. Compare the results. (Comparing the Results)
  4. If spreadsheets are something new for you, then the detailed explanations that follow should help. If you are already comfortable with using spreadsheets, then you should be in good shape on your own.

Downloading and Importing Data

  1. Download annual team battings statistics from Baseball-Reference.com: http://www.baseball-reference.com/leagues/.
    1. The data is arranged by year, and by league—National (NL) or American (AL).
    2. Choose the year and click on the league link (AL or NL) to get the team statistics for that league and year.
    3. Click and drag your mouse to highlight the batting statistics table, then copy and paste the table into Notepad.
    4. Repeat steps b and c for the other league. You can put both tables into the same text file, just include a blank line in between them.
    5. Save the tables as a text file (use the extension ".txt").
    6. The "Glossary" link (visible just above the table of team batting stats) has explanations of the batting statistics and their abbreviations.
  2. Import the saved batting data into your spreadsheet program. Here's how to do it in Excel:
    1. From the menu, select File/Open.... You'll see a dialog like the one below.

      Excel File Open dialog, with file type 'Text Files' selected.

    2. At the bottom of the File Open dialog, under "Files of type:" use the drop-down list to select "Text Files (*.prn, *.txt, *.csv)".
    3. Navigate to the directory where you saved your batting data file, select it, and click "Open."
    4. Excel now takes you through the "Text Import Wizard," a series of three dialogs. The first dialog looks like this (team batting data for 2005 shown):

      Excel Text Import Wizard Step 1 of 3, with file type 'Fixed width' selected.

    5. Make sure "Fixed width" file type is selected (as above), then click "Next."
    6. The second Text Import Wizard dialog is used to set the field widths. It looks like this:

      Excel Text Import Wizard Step 2 of 3, for setting field widths.

    7. The lines with arrows show where Excel will be breaking the data into columns. Check to make sure that each of the data columns has been recognized (use the horizontal and vertical scroll bars to view all of the columns).
    8. For the team batting data, you'll probably find that you need to add a column break for the third-from-last data column (the cursor points to the spot in the image below). The dialog box has instructions for adding, moving and deleting column breaks.

      Excel Text Import Wizard Step 2 of 3, showing the missing column break in the third-to-last column.

    9. When all of the data columns are set to your satisfaction, click "Next".
    10. The final step of the Text Import Wizard is to select data formats for each of the columns, as shown below.

      Excel Text Import Wizard Step 3 of 3, selecting data formats for each column.

    11. The default selection, "General", is what you want for any column with numerical data. For columns containing only text (like the team names), you can select "Text", but this is optional. Again you can use the horizontal and vertical scroll bars to examine the data columns and make sure that all the data types are set properly.
    12. When you are satisfied with your selections, click "Finish" to import the data.

Formatting the Data

Here are some tips for getting your data organized before analyzing it. You'll learn how to remove unwanted rows (or columns), how to change the order of data columns, and how to freeze the column labels, so that they remain visible even when vertically scrolling the data table.

  1. Removing unwanted rows (or columns). Sometimes an imported data file contains extra rows or columns that you don't need for your analysis. For example, with the team batting statistics, the second row is a dividing line made with characters. Since some of the analysis features we'll be using later require the data to be in contiguous blocks, we'll want to remove these extra rows. This is really simple. Here's how to do delete rows (or columns) in Excel:
    1. Right-click on the number(s) of the row(s) (or letters of the columns) you want to delete.
    2. The entire row (or column) will be highlighted, indicating that it is selected, and you will see a popup menu, as shown below.

      Deleting an unwanted row.

    3. Select "Delete" from the popup menu (as shown above), and you're done.
    4. You'll want to delete any row that does not contain team batting data (except for row 1, which contains the column labels).
  2. Rearranging columns. For the correlation and linear regression analysis we'll be running later, Excel requires the data of interest to be in contiguous blocks. We are interested in measuring the correlation between runs scored and batting statistics such as hits, doubles, triples, batting average, etc. The columns are arranged so that two data columns—games (G) and at-bats (AB)—separate runs per game (R/G) and runs (R) from the hitting stats of interest. If you want, you can simply delete the games and at-bat columns (as described above, 1a–d). Alternatively, you can move the columns. Here's how to rearrange data columns in Excel:
    1. First, you need to create two new columns to move the data into. We'll add them to the left of the runs/game column.
    2. Right-click on the column letters to select columns B and C.
    3. The entire columns will be highlighted, indicating that they are selected, and you will see a popup menu, as shown below.

      Inserting additional columns.

    4. Select "Insert" from the popup menu (as shown above), and two empty columns are inserted to the left of your selection. (If you want to insert more columns, select more to start with.)
    5. Now select the games and at-bats columns, then cut-and-paste them to the new location.
    6. Finally, delete the now-empty columns and you're done.
  3. Freezing the header row (data labels). If your data table has too many rows to fit on your screen, it can be nice to have the column labels stay put when you scroll vertically. That way, you can still see what data is in each column. Here's how to freeze your column labels in Excel:
    1. Select the row below your column labels.
    2. From the menu, select "Window/Freeze Panes." That's it!

Adding Derived Statistical Measures

In this section you will be adding derived statistics—those that are calculated from other statistics in your table. In addition to the derived statistics below, you can include other measures that you found in your background research, or you can try to create your own derived statistic.

  1. First, insert at least three additional columns between the OPS+ and hmR/G columns. You'll be adding the three derived statistics mentioned in the Introduction: BRA, TA and RC.
  2. Batter's runs average (BRA), is the product of OBP and SLG. Here's how to enter it in Excel:
    1. All formulas in Excel start with the equals sign: "=".
    2. The arithmetic operators for formulas are: "+", "-", "*", and "/", for addition, subtraction, multiplication and division, respectively.
    3. On our example spreadsheet, OBP is in column "M" and SLG is in column "N".
    4. So to enter the formula for BRA for row 2, you would type: "=M2*N2", as shown below:

      Entering the formula for batter's runs average (BRA).

    5. Hit "Enter", and Excel calculates the value for you, as shown below.

      Excel automatically calculates formula values after you enter them.

    6. Next, copy and paste the formula into the rest of the column.
    7. Finally, you will want to change the formatting of the column so that only 3 decimal places are displayed (the same number of significant figures as in the operands, OBP and SLG). Right-click to select the entire BRA column, and select "Format cells..." from the popup menu.
    8. In the Format Cells dialog, Choose the "Number" tab, then select "Number" from the Category list, and change the "Decimal places" value to 3, as shown below:

      Changing the number display format to 3 decimal places.

    9. Click "OK", and Excel displays BRA to 3 decimal places.
  3. Total Average (TA), the ratio of the number of bases to the number of outs:
    TA = (TB + BB)/(AB - H). There is no column for TB (total bases) in the tables from Baseball-Reference.com. You can easily get TB from SLG, because:
    SLG = TB/AB ,

    so if we multiply both sides by AB, we can get TB:
    TB = SLG*AB .
  4. Substituting SLG*AB for TB, our formula becomes:
    TA = (SLG*AB + BB)/(AB - H). Find the corresponding data columns on your spreadsheet and enter the formula. As you did for BRA, above, copy and paste the TA formula to the rest of the column, and change the number display to 3 decimal places.
  5. Runs Created, devised by Bill James. RC = (H + BB)TB/(AB + BB). After the first two examples, you should be able to do this one on your own.

Running Correlation and Linear Regression Analysis

  1. To run the correlation analysis, use the menu to select "Tools/Data Analysis...". [Note, if this choice is not available, select "Tools/Add-Ins...". Check the "Analysis Toolpak" box, and click "OK". The "Data Analysis..." choice should now be available on the "Tools" menu.]
  2. In the Data Analysis dialog, select "Correlation" from the list of Analysis Tools (as shown below) then press "OK".

    Selecting the Correlation Analysis Tool in the Data Analysis dialog.

  3. In the Correlation dialog, there are several pieces of information to fill in:
    1. You want to measure the correlations between runs scored (R) and each of the batting statistics (all of the columns from hits (H) to runs created (RC). You should include the first row, which contains the column labels. Excel will use the labels to identify the correlation data. Enter this range of columns in the "Input Range" field.
    2. Make sure that "Columns" and "Labels in First Row" are both checked.
    3. In the "Output Options" section, it's a good idea to put the output on a new worksheet, which you can also name.
    4. When you are satisfied with your selections (see the image below as an example) hit "OK" and Excel will add a new worksheet with the correlation analysis results.

    Filling in the Correlation dialog box.

  4. Excel calculates the correlation coefficients for each pair of data columns in the range you supplied, and displays the results in a matrix on the new sheet you selected. Here is an example, using data from the 2005 baseball season:

    Matrix of correlation coefficients.

  5. The correlation matrix works like a mileage chart in a road atlas. To look up the correlation of batting average (BA) with runs (R), you look down the "R" column until you come to the value in the "BA" row (boxed coefficient in the image above).
  6. Next you'll do the linear regression analysis. This time you will need to run the analysis separately for each pair of variables (e.g., runs vs. OBP, runs vs. SLG, etc.) you want to test. You can use the results of the correlation analysis, above, to decide which pairs to explore further with linear regression.
  7. To run the linear regression analysis, use the menu to select "Tools/Data Analysis..." and then scroll down to choose "Regression" in the Data Analysis dialog, as shown below:

    Selecting the Regression Analysis Tool in the Data Analysis dialog.

  8. In the Regression dialog, there are several pieces of information to fill in:
    1. For the "Input Y Range," enter the range of cells for R (runs). Be sure to include the first row, with the column label.
    2. For the "Input X Range," enter the range of cells for the variable of interest (here, we're going to plot the regression line for R vs. OBP).
    3. Make sure that the "Labels" box is checked.
    4. Under "Output options," select "New Worksheet Ply" for the results, and give it a name (here, we're calling the new worksheet "2005_RvsOBP_Regr").
    5. Under "Residuals," make sure that boxes are checked as shown, so that Excel will automatically create regression and residuals plots for you.
    6. When you are satisfied with your selections (see the image below as an example) hit "OK" and Excel will add a new worksheet with the linear regression analysis results.

      Filling in the Regression dialog box.

  9. Excel does a lot for you automatically, but you will still need to tweak the formatting of the results. Here are some suggestions:
    1. Column widths. When Excel completes the Regression analysis, it will display the new worksheet containing the results, and all of the numerical results will be highlighted, indicating that they are selected. The first thing to do is to adjust the column widths so that you can read all of the headings. From the menu, select Format/Column/AutoFit Selection," and the widths will be set so that you can read everything.
    2. Next, you will need to reformat both the regression plot and the residuals plot. Think of Excel's automatic graphs as just a starting point (see below).

      You need to reformat Excel's default graphs in order to really see what is going on in your regression and residuals plots.

      To really see what's going on, you will want to expand the size of the graphs, and adjust both the x- and y-axis scales (see below). You may want to make other tweaks as well.
  10. Here are some suggestions for formatting the graphs.
      To change the size of a graph window, click on the "Chart Area" (between the graph ("Plot Area") and the border around it. (If you hold the mouse still over the graph, a tooltip will pop up and tell you where you are.) The border of the Chart Area will be highlighted, and there will be "handles" to click and drag so that you can size the graph window to your liking.
    1. You can change the size of the "Plot Area" (the graph itself) in a similar manner.
    2. To make changes to the x- or y-axis, double-click on the axis labels in the graph. You should see a dialog like this one:

      The Format Axis dialog.

    3. On the "Patterns" tab (see above), change the "Major tick mark type" and "Minor tick mark type" to "Outside," so that the tick marks don't obscure your data points. (Note: when you are formatting the Residuals graph, under "Tick mark labels" select "Low.")
    4. On the "Scale" tab (see below), adjust the limits so that your data just fits within the graph area. Choose a value for the "Minimum" that is just below your lowest data point, and a value for the "Maximum" that is just above your highest data point. The "Major unit" is the step size for the major tick marks. This value determines the interval for labeling the axis with numbers. The "Minor unit" is the step size for the tick marks falling between numbered values.

      Setting the appropriate scale for the axis.

    5. On the "Number" tab (see below), first select "Number" in the "Category" list. Then you can set the number of decimal places to display on the axis labels. Here the axis is OBP (on-base percentage), so we've chosen 3 decimal places, which is the customary way of displaying OBP.

      Setting the appropriate number format for the axis.

    6. Push "OK" and you can see the updates on your graph. If you need to make more changes, just double-click on the axis label again. Make these changes for both the x- and y-axis.
  11. Finally, here are a few more tweaks you can make to the graph: refining the graph title and axis labels, removing the legend, and changing the regression line from a series of symbols to a solid line.
    1. Right-click on the graph and select "Chart Options...". You'll see a dialog like the one below:

      Setting graph and axis titles on the Chart Options Dialog.

    2. On the "Titles" tab (see above), you can name the graph, and change the labels for the x- and y-axis.
    3. On the "Legend" tab (see below), you can de-select "Show legend." Then push "OK."

      Turning off the legend display.

    4. Next, point your mouse at one of the data points for the regression line. (If you hold the mouse still over a data point, a tooltip will pop up to tell you which data series the point is from.) Double-click on the point, and the "Format Data Series" dialog will pop up (see below).

      Formatting the regression line.

    5. It's best to have the regression line appear as a simple line, not as data symbols. In the "Marker" section, select "None". In the "Line" section, select "Custom", choose a solid line style in black, and then push "OK."
    6. Here's the finished result (below). Now the data points are clearly visible, as is their relationship to the regression line.

      The finished regression plot.

Comparing the Results

  1. Use your correlation analysis results to decide which batting statistics are more highly correlated with scoring runs.
  2. Make a table or graph of your results, using the r2 statistic to interpret the significance of the correlation.
  3. For each of the highly-correlated statistics, work through the linear regression analysis, and make a regression plot and a residuals plot. Compare the graphs and see which measure is best at predicting the number of runs scored.

Variations

Many variations of this project are possible. We're sure that you can think of more yourself, but here are a few ideas to get you started.

  • Do you get the same results if you run this analysis for a different year? For a different baseball era? Can you think of reasons to explain any differences you find?
  • Are there other derived statistics (besides RC, TA, and BRA) that might do a better job at predicting runs scored?
  • You have to score at least one run to win a baseball game, so we expect teams that score more runs to win more games. However, you also have to keep the other team from scoring more runs than you do. So how well does a team's run-scoring ability correlate with winning percentage?
  • Investigate correlations between team pitching statistics and winning percentage. Which pitching statistic is the best predictor of success?

    Baseball Economics

  • How well do player salaries correlate with offensive performance? In baseball it is generally expected that the three outfielders and the first and third basemen will produce runs for the team by being skilled with the bat. Assemble the individual batting and salary statistics for this group of players for a single season. How well does salary correlate with the various batting statistics used above? You can take this further by expanding your sample to multiple seasons.
  • How well does team payroll correlate with winning percentage?

    More Advanced Project Ideas

  • Baseball and Athletic Longevity. History tells us that, over a human lifetime, the trajectory for most individual accomplishments is an arc. We all start off pretty much helpless as infants, grow in physical and mental skill through childhood, teenage years and young adulthood. If we are fortunate enough to live into old age, we also, inevitably, start to notice a decline in those same skills as the body and mind age. Baseball statistics provide a way to measure the trajectory of athletic ability for large numbers of individuals. There are many, many questions you could explore along these lines. What is the "average" age for peak performance? How much variance is there in this age? Does it differ for pitchers and batters? Which position has the greatest longevity? The shortest? Has peak performance age changed over time? Use year-by-year career statistics for individual players to identify their peak years by some measure that you devise. Compile and analyze tables of peak performance data for groups of players to answer one of the questions above, or a similar question that interests you.
  • For more ideas, see Teaching Baseball Using Statistics, by Jim Albert (listed in the Bibliography).

Credits

Andrew Olson, Ph.D., Science Buddies

Sources

  • Albert, Jim, 2003. Teaching Statistics Using Baseball. Washington, D.C.: The Mathematical Association of America.
  • Rummel, R.J., 1976. "Understanding Correlation, Chapter 4.3, Interpreting the Correlation: Correlation Squared" Department of Political Science, University of Hawaii [accessed March 6, 2006] http://www.mega.nu:8080/ampp/rummel/uc.htm#S4.3.

Engineering News

Central Board of Secondary Education

Architecture News

Management News

Medical News

Journalism News

ss_blog_claim=39d0fbd9150037431cf33bbbf3c7c7ce