Skip to main content

(Easy)Getting Started With R - Gapminder Dataset Part 1(Free Tutorial)

This guide will get you started on the path to exploring and visualizing your own data with the R programming language. It introduces you to the tidyverse which is a collection of data science tools within R for transforming and visualizing data. This is not the only set of tools in R, but it's a powerful and popular approach for exploring data. At every step, you'll be analyzing a real dataset called gapminder.

Gapminder tracks economic and social indicators like life expectancy and the GDP per capita of countries over time. The experience you gain on this example will help you in analyzing your own data. You'll learn to draw specific insights and communicate them through informative visualizations with the ggplot2 package. 

The first code you'll write is to load two R packages, which is done by writing library(packagename). R packages are tools that aren't built into the language, but were created later by other programmers. Each of them provides tools that you don't have to write yourself. The first package is gapminder, created by Jenny Bryan, which contains the dataset that you'll be analyzing. The second package is dplyr, created by Hadley Wickham, which provides step-by-step tools for transforming this data, such as filtering, sorting, and summarizing it. 

 You type library(gapminder) to display the contents of the gapminder object, which is structured as a data frame. A data frame keeps rectangular data in rows and columns, similar to a spreadsheet, or a table in a SQL database. Most data analyses in R, and everything you'll do in this guide, are centered around data frames. 


As described in the first line of the output, this is a special type of data frame called a tibble. R displays the first ten rows so that you can get a glimpse of it, and you can see a short description in the first line. This tells you the tibble has one thousand seven hundred and four rows, each of which we call an observation. It has six columns, each of which we call a variable. 

It's important in an analysis to understand what each observation, or row, represents. Here, each represents a unique pair of a country and a year. For example, 

  • the first observation represents country statistics for Afghanistan in 1952, 
  • the second for Afghanistan in 1957, and so on. 

For each combination of a country and year, the dataset contains several variables, or columns, describing the country's demographics. We see the continent - in this case, Asia - the life expectancy in years, the population, and the GDP per capita. The GDP per capita is the country's total economic output (Gross Domestic Product) divided by its population, and it's a common measure of how wealthy a country is. 

Each variable is of one consistent data type: some are numbers, like life expectancy and population, and some are categorical, like country and continent. Even with this small glimpse of the data, you can extract a few insights. For example, you can see that Afghanistan's life expectancy and population have both gone up from 1952 to 1997, but that its GDP per capita has wavered. In the rest of this guide, you'll learn to use R to draw many conclusions about the social and economic history of countries around the world. 

Loading the gapminder and dplyr packages

Before you can work with the gapminder dataset, you'll need to load two R packages that contain the tools for working with it, then display the gapminder dataset so that you can see what it contains.

Exercise 1:

Use the library() function to load the dplyr package and the gapminder package.

Type gapminder, on its own line, to look at the gapminder dataset.



Part 1 ends here. Stay tuned for part 2

Cheers!

 

 

Comments

Popular posts from this blog

Fun Terminal Commands Every Linux User Should Try

Accessing Maps from the terminal with MapSCII Requirements Telnet installed Internet Connection Firewall is disabled You can do this on Linux, Unix, Mac OS X or Windows with an app like PuTTY or the Windows 10 Linux bash shell or any Os that supports telnet. Open terminal and write the command below. telnet mapscii.me Hit enter and you're ready to browse and enjoy MapSCII. Navigate using the keyboard or mouse. Use the following keys on your keyboard Arrow keys: up, down, right, left A to zoom in Z to zoom out C toggles ASCII mode on/off You can also click and drag and hold on the map with your cursor. If your connection dropped without a reason, reconnect with telnet -E mapscii.me and use only cursors, A and Z to navigate. The Mapscii project is open source and you can install it locally if you'd like. Check out their project here on GitHub . The Dancing ASCII Party Parrot Requirements Curl installed Internet

SQL for Data Analysis - Udacity

  Entity Relationship Diagrams An  entity relationship diagram  (ERD) is a common way to view data in a database. Below is the ERD for the database we will use from Parch & Posey. These diagrams help you visualize the data you are analyzing including: The names of the tables. The columns in each table. The way the tables work together. You can think of each of the boxes below as a spreadsheet. What to Notice In the Parch & Posey database there are five tables (essentially 5 spreadsheets): web_events accounts orders sales_reps region You can think of each of these tables as an individual spreadsheet. Then the columns in each spreadsheet are listed below the table name. For example, the  region  table has two columns:  id  and  name . Alternatively the  web_events  table has four columns. The "crow's foot" that connects the tables together shows us how the columns in one table relate to the columns in another table. In this first lesson, you will be learning the bas

Impressive - Check if Your Email Address Has Been Hacked - Free,Easy Tutorial

haveibeenpwned.com Data breaches are rampant and many people don't appreciate the scale or frequency with which they occur. A "breach" is an incident where data is inadvertently exposed in a vulnerable system, usually due to insufficient access controls or security weaknesses in the software. How is the legitimacy of a data breach established? Attackers often give "breach" announcements, which are later revealed to be hoaxes. There is a delicate balance to be struck between making data searchable as soon as possible and conducting proper due diligence to confirm the breach's validity. In order to verify the authenticity of a violation, the following steps are normally taken: Has the affected provider made a public statement about the security breach? Does the information stolen in the breach show up in a Google search (i.e., it was simply copied from another source)? Is the structure of the data consistent with what you'd expect to see in a breach? Have