1  Introduction to R and Python

R Vs Python: What’s the Difference? https://www.guru99.com/r-vs-python.html

1.1 About R and Python

R and Python are both versatile and widely used open-source programming languages with uses in statistical computing and graphics. R is an object-oriented environment that enables statistical techniques to be easily implemented and adapted for different use cases. It is specifically designed for statistical computing and graphics. R’s extensive library of packages, which provide a variety of functions, data, and documentation that can be readily shared and used by other researchers and analysts, is one of its main strengths. These packages provide a wide range of capabilities, making R an essential tool for data analysis, visualization, and modeling across a wide range of disciplines.

Python, on the other hand, is a general-purpose computer language that excels at statistical computing and graphics. Python’s object-oriented method lends itself well to organizing and manipulating complicated data structures, and its interactive nature enables rapid prototyping and experimentation. Python also has a robust ecosystem of third-party libraries and frameworks, including several for data processing and visualization. While Python’s statistical capabilities are not as extensive as those of R, its versatility and popularity make it a popular option for a variety of applications other than statistical computing.

When considering programming languages for data analysis and manipulation, R and Python are two of the most popular choices. While both languages have similarities in terms of syntax and functionality, there are also several key differences between the two that can impact their usefulness for different projects or tasks. In this context, let’s explore some of the key differences between R and Python, including their data structures, function syntax, package ecosystems, object-oriented programming support, and error handling.

  1. Data structures: R has several built-in data structures, such as vectors, matrices, data frames, and lists, that are optimized for statistical analysis and data manipulation. Python, on the other hand, has a more diverse range of built-in data structures, including lists, tuples, dictionaries, and sets, that can be used for a variety of purposes.

  2. Function syntax: The syntax for defining functions is slightly different in R and Python. In R, the function keyword is used to define a function, while in Python the def keyword is used. Additionally, R allows for optional arguments to be specified by name, while Python requires all arguments to be specified in order.

  3. Package ecosystems: Both R and Python have large ecosystems of packages and libraries that extend the functionality of the language. However, R’s ecosystem is more focused on statistical analysis and data manipulation, while Python’s ecosystem is more diverse and includes libraries for web development, machine learning, scientific computing, and more.

  4. Object-oriented programming: Python is a fully object-oriented language, whereas R has some support for object-oriented programming but is primarily a functional programming language.

  5. Error handling: Python has a built-in mechanism for handling errors, called exceptions, that allows for more fine-grained control over error handling. R does not have a built-in exception mechanism, but instead relies on returning error codes or messages from functions.

Overall, the choice between R and Python depends on the specific needs of the project or task at hand. R is a good choice for statistical analysis and data manipulation, while Python is more versatile and can be used for a wider range of tasks. However, both languages have their strengths and weaknesses, and many projects may benefit from using both languages together in a complementary way.

1.2 History of R and Python

R is an implementation of the S computer language, which was created in 1976 by John Chambers. Ross Ihaka and Robert Gentleman of the University of Auckland in New Zealand created an alternate version of the basic S language in 1991. R, an alternative implementation, was released in 1993. R has since grown in popularity and is now a top statistical computing and graphics tool.

Python, on the other hand, was developed in 1989 at Centrum Wiskunde & Informatica (CWI) in the Netherlands by Guido van Rossum. It was created as a replacement for the ABC computer language. Python 2.0 was released in 2000, and Python 3.0, a significant language revision that is not fully backward-compatible, was released in 2008. Today, many developers create libraries especially for use with Python 3, and the language has grown in popularity for a variety of uses other than statistical computing, such as web development, machine learning, and scientific computing.

1.3 Story behind their names

1.3.1 R

R is called after its creators, Robert Gentleman and Ross Ihaka. S was named after its creators, John Chambers and his coworkers at Bell Labs, whose surnames all began with the letter “S.” This practice is continued by R’s name, which incorporates the first letter of its creators’ first names.

1.3.2 Python

Guido van Rossum, the creator of Python, was a fan of the British comedy group Monty Python’s Flying Circus. Van Rossum selected the name Python as a working title for his project in December 1989, while working at the Netherlands’ National Research Institute for Mathematics and Computer Science. He wanted a short, distinct, and slightly mysterious name for his language, and he liked the connotations of the term “Python” - it was an uncommon and exotic species, and it also had a connection to serpents, implying the language’s power and flexibility. Van Rossum kept the name when he released the first version of Python in February 1991, and it has since become one of the most famous programming languages in the world.

1.5 Installation

1.5.1 R

To get started with R programming, you’ll need to follow two steps:

  1. Download R, a programming language for statistical computing and graphics. You can get the latest version for free from the official R website at https://cran.rstudio.com/. Make sure to choose the appropriate version for your computer’s operating system.

  2. Install RStudio, an integrated development environment (IDE) for R. RStudio makes it easier to write, debug, and organize your R code. There are two versions of RStudio available: RStudio Desktop and RStudio Server. RStudio Desktop is a standalone desktop application that you can download from https://posit.co/download/rstudio-desktop/. RStudio Server is a web-based version that runs on a remote server and can be accessed through a web browser.

1.5.2 Python

To get started with Python programming, you’ll need to follow these steps:

  1. Download Python: Python is an open-source programming language that is used for a wide range of tasks, including web development, data analysis, and machine learning. You can download Python for free from the official Python website at https://www.python.org/downloads/. Make sure to choose the appropriate version for your computer’s operating system.

  2. Install an Integrated Development Environment (IDE): An IDE is a software application that provides comprehensive facilities to computer programmers for software development. It typically consists of a code editor, a debugger, and a compiler. There are several options for Python IDEs, including:

  • PyCharm: PyCharm is a popular Python IDE developed by JetBrains. It is available in both free and paid versions and provides advanced features for web development, scientific computing, and data analysis. You can download PyCharm from https://www.jetbrains.com/pycharm/download/.

  • Visual Studio Code: Visual Studio Code is a free, open-source IDE developed by Microsoft. It supports multiple programming languages, including Python, and provides a range of features such as debugging, syntax highlighting, and code completion. You can download Visual Studio Code from https://code.visualstudio.com/download.

  • Jupyter Notebook: Jupyter Notebook is a web-based interactive computing environment that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. It is commonly used for data analysis and machine learning tasks. You can download Jupyter Notebook from https://jupyter.org/install.

Once you have downloaded and installed Python and an IDE of your choice, you’re ready to start coding in Python!

1.6 Install and Load Libraries

1.6.1 R

R Packages: A Beginner’s Guide https://www.datacamp.com/community/tutorials/r-packages-guide?utm_source=adwords_ppc&utm_campaignid=1655852085&utm_adgroupid=61045434222&utm_device=c&utm_keyword=%2Bload%20%2Bpackage%20%2Br&utm_matchtype=b&utm_network=g&utm_adpostion=&utm_creative=469789579329&utm_targetid=aud-522010995285:kwd-589281898774&utm_loc_interest_ms=9071445&utm_loc_physical_ms=1009919&gclid=Cj0KCQjwyZmEBhCpARIsALIzmnKGh4ZVHa4OxhLq0JUzpoBMMRhQvCGEmvscFuLZ5CI3V3JPsQ2v9P8aAhwpEALw_wcB

An R package is a way to organize your own work and share it with others. Typically, a package contains code, documentation for the package and the functions inside, some tests to check everything works as it should, and data sets.

Three of the most popular repositories for R packages are: CRAN, Bioconductor and Github.

1.6.1.1 Installing Packages From CRAN

install.packages("package_name")

Example

install.packages("tidyverse")

After running this, some messages will be diplayed on the console. They will depend on what operating system you are using, the dependencies, and if the package was successfully installed.

To install more than a package at the same time, we can use a character vector

install.packages(c("vioplot", "MASS"))

The function install.packages will download the source code from on the CRAN mirrors and install the package (and any dependencies) locally on your computer.

You have to install a package only once.

1.6.1.2 Load Packages

After a package is installed, you are ready to use its functionalities.

If you just need a sporadic use of a few functions or data inside a package you can access them with the notation

packagename::functionname().

If you will make a more intensive use of the package, then maybe is worth to load it into memory. The simplest way to do this is with the library() command.

Please note that the input of install.packages() is a character vector and requires the name to be in quotes, while library() accepts either character or name and makes it possible for you to write the name of the package without quotes.

Once you have the package installed, you can load the library into your R session for use. Any of the functions that are specific to that package will be available for you to use by simply calling the function as you would for any of the base functions. Note that quotations are not required here.

library(tidyverse)

1.6.2 Python

Use ‘import module’ or ‘from module import’? https://stackoverflow.com/questions/710551/use-import-module-or-from-module-import

Method 1: import module

Method 2: from module import foo

The difference between import module and from module import foo is subjective. User can select one method and be consitent in the use of it.

import module from module import foo
Pros Pros
- Less Maintanence of the import statements - Less typying to use foo function
- Don’t need to add any aditional imports to start using another item from the same module - More control over whcih items of the module can be accessed
Cons Cons
- Typing module.foo in the code be tedious (dull, boring ) to use new items from the module the user have to update the import statement
  • It can be minimized by using import module as mo, then typing mo.foo | You loose context about foo. For example it is less clear ceil() does, compared to math.ceil()

Don’t use

  • from modle import *

    • Because it clutters or fills with untidy collection of things in the namespace
  • import *

    • For any reasonable large set of code, if you import * you will likely be cementing it into the module, unable to be removed.

    • This is because now it is difficullt to identiify what items used in the code are coming from module.